Seedance 2.0 Video Generation

curl --request POST \
  --url https://api.highwayapi.ai/v3/async/seedance-2.0 \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "fast": true,
  "seed": 123,
  "image": "<string>",
  "ratio": "<string>",
  "prompt": "<string>",
  "duration": 123,
  "watermark": true,
  "last_image": "<string>",
  "resolution": "<string>",
  "web_search": true,
  "generate_audio": true,
  "reference_audios": [
    {}
  ],
  "reference_images": [
    {}
  ],
  "reference_videos": [
    {}
  ],
  "return_last_frame": true
}
'

{
  "task_id": "<string>"
}

POST

async

seedance-2.0

Seedance 2.0 Video Generation

curl --request POST \
  --url https://api.highwayapi.ai/v3/async/seedance-2.0 \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "fast": true,
  "seed": 123,
  "image": "<string>",
  "ratio": "<string>",
  "prompt": "<string>",
  "duration": 123,
  "watermark": true,
  "last_image": "<string>",
  "resolution": "<string>",
  "web_search": true,
  "generate_audio": true,
  "reference_audios": [
    {}
  ],
  "reference_images": [
    {}
  ],
  "reference_videos": [
    {}
  ],
  "return_last_frame": true
}
'

{
  "task_id": "<string>"
}

The Seedance 2.0 series models support multimodal content inputs including images, videos, audio, and text. They provide capabilities such as video generation, video editing, and video extension, and can accurately restore object details, timbre, effects, style, camera movement, and more while maintaining stable character features. Supported modes include text-to-video, image-to-video (first frame / first and last frames), and multimodal-reference video generation (image + video + audio combinations). A standard version (seedance-2.0) and a fast version (seedance-2.0-fast) are available; the fast version costs less and generates faster.

Minimum Charge Notice

Applicable SKU: Multimodal-reference video generation (with video input, i.e., the MULTI_REF_VID series)
Billing rule: Actual charge = max(unit price per second × total video seconds, minimum charge)
Trigger scenario: When the user inputs a very short video (such as 1–2 seconds) and the output is also short, the amount calculated by seconds may be lower than the cost corresponding to the supplier’s minimum token consumption. In this case, the minimum charge is applied.
Example scenario: A customer wants to generate a 4-second product promotional video. They upload a 2-second product promotional video and want to modify the background and colors, with no other input elements. The video is relatively simple, and the unit price per second × video seconds calculates to only $0.19 for this request. However, because a video asset was uploaded, the minimum charge is triggered, and the 4-second minimum charge ($ 0.30) is applied directly.

Minimum Charge Table

Output Seconds	2.0-480P	2.0-720P	2.0-1080P	fast-480P	fast-720P
4	$0.30	$0.65	$1.64	$0.23	$0.50
5	$0.39	$0.84	$2.06	$0.30	$0.64
6	$0.43	$0.93	$2.47	$0.33	$0.71
7	$0.52	$1.11	$2.88	$0.40	$0.85
8	$0.61	$1.30	$3.29	$0.46	$1.00
9	$0.65	$1.39	$3.70	$0.50	$1.07
10	$0.73	$1.58	$4.11	$0.56	$1.21
11	$0.82	$1.76	$4.52	$0.63	$1.35
12	$0.86	$1.86	$4.93	$0.66	$1.43
13	$0.95	$2.04	$5.35	$0.73	$1.57
14	$1.04	$2.23	$5.76	$0.79	$1.71
15	$1.08	$2.32	$6.17	$0.83	$1.78

This is an asynchronous API and only returns the task_id of the asynchronous task. You should use this task_id to request the Get Task Result API to retrieve the generated result.

Request Headers

Content-Type

string

required

Enum value: application/json

Authorization

string

required

Bearer authentication format: Bearer {{API Key}}.

Request Body

fast

boolean

default:false

Whether to use the fast version of the model (seedance-2.0-fast). The fast version costs less and generates faster.

seed

integer

Random seed, used to control the randomness of the generated content. Value range: [-1, 2^32-1], where -1 means random.Value range: [-1, +∞]

image

string

First-frame image URL or Base64 encoding. Used for image-to-video first-frame mode. Supported formats: jpeg/png/webp/bmp/tiff/gif. Aspect ratio range: (0.4, 2.5). Width and height pixel range: (300, 6000). Each image must not exceed 30MB.

ratio

string

default:"adaptive"

Aspect ratio of the generated video. adaptive means the most suitable aspect ratio is automatically selected based on the input.Allowed values: 16:9, 4:3, 1:1, 3:4, 9:16, 21:9, adaptive

prompt

string

Text prompt describing the desired video to generate. Chinese and English are supported. It is recommended that Chinese prompts do not exceed 500 characters and English prompts do not exceed 1000 words. Required in text-to-video mode; optional in other modes.

duration

integer

default:5

Duration of the generated video (seconds). Range: [4,15]Value range: [4, 15]

watermark

boolean

default:false

Whether the generated video contains a watermark.

last_image

string

Last-frame image URL or Base64 encoding. Must be provided together with the image field to implement image-to-video first-and-last-frame mode. Passing only last_image without image is invalid. If the aspect ratios of the first-frame and last-frame images are inconsistent, the first frame takes precedence, and the last frame is automatically cropped to fit.

resolution

string

default:"720p"

Video resolution. 1080p is only supported by the standard version (fast=false).Allowed values: 480p, 720p, 1080p

web_search

boolean

default:false

Whether to enable web search. When enabled, the model independently determines whether to search internet content based on the prompt, which can improve timeliness but increases latency.

generate_audio

boolean

default:true

Whether to generate audio synchronized with the visuals. When true, the model automatically generates matching vocals, sound effects, and background music based on the text and visual content.

reference_audios

array

List of reference audio files, used for multimodal-reference video generation mode. Each item is an audio URL or Base64 encoding. Formats: wav/mp3. Each audio duration: [2,15]s. The total duration of all audio must not exceed 15s, and each file must not exceed 15MB. Audio cannot be input alone; at least 1 reference image or video must be included.Array length: 1 - 3

reference_images

array

List of reference images, used for multimodal-reference video generation mode. Each item is an image URL or Base64 encoding. Up to 9 images are supported. You can specify how to combine images through the prompt. The recommended format is “[Image 1]xxx, [Image 2]xxx”.Array length: 1 - 9

reference_videos

array

List of reference videos, used for multimodal-reference video generation mode. Each item is a video URL. Formats: mp4/mov. Resolution: 480p/720p. Each video duration: [2,15]s. The total duration of all videos must not exceed 15s, and each file must not exceed 50MB.Array length: 1 - 3

return_last_frame

boolean

default:false

Whether to return the last-frame image of the generated video (png format, no watermark). This can be used for continuous video generation: use the last frame as the first frame of the next video segment.

Response Information

task_id

string

required

Use task_id to request the Get Task Result API to retrieve the generated output.

Veo 3.1 Fast First/Last Frame Video Generation

VIDU Q2 Pro Fast Reference Image/Video to Video

​Minimum Charge Notice

​Minimum Charge Table

​Request Headers

​Request Body

​Response Information

Minimum Charge Notice

Minimum Charge Table

Request Headers

Request Body

Response Information