Skip to main content
POST
/
v3
/
async
/
seedance-2.0
Seedance 2.0 Video Generation
curl --request POST \
  --url https://api.highwayapi.ai/v3/async/seedance-2.0 \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "fast": true,
  "seed": 123,
  "image": "<string>",
  "ratio": "<string>",
  "prompt": "<string>",
  "duration": 123,
  "watermark": true,
  "last_image": "<string>",
  "resolution": "<string>",
  "web_search": true,
  "generate_audio": true,
  "reference_audios": [
    {}
  ],
  "reference_images": [
    {}
  ],
  "reference_videos": [
    {}
  ],
  "return_last_frame": true
}
'
{
  "task_id": "<string>"
}
The Seedance 2.0 series models support multimodal content inputs including images, videos, audio, and text. They provide capabilities such as video generation, video editing, and video extension, and can accurately restore object details, timbre, effects, style, camera movement, and more while maintaining stable character features. Supported modes include text-to-video, image-to-video (first frame / first and last frames), and multimodal-reference video generation (image + video + audio combinations). A standard version (seedance-2.0) and a fast version (seedance-2.0-fast) are available; the fast version costs less and generates faster.

Minimum Charge Notice

  • Applicable SKU: Multimodal-reference video generation (with video input, i.e., the MULTI_REF_VID series)
  • Billing rule: Actual charge = max(unit price per second × total video seconds, minimum charge)
  • Trigger scenario: When the user inputs a very short video (such as 1–2 seconds) and the output is also short, the amount calculated by seconds may be lower than the cost corresponding to the supplier’s minimum token consumption. In this case, the minimum charge is applied.
  • Example scenario: A customer wants to generate a 4-second product promotional video. They upload a 2-second product promotional video and want to modify the background and colors, with no other input elements. The video is relatively simple, and the unit price per second × video seconds calculates to only 0.19forthisrequest.However,becauseavideoassetwasuploaded,theminimumchargeistriggered,andthe4secondminimumcharge(0.19 for this request. However, because a video asset was uploaded, the minimum charge is triggered, and the 4-second minimum charge (0.30) is applied directly.

Minimum Charge Table

Output Seconds2.0-480P2.0-720P2.0-1080Pfast-480Pfast-720P
4$0.30$0.65$1.64$0.23$0.50
5$0.39$0.84$2.06$0.30$0.64
6$0.43$0.93$2.47$0.33$0.71
7$0.52$1.11$2.88$0.40$0.85
8$0.61$1.30$3.29$0.46$1.00
9$0.65$1.39$3.70$0.50$1.07
10$0.73$1.58$4.11$0.56$1.21
11$0.82$1.76$4.52$0.63$1.35
12$0.86$1.86$4.93$0.66$1.43
13$0.95$2.04$5.35$0.73$1.57
14$1.04$2.23$5.76$0.79$1.71
15$1.08$2.32$6.17$0.83$1.78
This is an asynchronous API and only returns the task_id of the asynchronous task. You should use this task_id to request the Get Task Result API to retrieve the generated result.

Request Headers

Content-Type
string
required
Enum value: application/json
Authorization
string
required
Bearer authentication format: Bearer {{API Key}}.

Request Body

fast
boolean
default:false
Whether to use the fast version of the model (seedance-2.0-fast). The fast version costs less and generates faster.
seed
integer
Random seed, used to control the randomness of the generated content. Value range: [-1, 2^32-1], where -1 means random.Value range: [-1, +∞]
image
string
First-frame image URL or Base64 encoding. Used for image-to-video first-frame mode. Supported formats: jpeg/png/webp/bmp/tiff/gif. Aspect ratio range: (0.4, 2.5). Width and height pixel range: (300, 6000). Each image must not exceed 30MB.
ratio
string
default:"adaptive"
Aspect ratio of the generated video. adaptive means the most suitable aspect ratio is automatically selected based on the input.Allowed values: 16:9, 4:3, 1:1, 3:4, 9:16, 21:9, adaptive
prompt
string
Text prompt describing the desired video to generate. Chinese and English are supported. It is recommended that Chinese prompts do not exceed 500 characters and English prompts do not exceed 1000 words. Required in text-to-video mode; optional in other modes.
duration
integer
default:5
Duration of the generated video (seconds). Range: [4,15]Value range: [4, 15]
watermark
boolean
default:false
Whether the generated video contains a watermark.
last_image
string
Last-frame image URL or Base64 encoding. Must be provided together with the image field to implement image-to-video first-and-last-frame mode. Passing only last_image without image is invalid. If the aspect ratios of the first-frame and last-frame images are inconsistent, the first frame takes precedence, and the last frame is automatically cropped to fit.
resolution
string
default:"720p"
Video resolution. 1080p is only supported by the standard version (fast=false).Allowed values: 480p, 720p, 1080p
Whether to enable web search. When enabled, the model independently determines whether to search internet content based on the prompt, which can improve timeliness but increases latency.
generate_audio
boolean
default:true
Whether to generate audio synchronized with the visuals. When true, the model automatically generates matching vocals, sound effects, and background music based on the text and visual content.
reference_audios
array
List of reference audio files, used for multimodal-reference video generation mode. Each item is an audio URL or Base64 encoding. Formats: wav/mp3. Each audio duration: [2,15]s. The total duration of all audio must not exceed 15s, and each file must not exceed 15MB. Audio cannot be input alone; at least 1 reference image or video must be included.Array length: 1 - 3
reference_images
array
List of reference images, used for multimodal-reference video generation mode. Each item is an image URL or Base64 encoding. Up to 9 images are supported. You can specify how to combine images through the prompt. The recommended format is “[Image 1]xxx, [Image 2]xxx”.Array length: 1 - 9
reference_videos
array
List of reference videos, used for multimodal-reference video generation mode. Each item is a video URL. Formats: mp4/mov. Resolution: 480p/720p. Each video duration: [2,15]s. The total duration of all videos must not exceed 15s, and each file must not exceed 50MB.Array length: 1 - 3
return_last_frame
boolean
default:false
Whether to return the last-frame image of the generated video (png format, no watermark). This can be used for continuous video generation: use the last frame as the first frame of the next video segment.

Response Information

task_id
string
required
Use task_id to request the Get Task Result API to retrieve the generated output.