Seedance 2.0 Video Generation
Video
Seedance 2.0 Video Generation
POST
Seedance 2.0 Video Generation
The Seedance 2.0 series models support multimodal content inputs including images, videos, audio, and text. They provide capabilities such as video generation, video editing, and video extension, and can accurately restore object details, timbre, effects, style, camera movement, and more while maintaining stable character features. Supported modes include text-to-video, image-to-video (first frame / first and last frames), and multimodal-reference video generation (image + video + audio combinations). A standard version (seedance-2.0) and a fast version (seedance-2.0-fast) are available; the fast version costs less and generates faster.
Minimum Charge Notice
- Applicable SKU: Multimodal-reference video generation (with video input, i.e., the MULTI_REF_VID series)
- Billing rule: Actual charge = max(unit price per second × total video seconds, minimum charge)
- Trigger scenario: When the user inputs a very short video (such as 1–2 seconds) and the output is also short, the amount calculated by seconds may be lower than the cost corresponding to the supplier’s minimum token consumption. In this case, the minimum charge is applied.
- Example scenario: A customer wants to generate a 4-second product promotional video. They upload a 2-second product promotional video and want to modify the background and colors, with no other input elements. The video is relatively simple, and the unit price per second × video seconds calculates to only 0.30) is applied directly.
Minimum Charge Table
| Output Seconds | 2.0-480P | 2.0-720P | 2.0-1080P | fast-480P | fast-720P |
|---|---|---|---|---|---|
| 4 | $0.30 | $0.65 | $1.64 | $0.23 | $0.50 |
| 5 | $0.39 | $0.84 | $2.06 | $0.30 | $0.64 |
| 6 | $0.43 | $0.93 | $2.47 | $0.33 | $0.71 |
| 7 | $0.52 | $1.11 | $2.88 | $0.40 | $0.85 |
| 8 | $0.61 | $1.30 | $3.29 | $0.46 | $1.00 |
| 9 | $0.65 | $1.39 | $3.70 | $0.50 | $1.07 |
| 10 | $0.73 | $1.58 | $4.11 | $0.56 | $1.21 |
| 11 | $0.82 | $1.76 | $4.52 | $0.63 | $1.35 |
| 12 | $0.86 | $1.86 | $4.93 | $0.66 | $1.43 |
| 13 | $0.95 | $2.04 | $5.35 | $0.73 | $1.57 |
| 14 | $1.04 | $2.23 | $5.76 | $0.79 | $1.71 |
| 15 | $1.08 | $2.32 | $6.17 | $0.83 | $1.78 |
Request Headers
Enum value:
application/jsonBearer authentication format: Bearer {{API Key}}.
Request Body
Whether to use the fast version of the model (seedance-2.0-fast). The fast version costs less and generates faster.
Random seed, used to control the randomness of the generated content. Value range: [-1, 2^32-1], where -1 means random.Value range: [-1, +∞]
First-frame image URL or Base64 encoding. Used for image-to-video first-frame mode. Supported formats: jpeg/png/webp/bmp/tiff/gif. Aspect ratio range: (0.4, 2.5). Width and height pixel range: (300, 6000). Each image must not exceed 30MB.
Aspect ratio of the generated video. adaptive means the most suitable aspect ratio is automatically selected based on the input.Allowed values:
16:9, 4:3, 1:1, 3:4, 9:16, 21:9, adaptiveText prompt describing the desired video to generate. Chinese and English are supported. It is recommended that Chinese prompts do not exceed 500 characters and English prompts do not exceed 1000 words. Required in text-to-video mode; optional in other modes.
Duration of the generated video (seconds). Range: [4,15]Value range: [4, 15]
Whether the generated video contains a watermark.
Last-frame image URL or Base64 encoding. Must be provided together with the image field to implement image-to-video first-and-last-frame mode. Passing only last_image without image is invalid. If the aspect ratios of the first-frame and last-frame images are inconsistent, the first frame takes precedence, and the last frame is automatically cropped to fit.
Video resolution. 1080p is only supported by the standard version (fast=false).Allowed values:
480p, 720p, 1080pWhether to enable web search. When enabled, the model independently determines whether to search internet content based on the prompt, which can improve timeliness but increases latency.
Whether to generate audio synchronized with the visuals. When true, the model automatically generates matching vocals, sound effects, and background music based on the text and visual content.
List of reference audio files, used for multimodal-reference video generation mode. Each item is an audio URL or Base64 encoding. Formats: wav/mp3. Each audio duration: [2,15]s. The total duration of all audio must not exceed 15s, and each file must not exceed 15MB. Audio cannot be input alone; at least 1 reference image or video must be included.Array length: 1 - 3
List of reference images, used for multimodal-reference video generation mode. Each item is an image URL or Base64 encoding. Up to 9 images are supported. You can specify how to combine images through the prompt. The recommended format is “[Image 1]xxx, [Image 2]xxx”.Array length: 1 - 9
List of reference videos, used for multimodal-reference video generation mode. Each item is a video URL. Formats: mp4/mov. Resolution: 480p/720p. Each video duration: [2,15]s. The total duration of all videos must not exceed 15s, and each file must not exceed 50MB.Array length: 1 - 3
Whether to return the last-frame image of the generated video (png format, no watermark). This can be used for continuous video generation: use the last frame as the first frame of the next video segment.
Response Information
Use task_id to request the Get Task Result API to retrieve the generated output.