Wanx Wan 2.7 Reference-to-Video

curl --request POST \
  --url https://api.highwayapi.ai/v3/async/wan2.7-r2v \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "seed": 123,
  "size": "<string>",
  "audio": true,
  "media": [
    {
      "url": "<string>",
      "type": "<string>",
      "reference_voice": "<string>"
    }
  ],
  "prompt": "<string>",
  "duration": 123,
  "shot_type": "<string>",
  "watermark": true,
  "negative_prompt": "<string>"
}
'

{
  "task_id": "<string>"
}

POST

async

wan2.7-r2v

Wanx Wan 2.7 Reference-to-Video

curl --request POST \
  --url https://api.highwayapi.ai/v3/async/wan2.7-r2v \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "seed": 123,
  "size": "<string>",
  "audio": true,
  "media": [
    {
      "url": "<string>",
      "type": "<string>",
      "reference_voice": "<string>"
    }
  ],
  "prompt": "<string>",
  "duration": 123,
  "shot_type": "<string>",
  "watermark": true,
  "negative_prompt": "<string>"
}
'

{
  "task_id": "<string>"
}

Wanx Wan 2.7 reference-to-video model supports multimodal input (text/images/videos). It can use a person or object as the protagonist to generate single-character performance videos or multi-character interaction videos. It supports intelligent storyboarding to generate multi-shot videos. It supports 720P and 1080P resolutions, durations from 2 to 10 seconds, and is billed by the second. The output includes audio by default.

This is an asynchronous API and only returns the asynchronous task’s task_id. You should use this task_id to request the Get Task Result API to retrieve the generated result.

Request Headers

Content-Type

string

required

Enumerated value: application/json

Authorization

string

required

Bearer authentication format: Bearer {{API key}}.

Request Body

seed

integer

Random seed, used to improve the reproducibility of generated results. Value range: [0, 2147483647].Value range: [0, 2147483647]

size

string

default:"1920*1080"

Output video resolution (widthheight), which affects cost. 720P tier: 1280720 (16:9), 7201280 (9:16), 960960 (1:1), 1088832 (4:3), 8321088 (3:4). 1080P tier: 19201080 (16:9), 10801920 (9:16), 14401440 (1:1), 16321248 (4:3), 1248*1632 (3:4).Available values: 1280*720, 720*1280, 960*960, 1088*832, 832*1088, 1920*1080, 1080*1920, 1440*1440, 1632*1248, 1248*1632

audio

boolean

default:true

Whether to generate a video with sound, which affects cost. Default is true (video with sound).

media

array

required

Reference media array, used to extract character appearance, motion, and voice timbre. Corresponds to character1, character2, etc. in the prompt in array order. Number of images: 0–5; number of videos: 0–3; total number does not exceed 5. Image formats: JPEG, JPG, PNG, BMP, WEBP; resolution [240,8000] pixels; no more than 10 MB. Video formats: MP4, MOV; duration 1–30 seconds; no more than 100 MB. Audio formats: MP3, WAV, FLAC; duration 3–30 seconds.Array length: 1 - 5

Hide properties

url

string

required

Media file URL.

type

string

required

Media type. reference_image: reference image, used to extract character appearance; reference_video: reference video, used to extract character motion and appearance; first_frame: first-frame image, controls the starting frame of the video.Available values: reference_image, reference_video, first_frame

reference_voice

string

Character reference audio URL, used to clone the character’s voice timbre and generate a video with sound. Format: MP3, WAV, FLAC; duration 3–30 seconds.

prompt

string

required

Text prompt, used to describe the elements and visual characteristics expected in the generated video. Use character1, character2, etc. to reference the reference characters. Each reference (video or image) contains only a single character. Chinese and English are supported, up to 1500 characters.Length limit: 0 - 1500

duration

integer

default:5

Duration of the generated video, in seconds, billed by the second. Integer value range: [2, 10].Value range: [2, 10]

shot_type

string

default:"single"

Shot type. single indicates a single shot (default), and multi indicates multiple shots. This parameter has higher priority than the prompt.Available values: single, multi

watermark

boolean

default:false

Whether to add a watermark identifier. The watermark is located in the lower-right corner of the video.

negative_prompt

string

Negative prompt, used to describe content you do not want to appear in the video. Chinese and English are supported, up to 500 characters.Length limit: 0 - 500

Response

task_id

string

Use task_id to request the Get Task Result API to retrieve the generated output.

Wanxiang Wan 2.7 Text-to-Video

Wanxiang Wan 2.7 Video Editing

​Request Headers

​Request Body

​Response

Request Headers

Request Body

Response