Skip to main content
POST
/
v3
/
minimax-voice-cloning
MiniMax Audio Quick Cloning
curl --request POST \
  --url https://api.highwayapi.ai/v3/minimax-voice-cloning \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "audio_url": "<string>",
  "text": "<string>",
  "model": "<string>",
  "accuracy": 123,
  "need_noise_reduction": true,
  "need_volume_normalization": true
}
'
{
  "demo_audio_url": "<string>",
  "voice_id": "<string>"
}
This API supports cloning voices from mono or stereo audio, enabling quick replication of speech with the same timbre based on a specified audio file. The quickly cloned voice produced by this API is a temporary voice. If you wish to permanently retain a cloned voice, please use this voice in any T2A speech synthesis API within 168 hours (7 days) (excluding the trial-listening behavior within this API); otherwise, the voice will be deleted. Applicable scenarios for this API: IP replication, voice cloning, and other scenarios that require quickly replicating a specific voice. Notes:
  • The uploaded audio file format must be: mp3, m4a, or wav;
  • The uploaded audio file must be at least 10 seconds long and no longer than 5 minutes;
  • The uploaded audio file size must not exceed 20mb.

Request Headers

Content-Type
string
required
Enum value: application/json
Authorization
string
required
Bearer authentication format: Bearer {{API Key}}.

Request Body

audio_url
string
required
The audio file url for the voice to be cloned. Supports mp3, m4a, and wav formats.
clone_prompt
Voice cloning parameters. Providing this parameter helps improve the timbre similarity and stability of speech synthesis.If this parameter is used, you must also upload a short sample audio clip (less than 8s) and the corresponding text for the audio. The audio supports mp3, m4a, and wav formats.
text
string
Cloning trial-listening parameter. The model will read this text using the cloned voice and return the synthesized audio result as a link for previewing the cloning effect. Limited to 2000 characters. Note: trial listening will be charged for speech synthesis based on the number of characters, with pricing consistent with the T2A APIs.
model
string
Cloning trial-listening parameter. Specifies the speech model used for trial listening. This field is required when the “text” field is provided.
Options: speech-02-hd, speech-02-turbo, speech-2.5-hd-preview, speech-2.5-turbo-preview, speech-2.8-hd, speech-2.8-turbo
accuracy
float
Audio cloning parameter. Value range: [0,1]. Providing this field sets the text verification accuracy threshold. If not provided, the default value is 0.7.
need_noise_reduction
bool
Audio cloning parameter. Whether to enable noise reduction. Defaults to false if not provided.
need_volume_normalization
bool
Audio cloning parameter. Whether to enable volume normalization. Defaults to false if not provided.

Response Information

demo_audio_url
string
If the request body includes the trial-listening text text and the trial-listening model model, this parameter returns the trial-listening audio as a link.
voice_id
string
The generated voice_id