MiniMax Audio Quick Cloning
Audio
MiniMax Audio Quick Cloning
POST
MiniMax Audio Quick Cloning
This API supports cloning voices from mono or stereo audio, enabling quick replication of speech with the same timbre based on a specified audio file.
The quickly cloned voice produced by this API is a temporary voice. If you wish to permanently retain a cloned voice, please use this voice in any T2A speech synthesis API within 168 hours (7 days) (excluding the trial-listening behavior within this API); otherwise, the voice will be deleted.
Applicable scenarios for this API: IP replication, voice cloning, and other scenarios that require quickly replicating a specific voice.
Notes:
- The uploaded audio file format must be: mp3, m4a, or wav;
- The uploaded audio file must be at least 10 seconds long and no longer than 5 minutes;
- The uploaded audio file size must not exceed 20mb.
Request Headers
Enum value:
application/jsonBearer authentication format: Bearer {{API Key}}.
Request Body
The audio file url for the voice to be cloned. Supports mp3, m4a, and wav formats.
clone_prompt
Voice cloning parameters. Providing this parameter helps improve the timbre similarity and stability of speech synthesis.If this parameter is used, you must also upload a short sample audio clip (less than 8s) and the corresponding text for the audio. The audio supports mp3, m4a, and wav formats.
Cloning trial-listening parameter. The model will read this text using the cloned voice and return the synthesized audio result as a link for previewing the cloning effect. Limited to 2000 characters. Note: trial listening will be charged for speech synthesis based on the number of characters, with pricing consistent with the T2A APIs.
Cloning trial-listening parameter. Specifies the speech model used for trial listening. This field is required when the “text” field is provided.
Options:
Options:
speech-02-hd, speech-02-turbo, speech-2.5-hd-preview, speech-2.5-turbo-preview, speech-2.8-hd, speech-2.8-turboAudio cloning parameter. Value range: [0,1]. Providing this field sets the text verification accuracy threshold. If not provided, the default value is 0.7.
Audio cloning parameter. Whether to enable noise reduction. Defaults to false if not provided.
Audio cloning parameter. Whether to enable volume normalization. Defaults to false if not provided.
Response Information
If the request body includes the trial-listening text text and the trial-listening model model, this parameter returns the trial-listening audio as a link.
The generated voice_id