Skip to main content
POST
/
v3
/
async
/
minimax-speech-02-turbo
MiniMax Speech-02-turbo Asynchronous Speech Synthesis
curl --request POST \
  --url https://api.highwayapi.ai/v3/async/minimax-speech-02-turbo \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "text": "<string>",
  "voice_setting": {
    "speed": 123,
    "vol": 123,
    "pitch": 123,
    "voice_id": "<string>",
    "emotion": "<string>",
    "text_normalization": true
  },
  "audio_setting": {
    "sample_rate": 123,
    "bitrate": 123,
    "format": "<string>",
    "channel": 123
  },
  "pronunciation_dict": {
    "tone": [
      {}
    ]
  },
  "language_boost": "<string>",
  "voice_modify": {
    "pitch": 123,
    "intensity": 123,
    "timbre": 123,
    "sound_effects": "<string>"
  }
}
'
{
  "task_id": "<string>"
}
This API supports asynchronous text-to-speech generation. A single text generation request supports transmission of up to 1 million characters, and the complete generated audio result can be retrieved asynchronously. It supports 100+ system voices and cloned voices for you to choose from, and allows independent adjustment of intonation, speech rate, volume, bitrate, sample rate, and output format. After submitting a long-text speech synthesis request, note that the returned url is valid for 24 hours from the time the url is returned. Please pay attention to the timing when downloading the information.
Suitable for speech generation from long texts such as entire books. Task queuing may take a relatively long time. For scenarios such as short sentence generation, voice chat, and online social interaction, we recommend using synchronous speech synthesis calls.

Request Headers

Content-Type
string
required
Enumerated value: application/json
Authorization
string
required
Bearer authentication format: Bearer {{API Key}}.

Request Body

text
string
required
The text to be synthesized, with a maximum length of 50,000 characters.
voice_setting
object
required
audio_setting
object
pronunciation_dict
object
language_boost
string
default:"null"
Enhances recognition capability for specified low-resource languages and dialects. After setting this parameter, speech performance can be improved in the specified low-resource language/dialect scenarios. If the low-resource language type is unclear, you can select “auto”, and the model will determine the low-resource language type on its own. The following values are supported:'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'Bulgarian', 'Danish', 'Hebrew', 'Malay', 'Persian', 'Slovak', 'Swedish', 'Croatian', 'Filipino', 'Hungarian', 'Norwegian', 'Slovenian', 'Catalan', 'Nynorsk', 'Tamil', 'Afrikaans', 'auto'
voice_modify
object
Voice effects settings. Supported audio formats for this parameter: mp3, wav, flac

Response Parameters

task_id
string
required
The task_id of the asynchronous task. You should use this task_id to request the Query Task Result API to obtain the generated result.