Skip to main content
POST
/
v3
/
async
/
minimax-speech-2.6-turbo
MiniMax Speech-2.6-turbo Async Speech Synthesis
curl --request POST \
  --url https://api.highwayapi.ai/v3/async/minimax-speech-2.6-turbo \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "text": "<string>",
  "voice_setting": {
    "speed": 123,
    "vol": 123,
    "pitch": 123,
    "voice_id": "<string>",
    "emotion": "<string>",
    "text_normalization": true
  },
  "audio_setting": {
    "sample_rate": 123,
    "bitrate": 123,
    "format": "<string>",
    "channel": 123
  },
  "pronunciation_dict": {
    "tone": [
      {}
    ]
  },
  "language_boost": "<string>",
  "voice_modify": {
    "pitch": 123,
    "intensity": 123,
    "timbre": 123,
    "sound_effects": "<string>"
  }
}
'
{
  "task_id": "<string>"
}
This API supports asynchronous text-to-speech generation. A single text generation request supports up to 1 million characters for transmission, and the complete generated audio result can be retrieved asynchronously. It supports 100+ system voices and custom cloned voices, as well as independent adjustment of intonation, speed, volume, bitrate, sample rate, and output format. After submitting a long-text speech synthesis request, note that the returned url is valid for 24 hours from the time the url is returned. Please download the information in time.
This is suitable for speech generation of long texts such as entire books, and task queueing may take a relatively long time. For short sentence generation, voice chat, online social scenarios, and similar use cases, we recommend using synchronous speech synthesis.

Request Headers

Content-Type
string
required
Enum value: application/json
Authorization
string
required
Bearer authentication format: Bearer {{API Key}}.

Request Body

text
string
required
The text to be synthesized, with a maximum length of 50,000 characters.
voice_setting
object
required
audio_setting
object
pronunciation_dict
object
language_boost
string
default:"null"
Enhances recognition capability for specified low-resource languages and dialects. After setting this, voice performance can be improved in the specified low-resource language/dialect scenarios. If the low-resource language type is unclear, you can select “auto”, and the model will determine the low-resource language type automatically. The following values are supported:'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'Bulgarian', 'Danish', 'Hebrew', 'Malay', 'Persian', 'Slovak', 'Swedish', 'Croatian', 'Filipino', 'Hungarian', 'Norwegian', 'Slovenian', 'Catalan', 'Nynorsk', 'Tamil', 'Afrikaans', 'auto'
voice_modify
object
Voice effects settings. Supported audio formats for this parameter: mp3, wav, flac

Response Parameters

task_id
string
required
The task_id of the asynchronous task. You should use this task_id to request the Get Task Result API to obtain the generated result.