Skip to main content
POST
/
v3
/
async
/
minimax-speech-2.5-hd-preview
MiniMax Speech-2.5-hd-preview Asynchronous Speech Synthesis
curl --request POST \
  --url https://api.highwayapi.ai/v3/async/minimax-speech-2.5-hd-preview \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "text": "<string>",
  "voice_setting": {
    "speed": 123,
    "vol": 123,
    "pitch": 123,
    "voice_id": "<string>",
    "emotion": "<string>",
    "text_normalization": true
  },
  "audio_setting": {
    "sample_rate": 123,
    "bitrate": 123,
    "format": "<string>",
    "channel": 123
  },
  "pronunciation_dict": {
    "tone": [
      {}
    ]
  },
  "language_boost": "<string>",
  "voice_modify": {
    "pitch": 123,
    "intensity": 123,
    "timbre": 123,
    "sound_effects": "<string>"
  }
}
'
{
  "task_id": "<string>"
}
This API supports asynchronous text-to-speech generation. A single text generation request supports transmitting up to 1 million characters, and the complete generated audio result can be retrieved asynchronously. It supports 100+ system voices and cloned voice selection, as well as custom adjustment of pitch, speed, volume, bitrate, sample rate, and output format. After submitting a long-text speech synthesis request, please note that the returned url is valid for 24 hours from the time the url is returned. Please download the information within this time window.
Suitable for generating speech from long texts such as entire books. Task queuing may take a relatively long time. For scenarios such as short sentence generation, voice chat, and online social interaction, we recommend using synchronous speech synthesis.

Request Headers

Content-Type
string
required
Enum value: application/json
Authorization
string
required
Bearer authentication format: Bearer {{API key}}.

Request Body

text
string
required
The text to synthesize, with a maximum length of 50,000 characters.
voice_setting
object
required
audio_setting
object
pronunciation_dict
object
language_boost
string
default:"null"
Enhances recognition capabilities for specified low-resource languages and dialects. After setting this parameter, speech performance can be improved in the specified low-resource language/dialect scenario. If the low-resource language type is unclear, you can choose “auto”, and the model will determine the low-resource language type autonomously. The following values are supported:'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'Bulgarian', 'Danish', 'Hebrew', 'Malay', 'Persian', 'Slovak', 'Swedish', 'Croatian', 'Filipino', 'Hungarian', 'Norwegian', 'Slovenian', 'Catalan', 'Nynorsk', 'Tamil', 'Afrikaans', 'auto'
voice_modify
object
Voice effects settings. Supported audio formats for this parameter: mp3, wav, flac

Response Parameters

task_id
string
required
The task_id of the asynchronous task. You should use this task_id to request the Query Task Result API to obtain the generated result.