Skip to main content
POST
/
v3
/
minimax-speech-2.5-turbo-preview
MiniMax Speech-2.5-turbo-preview Synchronous Speech Synthesis
curl --request POST \
  --url https://api.highwayapi.ai/v3/minimax-speech-2.5-turbo-preview \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "text": "<string>",
  "voice_setting": {
    "speed": 123,
    "vol": 123,
    "pitch": 123,
    "voice_id": "<string>",
    "emotion": "<string>",
    "latex_read": true,
    "text_normalization": true
  },
  "audio_setting": {
    "sample_rate": 123,
    "bitrate": 123,
    "format": "<string>",
    "channel": 123
  },
  "pronunciation_dict": {
    "tone": [
      {}
    ]
  },
  "timbre_weights": [
    {
      "voice_id": "<string>",
      "weight": 123
    }
  ],
  "stream": true,
  "stream_options": {
    "exclude_aggregated_audio": true
  },
  "language_boost": "<string>",
  "output_format": "<string>",
  "voice_modify": {
    "pitch": 123,
    "intensity": 123,
    "timbre": 123,
    "sound_effects": "<string>"
  }
}
'
{
  "audio": "<string>",
  "status": 123
}
This API supports synchronous text-to-speech generation, with a maximum of 10,000 characters per text submission. It supports 100+ system voices and custom selection of cloned voices; volume, pitch, speed, and output format adjustment; proportional voice mixing and fixed interval control; multiple audio specifications and formats, including mp3, pcm, flac, and wav; and streaming output. After submitting a long-text speech synthesis request, note that the returned URL is valid for 24 hours from the time the URL is returned. Please download the information within the validity period.
Suitable for scenarios such as short sentence generation, voice chat, and online social interaction. It has low latency but a text length limit of less than 10,000 characters. For long text, we recommend using asynchronous speech synthesis.

Request Headers

Content-Type
string
required
Enum value: application/json
Authorization
string
required
Bearer authentication format: Bearer {{API Key}}.

Request Body

text
string
required
The text to be synthesized. The length must be less than 10,000 characters. Use line breaks to separate paragraphs. (To control the interval duration in speech, insert <#x#> between words, where x is in seconds. Values from 0.01 to 99.99 are supported, with up to two decimal places.) Custom voice intervals between texts are supported to achieve custom pause durations in synthesized speech. Note that the text interval must be set between two pieces of text that can be pronounced, and multiple consecutive time intervals cannot be set.
voice_setting
object
required
audio_setting
object
pronunciation_dict
object
timbre_weights
object[]
Either this or voice_id is required.
stream
boolean
default:"false"
Whether to stream. Default is false, meaning streaming is disabled.
stream_options
object
language_boost
string
default:"null"
Enhances recognition capability for specified minor languages and dialects. After it is set, speech performance can be improved in the specified minor language/dialect scenario. If the minor language type is unclear, you can choose “auto”, and the model will determine the minor language type autonomously. The following values are supported:'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'Bulgarian', 'Danish', 'Hebrew', 'Malay', 'Persian', 'Slovak', 'Swedish', 'Croatian', 'Filipino', 'Hungarian', 'Norwegian', 'Slovenian', 'Catalan', 'Nynorsk', 'Tamil', 'Afrikaans', 'auto'
output_format
string
default:"hex"
Parameter that controls the form of the output result. Optional values are url hex. Default value is hex. This parameter only takes effect in non-streaming scenarios. Streaming scenarios only support returning in hex form. The returned URL is valid for 24 hours.
voice_modify
object
Voice effects settings. This parameter supports the following audio formats:
  • Non-streaming: mp3, wav, flac
  • Streaming: mp3

Response Information

audio
string
The synthesized audio segment, encoded in hex and generated according to the format defined in the input (audio_setting.format) (mp3/pcm/flac). The return form follows the definition of output_format. When stream is true, only hex return format is supported.
status
number
Current audio stream status, returned only when stream is true. 1 indicates synthesizing, and 2 indicates synthesis completed.