MiniMax Speech-2.5-hd-preview Synchronous Speech Synthesis
Audio
MiniMax Speech-2.5-hd-preview Synchronous Speech Synthesis
POST
MiniMax Speech-2.5-hd-preview Synchronous Speech Synthesis
This API supports synchronous text-to-speech generation, with a maximum of 10,000 characters per text submission. It supports 100+ system voices and cloned voices for flexible selection; volume, pitch, speed, and output format adjustment; proportional voice mixing and fixed interval control; and multiple audio specifications and formats, including mp3, pcm, flac, and wav. Streaming output is also supported.
After submitting a long-text speech synthesis request, note that the returned url is valid for 24 hours from the time the url is returned. Please download the information in time.
Request Headers
Enum value:
application/jsonBearer authentication format: Bearer {{API Key}}.
Request Body
The text to be synthesized. The length limit is less than 10,000 characters. Use line breaks to replace paragraph transitions. (If you need to control the interval time in the speech, add <#x#> between characters, where x is in seconds, supports 0.01-99.99, and allows up to two decimal places.) Custom speech intervals between texts are supported to achieve custom pause durations in synthesized speech. Note that the text interval must be set between two pieces of text that can be pronounced, and multiple consecutive intervals cannot be set.
Required if voice_id is not provided
Whether to stream. Default is false, meaning streaming is not enabled.
Enhances recognition capability for specified low-resource languages and dialects. After setting this parameter, speech performance can be improved in the specified low-resource language/dialect scenario. If the low-resource language type is unclear, you can select “auto”, and the model will determine the low-resource language type on its own. The following values are supported:
'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'Bulgarian', 'Danish', 'Hebrew', 'Malay', 'Persian', 'Slovak', 'Swedish', 'Croatian', 'Filipino', 'Hungarian', 'Norwegian', 'Slovenian', 'Catalan', 'Nynorsk', 'Tamil', 'Afrikaans', 'auto'Parameter that controls the output result format. Optional values are
url and hex. The default value is hex. This parameter only takes effect in non-streaming scenarios; streaming scenarios only support returning data in hex format. The returned url is valid for 24 hours.Voice effect settings. This parameter supports the following audio formats:
- Non-streaming: mp3, wav, flac
- Streaming: mp3
Response Information
The synthesized audio clip, encoded in hex and generated according to the input-defined format (
audio_setting.format) (mp3/pcm/flac). The return format depends on the definition of output_format. When stream is true, only the hex return format is supported.Current audio stream status, returned only when
stream is true. 1 indicates synthesis in progress, and 2 indicates synthesis has ended.