MiniMax Speech-2.6-turbo Synchronous Speech Synthesis
Audio
MiniMax Speech-2.6-turbo Synchronous Speech Synthesis
POST
MiniMax Speech-2.6-turbo Synchronous Speech Synthesis
This API supports synchronous text-to-speech generation, with a maximum of 10,000 characters per text submission. It supports 100+ system voices and custom cloned voices; volume, pitch, speed, and output format adjustments; proportional voice mixing and fixed interval timing control; and multiple audio specifications and formats, including mp3, pcm, flac, and wav. Streaming output is also supported.
After submitting a long-text speech synthesis request, note that the returned url is valid for 24 hours from the time the url is returned. Please download the information in time.
Request Headers
Enum value:
application/jsonBearer authentication format: Bearer {{API Key}}.
Request Body
The text to synthesize. The length limit is less than 10,000 characters. Use line breaks to replace paragraph breaks. (If you need to control intervals in the speech, add <#x#> between characters, where x is in seconds, supports 0.01-99.99, with at most two decimal places). Custom time intervals between texts are supported to achieve custom pauses in text-to-speech. Note that the text interval must be set between two pieces of text that can be pronounced, and multiple consecutive time intervals cannot be set.
Either this or voice_id is required
Whether to stream. Default is false, meaning streaming is not enabled.
Enhances recognition capability for specified minority languages and dialects. After setting it, speech performance can be improved in the specified minority language/dialect scenarios. If the minority language type is unclear, you can choose “auto”, and the model will determine the minority language type automatically. The following values are supported:
'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'Bulgarian', 'Danish', 'Hebrew', 'Malay', 'Persian', 'Slovak', 'Swedish', 'Croatian', 'Filipino', 'Hungarian', 'Norwegian', 'Slovenian', 'Catalan', 'Nynorsk', 'Tamil', 'Afrikaans', 'auto'Controls the form of the output result. Optional values are
url hex. Default value is hex. This parameter only takes effect in non-streaming scenarios. Streaming scenarios only support returning data in hex form. The returned url is valid for 24 hours.Voice effects settings. The audio formats supported by this parameter are:
- Non-streaming: mp3, wav, flac
- Streaming: mp3
Response Information
The synthesized audio segment, encoded in hex, generated according to the input-defined format (
audio_setting.format) (mp3/pcm/flac). The return form is determined by the definition of output_format. When stream is true, only hex return is supported.The current audio stream status. Returned only when
stream is true. 1 indicates synthesis in progress, and 2 indicates synthesis ended.