MiniMax Speech-2.5-turbo-preview Synchronous Speech Synthesis
Audio
MiniMax Speech-2.5-turbo-preview Synchronous Speech Synthesis
POST
MiniMax Speech-2.5-turbo-preview Synchronous Speech Synthesis
This API supports synchronous text-to-speech generation, with a maximum of 10,000 characters per text submission. It supports 100+ system voices and custom selection of cloned voices; volume, pitch, speed, and output format adjustment; proportional voice mixing and fixed interval control; multiple audio specifications and formats, including mp3, pcm, flac, and wav; and streaming output.
After submitting a long-text speech synthesis request, note that the returned URL is valid for 24 hours from the time the URL is returned. Please download the information within the validity period.
Request Headers
Enum value:
application/jsonBearer authentication format: Bearer {{API Key}}.
Request Body
The text to be synthesized. The length must be less than 10,000 characters. Use line breaks to separate paragraphs. (To control the interval duration in speech, insert <#x#> between words, where x is in seconds. Values from 0.01 to 99.99 are supported, with up to two decimal places.) Custom voice intervals between texts are supported to achieve custom pause durations in synthesized speech. Note that the text interval must be set between two pieces of text that can be pronounced, and multiple consecutive time intervals cannot be set.
Either this or voice_id is required.
Whether to stream. Default is false, meaning streaming is disabled.
Enhances recognition capability for specified minor languages and dialects. After it is set, speech performance can be improved in the specified minor language/dialect scenario. If the minor language type is unclear, you can choose “auto”, and the model will determine the minor language type autonomously. The following values are supported:
'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'Bulgarian', 'Danish', 'Hebrew', 'Malay', 'Persian', 'Slovak', 'Swedish', 'Croatian', 'Filipino', 'Hungarian', 'Norwegian', 'Slovenian', 'Catalan', 'Nynorsk', 'Tamil', 'Afrikaans', 'auto'Parameter that controls the form of the output result. Optional values are
url hex. Default value is hex. This parameter only takes effect in non-streaming scenarios. Streaming scenarios only support returning in hex form. The returned URL is valid for 24 hours.Voice effects settings. This parameter supports the following audio formats:
- Non-streaming: mp3, wav, flac
- Streaming: mp3
Response Information
The synthesized audio segment, encoded in hex and generated according to the format defined in the input (
audio_setting.format) (mp3/pcm/flac). The return form follows the definition of output_format. When stream is true, only hex return format is supported.Current audio stream status, returned only when
stream is true. 1 indicates synthesizing, and 2 indicates synthesis completed.