MiniMax Speech-2.6-hd Synchronous Speech Synthesis
Audio
MiniMax Speech-2.6-hd Synchronous Speech Synthesis
POST
MiniMax Speech-2.6-hd Synchronous Speech Synthesis
This API supports synchronous text-to-speech generation, with a maximum of 10,000 characters per text submission. It supports 100+ system voices and independently selectable cloned voices; supports adjustments to volume, pitch, speech rate, and output format; supports proportional voice mixing and fixed interval control; supports multiple audio specifications and formats, including: mp3, pcm, flac, wav, and supports streaming output.
After submitting a long-text speech synthesis request, note that the returned url is valid for 24 hours from the time the url is returned. Please download the information in time.
Request Headers
Enum value:
application/jsonBearer authentication format: Bearer {{API Key}}.
Request Body
The text to synthesize, with a length limit of under 10,000 characters. Paragraph breaks should be replaced with newline characters. (If you need to control intervals in the speech, add <#x#> between characters, where x is in seconds, supports 0.01-99.99, with up to two decimal places). Custom time intervals between text segments are supported to achieve custom speech pause durations. Note that the text interval must be set between two text segments that can be pronounced, and multiple consecutive time intervals cannot be set.
One of timbre_weights or voice_id is required.
Whether to stream. The default is false, meaning streaming is not enabled.
Enhances recognition capability for specified low-resource languages and dialects. After setting it, speech performance can be improved in the specified low-resource language/dialect scenario. If the low-resource language type is unclear, you can choose “auto”, and the model will determine the language type independently. Supports the following values:
'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'Bulgarian', 'Danish', 'Hebrew', 'Malay', 'Persian', 'Slovak', 'Swedish', 'Croatian', 'Filipino', 'Hungarian', 'Norwegian', 'Slovenian', 'Catalan', 'Nynorsk', 'Tamil', 'Afrikaans', 'auto'Parameter that controls the output result format. Optional values are
url hex. The default value is hex. This parameter only takes effect in non-streaming scenarios; streaming scenarios only support returning in hex format. The returned url is valid for 24 hours.Voice effects settings. This parameter supports the following audio formats:
- Non-streaming: mp3, wav, flac
- Streaming: mp3
Response Information
The synthesized audio segment, encoded in hex, generated according to the format defined in the input (
audio_setting.format) (mp3/pcm/flac). The return format depends on the definition of output_format; when stream is true, only the hex return format is supported.Current audio stream status, returned only when
stream is true. 1 indicates synthesis in progress, and 2 indicates synthesis completed.