Skip to main content
POST
/
v3
/
async
/
minimax-speech-2.8-turbo
MiniMax Speech 2.8 Turbo Asynchronous Speech Synthesis
curl --request POST \
  --url https://api.highwayapi.ai/v3/async/minimax-speech-2.8-turbo \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "text": "<string>",
  "text_file_id": 123,
  "voice_modify": {
    "pitch": 123,
    "timbre": 123,
    "intensity": 123,
    "sound_effects": "<string>"
  },
  "audio_setting": {
    "format": "<string>",
    "bitrate": 123,
    "channel": 123,
    "audio_sample_rate": 123
  },
  "voice_setting": {
    "vol": 123,
    "pitch": 123,
    "speed": 123,
    "emotion": "<string>",
    "voice_id": "<string>",
    "english_normalization": true
  },
  "aigc_watermark": true,
  "language_boost": "<string>",
  "continuous_sound": true,
  "pronunciation_dict": {
    "tone": [
      {}
    ]
  }
}
'
{
  "file_id": 123,
  "task_id": "<string>",
  "base_resp": {
    "status_msg": "<string>",
    "status_code": 123
  },
  "task_token": "<string>",
  "usage_characters": 123
}
Use this API to create an asynchronous speech synthesis task. Supports text or file input. Text length is limited to up to 50,000 characters, and file length is limited to up to 100,000 characters.
This is an asynchronous API and only returns the asynchronous task’s task_id. You should use this task_id to request the Get Task Result API to retrieve the generated result.

Request Headers

Content-Type
string
required
Enum value: application/json
Authorization
string
required
Bearer authentication format: Bearer {{API Key}}.

Request Body

text
string
The text to synthesize into audio, limited to up to 50,000 characters. Either this or text_file_id is required

• Interjection tags: Only when the selected model is speech-2.8-hd or speech-2.8-turbo, interjection tags can be inserted in the text. Supported interjections: (laughs) (laughter), (chuckle) (chuckle), (coughs) (cough), (clear-throat) (clearing throat), (groans) (groan), (breath) (normal breathing), (pant) (panting), (inhale) (inhale), (exhale) (exhale), (gasps) (gasp), (sniffs) (sniff), (sighs) (sigh), (snorts) (snort), (burps) (burp), (lip-smacking) (lip smacking), (humming) (humming), (hissing) (hissing sound), (emm) (umm), (whistles) (whistle), (sneezes) (sneeze), (crying) (sob), (applause) (applause)
text_file_id
integer
The text file id to synthesize into audio. A single file must be less than 100,000 characters. Supported file formats: txt, zip. Either this or text is required; the format is automatically validated after it is provided.
txt file: Length limit <100,000 characters. Supports using &lt;#x#&gt; to mark custom pauses. x is the pause duration (unit: seconds), range [0.01,99.99], with up to two decimal places. Note that pauses must be set between two pieces of text that can be spoken; multiple pause markers cannot be used consecutively
zip file:
• The compressed package must contain txt or json files in the same format.
• json file format: Supports three fields: [title, content, extra], representing the title, body, and additional information respectively. If all three fields exist, 3 groups of results are produced, for a total of 9 files, all stored in one folder. If a field does not exist or its content is empty, no corresponding result will be generated for that field
voice_modify
object
audio_setting
object
voice_setting
object
required
aigc_watermark
boolean
default:false
Controls whether to add an audio rhythm marker at the end of the synthesized audio. The default value is False. This parameter only takes effect for non-streaming synthesis
language_boost
string
Whether to enhance recognition capability for specified minor languages and dialects. The default value is null, and it can be set to auto to let the model determine automatically.Available values: Chinese, Chinese,Yue, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, Afrikaans, auto
continuous_sound
boolean
default:false
Enable this parameter to make transitions between clauses more natural. Only the speech-2.8-hd and speech-2.8-turbo models are supported
pronunciation_dict
object

Response Information

file_id
integer
The ID of the corresponding audio file returned after the task is created successfully.

• After the task is completed, it can be queried using file_id. This field is not returned when an error occurs in the request
Note: The returned download URL is valid for 9 hours (32400 seconds) from the time it is generated. After it expires, the file will become invalid and the generated information will be lost. Please pay attention to the timing when downloading the information
task_id
string
Use task_id to request the Get Task Result API to retrieve the generated output.
base_resp
object
task_token
string
Key information used to complete the current task
usage_characters
integer
Number of billable characters