Skip to main content
POST
/
v3
/
async
/
minimax-speech-2.8-hd
MiniMax Speech 2.8 HD Asynchronous Speech Synthesis
curl --request POST \
  --url https://api.highwayapi.ai/v3/async/minimax-speech-2.8-hd \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "text": "<string>",
  "text_file_id": 123,
  "voice_modify": {
    "pitch": 123,
    "timbre": 123,
    "intensity": 123,
    "sound_effects": "<string>"
  },
  "audio_setting": {
    "format": "<string>",
    "bitrate": 123,
    "channel": 123,
    "audio_sample_rate": 123
  },
  "voice_setting": {
    "vol": 123,
    "pitch": 123,
    "speed": 123,
    "emotion": "<string>",
    "voice_id": "<string>",
    "english_normalization": true
  },
  "aigc_watermark": true,
  "language_boost": "<string>",
  "continuous_sound": true,
  "pronunciation_dict": {
    "tone": [
      {}
    ]
  }
}
'
{
  "file_id": 123,
  "task_id": "<string>",
  "base_resp": {
    "status_msg": "<string>",
    "status_code": 123
  },
  "task_token": "<string>",
  "usage_characters": 123
}
Use this endpoint to create an asynchronous speech synthesis task. It supports text or file input. Text input is limited to a maximum of 50,000 characters, and file input is limited to a maximum of 100,000 characters.
This is an asynchronous API and only returns the task_id of the asynchronous task. You should use this task_id to request the Get Task Result API to retrieve the generated result.

Request Headers

Content-Type
string
required
Enum value: application/json
Authorization
string
required
Bearer authentication format: Bearer {{API Key}}.

Request Body

text
string
The text to synthesize into audio, limited to a maximum of 50,000 characters. Either text or text_file_id is required

• Vocalization tags: Only when the model is speech-2.8-hd or speech-2.8-turbo, vocalization tags can be inserted into the text. Supported vocalizations: (laughs) (laughter), (chuckle) (chuckle), (coughs) (cough), (clear-throat) (clear throat), (groans) (groan), (breath) (normal breathing), (pant) (panting), (inhale) (inhale), (exhale) (exhale), (gasps) (gasp), (sniffs) (sniff), (sighs) (sigh), (snorts) (snort), (burps) (burp), (lip-smacking) (lip smacking), (humming) (humming), (hissing) (hissing), (emm) (um), (whistles) (whistle), (sneezes) (sneeze), (crying) (sobbing), (applause) (applause)
text_file_id
integer
The text file id to synthesize into audio. A single file must be less than 100,000 characters. Supported file formats: txt, zip. Either text_file_id or text is required. The format is automatically validated after being passed in.
txt file: length limit <100000 characters. Supports using &lt;#x#&gt; to mark custom pauses. x is the pause duration (unit: seconds), with a range of [0.01, 99.99], and up to two decimal places. Note that pauses must be set between two pieces of text that can be pronounced, and multiple pause markers cannot be used consecutively
zip file:
• The compressed package must contain txt or json files in the same format.
• json file format: supports three fields: [title, content, extra], representing the title, body, and additional information respectively. If all three fields exist, 3 groups of results are produced, for a total of 9 files, all stored in one folder. If a field does not exist or its content is empty, no corresponding result will be generated for that field
voice_modify
object
audio_setting
object
voice_setting
object
required
aigc_watermark
boolean
default:false
Controls whether to add an audio rhythm identifier at the end of the synthesized audio. The default value is False. This parameter only takes effect for non-streaming synthesis
language_boost
string
Whether to enhance recognition capability for specified low-resource languages and dialects. The default value is null. It can be set to auto to let the model determine automatically.Available values: Chinese, Chinese,Yue, English, Arabic, Russian, Spanish, French, Portuguese, German, Turkish, Dutch, Ukrainian, Vietnamese, Indonesian, Japanese, Italian, Korean, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, Afrikaans, auto
continuous_sound
boolean
default:false
Enable this parameter to make clause transitions more natural. Only the speech-2.8-hd and speech-2.8-turbo models are supported
pronunciation_dict
object

Response Information

file_id
integer
The ID of the corresponding audio file returned after the task is created successfully.

• After the task is completed, you can query using file_id. This field is not returned when the request errors
Note: The returned download URL is valid for 9 hours (32400 seconds) from the time it is generated. After expiration, the file will become invalid and the generated information will be lost. Please pay attention to the time limit for downloading the information
task_id
string
Use task_id to request the Get Task Result API to retrieve the generated output.
base_resp
object
task_token
string
Key information used to complete the current task
usage_characters
integer
Number of billable characters