This API supports asynchronous text-to-speech generation. A single text generation request supports up to 1 million characters for transmission, and the complete generated audio result can be retrieved asynchronously. It supports 100+ system voices and custom cloned voices; it also supports custom adjustment of intonation, speed, volume, bitrate, sample rate, and output format.After submitting a long-text speech synthesis request, note that the returned url is valid for 24 hours from the time the url is returned. Please download the information in time.
Suitable for generating speech from long texts such as entire books. Task queueing may take a relatively long time. For scenarios such as short sentence generation, voice chat, and online social networking, we recommend using synchronous speech synthesis.
Range [-12,12], default value is 0The intonation of the generated voice. Optional. (0 means output with the original voice; the value must be an integer.)
Controls the emotion of the synthesized speech;7 emotions are currently supported: happy, sad, angry, fearful, disgusted, surprised, and neutral;Parameter range: ["happy", "sad", "angry", "fearful", "disgusted", "surprised", "neutral"]
This parameter supports English text normalization, which can improve performance in number-reading scenarios but may slightly increase latency. If not provided, the default value is false.
Range [32000, 64000, 128000, 256000]The bitrate of the generated voice. Optional. Default value is 128000. This parameter only takes effect for audio in mp3 format.
Replace text, symbols, and corresponding phonetic annotations that require special labeling.Pronunciation replacement (adjust tones/replace pronunciations of other characters), in the following format:["燕少飞/(yan4)(shao3)(fei1)","达菲/(da2)(fei1)","omg/oh my god"]Tones are represented by numbers: first tone (Yinping) is 1, second tone (Yangping) is 2, third tone (Shangsheng) is 3, fourth tone (Qusheng) is 4, and neutral tone is 5.
Enhances recognition capability for specified minority languages and dialects. After setting this parameter, speech performance can be improved in the specified minority language/dialect scenarios. If the minority language type is unclear, you can select “auto”, and the model will determine the minority language type automatically. The following values are supported:'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'Bulgarian', 'Danish', 'Hebrew', 'Malay', 'Persian', 'Slovak', 'Swedish', 'Croatian', 'Filipino', 'Hungarian', 'Norwegian', 'Slovenian', 'Catalan', 'Nynorsk', 'Tamil', 'Afrikaans', 'auto'