This API supports asynchronous text-to-speech generation. A single text generation transfer supports up to 1 million characters, and the complete generated audio result can be retrieved asynchronously. It supports 100+ system voices and cloned voices for independent selection; it also supports independent adjustment of intonation, speaking speed, volume, bitrate, sample rate, and output format.After submitting a long-text speech synthesis request, note that the returned url is valid for 24 hours from the time the url is returned. Please pay attention to the timing when downloading the information.
Suitable for speech generation from long texts such as entire books. Task queuing may take a long time. For scenarios such as short-sentence generation, voice chat, and online social networking, we recommend using synchronous speech synthesis calls.
Range [-12,12], default value is 0The intonation of the generated voice. Optional. (0 means output with the original voice; the value must be an integer.)
This parameter supports English text normalization, which can improve performance in number-reading scenarios but will slightly increase latency. If not provided, the default value is false.
Range 【32000,64000,128000,256000】The bitrate of the generated voice. Optional; default value is 128000. This parameter only takes effect for audio in mp3 format.
Replace text, symbols, and corresponding pronunciations that require special annotation.Pronunciation replacement (adjust tones/replace pronunciations of other characters), in the following format:["燕少飞/(yan4)(shao3)(fei1)","达菲/(da2)(fei1)","omg/oh my god"]Tones are represented by numbers: first tone (Yinping) is 1, second tone (Yangping) is 2, third tone (Shangsheng) is 3, fourth tone (Qusheng) is 4, and neutral tone is 5.
Enhances recognition capability for specified less common languages and dialects. After setting this, speech performance can be improved in the specified less common language/dialect scenario. If the less common language type is unclear, you can choose “auto”, and the model will determine the language type on its own. The following values are supported:'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'Bulgarian', 'Danish', 'Hebrew', 'Malay', 'Persian', 'Slovak', 'Swedish', 'Croatian', 'Filipino', 'Hungarian', 'Norwegian', 'Slovenian', 'Catalan', 'Nynorsk', 'Tamil', 'Afrikaans', 'auto'