This API supports asynchronous text-to-speech generation. A single text generation request supports transmission of up to 1 million characters, and the complete generated audio result can be retrieved asynchronously. It supports 100+ system voices and cloned voices for you to choose from, and allows independent adjustment of intonation, speech rate, volume, bitrate, sample rate, and output format.After submitting a long-text speech synthesis request, note that the returned url is valid for 24 hours from the time the url is returned. Please pay attention to the timing when downloading the information.
Suitable for speech generation from long texts such as entire books. Task queuing may take a relatively long time. For scenarios such as short sentence generation, voice chat, and online social interaction, we recommend using synchronous speech synthesis calls.
This parameter supports English text normalization, which can improve performance in number-reading scenarios, but may slightly increase latency. If not provided, the default value is false.
Range 【32000,64000,128000,256000】The bitrate of the generated voice. Optional. Default value is 128000. This parameter only takes effect for audio in mp3 format.
Replace text, symbols, and their corresponding phonetic annotations that require special marking.Replace pronunciation (adjust tone/replace pronunciation with that of other characters), in the following format:["燕少飞/(yan4)(shao3)(fei1)","达菲/(da2)(fei1)","omg/oh my god"]Tones are represented by numbers: first tone (yinping) is 1, second tone (yangping) is 2, third tone (shangsheng) is 3, fourth tone (qusheng) is 4, and neutral tone is 5.
Enhances recognition capability for specified low-resource languages and dialects. After setting this parameter, speech performance can be improved in the specified low-resource language/dialect scenarios. If the low-resource language type is unclear, you can select “auto”, and the model will determine the low-resource language type on its own. The following values are supported:'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'Bulgarian', 'Danish', 'Hebrew', 'Malay', 'Persian', 'Slovak', 'Swedish', 'Croatian', 'Filipino', 'Hungarian', 'Norwegian', 'Slovenian', 'Catalan', 'Nynorsk', 'Tamil', 'Afrikaans', 'auto'
Intensity adjustment (powerful/soft), range [-100,100]. Values closer to -100 make the voice more forceful; values closer to 100 make the voice softer.