This API supports asynchronous text-to-speech generation. A single text generation request supports transmitting up to 1 million characters, and the complete generated audio result can be retrieved asynchronously. It supports 100+ system voices and cloned voice selection, as well as custom adjustment of pitch, speed, volume, bitrate, sample rate, and output format.After submitting a long-text speech synthesis request, please note that the returned url is valid for 24 hours from the time the url is returned. Please download the information within this time window.
Suitable for generating speech from long texts such as entire books. Task queuing may take a relatively long time. For scenarios such as short sentence generation, voice chat, and online social interaction, we recommend using synchronous speech synthesis.
This parameter supports English text normalization, which can improve performance in numeric reading scenarios but may slightly increase latency. If not provided, the default value is false.
Range [32000, 64000, 128000, 256000]The bitrate of the generated voice. Optional, default value is 128000. This parameter only takes effect for audio in mp3 format.
Replace text, symbols, and corresponding pronunciations that require special annotation.Pronunciation replacement (adjust tones/replace pronunciations of other characters), in the following format:["燕少飞/(yan4)(shao3)(fei1)","达菲/(da2)(fei1)","omg/oh my god"]Tones are represented by numbers: first tone (Yinping) is 1, second tone (Yangping) is 2, third tone (Shangsheng) is 3, fourth tone (Qusheng) is 4, and neutral tone is 5.
Enhances recognition capabilities for specified low-resource languages and dialects. After setting this parameter, speech performance can be improved in the specified low-resource language/dialect scenario. If the low-resource language type is unclear, you can choose “auto”, and the model will determine the low-resource language type autonomously. The following values are supported:'Chinese', 'Chinese,Yue', 'English', 'Arabic', 'Russian', 'Spanish', 'French', 'Portuguese', 'German', 'Turkish', 'Dutch', 'Ukrainian', 'Vietnamese', 'Indonesian', 'Japanese', 'Italian', 'Korean', 'Thai', 'Polish', 'Romanian', 'Greek', 'Czech', 'Finnish', 'Hindi', 'Bulgarian', 'Danish', 'Hebrew', 'Malay', 'Persian', 'Slovak', 'Swedish', 'Croatian', 'Filipino', 'Hungarian', 'Norwegian', 'Slovenian', 'Catalan', 'Nynorsk', 'Tamil', 'Afrikaans', 'auto'
Intensity adjustment (powerful/soft), range [-100,100]. Values closer to -100 make the voice more forceful; values closer to 100 make the voice softer.