Gemini 2.5 Flash TTS Text-to-Speech
curl --request POST \
--url https://api.highwayapi.ai/v3/gemini-2.5-flash-tts \
--header 'Authorization: <authorization>' \
--header 'Content-Type: <content-type>' \
--data '
{
"contents": {
"role": "<string>",
"parts": {
"text": "<string>"
}
},
"generation_config": {
"temperature": 123,
"speech_config": {
"voice_config": {
"prebuilt_voice_config": {
"voice_name": "<string>"
}
},
"language_code": "<string>",
"multi_speaker_voice_config": {
"speaker_voice_configs": [
{
"speaker": "<string>",
"voice_config": {
"prebuilt_voice_config": {
"voice_name": "<string>"
}
}
}
]
}
}
}
}
'{
"audioContent": "<string>",
"usageMetadata": {
"totalTokenCount": 123,
"promptTokenCount": 123,
"candidatesTokenCount": 123
}
}音频
Gemini 2.5 Flash TTS Text-to-Speech
POST
/
v3
/
gemini-2.5-flash-tts
Gemini 2.5 Flash TTS Text-to-Speech
curl --request POST \
--url https://api.highwayapi.ai/v3/gemini-2.5-flash-tts \
--header 'Authorization: <authorization>' \
--header 'Content-Type: <content-type>' \
--data '
{
"contents": {
"role": "<string>",
"parts": {
"text": "<string>"
}
},
"generation_config": {
"temperature": 123,
"speech_config": {
"voice_config": {
"prebuilt_voice_config": {
"voice_name": "<string>"
}
},
"language_code": "<string>",
"multi_speaker_voice_config": {
"speaker_voice_configs": [
{
"speaker": "<string>",
"voice_config": {
"prebuilt_voice_config": {
"voice_name": "<string>"
}
}
}
]
}
}
}
}
'{
"audioContent": "<string>",
"usageMetadata": {
"totalTokenCount": 123,
"promptTokenCount": 123,
"candidatesTokenCount": 123
}
}基于 Vertex AI generateContent 接口将文本转换为语音。请求体格式与官方 Vertex AI API 完全一致。支持同步(单请求单响应)和流式(单请求流式响应)两种模式。输出为 LINEAR16 PCM 格式(24kHz, 单声道, 16-bit signed little-endian),不包含 WAV 头。
请求头
枚举值:
application/jsonBearer 身份验证格式: Bearer {{API 密钥}}。
请求体
隐藏 properties
隐藏 properties
角色,固定为 user可选值:
user隐藏 properties
隐藏 properties
要合成为语音的文本内容。Vertex AI API 将提示词和文本合并在一个字段中,格式为 ’: ‘,例如 ‘Say the following in a curious way: OK, so… tell me about this AI thing.’。总大小最多 8000 字节,超出 655 秒的音频将被截断。支持内联标记标签:[sigh]、[laughing]、[uhm]、[sarcasm]、[robotic]、[shouting]、[whispering]、[extremely fast]、[short pause]、[medium pause]、[long pause]长度限制:0 - 8000
隐藏 properties
隐藏 properties
温度参数,控制语音生成的随机性和创造性。值越高越有创意和多样性,值越低越可预测和集中。有效范围 (0.0, 2.0],推荐值为 2.0取值范围:[0, 2]
隐藏 properties
隐藏 properties
单人语音配置。与 multi_speaker_voice_config 二选一
隐藏 properties
隐藏 properties
隐藏 properties
隐藏 properties
预置语音名称(大小写不敏感)。可选的 30 个语音(男女声均有)可选值:
Achernar, Achird, Algenib, Algieba, Alnilam, Aoede, Autonoe, Callirrhoe, Charon, Despina, Enceladus, Erinome, Fenrir, Gacrux, Iapetus, Kore, Laomedeia, Leda, Orus, Pulcherrima, Puck, Rasalgethi, Sadachbia, Sadaltager, Schedar, Sulafat, Umbriel, Vindemiatrix, Zephyr, Zubenelgenubi语言代码(BCP-47 格式,大小写不敏感)。GA 语言:ar-EG, bn-BD, nl-NL, en-IN, en-US, fr-FR, de-DE, hi-IN, id-ID, it-IT, ja-JP, ko-KR, mr-IN, pl-PL, pt-BR, ro-RO, ru-RU, es-ES, ta-IN, te-IN, th-TH, tr-TR, uk-UA, vi-VN。Preview 语言包括 cmn-CN(中文普通话)等 63 种可选值:
af-ZA, am-ET, ar-001, ar-EG, az-AZ, be-BY, bg-BG, bn-BD, ca-ES, ceb-PH, cmn-CN, cmn-TW, cs-CZ, da-DK, de-DE, el-GR, en-AU, en-GB, en-IN, en-US, es-419, es-ES, es-MX, et-EE, eu-ES, fa-IR, fi-FI, fil-PH, fr-CA, fr-FR, gl-ES, gu-IN, he-IL, hi-IN, hr-HR, ht-HT, hu-HU, hy-AM, id-ID, is-IS, it-IT, ja-JP, jv-JV, ka-GE, kn-IN, ko-KR, kok-IN, la-VA, lb-LU, lo-LA, lt-LT, lv-LV, mai-IN, mg-MG, mk-MK, ml-IN, mn-MN, mr-IN, ms-MY, my-MM, nb-NO, ne-NP, nl-NL, nn-NO, or-IN, pa-IN, pl-PL, ps-AF, pt-BR, pt-PT, ro-RO, ru-RU, sd-IN, si-LK, sk-SK, sl-SI, sq-AL, sr-RS, sv-SE, sw-KE, ta-IN, te-IN, th-TH, tr-TR, uk-UA, ur-PK, vi-VN多人语音配置。与 voice_config 二选一。注意:gemini-2.5-flash-lite-preview-tts 不支持多人合成
隐藏 properties
隐藏 properties
说话人语音配置列表
隐藏 properties
隐藏 properties
说话人别名,必须仅由字母数字字符组成,不含空格。需与 contents.parts.text 中的说话人标识一致
隐藏 properties
隐藏 properties
隐藏 properties
隐藏 properties
预置语音名称(大小写不敏感)。可选的 30 个语音(男女声均有)可选值:
Achernar, Achird, Algenib, Algieba, Alnilam, Aoede, Autonoe, Callirrhoe, Charon, Despina, Enceladus, Erinome, Fenrir, Gacrux, Iapetus, Kore, Laomedeia, Leda, Orus, Pulcherrima, Puck, Rasalgethi, Sadachbia, Sadaltager, Schedar, Sulafat, Umbriel, Vindemiatrix, Zephyr, Zubenelgenubi响应信息
Base64 编码的音频内容。格式为 LINEAR16 PCM(24kHz, 单声道, 16-bit signed little-endian),不包含 WAV 头。客户端可使用 ffmpeg 转换:ffmpeg -f s16le -ar 24k -ac 1 -i input.raw output.wav
⌘I