Skip to main content
POST
/
v3
/
gemini-2.5-flash-tts
Gemini 2.5 Flash TTS Text-to-Speech
curl --request POST \
  --url https://api.highwayapi.ai/v3/gemini-2.5-flash-tts \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "contents": {
    "role": "<string>",
    "parts": {
      "text": "<string>"
    }
  },
  "generation_config": {
    "temperature": 123,
    "speech_config": {
      "voice_config": {
        "prebuilt_voice_config": {
          "voice_name": "<string>"
        }
      },
      "language_code": "<string>",
      "multi_speaker_voice_config": {
        "speaker_voice_configs": [
          {
            "speaker": "<string>",
            "voice_config": {
              "prebuilt_voice_config": {
                "voice_name": "<string>"
              }
            }
          }
        ]
      }
    }
  }
}
'
{
  "audioContent": "<string>",
  "usageMetadata": {
    "totalTokenCount": 123,
    "promptTokenCount": 123,
    "candidatesTokenCount": 123
  }
}
Convert text to speech based on the Vertex AI generateContent API. The request body format is fully consistent with the official Vertex AI API. Both synchronous (single request, single response) and streaming (single request, streaming response) modes are supported. The output is in LINEAR16 PCM format (24kHz, mono, 16-bit signed little-endian) and does not include a WAV header.

Request Headers

Content-Type
string
required
Enum value: application/json
Authorization
string
required
Bearer authentication format: Bearer {{API Key}}.

Request Body

contents
object
required
generation_config
object
required

Response Information

audioContent
string
Base64-encoded audio content. The format is LINEAR16 PCM (24kHz, mono, 16-bit signed little-endian) and does not include a WAV header. The client can use ffmpeg to convert it: ffmpeg -f s16le -ar 24k -ac 1 -i input.raw output.wav
usageMetadata
object