Gemini 2.5 Flash TTS Text-to-Speech
Audio
Gemini 2.5 Flash TTS Text-to-Speech
POST
Gemini 2.5 Flash TTS Text-to-Speech
Convert text to speech based on the Vertex AI generateContent API. The request body format is fully consistent with the official Vertex AI API. Both synchronous (single request, single response) and streaming (single request, streaming response) modes are supported. The output is in LINEAR16 PCM format (24kHz, mono, 16-bit signed little-endian) and does not include a WAV header.
Request Headers
Enum value:
application/jsonBearer authentication format: Bearer {{API Key}}.
Request Body
Response Information
Base64-encoded audio content. The format is LINEAR16 PCM (24kHz, mono, 16-bit signed little-endian) and does not include a WAV header. The client can use ffmpeg to convert it: ffmpeg -f s16le -ar 24k -ac 1 -i input.raw output.wav