ElevenLabs Text to Speech Flash V2.5

curl --request POST \
  --url https://api.highwayapi.ai/v3/elevenlabs-tts-flash-v2.5 \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "seed": 123,
  "text": "<string>",
  "stream": true,
  "voice_id": "<string>",
  "next_text": "<string>",
  "language_code": "<string>",
  "output_format": "<string>",
  "previous_text": "<string>",
  "use_pvc_as_ivc": true,
  "voice_settings": {
    "speed": 123,
    "style": 123,
    "stability": 123,
    "similarity_boost": 123,
    "use_speaker_boost": true
  },
  "next_request_ids": [
    {}
  ],
  "previous_request_ids": [
    {}
  ],
  "apply_text_normalization": "<string>",
  "apply_language_text_normalization": true,
  "pronunciation_dictionary_locators": [
    {
      "version_id": "<string>",
      "pronunciation_dictionary_id": "<string>"
    }
  ]
}
'

POST

elevenlabs-tts-flash-v2.5

curl --request POST \
  --url https://api.highwayapi.ai/v3/elevenlabs-tts-flash-v2.5 \
  --header 'Authorization: <authorization>' \
  --header 'Content-Type: <content-type>' \
  --data '
{
  "seed": 123,
  "text": "<string>",
  "stream": true,
  "voice_id": "<string>",
  "next_text": "<string>",
  "language_code": "<string>",
  "output_format": "<string>",
  "previous_text": "<string>",
  "use_pvc_as_ivc": true,
  "voice_settings": {
    "speed": 123,
    "style": 123,
    "stability": 123,
    "similarity_boost": 123,
    "use_speaker_boost": true
  },
  "next_request_ids": [
    {}
  ],
  "previous_request_ids": [
    {}
  ],
  "apply_text_normalization": "<string>",
  "apply_language_text_normalization": true,
  "pronunciation_dictionary_locators": [
    {
      "version_id": "<string>",
      "pronunciation_dictionary_id": "<string>"
    }
  ]
}
'

Convert text to speech using the voice of your choice and return audio.

Request Headers

Content-Type

string

required

Enum value: application/json

Authorization

string

required

Bearer authentication format: Bearer {{API Key}}.

Request Body

seed

integer

If specified, the system will try to sample deterministically. Repeated requests with the same seed and parameters should return the same result, but full determinism is not guaranteed.Value range: [0, 4294967295]

text

string

required

The text to convert to speech.

stream

boolean

Whether to enable Stream mode.

voice_id

string

required

The voice ID to use.

next_text

string

The text after the text in the current request. Used to improve speech continuity when stitching together multiple generations.

language_code

string

The language code (ISO 639-1) used for the model and text normalization. If the model does not support this language code, an error will be returned.

output_format

string

default:"mp3_44100_128"

The output format of the generated audio. The format is codec_sample_rate_bitrate. A 192 kbps bitrate for MP3 requires a Creator account or above; a 44.1 kHz sample rate for PCM requires a Pro account or above.Optional values: mp3_22050_32, mp3_24000_48, mp3_44100_32, mp3_44100_64, mp3_44100_96, mp3_44100_128, mp3_44100_192, pcm_8000, pcm_16000, pcm_22050, pcm_24000, pcm_32000, pcm_44100, pcm_48000, ulaw_8000, alaw_8000, opus_48000_32, opus_48000_64, opus_48000_96, opus_48000_128, opus_48000_192

previous_text

string

The text before the text in the current request. Used to improve speech continuity when stitching together multiple generations.

use_pvc_as_ivc

boolean

default:false

If true, use the IVC version of the voice instead of the PVC version. This is a temporary workaround for the higher latency of the PVC version.

voice_settings

object

Hide properties

speed

number

default:1

Adjusts the speed of the voice. 1.0 is the default speed; values below 1.0 slow the speech down, while values above 1.0 speed it up.

style

number

default:0

Determines how exaggerated the voice style is. Attempts to amplify the style of the original speaker. Setting this to a non-zero value consumes more compute resources and may increase latency.

stability

number

Determines the stability of speech generation and the randomness between generations. Lower values produce a wider emotional range, while higher values may make the voice sound monotonous.

similarity_boost

number

Determines how closely the AI tries to replicate the original voice.

use_speaker_boost

boolean

default:true

Enhances similarity to the original speaker. Requires slightly higher compute load and increases latency.

next_request_ids

array

A list of request_id values for subsequent samples. Used to maintain speech continuity when regenerating samples. Up to 3 request_id values can be provided.Array length: 0 - 3

previous_request_ids

array

A list of request_id values for samples generated before the current generation. Can be used to improve speech continuity. Up to 3 request_id values can be provided.Array length: 0 - 3

apply_text_normalization

string

default:"auto"

Controls text normalization. ‘auto’ lets the system decide, ‘on’ always normalizes, and ‘off’ skips normalization.Optional values: auto, on, off

apply_language_text_normalization

boolean

default:false

Controls language-specific text normalization for certain supported languages to achieve more natural pronunciation. Warning: this may significantly increase latency. Currently only Japanese is supported.

pronunciation_dictionary_locators

array

A list of pronunciation dictionary locators (id, version_id) to apply to the text. They take effect in order. Each request can include up to 3 locators.Array length: 0 - 3

Hide properties

version_id

string

The ID of the pronunciation dictionary version. If not specified, the latest version is used.

pronunciation_dictionary_id

string

required

The ID of the pronunciation dictionary.

Response Information

Generated audio file Format: binary

ElevenLabs Text-to-Speech Flash V2

ElevenLabs Text to Speech Multilingual V2

​Request Headers

​Request Body

​Response Information

Request Headers

Request Body

Response Information