Large Language Model Monitoring

Metrics Description

All metrics below are broken down by model and sampled at the minute level, but depending on the time interval you select, sample points may not be displayed for every minute. In this case, the sample points within that time interval will be averaged and displayed.

Requests Per Minute (RPM) Displays the number of API requests made per minute, helping you understand usage patterns and API concurrency levels.

Request Success Rate Displays the percentage of successful API responses per minute (non-5xx status codes), reflecting API availability.

Average Number of Tokens per Request Displays the average number of input and output tokens per request per minute, helping you understand token consumption patterns.

End-to-End (E2E) Latency Displays the total time required for the model to generate a complete response for requests per minute. Includes 99th percentile, 95th percentile, and average latency metrics.

Time to First Token (TTFT)

This metric is tracked only for streaming requests with the stream=true parameter enabled.

Displays the time required to process the Prompt and generate the first output token for requests per minute. Includes 99th percentile, 95th percentile, and average latency metrics.

Time per Output Token (TPOT)

This metric is tracked only for streaming requests with the stream=true parameter enabled.

Displays the average time between consecutive output tokens for requests per minute. Includes 99th percentile, 95th percentile, and average latency metrics.

Getting Started

LLM API

Model Providers

Model Features

Third-party Tool Setup

Metrics Description

​Metrics Description

Metrics Description