Metrics Description
All metrics below are broken down by model and sampled at the minute level, but depending on the time interval you select, sample points may not be displayed for every minute. In this case, the sample points within that time interval will be averaged and displayed.
- Requests Per Minute (RPM) Displays the number of API requests made per minute, helping you understand usage patterns and API concurrency levels.
- Request Success Rate Displays the percentage of successful API responses per minute (non-5xx status codes), reflecting API availability.
- Average Number of Tokens per Request Displays the average number of input and output tokens per request per minute, helping you understand token consumption patterns.
- End-to-End (E2E) Latency Displays the total time required for the model to generate a complete response for requests per minute. Includes 99th percentile, 95th percentile, and average latency metrics.
-
Time to First Token (TTFT)
Displays the time required to process the Prompt and generate the first output token for requests per minute. Includes 99th percentile, 95th percentile, and average latency metrics.This metric is tracked only for streaming requests with the
stream=trueparameter enabled. -
Time per Output Token (TPOT)
Displays the average time between consecutive output tokens for requests per minute. Includes 99th percentile, 95th percentile, and average latency metrics.This metric is tracked only for streaming requests with the
stream=trueparameter enabled.