Understanding Rate Limits
Rate limits define the number of API requests that can be made within a specific period of time, helping optimize API usage.- Prevent API abuse and misuse
- Ensure fair resource allocation
- Maintain API performance and reliability
- Protect service stability
Default Rate Limits
Each account has default rate limits when calling models, measured in RPM (requests per model per minute) and TPM (tokens per model per minute). Rate limits vary by account tier. See the table below for the specific criteria.| Quota Tier | Eligibility (USD) |
|---|---|
| T1 | Highest total top-up amount in a single month over the last 3 calendar months < $50 |
| T2 | $50 ≤ Highest total top-up amount in a single month over the last 3 calendar months < $500 |
| T3 | $500 ≤ Highest total top-up amount in a single month over the last 3 calendar months < $3000 |
| T4 | $3000 ≤ Highest total top-up amount in a single month over the last 3 calendar months < $10000 |
| T5 | $10000 ≤ Highest total top-up amount in a single month over the last 3 calendar months |
Avoiding Rate Limits
If the number of your API requests exceeds the rate limit, the API will return:- HTTP status code: 429 (Too Many Requests).
- A message in the response body indicating that the rate limit has been exceeded.
- Implement request throttling in your application.
- Use exponential backoff when retrying.
- Monitor your API usage.
Handling 429 Errors
If you receive a 429 error, you can try the following:- Try again later: Wait for a period of time before retrying your request.
- Optimize requests: Reduce the request frequency.
- Increase rate limits: If you need higher rate limits, please contact us.