Skip to main content

Understanding Rate Limits

Rate limits define the number of API requests that can be made within a specific period of time, helping optimize API usage.
  • Prevent API abuse and misuse
  • Ensure fair resource allocation
  • Maintain API performance and reliability
  • Protect service stability

Default Rate Limits

Each account has default rate limits when calling models, measured in RPM (requests per model per minute) and TPM (tokens per model per minute). Rate limits vary by account tier. See the table below for the specific criteria.
Quota TierEligibility (USD)
T1Highest total top-up amount in a single month over the last 3 calendar months < $50
T2$50 ≤ Highest total top-up amount in a single month over the last 3 calendar months < $500
T3$500 ≤ Highest total top-up amount in a single month over the last 3 calendar months < $3000
T4$3000 ≤ Highest total top-up amount in a single month over the last 3 calendar months < $10000
T5$10000 ≤ Highest total top-up amount in a single month over the last 3 calendar months
Default rate limits for each tier (RPM / TPM):

Avoiding Rate Limits

If the number of your API requests exceeds the rate limit, the API will return:
  • HTTP status code: 429 (Too Many Requests).
  • A message in the response body indicating that the rate limit has been exceeded.
To avoid triggering rate limits, you can take the following measures:
  • Implement request throttling in your application.
  • Use exponential backoff when retrying.
  • Monitor your API usage.

Handling 429 Errors

If you receive a 429 error, you can try the following:
  • Try again later: Wait for a period of time before retrying your request.
  • Optimize requests: Reduce the request frequency.
  • Increase rate limits: If you need higher rate limits, please contact us.