Rate limits
Per-plan request and daily token caps and how throttling is signalled.
Foundation Machines applies two complementary limits to every API key: a sliding-window cap on requests per minute, and a rolling daily cap on total tokens. Both reset on a fixed schedule.
Per-plan limits
| Plan | RPM | Daily tokens |
|---|---|---|
| free | 5 | 50,000 |
| pro | 20 | 500,000 |
| team | 60 | 3,000,000 |
| enterprise | 120 | 10,000,000 |
These limits apply to all gateway endpoints, /v1/chat/completions,
/v1/completions and /v1/audit share the same per-key budget.
Reset cadence
- RPM: 60-second sliding window. The counter decays continuously.
- Daily tokens: 00:00 UTC. The day boundary is fixed regardless of your timezone.
When you hit a limit
When throttled you receive HTTP 429 with one of two error.type values:
rate_limit: your per-minute request rate exceeded the plan cap. Back off briefly.quota_exceeded: your daily token budget is exhausted. Upgrade or wait for the UTC reset.
Both responses include a human-readable error.message describing the cap that
was hit. The response body shape matches OpenAI's error format so existing
SDK error handling works unchanged.
Checking your usage
Call GET /v1/usage/me at any time to see the current
counter and remaining headroom for the day. The endpoint is not rate-limited.