Concepts
Rate limits
Per-tenant budgets for the worker and MCP tools, the headers we emit, and the backoff strategy we recommend.
Prometheus enforces per-tenant rate limits to keep a noisy job from starving the rest of the platform. Limits are intentionally generous for normal interactive use and tight for bursty automation that forgets to back off.
Default budgets
| Surface | Budget | Window |
|---|---|---|
POST /index | 30 requests | per hour, per tenant |
GET /healthz | 600 requests | per minute, per source IP |
MCP tool calls (search_code, get_symbol, ...) | 600 requests | per minute, per tenant |
Dashboard mutations (/app/... server actions) | 120 requests | per minute, per user |
| Magic-link sends | 5 requests | per 10 minutes, per email |
| Invitation sends | 30 requests | per hour, per tenant |
These are the default budgets during the private beta. Owners can mail hello@prom.codes to negotiate a higher ceiling for genuine workloads — we are not in the business of artificial scarcity.
Headers on every response
The worker emits standard rate-limit headers on every request, including the ones it accepts:
X-RateLimit-Limit: 30
X-RateLimit-Remaining: 27
X-RateLimit-Reset: 1716220800
X-RateLimit-Reset is a Unix timestamp (seconds) at which the
current window expires. Read it before sleeping — do not
hard-code a wait interval.
When a request is rejected:
HTTP/1.1 429 Too Many Requests
Retry-After: 42
X-RateLimit-Limit: 30
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1716220800
Content-Type: application/json
{ "ok": false, "code": "RATE_LIMITED", "retryAfterSec": 42 }
Honor Retry-After even if you compute your own delay from
X-RateLimit-Reset — they may differ during partial outages
when we throttle proactively.
Recommended backoff
For automated clients (CI, agents, custom scripts):
- Inspect headers on every response, not just on 429. If
X-RateLimit-Remainingis ≤ 10 % of the limit, slow down pre-emptively. - Honor
Retry-Afteron 429 and 503. Sleep at least that long, then add up to 25 % jitter so coordinated retries do not align. - Exponential backoff on repeated 429s. Double the wait each time, cap at 5 minutes. Reset on the first 2xx.
- No tight loops. A retry loop without sleep will only ever succeed in being throttled harder — the limit is per tenant, so two of your jobs racing each other share the budget.
What does not count
- Failed authentication (
401) does not consume rate-limit budget. Otherwise a leaked-but-revoked key in a tight loop could DoS its former tenant. - Schema validation rejections (
400 INVALID_BODY) do count — malformed payloads still cost the worker CPU. - Reads from the dashboard (
GET /app/...) are budgeted per-user, not per-tenant, so a busy operator does not starve their colleagues.
Per-key limits
Per-API-key limits ship with workspace-scoped keys in Phase 4. Until then, every key on a tenant draws from the same per-tenant bucket. That is also why we recommend distinct labels per consumer (API key lifecycle) — once per-key limits exist, the dashboard will let you set them per label.
Telemetry you can read
The audit log records RATE_LIMITED events for POST /index and
dashboard actions, so you can see in
/app/audit which actor blew the
budget. Worker-side MCP throttling lands in the audit feed in
Phase 3 — until then it is response-header only.
Related
- Troubleshooting — what to do when a 429 surprises you.
- API key lifecycle — why a rotating key set helps once per-key limits land.
- Security model — the isolation guarantees the limiter sits behind.