Concepts

Rate limits

Per-tenant budgets for the worker and MCP tools, the headers we emit, and the backoff strategy we recommend.

Prometheus enforces per-tenant rate limits to keep a noisy job from starving the rest of the platform. Limits are intentionally generous for normal interactive use and tight for bursty automation that forgets to back off.

Default budgets

SurfaceBudgetWindow
POST /index30 requestsper hour, per tenant
GET /healthz600 requestsper minute, per source IP
MCP tool calls (search_code, get_symbol, ...)600 requestsper minute, per tenant
Dashboard mutations (/app/... server actions)120 requestsper minute, per user
Magic-link sends5 requestsper 10 minutes, per email
Invitation sends30 requestsper hour, per tenant

These are the default budgets during the private beta. Owners can mail hello@prom.codes to negotiate a higher ceiling for genuine workloads — we are not in the business of artificial scarcity.

Headers on every response

The worker emits standard rate-limit headers on every request, including the ones it accepts:

X-RateLimit-Limit:      30
X-RateLimit-Remaining:  27
X-RateLimit-Reset:      1716220800

X-RateLimit-Reset is a Unix timestamp (seconds) at which the current window expires. Read it before sleeping — do not hard-code a wait interval.

When a request is rejected:

HTTP/1.1 429 Too Many Requests
Retry-After: 42
X-RateLimit-Limit:      30
X-RateLimit-Remaining:  0
X-RateLimit-Reset:      1716220800
Content-Type: application/json

{ "ok": false, "code": "RATE_LIMITED", "retryAfterSec": 42 }

Honor Retry-After even if you compute your own delay from X-RateLimit-Reset — they may differ during partial outages when we throttle proactively.

For automated clients (CI, agents, custom scripts):

  1. Inspect headers on every response, not just on 429. If X-RateLimit-Remaining is ≤ 10 % of the limit, slow down pre-emptively.
  2. Honor Retry-After on 429 and 503. Sleep at least that long, then add up to 25 % jitter so coordinated retries do not align.
  3. Exponential backoff on repeated 429s. Double the wait each time, cap at 5 minutes. Reset on the first 2xx.
  4. No tight loops. A retry loop without sleep will only ever succeed in being throttled harder — the limit is per tenant, so two of your jobs racing each other share the budget.

What does not count

  • Failed authentication (401) does not consume rate-limit budget. Otherwise a leaked-but-revoked key in a tight loop could DoS its former tenant.
  • Schema validation rejections (400 INVALID_BODY) do count — malformed payloads still cost the worker CPU.
  • Reads from the dashboard (GET /app/...) are budgeted per-user, not per-tenant, so a busy operator does not starve their colleagues.

Per-key limits

Per-API-key limits ship with workspace-scoped keys in Phase 4. Until then, every key on a tenant draws from the same per-tenant bucket. That is also why we recommend distinct labels per consumer (API key lifecycle) — once per-key limits exist, the dashboard will let you set them per label.

Telemetry you can read

The audit log records RATE_LIMITED events for POST /index and dashboard actions, so you can see in /app/audit which actor blew the budget. Worker-side MCP throttling lands in the audit feed in Phase 3 — until then it is response-header only.