Rate Limiting

The Gateway enforces rate limits per organization per model. Each model gets its own independent sliding window — so traffic to different models does not compete. Limits use a sliding window algorithm.

How Rate Limits Work

Per model bucket

Each model your organization accesses gets its own independent rate limit bucket. Requests to different models do not compete with each other.

Model A: 60 req/min + Model B: 60 req/min — independent

Shared across all keys

Within a single model, all API keys in your organization share the same bucket. Multiple keys do not multiply your limit.

Basic plan: 60 req/min per model (all keys combined)

Sliding window

Limits use a sliding 60-second window — not a fixed reset at the top of the minute. Capacity restores gradually.

No sudden resets — smooth traffic flow

The rate limit applies per organization per model. If your plan allows 60 req/min and you have two API keys both calling the same model simultaneously, both count toward that model's shared 60 req/min limit. However, requests to a different model use a completely separate bucket.

Sliding Window Algorithm

Rate limits use a sliding window — not a fixed reset at the top of each minute. This prevents burst spikes right after a reset.

At any point in time, the window covers the past 60 seconds.

Requests are counted in the current window continuously.

As old requests age out of the window, capacity is restored automatically.

There is no sudden reset — smooths out traffic naturally.

Burst limit (per-second) is enforced in addition to the per-minute limit.

Limits by Plan

Plan	Rate limit (per org per model)
Free	10 req/min
Basic	60 req/min
Pro	300 req/min
Enterprise	9,999 req/min

429 Response

429Too Many Requests

JSON

HTTP 429 Too Many Requests

{
  "error": "Rate limit exceeded. Slow down and try again.",
  "code": "RATE_LIMIT_EXCEEDED"
}

Implementing Retry Logic

When you receive a 429, wait before retrying. Use exponential backoff to avoid hammering the API repeatedly.

#!/bin/bash
MAX_RETRIES=3
DELAY=5
URL="https://dev-api.onlyfans-api.ai/api/models/MODEL_UUID/users/me"

for attempt in $(seq 1 $MAX_RETRIES); do
  http_code=$(curl -s -o /tmp/of_resp.json -w "%{http_code}" "$URL" \
    -H "X-API-Key: sk_live_your_key_here")

  [ "$http_code" != "429" ] && { cat /tmp/of_resp.json; exit 0; }

  echo "Rate limited. Retrying in ${DELAY}s... (attempt $attempt)"
  sleep $DELAY
done
echo "Max retries exceeded"; exit 1

Don't retry on 402

Only retry on 429. A 402 (Insufficient Credits) won't resolve itself by waiting — you need to top up credits first.

Log Retention

Request logs are retained for 14 days. Logs older than 14 days are automatically deleted on a rolling basis.

The 14-day window is measured from the timestamp of each request, not from the start of any billing period.

To keep records beyond 14 days, export logs via the API before they expire.

Retention applies to request logs only. Credit and billing records are not subject to this limit.

Logs expire automatically

There is no way to recover logs deleted past the 14-day retention window. Export any logs you need to keep before they expire.

PreviousRAG & SearchSemantic search and DSL query NextError HandlingStatus codes and error codes