Concurrency Limiting¶
Rate limiting caps requests per time window. Concurrency limiting caps how many requests are being processed at the same time for a key — for example, "at most 5 simultaneous exports per user". It's the right tool for protecting expensive, long-running endpoints from pile-ups.
from django_smart_ratelimit import concurrency_limit
@concurrency_limit(key="user", max_concurrent=5)
def export_view(request):
... # at most 5 of these run concurrently per user
When the limit is reached, additional requests get a 429 until an in-flight
request finishes and frees a slot.
How it works¶
Each in-flight request takes a slot from an atomic semaphore (a Redis sorted
set, or the in-memory backend) on entry and releases it on exit. If a request
crashes before releasing, its slot is reclaimed after ttl seconds, so the
limiter self-heals rather than deadlocking.
Arguments¶
| Argument | Default | Description |
|---|---|---|
key |
— | Concurrency key. Resolves exactly like @rate_limit's key: "ip", "user", a template such as "user:{user.id}", or a callable. |
max_concurrent |
— | Maximum requests allowed in flight at once. |
ttl |
60 |
Seconds after which a held slot is assumed leaked and reclaimed. Set it above your longest expected request duration. |
backend |
None |
Backend name override (defaults to the configured backend). |
block |
True |
When True, an over-capacity request gets a 429. When False, it runs anyway without holding a slot. |
response_callback |
None |
Optional (request) -> HttpResponse for the over-capacity response. |
Examples¶
# Per-IP cap on a heavy report endpoint, with a generous hold time.
@concurrency_limit(key="ip", max_concurrent=3, ttl=300)
def report(request):
...
# A callable key, and observe-only mode (never blocks, just frees slots).
@concurrency_limit(key=lambda r: f"team:{r.user.team_id}", max_concurrent=10, block=False)
def bulk_import(request):
...
# Async views are supported too.
@concurrency_limit(key="user", max_concurrent=2)
async def async_export(request):
...
Backends¶
Concurrency limiting needs a backend with semaphore support:
- Redis (recommended for production) — atomic and shared across processes and hosts. Use this for any multi-process / multi-host deployment.
- Memory — works in a single process; fine for development. (It is not shared across processes, so it does not enforce a global limit in a multi-worker deployment.)
Other backends raise ImproperlyConfigured when a concurrency-limited view is
called. This is independent of RATELIMIT_BACKEND for rate limiting — you can
pass backend="redis" to @concurrency_limit specifically.