Concurrency Limiting¶

Rate limiting caps requests per time window. Concurrency limiting caps how many requests are being processed at the same time for a key — for example, "at most 5 simultaneous exports per user". It's the right tool for protecting expensive, long-running endpoints from pile-ups.

from django_smart_ratelimit import concurrency_limit

@concurrency_limit(key="user", max_concurrent=5)
def export_view(request):
    ...  # at most 5 of these run concurrently per user

When the limit is reached, additional requests get a 429 until an in-flight request finishes and frees a slot.

How it works¶

Each in-flight request takes a slot from an atomic semaphore (a Redis sorted set, or the in-memory backend) on entry and releases it on exit. If a request crashes before releasing, its slot is reclaimed after ttl seconds, so the limiter self-heals rather than deadlocking.

Arguments¶

Argument	Default	Description
`key`	—	Concurrency key. Resolves exactly like `@rate_limit`'s `key`: `"ip"`, `"user"`, a template such as `"user:{user.id}"`, or a callable.
`max_concurrent`	—	Maximum requests allowed in flight at once.
`ttl`	`60`	Seconds after which a held slot is assumed leaked and reclaimed. Set it above your longest expected request duration.
`backend`	`None`	Backend name override (defaults to the configured backend).
`block`	`True`	When `True`, an over-capacity request gets a `429`. When `False`, it runs anyway without holding a slot.
`response_callback`	`None`	Optional `(request) -> HttpResponse` for the over-capacity response.

Examples¶

# Per-IP cap on a heavy report endpoint, with a generous hold time.
@concurrency_limit(key="ip", max_concurrent=3, ttl=300)
def report(request):
    ...

# A callable key, and observe-only mode (never blocks, just frees slots).
@concurrency_limit(key=lambda r: f"team:{r.user.team_id}", max_concurrent=10, block=False)
def bulk_import(request):
    ...

# Async views are supported too.
@concurrency_limit(key="user", max_concurrent=2)
async def async_export(request):
    ...

Backends¶

Concurrency limiting needs a backend with semaphore support:

Redis (recommended for production) — atomic and shared across processes and hosts. Use this for any multi-process / multi-host deployment.
Memory — works in a single process; fine for development. (It is not shared across processes, so it does not enforce a global limit in a multi-worker deployment.)

Other backends raise ImproperlyConfigured when a concurrency-limited view is called. This is independent of RATELIMIT_BACKEND for rate limiting — you can pass backend="redis" to @concurrency_limit specifically.