Smart Concurrency#

Oban Pro extends queue configuration with global concurrency to limit jobs across all nodes, rate limiting to control jobs per time window, and partitioning to apply limits per worker, argument, or metadata value.

Global Concurrency#

Global concurrency limits the number of concurrent jobs that run across all nodes.

Typically the global concurrency limit is local_limit * num_nodes. For example, with three nodes and a local limit of 10, you’ll have a global limit of 30.

The only way to guarantee that all connected nodes will run exactly one job concurrently is to set global_limit = 1. Note that even a single node with limit = 1 may run more than one job during rolling or blue-green deploys when multiple instances briefly overlap.

Here are some examples:

# Execute 10 jobs concurrently across all nodes, with up to 10 on a single node
[queues.my_queue]
limit = 10
global_limit = 10

# Execute 10 jobs concurrently, but only 3 jobs on a single node
[queues.my_queue]
limit = 3
global_limit = 10

# Execute at most 1 job concurrently
[queues.my_queue]
limit = 1
global_limit = 1

Or programmatically in embedded mode:

queues = {"my_queue": {"limit": 3, "global_limit": 10}}

In this configuration each node can run up to 10 jobs locally, but across all nodes, only 5 jobs will run at once.

Rate Limiting#

Rate limiting controls the number of jobs that execute globally within a period of time. The limit is calculated using tracking data from all other nodes in the cluster. Then, job fetching is limited uses a sliding window over the configured period.

Every job execution counts toward the rate limit, regardless of whether the job completes, errors, snoozes, etc.

The period is specified in seconds:

  • period = 30 — 30 seconds

  • period = 60 — 1 minute

  • period = 3600 — 1 hour

Here are a few examples:

# Execute at most 60 jobs per minute, up to 10 locally
[queues.my_queue]
limit = 10
rate_limit = { allowed = 60, period = 60 }

# Execute at most 10 jobs per 30 seconds, up to 5 locally
[queues.my_queue]
limit = 5
rate_limit = { allowed = 10, period = 30 }

# Execute at most 1000 jobs per hour, up to 20 locally
[queues.my_queue]
limit = 20
rate_limit = { allowed = 1000, period = 3600 }

Or programmatically in embedded mode, where period can also be a timedelta:

from datetime import timedelta

queues = {"my_queue": {"limit": 10, "rate_limit": {"allowed": 60, "period": timedelta(minutes=1)}}}

The limit determines how many jobs can run on a single node and must always be set alongside rate_limit or global_limit.

Tip

Using larger time periods allows for smoother tracking of rate limits. For example, expressing “1 job per second” as “60 jobs per minute” provides the same throughput but reduces the granularity of tracking, resulting in more consistent job execution patterns.

Queue Partitioning#

In addition to global and rate limits at the queue level, you can partition a queue so that it’s treated as multiple queues where concurrency or rate limits apply separately to each partition.

Partitions are specified with fields like worker, args, or meta. When partitioning by args, or meta, choosing specific keys is required to keep partitioning meaningful. Focused partitioning minimizes the amount of data a queue needs to track and simplifies job-fetching queries.

Configuring Partitions#

The partition syntax is identical for global and rate limits.

Here are a few examples of viable partitioning schemes:

# Partition by worker alone
partition = "worker"

# Partition by the `account_id` from args
partition = { args = "account_id" }

# Partition by the `id` and `account_id` from args
partition = { args = ["id", "account_id"] }

# Partition by worker and the `account_id` key from args
partition = { worker = true, args = "account_id" }

# Partition by the `tenant_id` from meta
partition = { meta = "tenant_id" }

Take care to minimize partition cardinality by using as few keys as possible. Partitioning based on every permutation of your args makes concurrency or rate limits hard to reason about and can negatively impact queue performance.

Note

You cannot combine args and meta in the same partition configuration. Choose one or the other, optionally combined with worker.

Global Partitioning#

Global partitioning changes global concurrency behavior. Rather than applying a fixed number for the queue, it applies to every partition within the queue.

Consider the following example:

[queues.default]
limit = 10
global_limit = 1
partition = "worker"

The queue is configured to run one job per-worker across every node, but only 10 concurrently on a single node. That is in contrast to the standard behavior of global_limit, which would supersede the limit and only allow 1 concurrent job across every node.

Alternatively, you could partition by args based on a particular key:

[queues.default]
limit = 10
global_limit = 1
partition = { args = "tenant_id" }

That configures the queue to run one job concurrently across the entire cluster per tenant_id.

Rate Limit Partitioning#

Rate limit partitions operate similarly to global partitions. Rather than limiting all jobs within the queue, they limit each partition within the queue.

For example, to allow one job per-worker, every ten seconds, across every instance of the queue in your cluster:

[queues.default]
limit = 10
partition = "worker"
rate_limit = { allowed = 1, period = 10 }

Important

Smart concurrency requires the Oban Pro schema. See Installation for details.