# Smart Concurrency

Oban Pro extends queue configuration with global concurrency to limit jobs across all nodes, rate
limiting to control jobs per time window, and partitioning to apply limits per worker, argument,
or metadata value.

(global-concurrency)=
## Global Concurrency

Global concurrency limits the number of concurrent jobs that run across all nodes.

Typically the global concurrency limit is `local_limit * num_nodes`. For example, with three nodes
and a local limit of 10, you'll have a global limit of 30.

The only way to guarantee that all connected nodes will run _exactly one job_ concurrently is to
set `global_limit = 1`. Note that even a single node with `limit = 1` may run more than one job
during rolling or blue-green deploys when multiple instances briefly overlap.

Here are some examples:

```toml
# Execute 10 jobs concurrently across all nodes, with up to 10 on a single node
[queues.my_queue]
limit = 10
global_limit = 10

# Execute 10 jobs concurrently, but only 3 jobs on a single node
[queues.my_queue]
limit = 3
global_limit = 10

# Execute at most 1 job concurrently
[queues.my_queue]
limit = 1
global_limit = 1
```

Or programmatically in embedded mode:

```python
queues = {"my_queue": {"limit": 3, "global_limit": 10}}
```

In this configuration each node can run up to 10 jobs locally, but across all nodes, only 5 jobs
will run at once.

(rate-limiting)=
## Rate Limiting

Rate limiting controls the number of jobs that execute globally within a period of time. The limit
is calculated using tracking data from all other nodes in the cluster. Then, job fetching is
limited uses a sliding window over the configured period.

Every job execution counts toward the rate limit, regardless of whether the job completes,
errors, snoozes, etc.

The `period` is specified in seconds:

* `period = 30` — 30 seconds
* `period = 60` — 1 minute
* `period = 3600` — 1 hour

Here are a few examples:

```toml
# Execute at most 60 jobs per minute, up to 10 locally
[queues.my_queue]
limit = 10
rate_limit = { allowed = 60, period = 60 }

# Execute at most 10 jobs per 30 seconds, up to 5 locally
[queues.my_queue]
limit = 5
rate_limit = { allowed = 10, period = 30 }

# Execute at most 1000 jobs per hour, up to 20 locally
[queues.my_queue]
limit = 20
rate_limit = { allowed = 1000, period = 3600 }
```

Or programmatically in embedded mode, where `period` can also be a `timedelta`:

```python
from datetime import timedelta

queues = {"my_queue": {"limit": 10, "rate_limit": {"allowed": 60, "period": timedelta(minutes=1)}}}
```

The `limit` determines how many jobs can run on a single node and *must always* be set alongside
`rate_limit` or `global_limit`.

```{tip}
Using larger time periods allows for smoother tracking of rate limits. For example, expressing "1
job per second" as "60 jobs per minute" provides the same throughput but reduces the granularity
of tracking, resulting in more consistent job execution patterns.
```

(queue-partitioning)=
## Queue Partitioning

In addition to global and rate limits at the queue level, you can partition a queue so that it's
treated as multiple queues where concurrency or rate limits apply separately to each partition.

Partitions are specified with fields like `worker`, `args`, or `meta`. When partitioning by
`args`, or `meta`, choosing specific keys is required to keep partitioning meaningful. Focused
partitioning minimizes the amount of data a queue needs to track and simplifies job-fetching
queries.

### Configuring Partitions

The partition syntax is identical for global and rate limits.

Here are a few examples of viable partitioning schemes:

```toml
# Partition by worker alone
partition = "worker"

# Partition by the `account_id` from args
partition = { args = "account_id" }

# Partition by the `id` and `account_id` from args
partition = { args = ["id", "account_id"] }

# Partition by worker and the `account_id` key from args
partition = { worker = true, args = "account_id" }

# Partition by the `tenant_id` from meta
partition = { meta = "tenant_id" }
```

Take care to minimize partition cardinality by using as few keys as possible. Partitioning based
on _every permutation_ of your args makes concurrency or rate limits hard to reason about and can
negatively impact queue performance.

```{note}
You cannot combine `args` and `meta` in the same partition configuration. Choose one or the
other, optionally combined with `worker`.
```

### Global Partitioning

Global partitioning changes global concurrency behavior. Rather than applying a fixed number for
the queue, it applies to every partition within the queue.

Consider the following example:

```toml
[queues.default]
limit = 10
global_limit = 1
partition = "worker"
```

The queue is configured to run one job per-worker across every node, but only 10 concurrently on a
single node. That is in contrast to the standard behavior of `global_limit`, which would supersede
the `limit` and only allow 1 concurrent job across every node.

Alternatively, you could partition by `args` based on a particular key:

```toml
[queues.default]
limit = 10
global_limit = 1
partition = { args = "tenant_id" }
```

That configures the queue to run one job concurrently across the entire cluster per `tenant_id`.

### Rate Limit Partitioning

Rate limit partitions operate similarly to global partitions. Rather than limiting all jobs within
the queue, they limit each partition within the queue.

For example, to allow one job per-worker, every ten seconds, across every instance of the queue
in your cluster:

```toml
[queues.default]
limit = 10
partition = "worker"
rate_limit = { allowed = 1, period = 10 }
```

```{important}
Smart concurrency requires the Oban Pro schema. See [Installation](installation.md) for details.
```