# Smart Concurrency Oban Pro extends queue configuration with global concurrency to limit jobs across all nodes, rate limiting to control jobs per time window, and partitioning to apply limits per worker, argument, or metadata value. (global-concurrency)= ## Global Concurrency Global concurrency limits the number of concurrent jobs that run across all nodes. Typically the global concurrency limit is `local_limit * num_nodes`. For example, with three nodes and a local limit of 10, you'll have a global limit of 30. The only way to guarantee that all connected nodes will run _exactly one job_ concurrently is to set `global_limit = 1`. Note that even a single node with `limit = 1` may run more than one job during rolling or blue-green deploys when multiple instances briefly overlap. Here are some examples: ```toml # Execute 10 jobs concurrently across all nodes, with up to 10 on a single node [queues.my_queue] limit = 10 global_limit = 10 # Execute 10 jobs concurrently, but only 3 jobs on a single node [queues.my_queue] limit = 3 global_limit = 10 # Execute at most 1 job concurrently [queues.my_queue] limit = 1 global_limit = 1 ``` Or programmatically in embedded mode: ```python queues = {"my_queue": {"limit": 3, "global_limit": 10}} ``` In this configuration each node can run up to 10 jobs locally, but across all nodes, only 5 jobs will run at once. (rate-limiting)= ## Rate Limiting Rate limiting controls the number of jobs that execute globally within a period of time. The limit is calculated using tracking data from all other nodes in the cluster. Then, job fetching is limited uses a sliding window over the configured period. Every job execution counts toward the rate limit, regardless of whether the job completes, errors, snoozes, etc. The `period` is specified in seconds: * `period = 30` — 30 seconds * `period = 60` — 1 minute * `period = 3600` — 1 hour Here are a few examples: ```toml # Execute at most 60 jobs per minute, up to 10 locally [queues.my_queue] limit = 10 rate_limit = { allowed = 60, period = 60 } # Execute at most 10 jobs per 30 seconds, up to 5 locally [queues.my_queue] limit = 5 rate_limit = { allowed = 10, period = 30 } # Execute at most 1000 jobs per hour, up to 20 locally [queues.my_queue] limit = 20 rate_limit = { allowed = 1000, period = 3600 } ``` Or programmatically in embedded mode, where `period` can also be a `timedelta`: ```python from datetime import timedelta queues = {"my_queue": {"limit": 10, "rate_limit": {"allowed": 60, "period": timedelta(minutes=1)}}} ``` The `limit` determines how many jobs can run on a single node and *must always* be set alongside `rate_limit` or `global_limit`. ```{tip} Using larger time periods allows for smoother tracking of rate limits. For example, expressing "1 job per second" as "60 jobs per minute" provides the same throughput but reduces the granularity of tracking, resulting in more consistent job execution patterns. ``` (queue-partitioning)= ## Queue Partitioning In addition to global and rate limits at the queue level, you can partition a queue so that it's treated as multiple queues where concurrency or rate limits apply separately to each partition. Partitions are specified with fields like `worker`, `args`, or `meta`. When partitioning by `args`, or `meta`, choosing specific keys is required to keep partitioning meaningful. Focused partitioning minimizes the amount of data a queue needs to track and simplifies job-fetching queries. ### Configuring Partitions The partition syntax is identical for global and rate limits. Here are a few examples of viable partitioning schemes: ```toml # Partition by worker alone partition = "worker" # Partition by the `account_id` from args partition = { args = "account_id" } # Partition by the `id` and `account_id` from args partition = { args = ["id", "account_id"] } # Partition by worker and the `account_id` key from args partition = { worker = true, args = "account_id" } # Partition by the `tenant_id` from meta partition = { meta = "tenant_id" } ``` Take care to minimize partition cardinality by using as few keys as possible. Partitioning based on _every permutation_ of your args makes concurrency or rate limits hard to reason about and can negatively impact queue performance. ```{note} You cannot combine `args` and `meta` in the same partition configuration. Choose one or the other, optionally combined with `worker`. ``` ### Global Partitioning Global partitioning changes global concurrency behavior. Rather than applying a fixed number for the queue, it applies to every partition within the queue. Consider the following example: ```toml [queues.default] limit = 10 global_limit = 1 partition = "worker" ``` The queue is configured to run one job per-worker across every node, but only 10 concurrently on a single node. That is in contrast to the standard behavior of `global_limit`, which would supersede the `limit` and only allow 1 concurrent job across every node. Alternatively, you could partition by `args` based on a particular key: ```toml [queues.default] limit = 10 global_limit = 1 partition = { args = "tenant_id" } ``` That configures the queue to run one job concurrently across the entire cluster per `tenant_id`. ### Rate Limit Partitioning Rate limit partitions operate similarly to global partitions. Rather than limiting all jobs within the queue, they limit each partition within the queue. For example, to allow one job per-worker, every ten seconds, across every instance of the queue in your cluster: ```toml [queues.default] limit = 10 partition = "worker" rate_limit = { allowed = 1, period = 10 } ``` ```{important} Smart concurrency requires the Oban Pro schema. See [Installation](installation.md) for details. ```