Changelog for Oban Pro v1.3
This release is entirely dedicated to Smart engine optimizations, from slashing queue transactions to boosting bulk insert performance.
📮 Async Tracking
Rather than synchronously recording updates (acks) in a separate transaction after jobs execute, the Smart engine now bundles acks together to minimize transactions and reduce load on the database.
Async tracking, combined with the other enhancements detailed below, showed the following improvements over the previous Smart engine when executing 1,000 jobs with concurrency set to 20:
- Transactions reduced by 97% (1,501 to 51)
- Queries reduced by 94% (3,153 to 203)
That means less Ecto pool contention, fewer transactions, fewer queries, and fewer writes to the
oban_producers
table! There are similar, albeit less flashy, improvements over the Basic
engine as well.
Notes and Implementation Details
Acks are stored centrally per queue and flushed with the next transaction using a lock-free mechanism that never drops an operation.
Acks are grouped and executed as a single query whenever possible. This is most visible in high throughput queues.
Acks are preserved across transactions to guarantee nothing is lost in the event of a rollback or an exception.
Acks are flushed on shutdown and when the queue is paused to ensure data remains as consistent as the previous synchronous version.
Acking is synchronous in testing mode, when draining jobs, and when explicitly enabled by a flag provided to the queue.
See the Smart engine's async tracking section for more details and instructions on how to selectively opt out of async mode.
v1.3.5 — 2024-02-16
Enhancements
[DynamicLifeline] Track rescues with a counter in meta.
Rescued jobs can now be identified by a
rescued
value inmeta
. Each rescue increments therescued
count by one.[Smart] Skip taking unique advisory locks in testing mode.
Advisory locks are global and apply across transactions. That can break async tests with overlapping unique jobs because the lock is held in a concurrent, sandboxed test.
Bug Fixes
[Smart] Prevent stuck jobs with more reliable async ack management.
Unhandled transaction failures or timeouts while acking could cause uncomitted acks to be lost, leaving jobs stuck in an
executing
state but unable to be rescued.Now acks are pulled from the producer's ETS table all at once, without a time-based select. Successfully persisted acks are deleted from the table individually rather than by time.
[Smart] Ensure uniqueness across args when no keys are specified regardless of insertion order.
[Smart] Force materializing the CTE when fetching jobs.
The CTE used to prevent optimizations in the engine's fetch query is only referenced once, which may allow the Postgres optimizer to inline it. Inlining can negate the CTE "optimization fence", so we force the CTE to be materialized.
[DynamicPartitioner] Determine
date_partition?
default lazily at runtime.Date partitioning should be disabled in the test environment because the plugin doesn't run to pre-create partitions. However, using
Mix.env
at compliation time wasnt't reliable enough to prevent sub-partitioning by date in testing environments. This switches the check to runtime.
v1.3.4 — 2024-02-06
Enhancements
[DynamicPartitioner] Improve partitioned structure and indexes for performance.
The partitioner migration now creates fewer partitions to improve staging queries. To match, it creates more specific indexes for each table to prevent sequential scans during common queries.
All partitions gain a primary key index, and the compound index for "completed" states now omits fields that are only needed for incomplete states.
Indexes on existing partitioned tables should be updated as shown below. However, note that indexes can't be added to partitioned tables concurrently.
-- This will cascade down to all partitions CREATE INDEX oban_jobs_pkey ON oban_jobs (id); -- Add an index with the correct state to for available, scheduled, and retryable CREATE INDEX oban_jobs_available_state_queue_priority_scheduled_at_id_index ON oban_jobs_available (state, queue, priority, scheduled_at, id); CREATE INDEX oban_jobs_available_state_queue_priority_scheduled_at_id_index ON oban_jobs_retryable (state, queue, priority, scheduled_at, id); CREATE INDEX oban_jobs_available_state_queue_priority_scheduled_at_id_index ON oban_jobs_scheduled (state, queue, priority, scheduled_at, id); -- Add a simpler index to completed states CREATE INDEX oban_jobs_cancelled_queue_scheduled_at_id_index ON oban_jobs_cancelled (queue, scheduled_at, id); CREATE INDEX oban_jobs_completed_queue_scheduled_at_id_index ON oban_jobs_completed (queue, scheduled_at, id); CREATE INDEX oban_jobs_discarded_queue_scheduled_at_id_index ON oban_jobs_discarded (queue, scheduled_at, id); -- Drop the previous index that lacked the state DROP INDEX oban_jobs_queue_priority_scheduled_at_id_index;
[DynamicPartitioner] Allow using
DynamicPruner
withDynamicPartitioner
There are situations where it's still useful to use
DynamicPruner
to aggressively prune a subset of jobs more frequently than partitioning allows.[Smart] Augment all frequent queries with a
state
condition to aid partitioned queries.Partitioned table queries require the state for partition pruning, especially for an
id
only query. This changes the Smart engine's queries to run optimally with either a standard or partitioned table.
Bug Fixes
[DynamicPartitioner] Override configured
prefix
during backfillsDynamicPartitioner backfills retained the configured prefix, which defaults to
public
, without respecting thenew_prefix
option. Now the prefix is overridden and correctly escaped before usage.[Worker] Make
embed_one
with required fields truly optionalThe recursive nature of embedded structs made it impossible to have an optional
embed_one
with required fields. Now the embedded fields are only validated when a value is provided.[Workflow] Use
t:name
for:names
int:fetch_opts
Names may be either an atom or a string, and they're always coerced before querying anyhow.
v1.3.3 — 2024-01-23
This release depends on Oban v2.17.3 or greater
Enhancements
[Worker] Add
after_process/3
hook callback that includes execution results.The new
after_process/3
callback includes the job's return value as a third argument. That allows hooks to have immediate access to the job's return value without recording it and fetching it from the database.[Testing] Ensure queues are started before returning from
start_supervised_oban/1
.This prevents race conditions between when
start_supervised_oban
returns and when queues are fully booted, i.e. actively listening to pause/resume/scale signals.[DynamicPruner] Add
by_state_timestamp
option to DynamicPruner.In rare situations where the
scheduled_at
timestamp isn't accurate enough to identify prunable jobs, e.g. cancelling large swaths of jobs scheduled far into the future, the newby_state_timestamp: true
option can be used for increased accuracy.[DynamicQueues] Accept ack_async/refresh_interval in DynamicQueues.
DynamicQueues now allows the
ack_async
andrefresh_interval
virtual fields for parity with standard queues.
Bug Fixes
[Smart] Serialize all meta updates through the producer.
This fixes all of the race conditions and outdated record issues that the mismatch between registry meta and the producer's own meta caused.
[Smart] Force synchronous acking for all Batch jobs.
Batches require accurate status counts to insert callbacks correctly. The slight delay from async jobs can cause incorrect counts in highly active queues with batches.
v1.3.2 — 2024-01-19
Bug Fixes
[Smart] Ensure global queues keep running with
ack_async: false
.Global queues that are marked with
ack_async: false
must refresh the in-memory producer record between job fetching to keep the queue running. Otherwise, tracked jobs linger in the producer record despite successful acking.[Smart] Prevent a race condition while pausing from stopping global queues.
Pausing a global queue while there are pending acks could trigger a write-after-read race condition that lost tracking changes. Eventually, leaked changes could prevent the queue from fetching new jobs because it looked like the global limit was met.
[Smart] Always split
completed
ack queries for recorded jobs.Jobs with different recorded output could mistakenly be written with a single query if they completed within a few
ms
of each other. This changes the grouping mechanism to only bundle simple completions, never recorded completions.
v1.3.1 — 2024-01-17
Bug Fixes
[Smart] Default to synchronous acking when Oban is in a testing mode.
Acking should always be synchronous during tests to prevent flickering failures from race conditions. Previously, acking relied on a failed registry lookup to switch to synchronous mode, which wasn't accurate enough.
[Smart] Default to synchronous acking for
drain_jobs/2
and related test helpers.Draining runs synchronously in the test process, but not in testing mode. This explicitly disables
ack_async
when draining jobs.[DynamicPartitioner] Only sub-partition by date in non-test environments.
To prevent testing errors after migration, the
completed
,cancelled
, anddiscarded
states are sub-partitioned by date only in:dev
and:prod
environments.It's possible to enable date partitioning in other production-like environments with the new
date_partition?
flag.[DynamicPartitioner] Rename existing
args
andmeta
indexes to allow index recreationWhen renaming the existing table to
oban_jobs_old
theargs
andmeta
indexes weren't renamed. That prevented creating those indexes on the new partitioned table, because Postgres detects that those indexes already existed and so it skips their creation.
v1.3.0 — 2024-01-16
Enhancements
[Smart] Skip extra query to "touch" the producer when acking without global or rate limiting enabled. This change reduces overall producer updates from 1 per job to 2 per minute for standard queues.
[Smart] Avoid refetching the local producer's data when fetching new jobs.
Async acking is centralized through the producer, which guarantees global and rate tracking data is up-to-date before fetching without an additional read.
[Smart] Optimize job insertion with fewer iterations.
Iterating through job changesets as a map/reduce with fewer conversions improves inserting 1k jobs by 10% while reducing overall memory by 9%.
[Smart] Efficiently count changesets during
insert_all
.Prevent duplicate iterations through changesets to count unique jobs. Iterating through them once to accumulate multiple counts improved insertion by 3% and reduced overall memory by 2%.
[Smart] Acking cancelled jobs is done with a single operation and limited to queues with global limiting.
Bug Fixes
[Smart] Always merge acked meta updates dynamically.
All meta-updating queries are dynamically merged with existing
meta
. This prevents recorded jobs from clobbering other meta updates made while the job executed.[Smart] Safely extract producer uuid from
attempted_by
with more than two elements[DynamicCron] Preserve stored opts such as
args
,priority
, etc., on reboot when no new opts are set.[Relay] Skip attempting relay notifications when the associated Oban pid isn't alive.