Job Maintenance#

Oban stores all jobs in the database, which offers several advantages:

  • Durability: Jobs survive application restarts and crashes

  • Visibility: Administrators can inspect job status and history

  • Accountability: Complete audit trail of job execution

Pruning Jobs#

Oban automatically prunes completed, cancelled, and discarded jobs after a configurable period (1 day, by default) to prevent the table from growing indefinitely. A pruner periodically runs in the background to delete jobs that are in a final state and have exceeded the retention period.

Configuring Pruning#

You can customize pruning behavior in oban.toml:

[pruner]
max_age = 3_600 # Keep jobs for 1 hour
limit = 50_000  # Delete up to 50k jobs per run

Or programmatically when running embedded:

oban = Oban(
    pool=pool,
    queues={"default": 10},
    pruner={"max_age": 3_600, "limit": 50_000}
)

Rescuing Jobs#

During deployment or unexpected restarts, jobs may be left in an executing state indefinitely. We call these jobs “orphans”, but orphaning isn’t a bad thing. It means that the job wasn’t lost and it may be retried again when the system comes back online.

The “lifeline” process automatically rescues orphaned jobs by periodically checking for jobs stuck in the executing state for too long and moving them back to available so they can run again.

Lifeline is enabled by default and runs every 60 seconds, rescuing jobs that have been executing for more than 5 minutes.

How Rescue Works#

Oban uses a timeout-based rescue strategy: jobs are rescued if their attempted_at timestamp is older than the configured rescue_after threshold (default: 300 seconds). This approach works reliably across node and doesn’t require coordination between nodes.

For more accurate rescue that detects crashed producers immediately, and won’t rescue jobs that are still legitimately running, see Oban Pro’s Accurate Rescue.

Configuring Lifeline#

You can customize lifeline behavior in oban.toml:

[lifeline]
interval = 30       # Check for orphaned jobs every 30 seconds
rescue_after = 600  # Rescue jobs executing for more than 10 minutes

Or programmatically when running embedded:

oban = Oban(
    pool=pool,
    queues={"default": 10},
    lifeline={"interval": 30, "rescue_after": 600}
)

Choosing rescue_after#

The rescue_after value should be longer than your longest-running job. If you have jobs that legitimately run for 10 minutes, set rescue_after to at least 15 minutes (900 seconds) to avoid premature rescue.

Maintenance Guidelines#

  • All limits are soft; jobs beyond a specified age may not be pruned immediately after jobs complete. This means, pruning is best-effort and performed out-of-band.

  • Pruning is only applied to jobs that are completed, cancelled, or discarded. It’ll never delete jobs in an incomplete state.

  • For high-volume systems, consider reducing max_age to keep the jobs table smaller, or increasing limit to prune more jobs per run.