# Job Maintenance Oban stores all jobs in the database, which offers several advantages: - **Durability**: Jobs survive application restarts and crashes - **Visibility**: Administrators can inspect job status and history - **Accountability**: Complete audit trail of job execution ## Pruning Jobs Oban automatically prunes `completed`, `cancelled`, and `discarded` jobs after a configurable period (1 day, by default) to prevent the table from growing indefinitely. A pruner periodically runs in the background to delete jobs that are in a final state and have exceeded the retention period. ### Configuring Pruning You can customize pruning behavior in `oban.toml`: ```toml [pruner] max_age = 3_600 # Keep jobs for 1 hour limit = 50_000 # Delete up to 50k jobs per run ``` Or programmatically when running embedded: ```python oban = Oban( pool=pool, queues={"default": 10}, pruner={"max_age": 3_600, "limit": 50_000} ) ``` ## Rescuing Jobs During deployment or unexpected restarts, jobs may be left in an executing state indefinitely. We call these jobs "orphans", but orphaning isn't a bad thing. It means that the job wasn't lost and it may be retried again when the system comes back online. The "lifeline" process automatically rescues orphaned jobs by periodically checking for jobs stuck in the `executing` state for too long and moving them back to `available` so they can run again. **Lifeline is enabled by default** and runs every 60 seconds, rescuing jobs that have been executing for more than 5 minutes. ### How Rescue Works Oban uses a **timeout-based rescue** strategy: jobs are rescued if their `attempted_at` timestamp is older than the configured `rescue_after` threshold (default: 300 seconds). This approach works reliably across node and doesn't require coordination between nodes. For more accurate rescue that detects crashed producers immediately, and won't rescue jobs that are still legitimately running, see [Oban Pro's Accurate Rescue][adoption]. [adoption]: https://oban.pro/docs/py_pro/adoption.html ### Configuring Lifeline You can customize lifeline behavior in `oban.toml`: ```toml [lifeline] interval = 30 # Check for orphaned jobs every 30 seconds rescue_after = 600 # Rescue jobs executing for more than 10 minutes ``` Or programmatically when running embedded: ```python oban = Oban( pool=pool, queues={"default": 10}, lifeline={"interval": 30, "rescue_after": 600} ) ``` ### Choosing rescue_after The `rescue_after` value should be longer than your longest-running job. If you have jobs that legitimately run for 10 minutes, set `rescue_after` to at least 15 minutes (900 seconds) to avoid premature rescue. ## Maintenance Guidelines - All limits are soft; jobs beyond a specified age may not be pruned immediately after jobs complete. This means, pruning is best-effort and performed out-of-band. - Pruning is only applied to jobs that are `completed`, `cancelled`, or `discarded`. It'll never delete jobs in an incomplete state. - For high-volume systems, consider reducing `max_age` to keep the jobs table smaller, or increasing `limit` to prune more jobs per run.