Changelog for Oban Met v0.1.0

Initial release!

v0.1.11 — 2024-10-16

[Reporter] Speed up estimates by caching queues.
Pulling queue names from the oban_jobs table causes needlessly expensive sequential scans. Now queue names are extracted periodically and used directly for estimates. The same optimization is applied to the oban_producers table, when available, for a minor performance boost.

[Reporter] Explicitly prefix jobs table and state in estimate.
The previous correction wasn't enough for systems that lack a jobs table in the public schema. The current search path doesn't propagate to the table or state enum in the estimate function. This explicitly uses the prefix internally to ensure the correct table and state are queried.

[Reporter] Use the correct prefixed count estimate function.
A function was created in the correct prefix, but the query didn't automatically select that version of the function. This adds an explicit reference to the configured prefixed function.

[Examiner] Add catch-all handle_info clause to prevent crashing with unexpected notifications.

[Recorder] Explicitly trigger garbage collection after compaction.
Hibernating doesn't guarantee garbage collection in a highly active system. Now the Recorder process triggers garbage collection after it compacts stored metrics to ensure retained binaries are released.

[Met] Add Met.crontab/1 for distributed crontab tracking and aggregation.
The new Met.crontab/1 function exposes tracked, merged, and normalized crontabs from all connected nodes in a cluster.
[Examiner] Broadcast queue and cron gossip immediately after init.
Queue and cron data wasn't broadcast until after the first interval. This was more obvious for cron because it only broadcasts every 15s, but it also benefits queue responsiveness.

[Recorder] Prevent excessive message queue depth in OTP 26.2.5.
A bug fix in OTP 26.2.5 made the performance of :ets.select/3 with a map in the key much slower, effectively linear with the number of objects in the able. That performance degredation as enough to bottleneck the Oban.Met.Recorder message queue, which bloats the process and may cause OOM errors for active systems.
This restores, and even improves, the performance of :ets.select/3 for recorder operations by moving label maps out of the recorded oject key.

[Reporter] Increase count frequency by tweaking backoff times.
The leader node now counts larger queues and states more frequently than before by using a smaller expontential backoff.
- The minimum value for clamping is now 10k, 10x larger than it was before.
- The exponential factor is now 2 instead of 3, with a maximum of 128s between counts.
[Met] Add tests and documentation on running Oban.Met as an Oban plugin.
It's possible, even encouraged, to start Oban.Met as an Oban plugin in order to avoid auto-start race conditions.

[Met] Monitor Oban instances and synchronize shutdown.
Auto-started Met instances may outlive the Oban instance they're linked to, which causes a variety of registry errors when the original process has shutdown. To prevent that, a separate process now monitors the linked Oban supervisor process and coordinates shutting down the Met supervisor.

[Met] Start Met on boot for running oban instances
It's common for oban and oban_met to start in separate applications under an umbrella. When oban_met started after oban, then Met missed the telemetry event and can't start a Met supervisor.
This adds a task on boot that starts a Met instance for any running Oban isntances.
[Recorder] Hibernate recorder process after compact cycle
The Recorder process "touches" large batches of JSON received from Reporter processes, but it doesn't operate on the data often enough to trigger a full GC. The entire mechanics are explained in this post on the ElixirForum.
Now, the Recorder hibernates after compacting to trigger a fullsweep garbage collection.

[Recorder] Differentiate max/sum/pct operations for timeslice
Some gauges should be displayed as a sum (exec count) while others should be a maximum (full count). Now timeslicing can differentiate between the two, and values types gained sum/1 and union/2 functions to make it possible.

[Reporter] Reset reported counts whenever they're checkable
In situations where there wasn't anything new to count, e.g. an empty queue or state, the old checks lingered until there was something to count again. Now any checkable counts are reset to an empty state before storage to ensure we reset back to 0 without new data.