Oban Pro v0.14 Released

Most Oban Pro releases don't get the "blog post" treatment, so why is this release different from all other releases?

First, it contains some mega-features and performance optimizations that shouldn't languish in a changelog (or the snazzy releases section).

Second, this is the final Pro release before the big v1.0! Yes, Pro was pre-1.0, but only in name. We've always treated Pro like a 1.0+ and avoided breaking changes. Now it's time to toss out some lingering deprecations and proudly signal that Pro is mature and stable.

Now, on to the features, starting with one that's been distilling for nearly three years.

⚖️ Horizontal Auto-Scaling Worker Nodes

The new DynamicScaler plugin monitors queue throughput and horizontally scales worker nodes to optimize processing. Horizontal scaling is applied at the node level, not the queue level, so you can distribute processing over more physical hardware. With auto-scaling, you can spin up additional nodes during load spikes, and pare down to a single node during a lull.

The DynamicScaler calculates an optimal scale by predicting the future size of a queue based on throughput per node, not simply available jobs. You provide an acceptable range of node sizes, which queues to track, and a cloud module—auto-scaling takes care of the rest (with observability, of course).

It's designed to integrate with popular cloud infrastructure like AWS, GCP, K8S, and Fly via a simple, flexible behaviour. For example, here we declare auto-scaling rules for two distinct queues and hypothetical AWS autoscaling groups:

{Oban.Pro.Plugins.DynamicScaler,
 scalers: [
   [queues: :reports, range: 0..1, cloud: {MyApp.Cloud, asg: "rep-asg"}],
   [queues: :exports, range: 1..4, cloud: {MyApp.Cloud, asg: "exp-asg"}]
 ]}

With that in place, the reports group can scale down to 0 when the queue is empty and up to 1 when there are jobs available. Meanwhile, exports will never scale below 1 node but can surge up to 4 nodes during a spike.

Beyond filters for scaling by a particular queue, there are scaling steps to optimize responsiveness, and tunable cooldown periods to prevent unnecessary scaling.

💸 Auto-Scaling Can Save Real Money

DynamicScaling based on queue activity is especially exciting because it can peel dollars off your cloud hosting bill. Depending on your scale, auto-scaling can pay for your Pro license by itself (and then some). This table explores the dollars saved by scaling down 16 hours a day, 30 days a month, during off-peak hours across hefty instances on popular clouds:

Cloud	Instance	1 Node Savings	5 Node Savings	10 Node Savings
aws	c7g.2xlarge	$138.72	$693.60	$1387.20
fly	dedicated-cpu-8x	$165.37	$826.85	$1653.70
gcp	c3-standard-8	$200.52	$1002.62	$2005.25

ℹ️ Based on prorated on-demand prices captured in April, 2023

Imagine the potential savings from only scaling up production workers to meet demand, or spinning staging instances over night!

🏗️ Args Schema for Structured Workers

Structured args are indispensable for enforcing an args schema. However, the legacy keyword list syntax with field: :type, implicit enums, and an asterisk symbol for required fields was simply awkward. We're correcting that awkwardness with a new args_schema/1 macro for defining structured workers. The args_schema macro defines a DSL that's a subset of Ecto's schema, optimized for JSON compatibility and without the need for dedicated changeset functions.

Structured args are validated before they're inserted into the database, and cast into structs with defined fields, atom keys, enums, and nested data before processing.

Here's a schema that demonstrates multiple field types, the required option, enums, and embedded structures:

use Oban.Pro.Worker

alias __MODULE__, as: Args

args_schema do
  field :id, :id, required: true
  field :mode, :enum, values: ~w(enabled disabled paused)a

  embeds_one :data do
    field :office_id, :uuid, required: true
    field :addons, {:array, :string}
  end

  embeds_many :address do
    field :street, :string
    field :city, :string
    field :country, :string
  end
end

@impl Oban.Pro.Worker
def process(%{args: %Args{id: id, mode: mode} = args) do
  %Args.Data{office_id: office_id} = args.data
  [%Args.Address{street: street} | _] = args.addresses

  ...
end

The legacy (and legacy-legacy) syntax is still viable and it generates the appropriate field declarations automatically. However, we strongly recommend updating to the new syntax for your own sanity.

🍪 Chunk Partitioning

Previously, chunk workers executed jobs in groups based on size or timeout, with the grouping always consisting of jobs from the same queue, regardless of worker, args, or other job attributes. However, sometimes there's a need for more flexibility in grouping jobs based on different criteria.

To address this, we have introduced partitioning, which allows grouping chunks by worker and/or a subset of args or meta. This improvement enables you to methodically compose chunks of jobs with the same args or meta, instead of running a separate queue for each chunk.

Here's an example that demonstrates using GPT to summarize groups of messages from a particular author:

defmodule MyApp.MessageSummarizer do
  use Oban.Pro.Workers.Chunk,
      by: [:worker, args: :author_id],
      size: 100,
      timeout: :timer.minutes(5)

  @impl true
  def process([%{"author_id" => author_id} | _] = jobs) do
    messages =
      jobs
      |> Enum.map(& &1.args["message_id"])
      |> MyApp.Messages.all()

    {:ok, summary} = MyApp.GPT.summarize(messages)

    # Push the summary
  end
end

By leveraging the enhanced partitioning capabilities, you can now effectively group and process jobs based on specific criteria, providing a more streamlined approach to managing your workloads.

Chunk queries have been optimized for tables of any size to compensate for their newfound advanced complexity. As a result, even with hundreds of thousands of available jobs, queries run in milliseconds.

🗄️ Batch Performance

In short, batches are lazier, their queries are faster, and they'll put less load on your database. If query tuning and performance optimizations are your things, read on for the details!

Some teams run batches containing upwards of 50k and often run multiple batches simultaneously, chewing through millions of jobs a day. That load level exposed some opportunities for performance tuning that we're excited to provide for batches of all sizes.

Debounce batch callback queries so that only one database call is made for each batch within a short window of time. By default, batching debounces for 100ms, but it's configurable at the batch level if you'd prefer to debounce for longer.
Query the exact details needed by a batch's supported callbacks. If a batch only has a cancelled handler, then no other states are checked.
Optimize state-checking queries to avoid using exact counts and ignore completed jobs entirely, as that's typically the state with most jobs.
Use index-powered match queries to find existing callback jobs, again, only for the current batch's supported callbacks.

💛 That's a Wrap

All of the features in this bundle came directly from customer requests. Thanks for helping to refine Pro with your feedback and issues.

See the full Pro Changelog for a complete list of enhancements and bug fixes.

As usual, if you have any questions or comments, ask in the Elixir Forum. For future announcements and insight into what we're working on next, subscribe to our newsletter.