RUBY ON RAILS · 17 MIN READ · 30 APR 2026

Anthropic Message Batches in Rails: Cut Claude API Costs 50% with Async Batch Processing

Anthropic Message Batches in Rails: cut Claude API costs 50% with async batch processing, Solid Queue polling and idempotent result handling. Production patterns.

A founder I work with runs a content classification pipeline that pushes about forty thousand documents a day through Claude. They were paying just under twelve thousand dollars a week for it. The CFO wanted to know if there was a cheaper way before he renewed the budget. There was: Anthropic Message Batches. We moved the pipeline over a Friday afternoon, the next week’s bill was just under six thousand, and the only behavioural change anyone noticed was that results landed twenty minutes after upload instead of two seconds after each request.

After nineteen years of Rails I have built a lot of “send this to a third party API in the background and store the result” pipelines, and Anthropic Message Batches is the cleanest version of that pattern I have seen for LLM workloads. If you are running any kind of bulk classification, summarisation, extraction, or evaluation against Claude, and you do not need a synchronous answer, the Anthropic Message Batches API is the lever to pull. This is the production playbook.

What Anthropic Message Batches Actually Are

The Claude API has two delivery modes. The synchronous Messages API is what most Rails apps start with — you POST /v1/messages, you get a response in a few seconds, you write it to the database. That is fine for chat, low-volume agents, and anything user-facing where a human is waiting.

Anthropic Message Batches is the async cousin. You upload a JSONL-shaped batch of up to ten thousand messages or 256 MB of payload in a single request. Anthropic acknowledges the batch, processes it within twenty-four hours (usually much faster — minutes for small batches), and then exposes a results URL with one JSONL line per request. Every call inside the batch costs fifty percent of the synchronous price, including cached tokens. The 50% discount stacks with prompt caching, so a batched call hitting a warm cache costs five percent of the synchronous, uncached baseline.

There are three failure modes you need to design around: individual requests inside a batch can fail while the batch as a whole succeeds, the batch can be cancelled, and the batch can expire if Anthropic cannot finish it inside the window. None of these are exotic — they are the same operational concerns as any other async pipeline — but Rails apps usually start with synchronous calls, and the mental model has to shift.

The Anthropic Ruby SDK exposes batches as client.messages.batches. You create a batch, you poll for its status, you stream results when it finishes:

client = Anthropic::Client.new

batch = client.messages.batches.create(
  requests: [
    {
      custom_id: "doc-1",
      params: {
        model: "claude-sonnet-4-6",
        max_tokens: 1024,
        messages: [{ role: "user", content: "Classify: ..." }]
      }
    }
  ]
)

batch.id           # => "msgbatch_01..."
batch.processing_status  # => "in_progress"

The custom_id is the only thing you control inside the batch envelope, and it is the single most important thing in the design. It is how you reconcile results with the originating Rails records.

When Anthropic Message Batches Pay Off

The math on Anthropic Message Batches is honestly easier than prompt caching. Half price, no break-even, no warm-up. The only question is whether your workload tolerates async delivery.

These are the Rails workloads where it is a clear win. Nightly enrichment of records that came in during the day — classify, tag, embed metadata, or summarise. Re-processing historical data after a prompt change. Bulk evaluations and red-team runs against a model release. Generating alt text, descriptions, or seo blurbs for a content library. Anything that ends with “…and then stash the result on the record.”

The places it does not fit are the obvious ones. User-facing chat where a human is waiting. Tool-using agents that need to react to model output and decide the next call. Streaming responses. Workloads where the upstream system needs the answer to make a synchronous decision. For these, stay on the regular Messages API and lean on prompt caching for cost.

Where it gets interesting is the middle ground. A “submit a job, get an email when it is done” workflow inside a SaaS app maps perfectly onto Anthropic Message Batches. So does a “process this CSV of leads through Claude and write back to HubSpot” import. If the user can wait minutes, you should be batching.

Building the Anthropic Message Batches Pipeline in Rails

Here is the production shape I keep returning to. One Rails model for the batch envelope, one for each individual request, a Solid Queue job for submission, a polling job for status, and an idempotent result handler. Nothing exotic.

# db/migrate/20260430000001_create_claude_batches.rb
class CreateClaudeBatches < ActiveRecord::Migration[8.0]
  def change
    create_table :claude_batches do |t|
      t.string  :anthropic_id, index: { unique: true }
      t.string  :status, null: false, default: "pending"
      t.integer :request_count, null: false, default: 0
      t.integer :succeeded_count, null: false, default: 0
      t.integer :errored_count, null: false, default: 0
      t.datetime :submitted_at
      t.datetime :ended_at
      t.timestamps
    end

    create_table :claude_batch_requests do |t|
      t.references :claude_batch, null: false, foreign_key: true
      t.references :subject, polymorphic: true, null: false
      t.string  :custom_id, null: false
      t.jsonb   :params, null: false, default: {}
      t.string  :result_status
      t.jsonb   :result_payload
      t.timestamps

      t.index [:claude_batch_id, :custom_id], unique: true
    end
  end
end

The polymorphic subject is the Rails record the request is about — a Document, a Lead, a Product, whatever. The custom_id is what we send to Anthropic, and we make it deterministic so retries are safe.

class ClaudeBatchRequest < ApplicationRecord
  belongs_to :claude_batch
  belongs_to :subject, polymorphic: true

  before_validation :assign_custom_id, on: :create

  private

  def assign_custom_id
    self.custom_id ||= "#{subject_type.underscore}-#{subject_id}-#{SecureRandom.hex(4)}"
  end
end

The submission job builds the JSONL payload, ships it, and stores the Anthropic batch id. I keep the body construction in a plain Ruby object rather than the job itself — easier to test, easier to swap models later.

class ClaudeBatchSubmitter
  def initialize(claude_batch)
    @claude_batch = claude_batch
    @client = Anthropic::Client.new
  end

  def call
    requests = @claude_batch.claude_batch_requests.map do |req|
      { custom_id: req.custom_id, params: req.params }
    end

    response = @client.messages.batches.create(requests: requests)

    @claude_batch.update!(
      anthropic_id: response.id,
      status: "in_progress",
      request_count: requests.size,
      submitted_at: Time.current
    )

    ClaudeBatchPollJob.set(wait: 30.seconds).perform_later(@claude_batch.id)
  end
end

Polling is the part everyone wants to over-engineer. The Anthropic API does not push webhooks for batches, so you have to poll. Solid Queue makes this cheap because re-enqueuing a job with wait: is a single insert into the database. I poll every thirty seconds for the first five minutes, then back off to a minute, then to five.

class ClaudeBatchPollJob < ApplicationJob
  queue_as :claude_batches

  def perform(claude_batch_id)
    batch = ClaudeBatch.find(claude_batch_id)
    return if batch.status == "ended"

    response = Anthropic::Client.new.messages.batches.retrieve(batch.anthropic_id)

    case response.processing_status
    when "in_progress"
      reschedule(batch)
    when "ended"
      ClaudeBatchResultIngestJob.perform_later(batch.id)
    when "canceling", "canceled", "expired"
      batch.update!(status: response.processing_status, ended_at: Time.current)
    end
  end

  private

  def reschedule(batch)
    age = Time.current - batch.submitted_at
    delay = case age
            when 0..5.minutes then 30.seconds
            when 5.minutes..30.minutes then 1.minute
            else 5.minutes
            end
    self.class.set(wait: delay).perform_later(batch.id)
  end
end

Result ingestion is where idempotency matters. Anthropic exposes results as a streaming JSONL endpoint. You read line by line, look up the request by custom_id, and write the outcome. If the job dies halfway through and re-runs, the unique index on [claude_batch_id, custom_id] plus an if request.result_status.nil? guard keeps you safe.

class ClaudeBatchResultIngestJob < ApplicationJob
  queue_as :claude_batches

  def perform(claude_batch_id)
    batch = ClaudeBatch.find(claude_batch_id)
    client = Anthropic::Client.new

    client.messages.batches.results(batch.anthropic_id).each do |entry|
      ingest_one(batch, entry)
    end

    batch.update!(
      status: "ended",
      ended_at: Time.current,
      succeeded_count: batch.claude_batch_requests.where(result_status: "succeeded").count,
      errored_count: batch.claude_batch_requests.where.not(result_status: "succeeded").count
    )
  end

  private

  def ingest_one(batch, entry)
    request = batch.claude_batch_requests.find_by(custom_id: entry.custom_id)
    return unless request
    return if request.result_status.present?

    request.update!(
      result_status: entry.result.type,
      result_payload: entry.result.to_h
    )

    ClaudeBatchRequestProcessor.new(request).call if entry.result.type == "succeeded"
  end
end

The downstream processor is application-specific — write the classification to the Document, attach the embedding, send the notification. Keep it boring. The whole win of Anthropic Message Batches is moving cost out of the synchronous path; do not give that win back by making the result handler complicated.

Anthropic Message Batches and Prompt Caching Together

This is the part most teams miss. Anthropic Message Batches pricing stacks with prompt caching. Cached input tokens inside a batch are billed at 0.05x of the synchronous uncached rate — half of half of base input. If your batch shares a system prompt or a large preamble across thousands of requests, structure it so the prefix is identical and add cache_control: { type: "ephemeral" } on the last shared block.

shared_system = [
  { type: "text", text: long_system_prompt,
    cache_control: { type: "ephemeral" } }
]

requests = documents.map do |doc|
  {
    custom_id: "doc-#{doc.id}",
    params: {
      model: "claude-sonnet-4-6",
      max_tokens: 1024,
      system: shared_system,
      messages: [{ role: "user", content: "Classify: #{doc.body}" }]
    }
  }
end

The order matters. Anthropic hashes the prefix up to each cache marker, so anything that varies between requests has to come after the marker. If you put the document body inside the cached system, you have just turned every request into a cache miss and burned the discount.

For a forty-thousand-document daily run with an eight-thousand-token shared system prompt, the difference between cached batches and uncached batches is roughly an order of magnitude on the input bill. I covered the cache mechanics in detail in Anthropic Prompt Caching in Rails — the same patterns apply inside batches with the additional 50% discount on top.

Operational Gotchas with Anthropic Message Batches

Five things have bitten me in production. None of them are subtle once you know to look for them.

Batch size limits matter. You cannot stuff a hundred thousand requests into one batch. The cap is ten thousand requests or 256 MB. Above that you have to split, and the splitting has to be deterministic so retries do not double-process. I shard by subject_id % batch_count and store the shard on the ClaudeBatch row.

Rate limits on batch creation are separate from synchronous rate limits. You can have plenty of synchronous tokens left and still get throttled when submitting batches. Wrap the create call in retry-with-backoff and treat 429s as a signal to slow down submission, not to fail the job.

Individual request errors do not fail the batch. A batch can land with five thousand succeeded and five thousand errored and Anthropic will report the batch itself as ended cleanly. You have to inspect each result. The most common per-request error I see is overloaded_error — usually safe to re-batch the failures with a delay.

The 24-hour expiry is a real expiry. If Anthropic does not finish your batch inside the window, the requests that did not complete come back as expired and you do not get charged for them. But you also do not get a result. Always plan for partial completion and have a re-submission path.

Cost reporting lags. The Anthropic dashboard’s per-batch cost takes longer to appear than synchronous spend. Do not size your savings off the live dashboard the day you ship — wait a week, look at the invoice.

Frequently Asked Questions

How much does the Anthropic Message Batches API actually save versus synchronous Claude API calls?

Anthropic Message Batches are billed at fifty percent of the synchronous Messages API rate for both input and output tokens. The discount applies to every model and stacks with prompt caching, so a batched request hitting a warm cache costs roughly five percent of the synchronous, uncached baseline. For workloads that already use prompt caching, batching is the single biggest remaining cost lever.

What is the maximum batch size for Anthropic Message Batches?

A single Anthropic Message Batches submission is capped at ten thousand requests or 256 MB of payload, whichever you hit first. Above that you have to split into multiple batches. The processing window is up to twenty-four hours from creation, though small batches typically complete in minutes.

How do I handle individual request failures inside an Anthropic Message Batches result set?

Inspect the result.type of each entry in the streamed JSONL response. Possible values are succeeded, errored, canceled, and expired. The batch itself is marked ended even when individual requests fail, so you must iterate every line and decide per-request whether to re-submit. The most common transient failure is overloaded_error, which is safe to re-batch with a backoff.

Can I use Anthropic Message Batches with prompt caching and the Anthropic Ruby SDK at the same time?

Yes. Add cache_control: { type: "ephemeral" } on the shared prefix of each request inside the batch, exactly as you would for synchronous calls. Cached input tokens inside a batch are billed at 0.05x the synchronous uncached rate. The Anthropic Ruby SDK passes the cache control field through unchanged, so the pattern is identical to non-batched code.

When should I not use Anthropic Message Batches?

Anything user-facing where a human is waiting on the response, any agentic loop where the next call depends on the model output, and any streaming workload. Stay on the synchronous Messages API for those and use prompt caching for cost. Batching is for “submit ten thousand jobs, come back later, write the results to the database” work — not interactive traffic.

Need help cutting LLM costs in production Rails? TTB Software specializes in reliable AI integrations for Rails apps — we build the batch pipelines, caching, and async infrastructure that makes Claude affordable at scale. We have been doing Rails for nineteen years.

#anthropic-message-batches #claude-api-batch-processing #anthropic-ruby-sdk #rails-llm-cost-reduction #claude-batch-api #rails-async-llm #ruby-on-rails

RUBY ON RAILS

Rails GraphQL: Production Setup with graphql-ruby, Batch Loading, and Persisted Queries

Rails GraphQL with graphql-ruby done right — schema design, N+1 prevention with batch loading, persisted queries, and...

18 MAY 2026

RUBY ON RAILS

Rails Postgres EXPLAIN ANALYZE: Reading Query Plans to Fix Slow Rails Queries

Rails Postgres EXPLAIN ANALYZE reveals where queries spend their time. Read plans, spot Seq Scans, fix N+1s, and tune...

17 MAY 2026

RUBY ON RAILS

Streaming Claude Responses in Rails: SSE, Turbo Streams, and Real-Time AI Chat

Stream Claude responses in Rails with SSE and Turbo Streams. Token-by-token AI chat UI, backpressure, reconnects, and...

15 MAY 2026

Anthropic Message Batches in Rails: Cut Claude API Costs 50% with Async Batch Processing

What Anthropic Message Batches Actually Are

When Anthropic Message Batches Pay Off

Building the Anthropic Message Batches Pipeline in Rails

Anthropic Message Batches and Prompt Caching Together

Operational Gotchas with Anthropic Message Batches

Frequently Asked Questions

How much does the Anthropic Message Batches API actually save versus synchronous Claude API calls?

What is the maximum batch size for Anthropic Message Batches?

How do I handle individual request failures inside an Anthropic Message Batches result set?

Can I use Anthropic Message Batches with prompt caching and the Anthropic Ruby SDK at the same time?

When should I not use Anthropic Message Batches?

Related Articles

Rails GraphQL: Production Setup with graphql-ruby, Batch Loading, and Persisted Queries

Rails Postgres EXPLAIN ANALYZE: Reading Query Plans to Fix Slow Rails Queries

Streaming Claude Responses in Rails: SSE, Turbo Streams, and Real-Time AI Chat

It's a phone call. That's the worst it can get.