35+ Years Experience Netherlands Based ⚡ Fast Response Times Ruby on Rails Experts AI-Powered Development Fixed Pricing Available Senior Architects Dutch & English 35+ Years Experience Netherlands Based ⚡ Fast Response Times Ruby on Rails Experts AI-Powered Development Fixed Pricing Available Senior Architects Dutch & English
Rails AI Agents: Multi-Step Autonomous Workflows with Claude and Tool Use

Rails AI Agents: Multi-Step Autonomous Workflows with Claude and Tool Use

Roger Heykoop
Ruby on Rails, Artificial Intelligence
Rails AI agents that actually work in production — agent loop, tool registry, state machines, retries, cost control and observability with Claude tool use.

A founder hired me last winter to “rescue an AI agent.” The pitch was great: a Rails app that took a customer email, looked up the order in their database, checked the carrier API, drafted a refund or a replacement, and sent the response. The demo worked. Production did not. The thing would loop forever on ambiguous emails, hit OpenAI 1,000 times before realizing it was confused, and occasionally email a customer four refunds for the same order. The team called it “the agent” and were two months from running out of Anthropic credits when I walked in.

We rebuilt it in three weeks. The new version processes 4,200 emails a day, costs 71% less, and has not double-refunded anyone in six months. The trick was not a smarter prompt — it was treating the Rails AI agent as a state machine with a budget, not a chat loop with hopes. After nineteen years of Rails I have a strong opinion about agents in 2026: they are a software-engineering problem dressed up as a prompt-engineering problem, and Rails is unusually good at the engineering part.

This post is the architecture I use for production Rails AI agents: the agent loop, the tool registry, state persistence, retries, cost control, and the gotchas that bite teams who copy-paste from a blog post and ship.

What a Rails AI Agent Actually Is

A Rails AI agent is a controller-of-its-own-execution that, given a goal, picks tools to call until the goal is met or a budget is exhausted. The model decides the next step; your code executes it; the result feeds back into the next decision. That is it. Everything else — memory, planning, “reflection” — is built on top of this loop.

Three pieces make the difference between a demo and a production system:

  • A bounded loop. Every turn either calls a tool or returns a final answer. Total turns and total tokens are capped, hard. No unbounded recursion, ever.
  • A tool registry with explicit contracts. Each tool is a Ruby class with a JSON schema, a deterministic executor, and a permission check. The model never touches your database directly.
  • Durable state. The agent’s run lives in Postgres, not in process memory. If the server restarts mid-run, the agent picks up where it left off — or it doesn’t, and a human gets paged.

If you have read my earlier post on LLM function calling in Rails, you already have the foundation. A Rails AI agent is function calling in a loop, with state, budgets, and observability layered on top.

The Agent Loop

The core of any Rails AI agent is the loop. Mine fits in about 80 lines of Ruby and has exactly three exit conditions: the model returns a final answer, we hit the turn budget, or a tool raises an unrecoverable error.

# app/services/agent_runner.rb
class AgentRunner
  MAX_TURNS = 12
  MAX_TOKENS = 100_000

  def initialize(run:, tools: ToolRegistry.default)
    @run = run
    @tools = tools
    @client = Anthropic::Client.new(api_key: ENV.fetch("ANTHROPIC_API_KEY"))
  end

  def call
    @run.start!

    MAX_TURNS.times do |turn|
      response = call_model(messages: @run.messages_for_api)
      @run.record_assistant_turn!(response, turn:)

      return finalize!(response) if response.stop_reason == "end_turn"

      tool_results = execute_tools(response.tool_uses)
      @run.record_tool_results!(tool_results)

      raise BudgetExceeded if @run.tokens_used >= MAX_TOKENS
    end

    @run.fail!(reason: "max_turns_exceeded")
  end

  private

  def call_model(messages:)
    @client.messages.create(
      model: "claude-sonnet-4-6",
      max_tokens: 4_096,
      system: AgentPrompt.for(@run),
      tools: @tools.to_anthropic_schema,
      messages: messages
    )
  end

  def execute_tools(tool_uses)
    tool_uses.map do |tu|
      tool = @tools.find!(tu.name)
      tool.authorize!(@run.user)
      result = tool.call(**tu.input.symbolize_keys)
      { tool_use_id: tu.id, content: result.to_s, is_error: false }
    rescue ToolError => e
      { tool_use_id: tu.id, content: e.message, is_error: true }
    end
  end

  def finalize!(response)
    text = response.content.find { |c| c.type == "text" }&.text
    @run.complete!(final_answer: text)
    text
  end
end

A few things to notice. The loop is bounded by MAX_TURNS, not by “until the model is done.” Token usage is checked every turn against a hard ceiling — Claude’s tool-use output is JSON-heavy and runs hot. Tool errors do not crash the loop; they feed back to the model as is_error: true, which lets the model recover or give up. And the run is persisted on every turn, so a process restart does not lose work.

This is the boring core. The interesting work happens in the tool registry and the state model.

The Tool Registry

The single biggest mistake I see in Rails AI agent code is letting the model talk to Active Record directly. “Just give it a find_user tool that takes any SQL.” That’s not an agent, that’s a SQL-injection vector with a budget.

Every tool I ship has four properties: a stable name, a JSON schema for inputs, a permission check tied to the agent’s user, and an executor that returns a small structured result. The registry is a plain Ruby class:

# app/agents/tools/base.rb
module Tools
  class Base
    class_attribute :tool_name, :description, :input_schema

    def initialize(run)
      @run = run
    end

    def authorize!(user)
      raise UnauthorizedTool unless permitted_for?(user)
    end

    def permitted_for?(user) = true

    def call(**) = raise NotImplementedError

    def self.to_anthropic_schema
      {
        name: tool_name,
        description: description,
        input_schema: input_schema
      }
    end
  end
end

A concrete tool — looking up an order by number for the customer-support agent:

# app/agents/tools/lookup_order.rb
module Tools
  class LookupOrder < Base
    self.tool_name = "lookup_order"
    self.description = "Look up an order by its order number. Returns status, " \
                       "items, total, and the most recent shipping event."
    self.input_schema = {
      type: "object",
      properties: {
        order_number: { type: "string", description: "Order number, e.g. ORD-12345" }
      },
      required: ["order_number"]
    }

    def permitted_for?(user)
      user.role.in?(%w[support admin])
    end

    def call(order_number:)
      order = Order.find_by(number: order_number)
      return { error: "not_found" } if order.nil?
      return { error: "forbidden" } unless order.account_id == @run.account_id

      OrderSummarySerializer.new(order).as_json
    end
  end
end

Three things this gets right. The tool only returns the data the agent needs — the serializer is intentional. The permission check ties the tool to the run’s user, not the model’s whim. And the result is small JSON; you do not feed the agent a 4 KB ActiveRecord blob unless you enjoy paying for tokens.

The registry itself is just a list:

# app/agents/tool_registry.rb
class ToolRegistry
  def self.default
    new([
      Tools::LookupOrder,
      Tools::CheckShipmentStatus,
      Tools::IssueRefund,
      Tools::DraftReply
    ])
  end

  def initialize(tool_classes)
    @tool_classes = tool_classes
  end

  def find!(name)
    klass = @tool_classes.find { |c| c.tool_name == name }
    raise UnknownTool, name if klass.nil?
    klass.new(@run)
  end

  def to_anthropic_schema
    @tool_classes.map(&:to_anthropic_schema)
  end
end

I namespace tools by domain (Tools::Support::*, Tools::Billing::*) once an app has more than ~15. Below that, flat is fine.

Durable State With Active Record

Demos keep agent state in memory. Production keeps it in Postgres. An AgentRun model is the spine of a Rails AI agent:

# db/migrate/20260506_create_agent_runs.rb
class CreateAgentRuns < ActiveRecord::Migration[8.0]
  def change
    create_table :agent_runs do |t|
      t.references :user, null: false, foreign_key: true
      t.references :account, null: false, foreign_key: true
      t.string :agent_kind, null: false
      t.string :status, null: false, default: "pending"
      t.string :failure_reason
      t.jsonb :input, null: false, default: {}
      t.jsonb :messages, null: false, default: []
      t.text :final_answer
      t.integer :turns_used, null: false, default: 0
      t.integer :tokens_used, null: false, default: 0
      t.integer :cost_cents, null: false, default: 0
      t.timestamps
    end

    add_index :agent_runs, [:status, :agent_kind]
    add_index :agent_runs, :created_at
  end
end

Status uses a state machine. I keep it explicit:

# app/models/agent_run.rb
class AgentRun < ApplicationRecord
  STATUSES = %w[pending running completed failed cancelled].freeze
  validates :status, inclusion: { in: STATUSES }

  belongs_to :user
  belongs_to :account

  def start!
    update!(status: "running")
  end

  def complete!(final_answer:)
    update!(status: "completed", final_answer:)
  end

  def fail!(reason:)
    update!(status: "failed", failure_reason: reason)
  end

  def record_assistant_turn!(response, turn:)
    self.messages = messages + [serialize_assistant(response)]
    self.turns_used = turn + 1
    self.tokens_used += response.usage.input_tokens + response.usage.output_tokens
    self.cost_cents += CostCalculator.cents_for(response.usage)
    save!
  end

  def record_tool_results!(results)
    self.messages = messages + [{ role: "user", content: results }]
    save!
  end

  def messages_for_api
    [{ role: "user", content: input.fetch("prompt") }] + messages
  end
end

Every turn is persisted before the next API call. If the worker crashes between turns, a recovery job picks up running runs older than five minutes and either resumes them or marks them failed. This single decision — Postgres as the agent’s memory — is what makes a Rails AI agent safe to run on production traffic.

I run the agent inside a Solid Queue background job. Inline execution from a controller is fine for prototyping; production agents need retries, queues, and concurrency limits.

Cost Control and Prompt Caching

Agents are expensive. A naive customer-support agent ran us at €0.18 per email until we added two things: prompt caching and aggressive system-prompt trimming.

The system prompt for a Rails AI agent is long — it lists every tool, the policy, the guardrails. Caching it means you pay full price once per five minutes and 10% on every subsequent call. The savings on a 12-turn run are substantial:

@client.messages.create(
  model: "claude-sonnet-4-6",
  max_tokens: 4_096,
  system: [
    {
      type: "text",
      text: AgentPrompt.policy_block,
      cache_control: { type: "ephemeral" }
    },
    {
      type: "text",
      text: AgentPrompt.tools_block(@tools),
      cache_control: { type: "ephemeral" }
    }
  ],
  tools: @tools.to_anthropic_schema,
  messages: messages
)

I covered the full pattern in my prompt caching guide. For agents specifically, cache the policy and the tool definitions; never cache user input. Combined with smaller tool result payloads, we cut cost per run by 71%.

Failure Modes That Kill Agents in Production

These are the failures I see on every Rails AI agent rescue engagement.

Infinite tool loops. The model keeps calling lookup_order with slightly different inputs because the policy is unclear. Fix: hard turn cap, plus a same-tool-same-input deduplication check that returns “you already called this tool with these arguments — try a different approach.”

Hallucinated tools. The model invents a tool name. Fix: the registry raises UnknownTool, the loop returns the error to the model, and 95% of the time it recovers. Log every UnknownTool — it usually means your tool descriptions are bad.

Permission drift. The agent has tools its user role should not have. Fix: authorize! on every tool call, scoped to the run’s user. Treat the model as an untrusted client.

Token blowup from tool results. A tool returns 200 KB of JSON; the next turn explodes. Fix: every tool returns a result smaller than 4 KB, paginated if necessary. If a tool needs to return more, save the data, return a reference token, and add a read_data(token) tool.

Silent overruns. The agent finishes “successfully” but blew past the budget on the way. Fix: cost and tokens are first-class fields on AgentRun, alerted on at the 80th percentile, dashboards in Grafana. You will not control what you do not measure.

Restart-induced double-actions. A tool call sent an email, the worker crashed, the run resumes and calls it again. Fix: side-effect tools are idempotent — keyed by agent_run_id and a stable operation key. Same as webhook idempotency, exactly.

When to Build a Rails AI Agent — and When Not To

Agents are not the answer to every AI feature. The honest test I run with clients: if a deterministic state machine, a single prompt, or a RAG lookup can solve the problem, do not build an agent. Agents are the right tool when the path branches based on what you find — when step three depends on what step two returned, and you cannot enumerate the branches up front.

Examples where agents earn their cost:

  • Customer support triage that decides between refund, replacement, escalation, or human handoff.
  • Research workflows that combine web search, database lookups, and summarization in unpredictable orders.
  • Code review or QA bots that read files, run checks, and decide what to flag.
  • Sales-ops automations that look up an account, check open opportunities, and draft a follow-up.

Examples where I push back:

  • “Generate a description from these fields.” That’s a single prompt.
  • “Find related documents.” That’s RAG, not an agent.
  • “Process 50,000 records overnight.” That’s Anthropic’s batch API, not a real-time agent.
  • Anything where the cost of a wrong action is high and the model has no human in the loop. Agents that can issue refunds need explicit human approval gates above a threshold. Always.

Production Setup I Recommend

A Rails AI agent I would put on real traffic looks like this: Solid Queue worker dedicated to agent runs, concurrency capped at 5 per user and 50 globally, a cancel! endpoint that flips the run status and short-circuits the next turn check, prompt caching on system + tools blocks, structured logging on every tool call, and a Grafana dashboard with turns-per-run, tokens-per-run, cost-per-run, and failure-reason breakdown.

I monitor three metrics obsessively: p95 turns-per-run (should trend down as prompts mature), p95 cost-per-run (should be flat), and tool error rate (should be under 5%). When any of those drifts, I pull up the last hundred runs and read them. Agent debugging is reading transcripts; tooling does not replace that yet.

FAQ

What’s the difference between a Rails AI agent and just calling an LLM with tools?

An agent runs the tool-use loop autonomously across multiple turns until a goal is achieved or a budget is exhausted. A single LLM-with-tools call returns one tool selection or one answer. Agents add the loop, the state persistence, the budget controls, and the recovery logic. If you only need one tool call, you do not need an agent — you need function calling.

Should I use LangChain or build my own Rails AI agent framework?

For Rails apps, build your own — it is 200 lines of Ruby and you control every primitive. LangChain solves Python-ecosystem problems Ruby does not have, and it adds abstractions that make production debugging harder. The Anthropic and OpenAI Ruby SDKs give you everything you need; the agent loop is genuinely a small piece of code.

How do I keep a Rails AI agent from running up a huge bill?

Hard caps on turns, hard caps on tokens, prompt caching on the system prompt, and per-user concurrency limits. Track cost_cents on every run and alert when the daily spend per agent kind exceeds a threshold. I have never seen a runaway bill from an agent that had a MAX_TOKENS check on every iteration of the loop.

Can I run a Rails AI agent against open-source models instead of Claude?

Yes — the loop is model-agnostic. The pinch point is tool-use reliability. Frontier models (Claude, GPT-4-class) call tools correctly 95-99% of the time; smaller open-source models drop to 70-85% and start hallucinating tool names or arguments. Use open-source models for the cheap parts of the pipeline (extraction, classification) and Claude for the orchestration layer that decides which tool to call.

Building or rescuing a Rails AI agent? TTB Software ships AI features inside Rails apps as a fractional CTO engagement — agents, RAG, tool use, the production hardening that turns demos into systems. Nineteen years of Rails, with the receipts.

#rails-ai-agents #ai-agents-ruby #claude-tool-use #llm-multi-step-workflows #autonomous-agents-rails #ruby-on-rails #rails-8
R

About the Author

Roger Heykoop is a senior Ruby on Rails developer with 19+ years of Rails experience and 35+ years in software development. He specializes in Rails modernization, performance optimization, and AI-assisted development.

Get in Touch

Share this article

Need Expert Rails Development?

Let's discuss how we can help you build or modernize your Rails application with 19+ years of expertise

Schedule a Free Consultation