LLM Function Calling in Rails: Teaching the Model to Use Your App
A client came to me last year with a familiar brief: “We want to add AI to our CRM.” I assumed they wanted a chatbot. What they actually needed was an LLM that could query their customer records, create follow-up tasks, send templated emails, and summarize account history—all through existing Rails models.
The technology that made this possible is called function calling (OpenAI’s term) or tool use (Anthropic’s). The idea is simple: you describe your application’s capabilities to the model in structured JSON, and when the model needs to do something, it tells you which function to call and with what arguments, rather than hallucinating an answer or outputting unstructured text you have to parse.
After building several of these integrations in production Rails apps, I have strong opinions about what works and what doesn’t.
What Function Calling Actually Is
When you call the OpenAI or Anthropic API normally, the model generates text. With tool use, the conversation can pause mid-stream: the model outputs a structured tool call, your code executes something real, you return the result, and the model continues.
Here’s the simplest possible example using the ruby-openai gem:
client = OpenAI::Client.new(access_token: ENV["OPENAI_API_KEY"])
tools = [
{
type: "function",
function: {
name: "get_customer",
description: "Look up a customer record by email address",
parameters: {
type: "object",
properties: {
email: {
type: "string",
description: "The customer's email address"
}
},
required: ["email"]
}
}
}
]
response = client.chat(
parameters: {
model: "gpt-4o",
messages: [
{ role: "user", content: "What's the status of roger@example.com's account?" }
],
tools: tools,
tool_choice: "auto"
}
)
If the model decides it needs customer data, response.dig("choices", 0, "message", "tool_calls") will contain a tool call instead of (or alongside) text. Your job is to detect that, dispatch to your actual code, and loop.
The Dispatch Loop
The interesting part is the loop. A single user message can require multiple tool calls—the model checks the customer, then looks up their recent orders, then decides to create a follow-up task. You need to handle all of these.
class AiCrmService
TOOLS = [
{
type: "function",
function: {
name: "get_customer",
description: "Look up a customer by email address. Returns account status, plan, and MRR.",
parameters: {
type: "object",
properties: { email: { type: "string" } },
required: ["email"]
}
}
},
{
type: "function",
function: {
name: "create_task",
description: "Create a follow-up task for a customer. Use when the user asks to schedule or remember something.",
parameters: {
type: "object",
properties: {
customer_id: { type: "integer" },
description: { type: "string" },
due_date: { type: "string", description: "ISO 8601 date, e.g. 2026-04-15" }
},
required: ["customer_id", "description", "due_date"]
}
}
}
].freeze
def initialize(user)
@user = user
@client = OpenAI::Client.new(access_token: ENV["OPENAI_API_KEY"])
@messages = []
end
def run(user_message)
@messages << { role: "user", content: user_message }
loop do
response = @client.chat(
parameters: {
model: "gpt-4o",
messages: @messages,
tools: TOOLS,
tool_choice: "auto"
}
)
assistant_message = response.dig("choices", 0, "message")
@messages << assistant_message
tool_calls = assistant_message["tool_calls"]
break unless tool_calls
tool_calls.each do |tool_call|
result = dispatch(tool_call)
@messages << {
role: "tool",
tool_call_id: tool_call["id"],
content: result.to_json
}
end
end
@messages.last["content"]
end
private
def dispatch(tool_call)
name = tool_call.dig("function", "name")
args = JSON.parse(tool_call.dig("function", "arguments"))
case name
when "get_customer"
customer = @user.accessible_customers.find_by(email: args["email"]&.downcase&.strip)
return { error: "No customer found with that email" } unless customer
customer_as_tool_result(customer)
when "create_task"
customer = @user.accessible_customers.find(args["customer_id"])
task = Task.create!(
customer: customer,
description: args["description"],
due_date: Date.parse(args["due_date"]),
created_by: @user
)
{ task_id: task.id, created: true }
else
{ error: "Unknown tool: #{name}" }
end
end
def customer_as_tool_result(customer)
{
id: customer.id,
name: customer.full_name,
email: customer.email,
plan: customer.plan_name,
mrr: customer.monthly_revenue_cents / 100.0,
status: customer.status,
created_at: customer.created_at.to_date.iso8601,
open_tasks_count: customer.tasks.open.count
}
end
end
This is the core pattern. Everything else is refinement.
Keep Your Tool Descriptions Sharp
The model decides which tools to call based on your descriptions. Vague descriptions lead to wrong choices or the model answering from training data instead of calling your function.
Bad:
{
"name": "get_data",
"description": "Gets data from the system"
}
Good:
{
"name": "get_customer_orders",
"description": "Retrieve recent orders for a customer. Returns order ID, date, total, and status. Use this when asked about purchase history, recent transactions, or order status.",
"parameters": {
"properties": {
"customer_id": {
"type": "integer",
"description": "The internal customer ID (not email address)"
},
"limit": {
"type": "integer",
"description": "Maximum orders to return. Default 10, maximum 50."
}
},
"required": ["customer_id"]
}
}
Specificity in the description saves you from having to compensate in the system prompt. Write descriptions as if you’re documenting a public API—because from the model’s perspective, you are.
Scope What You Return
A common mistake: returning entire ActiveRecord objects or huge result sets as tool output. The model doesn’t need 47 columns and 500 rows. Every byte in the tool result goes back into the context window and costs money.
Be deliberate about what you expose. The customer_as_tool_result method above is intentional—it returns exactly what the model needs to answer questions about a customer, and nothing it shouldn’t see.
This also prevents accidentally leaking sensitive fields. Returning customer.attributes works until it doesn’t.
Error Handling and the Lying Model Problem
The model will sometimes pass arguments that don’t match your data. A customer ID that doesn’t exist. A date in the wrong format. An email address with a typo. Your dispatch method needs to handle this gracefully and return useful error information:
when "get_customer"
return { error: "Email parameter is required" } if args["email"].blank?
email = args["email"].downcase.strip
customer = @user.accessible_customers.find_by(email: email)
unless customer
domain = email.split("@").last
suggestions = @user.accessible_customers
.where("email ILIKE ?", "%#{domain}%")
.limit(3)
.pluck(:email, :full_name)
return {
error: "No customer found with email #{email}",
suggestions: suggestions.map { |e, n| { email: e, name: n } }
}
end
customer_as_tool_result(customer)
When you return an error in the tool result, the model typically tries to recover—it’ll use the suggestions, try a different approach, or ask the user for clarification. This works surprisingly well in practice.
Authorization Is Your Responsibility, Not the Model’s
Your tool dispatch runs with whatever authorization context you give it. The model has no understanding of your permission model. It will happily request data for customers your user shouldn’t see if you let it.
Scope every query to the current user’s accessible records—as the dispatch method above does with @user.accessible_customers. Don’t add an authorization check after the fact. Build it into the query.
This matters more than it sounds. If you’re feeding the model user-generated content—customer notes, support tickets, incoming emails—an adversarial user can write text that instructs the model to take actions it shouldn’t. This is prompt injection, and it’s a real attack vector in production systems.
Log every tool call with the user ID, the tool name, and the arguments. If something goes wrong, you want to know exactly what the model was asked to do and what it did.
Background Processing for Slow Tool Chains
A sequence of five tool calls can take 10-15 seconds. Don’t hold an HTTP request open for that.
Move the work to a background job:
class AiCrmJob < ApplicationJob
queue_as :default
def perform(user_id, message, conversation_id)
user = User.find(user_id)
conversation = Conversation.find(conversation_id)
service = AiCrmService.new(user)
result = service.run(message)
conversation.update!(response: result, status: "complete")
ActionCable.server.broadcast("conversation_#{conversation_id}", {
type: "response",
content: result
})
end
end
The controller enqueues the job and returns immediately with a conversation_id. The client subscribes to the ActionCable channel and receives the result when the job finishes. The model can take as long as it needs without timing out an HTTP connection.
Don’t Load Every Tool for Every Request
Define 40 tools and include them all in every request, and you’re paying tokens for tools the model will never use in that context. You’re also increasing the chance of the wrong tool being selected.
Load only the tools relevant to the current context:
def tools_for_context(context)
base = [LOOKUP_CUSTOMER_TOOL]
case context
when "crm"
base + [CREATE_TASK_TOOL, SEND_EMAIL_TOOL, UPDATE_STATUS_TOOL]
when "reporting"
base + [GENERATE_REPORT_TOOL, EXPORT_CSV_TOOL]
else
base
end
end
The model doesn’t need to know about export_csv when the user is looking at a customer record.
What Anthropic Claude Does Differently
If you’re using Claude—which I prefer for longer conversations with complex tool chains; Claude’s instruction-following in tool use is particularly reliable—the API shape is slightly different:
require "anthropic"
client = Anthropic::Client.new(api_key: ENV["ANTHROPIC_API_KEY"])
response = client.messages(
model: "claude-opus-4-6",
max_tokens: 4096,
tools: [
{
name: "get_customer",
description: "Look up a customer by email address",
input_schema: {
type: "object",
properties: { email: { type: "string" } },
required: ["email"]
}
}
],
messages: @messages
)
# Tool use comes back as a content block with type "tool_use"
tool_use_blocks = response.content.select { |block| block.type == "tool_use" }
tool_use_blocks.each do |block|
result = dispatch_by_name(block.name, block.input)
@messages << {
role: "user",
content: [
{
type: "tool_result",
tool_use_id: block.id,
content: result.to_json
}
]
}
end
The dispatch logic is identical. The API shape differs. The fundamental loop—send messages, check for tool calls, dispatch, return results, continue—is the same across providers.
The Honest Limits
Function calling is powerful. It is not magic.
The model doesn’t understand your domain. It understands your tool descriptions. If your description says create_task creates a task and your code also sends an email, the model has no idea. Keep tools single-responsibility.
The model will sometimes call tools unnecessarily. It will sometimes fail to call tools when it should. Test with real, messy user inputs—not idealized prompts you wrote yourself.
Latency compounds. Every tool call adds a full LLM round-trip. Design your tools so the model can gather what it needs in parallel. Both OpenAI (parallel_tool_calls) and Claude (multiple tool use blocks in one response) support parallel tool execution. Use it.
The client I mentioned at the start shipped in four weeks. The first two weeks were the tool integrations. The last two weeks were hardening: edge cases, authorization, logging, and making sure sales reps couldn’t accidentally delete records by phrasing a question badly.
That ratio tracks.
Frequently Asked Questions
Should I use OpenAI function calling or Anthropic tool use?
Both work well. Claude tends to be more precise with complex multi-step reasoning and less likely to hallucinate tool arguments. GPT-4o is faster and cheaper at scale for simpler flows. For exploratory or long tool chains, I lean toward Claude. For high-volume, well-defined integrations, OpenAI is fine. Use whichever you already have production experience with.
How do I prevent prompt injection through tool results?
Treat tool results the same way you’d treat user input: potentially adversarial. If you’re including customer-provided content (notes, emails, tickets) in tool results or system prompts, instruct the model explicitly that this content may contain instructions that should be ignored. Don’t echo unescaped user input into system prompts.
How many tool calls can happen in one conversation turn?
There’s no API-imposed limit, but context windows are finite. In practice, if a single user request requires more than 8-10 tool calls, your tool design is too granular. Group related reads into a single tool that returns richer data, rather than making the model chain five narrow lookups.
Building an AI-powered feature in Rails and hitting the hard edges? TTB Software has shipped function-calling integrations across CRM, logistics, and analytics domains. Nineteen years of Rails, plus whatever the AI ecosystem throws at us this week.
About the Author
Roger Heykoop is a senior Ruby on Rails developer with 19+ years of Rails experience and 35+ years in software development. He specializes in Rails modernization, performance optimization, and AI-assisted development.
Get in TouchRelated Articles
Need Expert Rails Development?
Let's discuss how we can help you build or modernize your Rails application with 19+ years of expertise
Schedule a Free Consultation