35+ Years Experience Netherlands Based ⚡ Fast Response Times Ruby on Rails Experts AI-Powered Development Fixed Pricing Available Senior Architects Dutch & English 35+ Years Experience Netherlands Based ⚡ Fast Response Times Ruby on Rails Experts AI-Powered Development Fixed Pricing Available Senior Architects Dutch & English
AI Code Review for Rails: Tools and Workflows That Actually Catch Bugs

AI Code Review for Rails: Tools and Workflows That Actually Catch Bugs

roger
How to set up AI-assisted code review in Rails projects using GitHub Copilot, Claude, and custom prompts. Real examples of bugs caught, false positives to expect, and CI integration patterns.

AI code review tools catch real bugs in Rails projects — N+1 queries, missing authorization checks, unsafe params usage — but only if you set them up correctly. Most teams bolt on an AI reviewer, get flooded with noise, and turn it off within a week.

This guide covers what works after six months of running AI-assisted review on production Rails 8 codebases: which tools to use, how to configure them, and the specific prompt patterns that produce useful feedback instead of pedantic style complaints.

The Current Tool Landscape

Three categories of AI code review tools exist for Rails projects in 2026:

IDE-integrated reviewers like GitHub Copilot’s code review feature (launched late 2024, now stable) analyze pull requests directly in GitHub. You enable it in repository settings under Code Review > Copilot. It runs automatically on new PRs and posts inline comments.

API-based reviewers where you send diffs to Claude, GPT-4, or similar models through your CI pipeline. This gives you full control over prompts and context but requires more setup.

Dedicated platforms like CodeRabbit, Sourcery, and Codium that wrap AI models with Rails-specific heuristics. These sit between the other two approaches — less customizable than raw API calls, but less work than building your own pipeline.

For Rails projects specifically, the API-based approach with Claude has produced the best results in my experience. Here’s why: Rails conventions matter enormously for catching real issues, and generic AI reviewers miss framework-specific problems like unsafe before_action ordering or missing Strong Parameters on nested attributes.

Setting Up Claude-Based Review in GitHub Actions

Here’s a working GitHub Actions workflow that sends PR diffs to Claude for review:

# .github/workflows/ai-review.yml
name: AI Code Review
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    permissions:
      pull-requests: write
      contents: read
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Get diff
        id: diff
        run: |
          git diff origin/$...HEAD -- '*.rb' '*.erb' > diff.txt
          echo "size=$(wc -c < diff.txt)" >> $GITHUB_OUTPUT

      - name: AI Review
        if: steps.diff.outputs.size > 0
        env:
          ANTHROPIC_API_KEY: $
        run: |
          ruby scripts/ai_review.rb diff.txt > review.md

      - name: Post Review
        if: steps.diff.outputs.size > 0
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const body = fs.readFileSync('review.md', 'utf8');
            if (body.trim().length > 0) {
              github.rest.pulls.createReview({
                owner: context.repo.owner,
                repo: context.repo.repo,
                pull_number: context.issue.number,
                body: body,
                event: 'COMMENT'
              });
            }

The Ruby script that does the actual review:

# scripts/ai_review.rb
require "net/http"
require "json"

diff = File.read(ARGV[0])

# Truncate massive diffs — Claude handles ~150KB well,
# beyond that quality drops
if diff.bytesize > 150_000
  diff = diff.byteslice(0, 150_000)
  diff += "\n\n[diff truncated]"
end

prompt = <<~PROMPT
  You are reviewing a Ruby on Rails pull request. The codebase uses Rails 8.0, Ruby 3.3, and PostgreSQL.

  Review this diff for:
  1. Security issues (SQL injection, XSS, mass assignment, missing authorization)
  2. N+1 queries or missing eager loading
  3. Missing database indexes for new queries
  4. Incorrect ActiveRecord usage (e.g., pluck vs select, find_each vs each)
  5. Missing error handling for external service calls
  6. Race conditions in concurrent scenarios

  Do NOT comment on:
  - Code style or formatting (RuboCop handles that)
  - Test coverage (separate CI step)
  - Minor naming preferences

  For each issue found, specify the file, line range, severity (critical/warning/info), and a specific fix.
  If the diff looks clean, say "No issues found" and nothing else.

  DIFF:
  #{diff}
PROMPT

uri = URI("https://api.anthropic.com/v1/messages")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true

request = Net::HTTP::Post.new(uri)
request["x-api-key"] = ENV["ANTHROPIC_API_KEY"]
request["anthropic-version"] = "2023-06-01"
request["content-type"] = "application/json"
request.body = JSON.generate({
  model: "claude-sonnet-4-20250514",
  max_tokens: 4096,
  messages: [{role: "user", content: prompt}]
})

response = http.request(request)
result = JSON.parse(response.body)

puts result.dig("content", 0, "text") || "Review failed: #{response.code}"

What AI Review Actually Catches (With Real Examples)

After running this on several Rails projects, here are the categories where AI review consistently adds value:

Missing authorization checks. When a developer adds a new controller action and forgets before_action :authenticate_user! or a Pundit authorize call, AI review flags it roughly 90% of the time. This is the single highest-value catch.

# AI flagged this — new action without authorization
class InvoicesController < ApplicationController
  def export
    @invoices = Invoice.where(date: params[:from]..params[:to])
    send_data @invoices.to_csv, filename: "invoices.csv"
  end
end

The AI correctly identified that export was added to a controller where every other action had authorize @invoice calls, and this one was missing both authentication and authorization.

Unsafe parameter handling. Especially with nested attributes, where developers permit the parent but forget to whitelist nested IDs, allowing record injection:

# AI caught the missing :id in reject_if
def project_params
  params.require(:project).permit(:name, :budget,
    tasks_attributes: [:title, :description, :_destroy])
    # Missing :id — existing tasks can't be updated,
    # only created/destroyed
end

N+1 queries hiding in partials. When a PR adds an association call inside a partial that gets rendered in a collection, AI review catches it if the diff includes both the partial change and the calling view. It misses this when the partial is unchanged and only the caller is in the diff — a genuine blind spot.

Missing database indexes. When a migration adds a column and a separate file queries by that column, AI connects the dots about 70% of the time:

# Migration adds user_uuid column
add_column :audit_logs, :user_uuid, :string

# Model queries by it — AI flags missing index
scope :for_user, ->(uuid) { where(user_uuid: uuid) }

The False Positive Problem

Expect roughly 30-40% false positive rate on a well-tuned setup. The main categories:

Context-dependent patterns. The AI flags find_each suggestions when you’re using each on a scope that returns 10 records maximum. It doesn’t know about your data volumes.

Framework magic. Rails concerns, included blocks, and STI hierarchies confuse AI reviewers. A before_action defined in a concern gets flagged as missing when the AI can’t see the concern is included.

Intentional tradeoffs. Sometimes you skip eager loading because the association is already cached from a parent query. The AI doesn’t know about your request-level caching strategy.

The fix: maintain a .ai-review-ignore file in your repo with patterns to suppress:

# .ai-review-ignore
suppress:
  - pattern: "find_each suggestion"
    paths: ["app/admin/**"]  # Admin panels have small datasets
  - pattern: "missing eager load"
    paths: ["app/models/concerns/**"]  # Concerns handle their own loading

Then filter the AI output in your review script before posting.

Integrating With Existing CI

AI review works best as one layer in a review stack, not a replacement for anything:

  1. RuboCop — style and lint (fast, deterministic)
  2. Brakemansecurity static analysis (fast, Rails-specific)
  3. AI Review — semantic analysis (slow, probabilistic)
  4. Human Review — architecture, business logic, team knowledge

Run AI review in parallel with your test suite. A typical PR gets AI feedback in 15-30 seconds for diffs under 500 lines, which is faster than most test suites.

Cost-wise, reviewing 20 PRs per day with Claude Sonnet runs about $3-5/month. That’s cheaper than a single missed security vulnerability reaching production.

# Run in parallel with tests
jobs:
  test:
    # your existing test job
  lint:
    # RuboCop + Brakeman
  ai-review:
    # the workflow above

Prompt Engineering for Rails-Specific Review

The generic “review this code” prompt produces generic feedback. These additions improved signal quality significantly:

Include your Gemfile.lock in context. Not the whole thing — just the Rails version and key gems. This stops the AI from suggesting gems you’re already using or flagging patterns that are correct for your specific Rails version.

Specify your auth system. “This project uses Devise with Pundit for authorization” eliminates 80% of auth-related false positives.

Reference your database. “PostgreSQL 16 with pgvector extension” prevents the AI from suggesting MySQL-specific solutions or flagging Postgres-specific syntax as errors.

Set the Rails version explicitly. Rails 7 vs Rails 8 changes best practices significantly. Solid Queue replaced Sidekiq as the default, Solid Cache replaced Redis — the AI needs to know which era you’re in.

When AI Review Isn’t Worth It

Skip AI review for:

  • Solo projects where you’re the only reviewer anyway (just use Copilot inline suggestions while coding)
  • Massive refactors that change hundreds of files — the AI drowns in context and produces garbage
  • Configuration-only PRs (Dockerfiles, CI configs, infrastructure) — the AI hallucinates Docker best practices that don’t match your deployment
  • Dependency updates — Dependabot PRs don’t benefit from AI review; your test suite is the gatekeeper there

FAQ

How much does AI code review cost per month?

For a team of 5 developers merging ~100 PRs per month, expect $15-25/month using Claude Sonnet via API. GitHub Copilot’s built-in review is included in the $19/month Copilot Individual plan or $39/month Copilot Business plan. Dedicated platforms like CodeRabbit charge $12-24/seat/month.

Can AI code review replace human reviewers?

No. AI catches mechanical issues — missing indexes, N+1 queries, security oversights — but cannot evaluate architecture decisions, business logic correctness, or whether code solves the right problem. Use it to handle the tedious checklist items so human reviewers can focus on design and intent.

Does AI code review work with GitHub Enterprise and private repos?

Yes. The GitHub Actions approach shown here runs entirely within your CI environment. Your code reaches the AI provider’s API, so check your organization’s data handling policies. Anthropic and OpenAI both offer zero-retention API options for enterprise customers. Self-hosted options exist using Ollama with models like CodeLlama, though quality drops significantly.

What’s the best model for Rails code review in 2026?

Claude Sonnet (currently claude-sonnet-4-20250514) hits the best balance of cost, speed, and Rails knowledge for code review. Claude Opus produces marginally better results but at 5x the cost and 3x the latency — not worth it for automated PR review. GPT-4o is comparable but tends to produce more style-related noise on Ruby code.

How do you handle sensitive code that can’t leave your network?

Run a local model through Ollama or vLLM. CodeLlama 34B and DeepSeek Coder V2 are the best open-source options for Ruby review, though neither matches Claude’s Rails-specific knowledge. For the middle ground, Anthropic and AWS offer Claude on Bedrock with VPC endpoints, keeping traffic within your AWS account.

#ai #code-review #rails #github-copilot #claude #ci-cd #developer-tools
r

About the Author

Roger Heykoop is a senior Ruby on Rails developer with 19+ years of Rails experience and 35+ years in software development. He specializes in Rails modernization, performance optimization, and AI-assisted development.

Get in Touch

Share this article

Need Expert Rails Development?

Let's discuss how we can help you build or modernize your Rails application with 19+ years of expertise

Schedule a Free Consultation