Debug Memory Leaks in Ruby on Rails: A Production Hunting Guide

Memory leaks in Ruby on Rails apps almost never come from actual C-extension leaks. In eight years of running Rails in production, I’ve traced maybe two issues to genuine memory leaks in native code. The rest? Unbounded growth — hashes that never get pruned, strings retained by accident, callbacks that accumulate references faster than the GC can collect them.

The distinction matters because the tools and approach differ. A real leak requires a gem update or patch. Unbounded growth requires you to find the code doing the accumulating and put a limit on it.

Recognizing the Symptoms

Your Rails app has a memory problem when worker RSS grows monotonically across requests. Healthy Ruby processes stabilize after a warm-up period — typically 50-200 requests depending on your app’s complexity. The GC reclaims objects, RSS flattens, and life goes on.

When RSS climbs without stabilizing, you have unbounded growth. A quick diagnostic:

# Watch Puma worker RSS over time (Linux)
while true; do
  ps -o pid,rss,command -p $(pgrep -f 'puma.*worker') | tail -n +2
  sleep 30
done

If RSS increases by 10-50 MB per hour under steady traffic, that’s your signal. Anything under 5 MB/hour might just be fragmentation.

Step 1: Measure Before You Hunt

Install derailed_benchmarks (gem version 2.2+, Ruby 3.2+):

# Gemfile
group :development, :test do
  gem 'derailed_benchmarks'
  gem 'stackprof'
end

Run the static memory analysis first — it catches the easy wins:

bundle exec derailed bundle:mem

This shows memory consumed at boot by each gem. I’ve seen apps where a single unused gem pulled in 40 MB of dependencies. On a recent Rails 8 project, removing mini_magick (replaced by ActiveStorage’s built-in processing) dropped boot memory by 28 MB across 4 Puma workers.

For request-level analysis:

bundle exec derailed exec perf:mem_over_time

This hits your app repeatedly and tracks memory growth. A flat line means no leak. An upward slope tells you to keep digging.

Step 2: Heap Dumps with ObjectSpace

Ruby’s ObjectSpace module is your primary investigation tool. Enable heap dump support in your Rails app:

# config/initializers/memory_debug.rb (temporary — remove after investigation)
if ENV['MEMORY_DEBUG']
  require 'objspace'
  ObjectSpace.trace_object_allocations_start
end

Trigger a heap dump from a running process:

# Via rails console attached to a production worker, or via a debug endpoint
GC.start(full_mark: true, immediate_sweep: true)
GC.start # Run twice to clear weak references

file = "/tmp/heap_dump_#{Process.pid}_#{Time.now.to_i}.json"
ObjectSpace.dump_all(output: File.open(file, 'w'))
puts "Heap dump written to #{file} (#{File.size(file) / 1024 / 1024} MB)"

The dump is a JSON-lines file where each line represents one live Ruby object. The fields that matter:

type: Object type (STRING, HASH, ARRAY, OBJECT, etc.)
file: Source file where the object was allocated
line: Line number
memsize: Memory consumed in bytes
generation: GC generation when allocated (lower = older = more suspicious)

Step 3: Analyze the Heap Dump

Parse the dump to find accumulation patterns:

# analyze_heap.rb
require 'json'

counts = Hash.new(0)
sizes = Hash.new(0)
locations = Hash.new(0)

File.foreach(ARGV[0]) do |line|
  obj = JSON.parse(line)
  type = obj['type']
  counts[type] += 1
  sizes[type] += obj['memsize'].to_i

  if obj['file']
    loc = "#{obj['file']}:#{obj['line']}"
    locations[loc] += 1
  end
end

puts "=== Object counts by type ==="
counts.sort_by { |_, v| -v }.first(10).each { |k, v| puts "  #{k}: #{v}" }

puts "\n=== Memory by type (MB) ==="
sizes.sort_by { |_, v| -v }.first(10).each { |k, v| puts "  #{k}: #{(v / 1024.0 / 1024).round(2)} MB" }

puts "\n=== Top allocation sites ==="
locations.sort_by { |_, v| -v }.first(20).each { |k, v| puts "  #{k}: #{v} objects" }

Run it:

ruby analyze_heap.rb /tmp/heap_dump_12345_1711353600.json

The “Top allocation sites” output is where the investigation gets real. When you see 500,000 strings allocated from one line in your codebase, you’ve found your culprit.

The Usual Suspects

After analyzing dozens of Rails memory issues across client projects, these patterns account for roughly 80% of cases:

Unbounded Memoization

# The classic leak
class ProductService
  def self.lookup(sku)
    @cache ||= {}
    @cache[sku] ||= Product.find_by(sku: sku)
  end
end

This class-level hash grows with every unique SKU looked up and never shrinks. With 100,000 products, you’re holding 100,000 ActiveRecord objects in memory permanently.

Fix: Use Rails.cache with TTL, or use an LRU cache like lru_redux:

class ProductService
  @cache = LruRedux::TTL::ThreadSafeCache.new(1000, 15 * 60) # 1000 items, 15 min TTL

  def self.lookup(sku)
    @cache.getset(sku) { Product.find_by(sku: sku) }
  end
end

ActiveRecord Callback Accumulation

class Order < ApplicationRecord
  after_commit :notify_warehouse

  def notify_warehouse
    WarehouseNotifier.perform_later(self) # Holds reference to `self`
  end
end

This isn’t a leak by itself, but when combined with bulk operations that load thousands of records, the callback chain holds references to all of them until the transaction completes. For batch processing, use find_each with smaller batch sizes or bypass callbacks entirely:

Order.where(status: :pending).find_each(batch_size: 100) do |order|
  WarehouseNotifier.perform_later(order.id) # Pass ID, not the object
end

String Retention from Logging

Rails.logger.info "Processing order #{order.inspect} with items #{order.items.map(&:inspect)}"

inspect on ActiveRecord objects generates massive strings. In production with debug-level logging accidentally enabled, I’ve seen this consume 2 GB in under an hour. The strings survive longer than you’d expect because the logger may buffer them.

Fix: Use structured logging and lazy evaluation:

Rails.logger.info { "Processing order #{order.id} with #{order.items.count} items" }

The block form means the string is never built if the log level filters it out.

Global Event Subscribers

ActiveSupport::Notifications.subscribe('process_action.action_controller') do |*args|
  event = ActiveSupport::Notifications::Event.new(*args)
  MetricsCollector.record(event) # If MetricsCollector accumulates without flushing...
end

Check that any metrics collectors, event subscribers, or instrumentation hooks flush their buffers periodically.

Step 4: Confirm the Fix

After applying a fix, verify with a controlled test. The derailed_benchmarks memory-over-time test works, but for production confirmation, I prefer tracking RSS per worker with a simple Prometheus metric:

# config/initializers/memory_metrics.rb
if defined?(Prometheus)
  MEMORY_GAUGE = Prometheus::Client::Gauge.new(
    :ruby_process_rss_bytes,
    docstring: 'RSS memory of the Ruby process',
    labels: [:worker]
  )

  Thread.new do
    loop do
      rss = File.read("/proc/#{Process.pid}/statm").split[1].to_i * 4096
      MEMORY_GAUGE.set(rss, labels: { worker: Process.pid.to_s })
      sleep 60
    end
  end
end

Deploy, watch the graph for 24-48 hours under production traffic. RSS should stabilize after warm-up. If it does, you’re done.

When It Actually Is a Native Leak

If heap dumps show stable Ruby object counts but RSS keeps growing, the leak is in a C extension. The approach changes:

Check gem changelogs for known memory fixes — nokogiri, mysql2, and image processing gems are common offenders
Use valgrind on a staging server (not production — the overhead is 10-30x):

valgrind --tool=massif bundle exec rails runner "1000.times { YourSuspiciousCode.call }"
ms_print massif.out.*

Try upgrading the suspect gem. If that fixes it, you’re done. If not, file an issue with a minimal reproduction.

In my experience, upgrading nokogiri fixes about half of all native memory issues in Rails apps. The Nokogiri team is responsive and their recent releases (1.16+) have addressed several memory management issues.

Production-Safe Memory Monitoring

For ongoing protection, configure Puma’s worker killer. It’s a band-aid, not a fix, but it prevents OOM kills while you investigate:

# config/puma.rb
plugin :tmp_restart

before_fork do
  require 'puma_worker_killer'
  PumaWorkerKiller.config do |config|
    config.ram = 2048 # MB total for all workers
    config.frequency = 30 # Check every 30 seconds
    config.percent_usage = 0.90 # Kill at 90% of ram limit
    config.rolling_restart_frequency = 6 * 3600 # Rolling restart every 6 hours
  end
  PumaWorkerKiller.start
end

This buys you time. The rolling restart every 6 hours keeps RSS in check while you track down the root cause with the techniques above.

Frequently Asked Questions

How much memory should a Rails 8 app use per Puma worker?

A typical Rails 8 app uses 150-300 MB per Puma worker after warm-up, depending on gem count and application complexity. Apps with heavy image processing or large ActiveRecord result sets can hit 500 MB+. If a single worker exceeds 1 GB under normal traffic, you likely have unbounded growth somewhere.

Does Ruby’s garbage collector cause memory bloat?

Ruby’s GC (particularly with YJIT enabled) manages object memory well, but it can’t shrink the process heap. Once Ruby requests memory from the OS via malloc, that RSS is permanent for the process lifetime — even after objects are freed. This is memory fragmentation, not a leak. The fix is controlling peak memory usage rather than expecting RSS to decrease. Jemalloc as a malloc replacement (LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2) reduces fragmentation by 10-30% in most Rails apps.

Can I use ObjectSpace.dump_all in production safely?

Yes, but with caveats. The dump pauses the Ruby process for 1-10 seconds depending on heap size (a 1 GB process takes about 3-5 seconds). Run it on a single worker during low traffic, not during peak hours. The trace_object_allocations_start call adds 5-10% overhead, so enable it temporarily and disable after collecting your dump. Never leave allocation tracing on permanently.

What’s the difference between RSS growth and a memory leak?

RSS (Resident Set Size) growth after warm-up can mean three things: a genuine C-extension leak, unbounded Ruby object accumulation, or memory fragmentation. Check Ruby heap object counts first — if they’re stable but RSS grows, it’s either fragmentation (try jemalloc) or a native leak. If object counts grow proportionally with RSS, you have Ruby-level accumulation. The heap dump analysis technique in this guide distinguishes between these cases.

Should I use puma_worker_killer or just increase server RAM?

Use both, but treat puma_worker_killer as a safety net, not a solution. Adding RAM masks the problem and costs scale linearly — doubling RAM doubles your hosting bill. Worker killers keep your app stable while you fix the root cause. Set a generous RAM limit, enable rolling restarts, and use the monitoring in this guide to find and eliminate the underlying growth pattern.