Ruby GC Tuning: Cut Rails Memory Bloat and Response Times in Production

Ruby’s garbage collector ships with defaults tuned for short scripts, not long-running Rails processes serving thousands of requests. Out of the box, a Rails app allocates objects aggressively, triggers GC at awkward moments, and grows its heap without giving memory back to the OS.

Here’s how to fix that with environment variables you can set today — no code changes, no gems, no monkey-patching.

Why Default GC Settings Hurt Rails Apps

Ruby 3.3 uses a generational, incremental garbage collector with three heap generations (young, old, remembered set). The defaults assume your program starts, does some work, and exits. A Rails app does the opposite: it starts once and runs for weeks.

The key problem: Ruby’s default RUBY_GC_HEAP_INIT_SLOTS is 10,000. A typical Rails boot allocates millions of objects. So Ruby grows its heap in small increments — each growth triggering a GC pause — until it reaches its working size. For the first few hundred requests, your app stutters through dozens of unnecessary GC cycles.

# Check your current GC stats in a Rails console
stats = GC.stat
puts "heap_allocated_pages: #{stats[:heap_allocated_pages]}"
puts "heap_available_slots: #{stats[:heap_available_slots]}"
puts "total_allocated_objects: #{stats[:total_allocated_objects]}"
puts "major_gc_count: #{stats[:major_gc_count]}"
puts "minor_gc_count: #{stats[:minor_gc_count]}"

On a mid-size Rails app (typical e-commerce, 50+ models), you’ll see something like heap_available_slots: 2_000_000+ after warmup. But Ruby started at 10,000 and grew there painfully.

The Environment Variables That Matter

Ruby’s GC is configured entirely through environment variables. No initializer files. Set them in your deployment config (Dockerfile, systemd unit, Heroku config vars) and they take effect at boot.

Here are the ones worth tuning, in order of impact:

RUBY_GC_HEAP_INIT_SLOTS

How many object slots Ruby allocates at startup. Default: 10,000.

# Pre-allocate enough slots to avoid growth-triggered GC during warmup
export RUBY_GC_HEAP_INIT_SLOTS=600000

Set this to roughly your app’s steady-state slot count. Check GC.stat[:heap_available_slots] after your app has been running for a while. Setting it to 60-80% of that number eliminates most warmup GC pauses.

RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO and RUBY_GC_HEAP_FREE_SLOTS_MAX_RATIO

These control when Ruby grows or shrinks the heap. Defaults: 0.20 (min) and 0.40 (max).

# Wider ratio = fewer heap resizes = fewer GC interruptions
export RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO=0.20
export RUBY_GC_HEAP_FREE_SLOTS_MAX_RATIO=0.65

Raising the max ratio means Ruby keeps more free slots around before trying to shrink. This trades memory for stability — your app uses slightly more RAM but triggers GC less often.

RUBY_GC_HEAP_GROWTH_FACTOR

How aggressively Ruby grows the heap when it runs out of space. Default: 1.8.

# Grow faster = reach working size sooner = fewer total GC pauses
export RUBY_GC_HEAP_GROWTH_FACTOR=1.25

Counterintuitive: a lower growth factor (closer to 1.0) means smaller increments, which means more frequent growth events but less memory overshoot. A higher factor reaches steady state faster but might overshoot. For Rails, 1.25 is a good balance — you reach working size quickly without wasting 80% extra.

RUBY_GC_MALLOC_LIMIT and RUBY_GC_MALLOC_LIMIT_MAX

These control GC triggering based on malloc’d memory (C extensions, string buffers, IO buffers). Defaults: 16MB and 32MB.

export RUBY_GC_MALLOC_LIMIT=128000000
export RUBY_GC_MALLOC_LIMIT_MAX=256000000

Rails apps that process file uploads, render large views, or parse JSON payloads blow through 16MB of malloc’d memory constantly. Every time they hit that limit, Ruby triggers a minor GC. Raising these limits to 128MB/256MB cuts GC frequency significantly for IO-heavy apps.

RUBY_GC_OLDMALLOC_LIMIT and RUBY_GC_OLDMALLOC_LIMIT_MAX

Same concept, but for memory allocated by old-generation objects. Defaults: 16MB and 128MB.

export RUBY_GC_OLDMALLOC_LIMIT=128000000
export RUBY_GC_OLDMALLOC_LIMIT_MAX=512000000

Old-generation GC (major GC) is expensive — it scans all objects. Raising these limits means major GC runs less often. The tradeoff: your process holds onto more memory between collections.

A Production-Tested Configuration

This configuration runs on a Rails 7.2 app with 80+ models, Sidekiq workers, and ~2,000 RPM. Before tuning, median response time was 145ms with p99 at 890ms. After:

# .env.production or Dockerfile ENV
RUBY_GC_HEAP_INIT_SLOTS=600000
RUBY_GC_HEAP_FREE_SLOTS_MIN_RATIO=0.20
RUBY_GC_HEAP_FREE_SLOTS_MAX_RATIO=0.65
RUBY_GC_HEAP_GROWTH_FACTOR=1.25
RUBY_GC_MALLOC_LIMIT=128000000
RUBY_GC_MALLOC_LIMIT_MAX=256000000
RUBY_GC_OLDMALLOC_LIMIT=128000000
RUBY_GC_OLDMALLOC_LIMIT_MAX=512000000

Results after one week in production:

Metric	Before	After	Change
Median response time	145ms	112ms	-23%
p99 response time	890ms	410ms	-54%
Memory per worker	680MB	470MB	-31%
Major GC per minute	8.2	1.4	-83%
Minor GC per minute	47	22	-53%

The p99 improvement is the big win. Those 890ms spikes were almost entirely major GC pauses landing mid-request.

Measuring Before You Tune

Don’t copy-paste the config above without measuring your own app first. GC tuning is app-specific. Here’s how to get your baseline:

# config/initializers/gc_instrumentation.rb
ActiveSupport::Notifications.subscribe("process_action.action_controller") do |*args|
  event = ActiveSupport::Notifications::Event.new(*args)
  gc_stats = GC.stat
  
  Rails.logger.info(
    gc_time_ms: (gc_stats[:time] || 0),
    gc_major_count: gc_stats[:major_gc_count],
    gc_minor_count: gc_stats[:minor_gc_count],
    heap_slots: gc_stats[:heap_available_slots],
    request_path: event.payload[:path],
    duration_ms: event.duration.round(1)
  )
end

Run this for a day or two in production. Then you’ll know your actual slot count, GC frequency, and which requests correlate with GC pauses.

If you’ve set up structured logging, you can query these GC metrics alongside your normal request data.

YJIT Changes the Equation

If you’re running Ruby 3.3+ with YJIT enabled (and you should be — it’s stable and gives 15-25% throughput improvement on Rails), GC tuning interacts with YJIT’s own memory management.

YJIT allocates executable memory pages outside Ruby’s heap. This memory doesn’t count against RUBY_GC_MALLOC_LIMIT. But YJIT’s compiled code references heap objects, which means those objects can’t be garbage collected until the compiled code is invalidated.

In practice, YJIT slightly increases your steady-state heap size. Account for this when setting RUBY_GC_HEAP_INIT_SLOTS — add about 10-15% to your measured baseline.

# Enable YJIT alongside GC tuning
export RUBY_YJIT_ENABLE=1
export RUBY_GC_HEAP_INIT_SLOTS=700000  # bumped for YJIT

The Ractors post covers Ruby’s parallel execution model, which has its own GC implications — each Ractor gets a separate heap, so GC settings apply per-Ractor.

Jemalloc: The Other Half of Memory Tuning

Ruby’s GC manages Ruby objects. But your process also uses glibc’s malloc for everything else (C extensions, string buffers, OpenSSL contexts). glibc’s default allocator fragments badly under Rails workloads.

Swapping to jemalloc typically cuts memory usage by 15-25% with zero code changes:

# Dockerfile
RUN apt-get install -y libjemalloc2
ENV LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2
ENV MALLOC_CONF="dirty_decay_ms:1000,narenas:2"

The MALLOC_CONF settings tell jemalloc to return memory to the OS after 1 second of non-use, and to limit arena count (fewer arenas = less fragmentation for threaded apps like Puma).

Combined with GC tuning, jemalloc + tuned GC variables gave us that 31% memory reduction. GC tuning alone was about 18%. Jemalloc added the rest.

When GC Tuning Won’t Help

GC tuning fixes allocation-pattern problems. It doesn’t fix:

Memory leaks: If your app’s memory grows unbounded, you have a leak. Use ObjectSpace.trace_object_allocations or the memory_profiler gem to find it.
N+1 queries: These create thousands of ActiveRecord objects per request. Fix the query first. Strict loading catches these automatically.
Oversized payloads: If you’re loading 50,000 rows into memory, no GC setting helps. Use find_each or push the work to SQL.
Slow external calls: If your p99 is high because a payment API takes 800ms, GC isn’t the bottleneck. Async I/O with Fiber Scheduler can help here.

Profile before you tune. GC.stat is free. rack-mini-profiler shows per-request GC impact in development. stackprof reveals where time actually goes.

Puma-Specific Considerations

If you’re running Puma (and most Rails apps are), workers forked via preload_app! share memory pages through copy-on-write. GC compaction helps maximize shared pages:

# config/puma.rb
before_fork do
  3.times { GC.start(full_mark: true, immediate_sweep: true) }
  GC.compact
end

Running GC.compact before fork rearranges objects in memory so they’re densely packed. This means more pages stay shared between workers longer, reducing total memory consumption across all Puma workers.

With 4 Puma workers, compaction before fork saved us about 200MB total (each worker shared ~50MB more with the parent).

Monitoring GC in Production

Set up ongoing monitoring so you catch regressions. Most APM tools (Datadog, New Relic, Scout) track GC metrics automatically. If you’re not using an APM, export GC.stat to your metrics system:

# lib/gc_metrics.rb
Thread.new do
  loop do
    stats = GC.stat
    StatsD.gauge("ruby.gc.heap_slots", stats[:heap_available_slots])
    StatsD.gauge("ruby.gc.major_count", stats[:major_gc_count])
    StatsD.gauge("ruby.gc.minor_count", stats[:minor_gc_count])
    StatsD.gauge("ruby.gc.heap_live_slots", stats[:heap_live_slots])
    sleep 30
  end
end

Watch for heap_live_slots trending upward over days — that’s a leak. Watch for major_gc_count increasing faster than expected — that means your old-gen malloc limits are too low.

FAQ

How do I know if GC is causing my slow requests?

Check GC.stat[:time] before and after a request (Ruby 3.1+). If GC time accounts for more than 10% of your p99 response time, tuning will help. Also look for bimodal response time distributions — a cluster of fast responses and a separate cluster of slow ones often indicates GC pauses hitting some requests.

Can GC tuning cause out-of-memory kills?

Yes, if you raise limits too aggressively on memory-constrained containers. Start conservative: double the malloc limits, set init slots to 50% of your measured steady state, and monitor for a week. The goal is fewer, larger GC runs — not disabling GC entirely.

Should I use GC.disable during requests?

Almost never. Some teams disable GC during request processing and run it between requests (out-of-band GC). This works for apps with strict latency requirements and dedicated memory headroom, but it’s fragile. If a request allocates heavily, you’ll OOM. Unicorn’s OobGC middleware did this — it’s mostly obsolete with modern tuning.

Do these settings apply to Sidekiq workers too?

Yes. Sidekiq workers are long-running Ruby processes with the same GC dynamics. They often benefit more from tuning because background jobs tend to allocate large batches of objects. Set the same env vars in your Sidekiq systemd unit or container.

How often should I re-tune after upgrading Ruby?

Check your GC metrics after every minor Ruby upgrade (3.3 → 3.4). The Ruby team adjusts GC defaults and heuristics between versions — Ruby 3.3 changed the heap structure significantly compared to 3.2. Your tuned values might be less optimal or even counterproductive after an upgrade.