Postgres Autovacuum Tuning for Rails: Stop Table Bloat and Transaction ID Wraparound in Production
A client called me on a Sunday morning last year because their production Rails app had suddenly stopped accepting writes. Postgres was logging a message they had never seen before: “database is not accepting commands to avoid wraparound data loss”. Their largest table had been bloated past 400 GB on disk while only holding about 18 GB of live rows. Autovacuum had been quietly losing the race for nine months and nobody had noticed until the database itself slammed the brakes.
After nineteen years of Rails I can tell you with confidence: Postgres autovacuum tuning is the single most underrated lever for production Rails performance. Most teams ship the defaults forever, those defaults were written for laptops in 2005, and one day a busy table eats the whole afternoon. This post is the playbook I wish that client had had.
What Postgres Autovacuum Actually Does
Postgres uses MVCC — multi-version concurrency control. When you UPDATE a row, Postgres does not modify it in place. It writes a new tuple and marks the old one as dead. When you DELETE a row, the row is just flagged. The space is not reclaimed. Read queries skip dead tuples, but they still pay the cost of walking past them.
Autovacuum is the background process that:
- Reclaims space from dead tuples so the table does not grow unbounded.
- Updates the planner statistics that drive query plans (
ANALYZE). - Freezes very old transaction IDs to prevent transaction ID wraparound — Postgres’s apocalyptic 32-bit counter that, if it overflows, takes your database read-only until you fix it.
When autovacuum is healthy you never think about it. When it falls behind you get three flavors of pain: tables that grow several times larger than their live data (bloat), query plans that go sideways because statistics are stale, and the wraparound emergency mode that took down my client.
Why Postgres Autovacuum Tuning Matters for Rails
Rails apps are particularly good at confusing the default autovacuum settings, for three reasons.
First, Active Record loves UPDATE. A typical User.find(1).update(last_seen_at: Time.current) is a full-row update that produces a dead tuple, even if only one column changed. Multiply by a session-tracking middleware and you have a hot table generating millions of dead tuples per day.
Second, Rails apps tend to have a few enormous tables (events, audits, jobs, sessions) and many small ones. Default autovacuum thresholds are scaled by percentage of table size, which means very large tables get vacuumed almost never, while tiny tables get vacuumed constantly.
Third, background job systems amplify this. If you are running Solid Queue or Sidekiq, the jobs table churns constantly — every enqueue is an INSERT, every completed job is a DELETE. Without tuning, the jobs table is the first thing to bloat.
Postgres autovacuum tuning for Rails is not optional past a certain scale. It just looks optional until the night it isn’t.
How to See If Autovacuum Is Behind
Before tuning anything, measure. The two views that matter live in the system catalog. Drop these into a Rake task or a Rails runner script.
# lib/tasks/postgres_health.rake
namespace :db do
desc "Show table bloat and last vacuum stats"
task vacuum_status: :environment do
sql = <<~SQL
SELECT
schemaname,
relname AS table_name,
n_live_tup AS live_rows,
n_dead_tup AS dead_rows,
ROUND(n_dead_tup::numeric / NULLIF(n_live_tup, 0), 2) AS dead_ratio,
last_autovacuum,
last_autoanalyze
FROM pg_stat_user_tables
WHERE n_live_tup > 1000
ORDER BY n_dead_tup DESC
LIMIT 20;
SQL
ActiveRecord::Base.connection.execute(sql).each do |row|
puts row.inspect
end
end
end
A healthy table has a dead_ratio under about 0.2 and a last_autovacuum within the last day or two. If you see a 2 GB table with a dead ratio of 5 and last_autovacuum two months ago, you have found your problem.
For wraparound risk specifically, watch the age of the oldest unfrozen transaction:
ActiveRecord::Base.connection.execute(<<~SQL).to_a
SELECT
relname,
age(relfrozenxid) AS xid_age,
pg_size_pretty(pg_table_size(oid)) AS size
FROM pg_class
WHERE relkind = 'r'
ORDER BY age(relfrozenxid) DESC
LIMIT 10;
SQL
Postgres triggers anti-wraparound autovacuum at autovacuum_freeze_max_age (default 200 million). The hard stop is at 2 billion. If your largest table is showing an xid_age north of a billion you are in the danger zone — schedule a manual VACUUM FREEZE during low traffic and start tuning aggressively.
The Defaults That Will Hurt You
Postgres ships with these autovacuum thresholds:
autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.2
autovacuum_analyze_threshold = 50
autovacuum_analyze_scale_factor = 0.1
autovacuum_vacuum_cost_limit = -1 (uses vacuum_cost_limit = 200)
autovacuum_vacuum_cost_delay = 2ms (Postgres 12+)
autovacuum_max_workers = 3
The dangerous one is autovacuum_vacuum_scale_factor = 0.2. That means autovacuum will not even consider vacuuming a table until 20% of it is dead. On a 10 GB table that is 2 GB of dead tuples before anything happens. On a 100 GB table that is 20 GB of dead tuples — at which point the vacuum itself takes hours, blocks other autovacuum workers, and probably gets cancelled by your deploy pipeline.
autovacuum_vacuum_cost_limit = 200 is also too conservative for any modern SSD. It throttles vacuum throughput so aggressively that on a busy table autovacuum can never catch up.
A Sane Baseline for Rails Production
Here is the configuration I drop into almost every Rails Postgres instance over 50 GB. Put it in postgresql.conf or your managed-database parameter group:
# Run more workers, in parallel
autovacuum_max_workers = 6
# Wake up more often
autovacuum_naptime = 30s
# Way more aggressive throughput on SSD
autovacuum_vacuum_cost_limit = 2000
autovacuum_vacuum_cost_delay = 10ms
# Lower the scale factor globally so big tables vacuum sooner
autovacuum_vacuum_scale_factor = 0.05
autovacuum_analyze_scale_factor = 0.02
# Freeze proactively to avoid emergency anti-wraparound work
autovacuum_freeze_max_age = 400000000
vacuum_freeze_min_age = 50000000
This is a starting point, not a destination. The real wins come from tuning per table.
Per-Table Autovacuum Tuning From Rails Migrations
The mistake I see most often is teams trying to tune everything globally. The right move is to identify your three or four hottest tables and give them their own autovacuum policy. You can do this from a regular Rails migration — no need to leave Active Record.
class TuneAutovacuumForJobs < ActiveRecord::Migration[8.0]
def up
execute <<~SQL
ALTER TABLE solid_queue_jobs SET (
autovacuum_vacuum_scale_factor = 0.01,
autovacuum_analyze_scale_factor = 0.01,
autovacuum_vacuum_cost_limit = 4000,
autovacuum_vacuum_cost_delay = 5
);
SQL
end
def down
execute <<~SQL
ALTER TABLE solid_queue_jobs RESET (
autovacuum_vacuum_scale_factor,
autovacuum_analyze_scale_factor,
autovacuum_vacuum_cost_limit,
autovacuum_vacuum_cost_delay
);
SQL
end
end
A 1% scale factor on a high-churn jobs table means autovacuum kicks in much earlier and finishes much faster. Each individual run does less work, holds locks for shorter periods, and stays out of the way of foreground traffic.
For append-mostly tables (events, audit logs), the analyze threshold matters more than the vacuum threshold. New rows are constantly being added, the planner needs fresh statistics, but there are very few dead tuples to reclaim:
class TuneAutovacuumForEvents < ActiveRecord::Migration[8.0]
def up
execute <<~SQL
ALTER TABLE events SET (
autovacuum_analyze_scale_factor = 0.005,
autovacuum_vacuum_scale_factor = 0.1,
autovacuum_freeze_max_age = 200000000
);
SQL
end
end
For tables you literally only ever insert into and never update or delete, you can effectively disable bloat-driven vacuum and let only freeze-driven vacuum run.
Watching for Vacuum That Cannot Make Progress
The most insidious failure mode is autovacuum that runs constantly but never reclaims space. This happens when there is a long-running transaction holding a snapshot. Postgres cannot reclaim a dead tuple if any open transaction might still need to see it.
The usual culprits in a Rails app:
- A Sidekiq job that opened a transaction and got stuck on an external API call.
- A Rails console session somebody forgot about (this is the classic).
- A long-running analytics query started from a read replica.
- A
pg_dumprunning against the primary instead of a replica.
Find them with:
SELECT
pid,
now() - xact_start AS xact_age,
state,
query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY xact_start ASC
LIMIT 10;
Anything older than a few minutes deserves a look. Anything older than an hour is almost always wrong. I add a Prometheus alert that fires if there is an open transaction over 15 minutes, because that single signal catches more incidents than any other Postgres metric I monitor. Pair this with the observability patterns I covered for Rails 8 and you will catch bloat before it catches you.
Manual VACUUM and Why You Sometimes Need It
Autovacuum is intentionally polite. It uses cost-based throttling, yields to other queries, and never holds the locks needed to actually shrink files on disk. For genuine recovery from a bloat incident you need manual intervention.
For statistics only:
ANALYZE VERBOSE users;
For dead-tuple reclamation without locking out writes:
VACUUM (VERBOSE, ANALYZE) users;
For shrinking the file on disk (this requires an exclusive lock):
VACUUM FULL users;
VACUUM FULL is the nuclear option — it rewrites the entire table and blocks all reads and writes for the duration. On a 200 GB table this means hours of downtime. Do not run VACUUM FULL on a live production table without a maintenance window.
The modern answer is pg_repack, which does the same physical compaction concurrently with normal traffic. Install the extension, and from a Rake task:
namespace :db do
desc "Repack a bloated table without downtime"
task :repack, [:table] => :environment do |_, args|
system("pg_repack --no-superuser-check --table=#{args[:table]} myapp_production")
end
end
Run it from a worker box, not your application servers. It will use roughly the size of the table in extra disk space while it works, so make sure you have headroom.
Putting It Together: A Quarterly Postgres Health Ritual
The teams that never have an autovacuum incident are not lucky. They have a ritual. Mine looks like this:
- Weekly: check the bloat query above. Anything with a dead ratio over 1.0 gets investigated.
- Monthly: review
pg_stat_user_tablesfor tables wherelast_autovacuumis older than the table’s typical churn would suggest. - Quarterly: review per-table autovacuum settings against actual workload. Promote any newly-hot tables to per-table tuning.
- On every deploy: monitor for transactions held open by background jobs longer than expected.
Combined with sane Postgres connection pooling via PgBouncer and disciplined database indexing strategies, Postgres autovacuum tuning is what keeps a Rails app running on the same database for a decade. Skip it and you eventually get the Sunday morning phone call.
Frequently Asked Questions
How often should autovacuum run on a Rails Postgres database?
It depends on table churn, but for a high-write table like a jobs queue or sessions table you want autovacuum running every few minutes. For mostly-read tables once a day is fine. The right metric is not frequency but lag — if n_dead_tup keeps growing between runs you are behind.
What is the difference between VACUUM and VACUUM FULL?
Regular VACUUM reclaims space inside the existing table file so future inserts can reuse it, but does not return disk space to the operating system. VACUUM FULL rewrites the table from scratch, returns disk space, and requires an exclusive lock that blocks all reads and writes. Use pg_repack instead of VACUUM FULL in production.
Can I disable autovacuum on a high-traffic table?
Technically yes, practically never. Disabling autovacuum means you take responsibility for vacuuming and freezing manually, and missing a freeze leads to wraparound shutdown. The right move is per-table tuning to make autovacuum more aggressive, not turning it off.
Why does Postgres autovacuum keep getting cancelled?
Autovacuum yields to any operation that needs a conflicting lock — typically ALTER TABLE, CREATE INDEX, or DROP TABLE. Rails migrations are the usual cause. If you see autovacuum cancellations clustering around deploy times, schedule heavy schema changes in a separate window from autovacuum-critical tables, or use CONCURRENTLY variants where possible.
Need help diagnosing or fixing Postgres performance in production Rails? TTB Software specializes in Rails infrastructure, database tuning and fractional CTO engagements. We’ve been doing this for nineteen years.
About the Author
Roger Heykoop is a senior Ruby on Rails developer with 19+ years of Rails experience and 35+ years in software development. He specializes in Rails modernization, performance optimization, and AI-assisted development.
Get in TouchRelated Articles
Need Expert Rails Development?
Let's discuss how we can help you build or modernize your Rails application with 19+ years of expertise
Schedule a Free Consultation