Rails pg_stat_statements: Find Slow Queries in Production Before Users Do
Rails pg_stat_statements setup, query, and analysis guide: find the slow queries actually hurting production, normalize fingerprints, and prioritize fixes by total time.
The page was slow. Not catastrophically slow — just slow enough that the product manager kept saying things like “feels sluggish” and the on-call engineer kept replying “it’s fine on my machine.” APM said the median request was 180 milliseconds. The p95 was 2.4 seconds. Nobody could agree on which query was guilty, and the discussion had been going in circles for three weeks.
I asked one question. “Is pg_stat_statements turned on?” Nobody knew. We turned it on, waited four hours, and ran one query against pg_stat_statements. The answer was obvious within sixty seconds: a single ActiveRecord scope used on the dashboard was responsible for forty-one percent of all database time across the entire fleet. Not the slowest query in absolute terms. Not the most frequent. But the one whose product of frequency and per-call cost was destroying the cluster.
After nineteen years of Rails, pg_stat_statements is the single most underused Postgres extension in production Rails deployments. It is also the cheapest, fastest path from “the app feels slow” to “here is the exact query to fix.” If you are running Rails on Postgres and you have not enabled it, you are flying blind.
What Rails pg_stat_statements Actually Does
pg_stat_statements is a Postgres extension that fingerprints every SQL statement executed against the database, then aggregates execution statistics per fingerprint. A fingerprint is the query text with literal values replaced by placeholders, so these two queries collapse into the same row:
SELECT * FROM users WHERE email = 'alice@example.com';
SELECT * FROM users WHERE email = 'bob@example.com';
Both become:
SELECT * FROM users WHERE email = $1;
For each fingerprint the extension tracks total executions, total time, mean time, min, max, standard deviation, rows returned, shared buffer hits, blocks read from disk, and several other counters. The output is a single view — pg_stat_statements — that gives you a flat ranking of where your database actually spends its time.
This matters because slow query logs lie. Slow query logs surface individual queries that crossed a threshold. They miss the query that runs ten thousand times an hour at 50ms each. That query is invisible to log_min_duration_statement = 500 but represents 500 seconds of database time per hour. pg_stat_statements catches it on the first sort.
Enabling Rails pg_stat_statements in Production
The extension ships with Postgres but is not loaded by default. You need three things: a shared_preload_libraries entry, a configuration block, and a CREATE EXTENSION call inside the database.
In postgresql.conf (or your managed provider’s parameter group):
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000
pg_stat_statements.track = top
pg_stat_statements.track_utility = off
pg_stat_statements.save = on
Restart Postgres. This is not optional — shared_preload_libraries only takes effect at startup. On RDS or Cloud SQL, modify the parameter group and reboot the instance.
Then enable the extension in the actual database. Add a Rails migration:
class EnablePgStatStatements < ActiveRecord::Migration[8.0]
def up
execute "CREATE EXTENSION IF NOT EXISTS pg_stat_statements"
end
def down
execute "DROP EXTENSION IF EXISTS pg_stat_statements"
end
end
Run the migration. Verify it worked:
SELECT extname, extversion
FROM pg_extension
WHERE extname = 'pg_stat_statements';
The first query will probably return one row almost immediately. From now on every statement Rails sends gets aggregated into a fingerprint counter.
The overhead is negligible. On every Postgres deployment I have measured it on, pg_stat_statements costs less than one percent of CPU and a few megabytes of shared memory. The cost of leaving it off is enormous; the cost of leaving it on is invisible.
The One Query That Matters Most
Newcomers to pg_stat_statements open the view, see twenty-eight columns, and freeze. Ignore most of them. The one query you will run ninety percent of the time is this:
SELECT
query,
calls,
ROUND(total_exec_time::numeric / 1000, 2) AS total_seconds,
ROUND(mean_exec_time::numeric, 2) AS mean_ms,
ROUND((100 * total_exec_time
/ SUM(total_exec_time) OVER ())::numeric, 2) AS pct_total
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
This ranks every query fingerprint by total wall-clock time spent in the database. The top of this list is where your database time goes. Period.
Not mean time. Mean time tells you which individual query was slow. Total time tells you which fingerprint is costing you the most money and slowing down the most users. A query that runs once and takes ten seconds matters less than a query that runs a million times at 10ms.
Run that query against production after the extension has been collecting for at least a few hours. The first few rows are almost always surprising — not because anyone is incompetent, but because production traffic has a different shape than developer intuition does. The top three queries on most Rails apps I audit are some combination of: a session lookup, a permissions check on every request, and a “recent activity” feed on the dashboard.
Reading the Output Like an Engineer, Not a Tourist
mean_exec_time is mean execution time per call. Fast queries with low mean time can still dominate total_exec_time if they run constantly. Slow queries with high mean time may not matter if they run once a day.
calls is the call count since the extension last reset. Pay attention to this. A query that runs ten million times in an hour but takes 0.5ms is a candidate for caching, not query optimization — you cannot make it much faster, you can only make it run less often. A query that runs ten times at five seconds each is a candidate for an index or a rewrite.
stddev_exec_time tells you how consistent the query is. High standard deviation means the query is fast most of the time but occasionally catastrophic — usually a sign of a query plan that flips between an index scan and a sequential scan based on parameter values. These are the queries that pass code review and then page someone at 3am.
shared_blks_hit versus shared_blks_read tells you cache behavior. Hits are pages found in Postgres’ shared buffer. Reads are pages fetched from disk (or the OS page cache, which is still faster than disk). A query with high shared_blks_read relative to shared_blks_hit is doing I/O. Either it is touching too much data or your shared_buffers is too small.
rows divided by calls tells you how much data each call returns on average. A query that returns three rows per call but reads a million blocks is reading and filtering most of a table. That is your invitation to add an index.
Rails-Specific Patterns That Appear at the Top
Three patterns dominate the top of pg_stat_statements on almost every Rails app I have ever audited. Each has a known fix.
The N+1 you missed. A query like SELECT * FROM comments WHERE post_id = $1 with thirty million calls and zero mean time per call is an N+1. The fix is an includes or preload somewhere — the trick is figuring out where. Add an APM trace or use strict loading in development to catch the call sites. pg_stat_statements tells you the N+1 exists; tracing tells you where it lives.
The “active scope” that scans the world. A query like SELECT * FROM orders WHERE status IN ($1,$2,$3) AND created_at > $4 with high mean time and high shared_blks_read is usually a missing composite index. Look at the WHERE clause, decide which columns are most selective, and add the index. Half the time the query also has an ORDER BY that the index can satisfy, turning a sort into a no-op.
The session lookup on every request. Devise users see a users lookup by id at the top of the list with a calls count equal to total request count. This is not a bug — it is every authenticated request — but it can dominate database time if the lookup is doing more work than necessary. Check that the users table is being looked up by primary key (it should be) and that you are not eager-loading a heavy association in your current-user lookup.
The pattern: scan the top twenty rows once a week. The fixes are usually small. The wins compound.
Normalizing Queries Across Environments
pg_stat_statements normalizes literal values but does not normalize structurally different queries. Two queries that differ only in their column list are different fingerprints. This bites Rails apps because ActiveRecord’s select(:id, :name) versus select(:id, :name, :email) produces two fingerprints, each with a fraction of the calls.
You cannot fix this from Postgres. You can fix it from Rails. Inspect the top of the list for fingerprints that are clearly the same logical query with different column lists. Decide on a canonical projection and use it everywhere. Most of the time the fix is removing a stray .pluck(:id, :name) or .select(...) that snuck into a serializer.
The same thing happens with IN lists of varying length. WHERE id IN ($1) and WHERE id IN ($1, $2) are different fingerprints in older Postgres versions. Postgres 14 and later normalize these into a single fingerprint with a generic array placeholder, which is one of the better reasons to be on a current Postgres version.
Resetting the Statistics for a Focused Investigation
pg_stat_statements accumulates counters since the last reset. For a focused investigation — “what is slow during the morning peak?” — reset the counters at the start of the window and read them at the end:
SELECT pg_stat_statements_reset();
Reset, wait an hour, run the top-20 query. You get a clean picture of what dominated database time during that specific window, uncontaminated by yesterday’s batch jobs and last week’s migrations.
Do this when you are chasing something specific. Do not do it routinely — the long-running aggregate is more useful for capacity planning and noticing slow drift over time.
Wiring pg_stat_statements Into a Weekly Cadence
The mistake teams make is treating pg_stat_statements as an incident tool. They turn it on during a fire, find one bad query, then forget about it. The right pattern is a weekly cadence.
I have my Rails clients run a small ActiveRecord-based report once a week against pg_stat_statements, dump the top twenty into a Slack channel, and review it in the team’s regular engineering sync. Total cost: ten minutes of human attention per week. Total benefit: every regression and every slow drift is caught within a week, and the team gradually learns what their database actually does.
A minimal version of the report:
class DatabaseHotspots
def self.top(limit: 20)
sql = <<~SQL
SELECT
query,
calls,
ROUND(total_exec_time::numeric / 1000, 2) AS total_seconds,
ROUND(mean_exec_time::numeric, 2) AS mean_ms,
ROUND((100 * total_exec_time
/ SUM(total_exec_time) OVER ())::numeric, 2) AS pct_total
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT $1
SQL
ActiveRecord::Base.connection.exec_query(sql, "DatabaseHotspots", [limit])
end
end
Wire that into a recurring Solid Queue job that runs every Monday morning, posts the result to Slack, and you have institutionalized database observability with about thirty lines of code.
When pg_stat_statements Is Not Enough
pg_stat_statements tells you which queries hurt. It does not tell you why they hurt. For that you need EXPLAIN ANALYZE on the actual query plan, and possibly auto_explain for capturing plans of slow queries in production without manual intervention.
It also does not tell you about lock waits. A query with high mean_exec_time and low shared_blks_read is often waiting on a lock rather than doing work. For lock investigation you want pg_locks and pg_stat_activity, not pg_stat_statements.
And it does not catch queries that fail. A query that times out and is rolled back contributes its execution time to the aggregate, but if you have a query that is killed by a statement timeout every time it runs, you will see it in pg_stat_statements as having high mean time and low call count. The information is there if you look for it.
The point is not that pg_stat_statements is the only tool. The point is that it is the cheapest, highest-signal tool you can add to a Rails Postgres deployment, and almost nobody has it on.
Frequently Asked Questions
How much overhead does Rails pg_stat_statements add in production?
In every Postgres deployment I have measured, pg_stat_statements adds less than one percent CPU overhead and a few megabytes of shared memory. The default pg_stat_statements.max = 5000 is fine for most Rails apps; I bump it to 10000 on apps with many distinct query fingerprints. The cost is negligible compared to the visibility you gain.
Should I sort by total_exec_time or mean_exec_time when finding Rails slow queries?
Sort by total_exec_time first. That ranks queries by total wall-clock time they cost your database, which is what actually slows down your application and racks up your cloud bill. mean_exec_time is useful as a secondary view — once you know which query dominates total time, mean time tells you whether to optimize the query itself or the call rate. A query at the top of total_exec_time with low mean time is a candidate for caching; one with high mean time is a candidate for query rewriting or an index.
Does Rails pg_stat_statements work with PgBouncer transaction pooling?
Yes. pg_stat_statements runs inside Postgres itself, so it sees every statement regardless of how the connection was pooled. The view aggregates by query fingerprint across all backends. If you are running PgBouncer transaction pooling, enable pg_stat_statements on the Postgres side, not on PgBouncer, and you will get a complete picture of every query your Rails app sends through the pooler.
How do I reset pg_stat_statements counters in production safely?
Call SELECT pg_stat_statements_reset(); from a session with sufficient privileges. This resets the aggregate counters but does not stop collection — new statements continue to accumulate immediately. Resetting is safe during business hours; the only “cost” is losing your historical aggregate. For a focused investigation, reset before the window of interest, wait, then query. For routine monitoring, leave the aggregate running for weeks at a time and only reset when you want a clean slate after a major deployment.
Need help wiring database observability into your Rails app, or chasing down the slow queries pg_stat_statements just surfaced? TTB Software specializes in Rails performance and Postgres production tuning. We have been doing this for nineteen years.
Related Articles
Rails 8 SQLite Production: WAL Mode, Litestream Backups, and When to Choose SQLite Over Postgres
Rails 8 SQLite production setup: WAL mode pragmas, Litestream continuous backups, Kamal deployment with volumes, and ...
Rails State Machine: AASM Patterns for Orders, Subscriptions, and Workflows in Production
Rails state machine with AASM: production patterns for orders, subscriptions, and workflows. Guards, callbacks, optim...
Rails insert_all and upsert_all: Bulk Database Operations That Skip the ORM Overhead
Rails insert_all and upsert_all for bulk database operations: skip callbacks, handle conflicts, returning IDs, and be...