RUBY ON RAILS · 16 MIN READ ·

Rails Postgres EXPLAIN ANALYZE: Reading Query Plans to Fix Slow Rails Queries

Rails Postgres EXPLAIN ANALYZE reveals where queries spend their time. Read plans, spot Seq Scans, fix N+1s, and tune indexes for production Rails apps.

Rails Postgres EXPLAIN ANALYZE: Reading Query Plans to Fix Slow Rails Queries

A founder of a B2B SaaS pinged me on a Sunday afternoon last November with a screenshot. The dashboard page that loaded in 200ms in staging was taking 4.2 seconds in production. The data was the same shape. The Rails app was the same code. Skylight pointed at one ActiveRecord query, and the team had already added the obvious index. Nothing changed. I asked him to run one thing: EXPLAIN (ANALYZE, BUFFERS) <query> in production psql and paste me the output. Two minutes later he had his answer — Postgres was choosing a Sequential Scan on a 12 million row table because the planner’s row estimate was off by three orders of magnitude. The fix took eight minutes. The Sunday was saved by reading a query plan.

After nineteen years of Rails I can tell you with confidence: most production performance problems are not framework problems, they are query plan problems. Rails Postgres EXPLAIN ANALYZE is the single most valuable diagnostic tool I use, and most Rails developers never look at its output past the first line. This post is the working guide I hand to engineers when I drop into a slow-query firefight — what to actually read, what to ignore, and the patterns that show up over and over in Rails apps talking to Postgres.

Why Rails Postgres EXPLAIN ANALYZE Matters in Production

Adding an index does not always help. That sentence is heretical to newer Rails developers, who have been trained that every slow query gets an index. It is also true. Postgres uses a cost-based planner: it estimates how many rows each step will produce, then picks the strategy with the lowest estimated cost. When those estimates are wrong — because statistics are stale, because the column is correlated with another, because your data distribution is skewed — the planner picks the wrong strategy and your “indexed” query goes through a Sequential Scan anyway.

EXPLAIN ANALYZE is the only way to see this. EXPLAIN alone shows the plan the planner would choose. EXPLAIN ANALYZE actually executes the query and reports the real timing and real row counts alongside the estimates. The gap between estimated and actual is where almost every Rails-Postgres performance bug lives.

I tell every Rails team I work with: if you cannot read a query plan, you are guessing about your database. The N+1 detector in bullet and your APM dashboard tell you which query is slow. They do not tell you why. The plan tells you why, and the why is usually surprising.

Running EXPLAIN ANALYZE on a Rails Query

ActiveRecord makes this convenient. Any relation responds to .explain, and from Rails 7.1 onward you can pass options through:

User.joins(:orders)
    .where(orders: { status: "paid" })
    .where("orders.created_at > ?", 30.days.ago)
    .explain(:analyze, :buffers, :verbose)

That prints a real query plan to your Rails log or console, with actual timing and actual row counts. For production diagnostics I usually skip ActiveRecord and run the raw query in psql with \timing on, because I want full control over the analyze options and I want to see the planner’s behavior without Rails wrapping it.

EXPLAIN (ANALYZE, BUFFERS, VERBOSE, FORMAT TEXT)
SELECT users.* FROM users
INNER JOIN orders ON orders.user_id = users.id
WHERE orders.status = 'paid'
  AND orders.created_at > NOW() - INTERVAL '30 days';

A note of caution: EXPLAIN ANALYZE runs the query. On a SELECT that is fine. On an UPDATE or DELETE you must wrap it in a transaction and roll back, or you will silently mutate production data while diagnosing.

BEGIN;
EXPLAIN (ANALYZE, BUFFERS) DELETE FROM sessions WHERE expires_at < NOW();
ROLLBACK;

I have seen exactly one engineer in my career forget the ROLLBACK. They deleted six months of payment audit logs while trying to figure out why the cleanup job was slow. Always BEGIN and always ROLLBACK.

The Five Lines of a Query Plan That Actually Matter

A Postgres plan looks intimidating because it is dense. Ninety percent of the diagnostic value is in five things. Once you can find them, you can debug almost any slow Rails query.

1. The scan type at the bottom. Every plan ends with one or more scans against tables. The three you care about are Seq Scan (read every row), Index Scan (read selected rows via an index), and Bitmap Heap Scan (use the index to build a bitmap, then read pages). A Seq Scan on a large table is usually wrong. A Bitmap Heap Scan for queries returning thousands of rows is usually right. An Index Scan for queries returning hundreds of rows is usually optimal.

2. The Rows estimate vs Rows actual. Every node shows (rows=N) for the estimate and actual rows=M for the truth. If the estimate is off by 10x or more, the planner is making decisions based on wrong information. That is the most common root cause of “I added the index and nothing got faster.”

3. The Buffers numbers. With BUFFERS enabled, you see shared hit=X read=Y. hit means the page was in Postgres’s cache, read means it had to go to disk. A query with 200,000 buffer reads on a 16GB server is hammering disk, and no amount of index tuning will fix it without addressing the data shape.

4. Total Execution Time at the bottom. This is the actual wall-clock time. If it is small but your Rails request is slow, the bottleneck is not the database — it is the network round trip, Active Record materialization, or rendering.

5. Filter and Rows Removed by Filter. When a node says Rows Removed by Filter: 1200000, Postgres scanned 1.2 million rows and threw most of them away. That is the smoking gun for a missing partial index or a poorly written WHERE clause.

If you read those five things, you can diagnose nearly any slow query in a Rails app. Everything else in the plan is supporting detail.

EXPLAIN Versus EXPLAIN ANALYZE in Rails Development

I use the two forms differently and on purpose.

EXPLAIN alone is fast and free. It shows the plan without executing. I use it constantly during development to sanity-check that the query I just wrote is going to hit the index I expect, before I write a test or push to CI.

EXPLAIN ANALYZE actually runs the query. It is the truth. I use it when the plan from EXPLAIN looks fine but the query is still slow, or when production behaves differently from staging. The execution overhead is small for fast queries and large for huge queries — there is a reason you only run it when you mean it.

EXPLAIN (ANALYZE, BUFFERS) is what I use in production diagnostics. The buffers data tells me whether the query is CPU-bound or IO-bound, which determines whether the fix is an index, a partial index, a covering index, or a config change.

There is also pg_stat_statements, which I treat as the first stop before EXPLAIN ANALYZE. It aggregates query stats over time and points you at which queries to investigate. EXPLAIN ANALYZE is the microscope you point at the queries pg_stat_statements identifies.

Common Patterns That Show Up in Rails Plans

After diagnosing slow Rails queries for nearly two decades, the same shapes keep appearing.

Sequential Scan on a soft-deleted column. Your where(deleted_at: nil) scope hits a table with 14 million rows, 50 of which are deleted. Postgres correctly decides that an index on deleted_at is useless because 99.9% of rows match — the index would be slower than the scan. The fix is a partial index: CREATE INDEX ... ON things(other_column) WHERE deleted_at IS NULL.

Bitmap Heap Scan with huge rechecks. You see Recheck Cond and Rows Removed by Index Recheck in the thousands. Your bitmap is too coarse, often because of low work_mem. Bump work_mem for the session, rerun, see if it switches to a clean Index Scan.

Nested Loop with bad row estimates. A Nested Loop join is fast when the outer side has very few rows. When the planner estimates 5 outer rows and the actual is 50,000, the inner index lookup runs 10,000 times more often than expected. The query takes 30 seconds. The fix is usually ANALYZE, sometimes CREATE STATISTICS on correlated columns.

Sort spilling to disk. A node with Sort Method: external merge Disk: 124000kB means Postgres ran out of work_mem and dumped the sort to disk. For analytic queries that ship to a dashboard, this is the silent killer — fine on a developer laptop, lethal on a constrained production replica.

Hash Join with massive memory. Hash nodes that show Buckets: 1024 Batches: 32 Memory Usage: 64MB mean the hash table did not fit in memory and got partitioned to disk. Same fix family as sort spills.

You will see these patterns over and over once you start reading plans. They form the bulk of Rails-Postgres performance work.

A Real Example: Turning a 3-Second Page Into 80ms

Let me make this concrete. A client’s admin dashboard listed recent paid orders from the last 30 days, joined to users, sorted by created_at desc, limit 50. The Rails query looked innocent:

class Admin::OrdersController < Admin::BaseController
  def index
    @orders = Order.where(status: "paid")
                   .where("created_at > ?", 30.days.ago)
                   .includes(:user, :line_items)
                   .order(created_at: :desc)
                   .limit(50)
  end
end

The page took 3.1 seconds. They had an index on orders(status) and an index on orders(created_at). EXPLAIN ANALYZE showed why it did not matter:

Limit  (cost=0.43..2412.18 rows=50)
       (actual time=2891.230..2891.245 rows=50 loops=1)
  Buffers: shared hit=18234 read=124518
  ->  Index Scan Backward using orders_created_at_idx on orders
        (cost=0.43..289412.83 rows=6000 width=237)
        (actual time=2891.228..2891.240 rows=50 loops=1)
        Filter: (status = 'paid'::text)
        Rows Removed by Filter: 1483291

The planner used the created_at index, walked it backwards, and then filtered by status — discarding almost 1.5 million rows in the process. Two single-column indexes did not help, because Postgres could only use one at a time for this query.

The fix was a composite index in the right column order:

class AddPaidOrdersCreatedAtIndex < ActiveRecord::Migration[8.0]
  disable_ddl_transaction!

  def change
    add_index :orders, [:status, :created_at],
              order: { created_at: :desc },
              algorithm: :concurrently,
              name: "index_orders_on_status_and_created_at"
  end
end

After the migration:

Limit  (cost=0.43..142.21 rows=50)
       (actual time=0.087..1.243 rows=50 loops=1)
  Buffers: shared hit=58
  ->  Index Scan using index_orders_on_status_and_created_at on orders
        (cost=0.43..17012.18 rows=6000 width=237)
        (actual time=0.085..1.236 rows=50 loops=1)
        Index Cond: (status = 'paid'::text)

Eighty milliseconds. Same data, same Rails code. The difference was reading the plan and seeing that two single-column indexes were never going to compose into the access pattern this query actually needed. This is the same principle behind good database indexing strategy — the index has to match the access path, not just the columns mentioned.

Tools and Workflow for EXPLAIN ANALYZE in Rails

Reading raw text plans is fine for small queries. For real ones, I lean on a few tools:

explain.dalibo.com — paste the plan, get a visual tree with hotspots highlighted. The Slow Node detection alone is worth the trip.

pev2 — the same engine you can self-host. I run this for clients who cannot paste production plans into a third-party site for compliance reasons.

pgMustard — paid, but excellent for engineers new to query plans. It explains why each node is the way it is, not just what it is.

ActiveRecord .explain — fine for development. I use it the way I use puts — quick, in-line, never committed.

pg_stat_statements — the first stop in production. It aggregates query stats so you know which queries to EXPLAIN ANALYZE in the first place. Pair it with your APM (Skylight, Scout, AppSignal) and you have a complete picture of where time is being spent. Combine with the discipline from opentelemetry production observability and you can correlate slow plans with the actual requests they ran inside.

Pitfalls Most Rails Developers Hit

A few things consistently trip people up.

Running EXPLAIN ANALYZE in development against a tiny dataset. Your dev database has 2,000 users. Production has 12 million. The planner picks completely different strategies. Always run analyze against a production-shaped dataset, ideally on a read replica.

Forgetting to ANALYZE after a bulk import. Postgres relies on statistics. If you just imported 5 million rows and never ran ANALYZE, the planner thinks the table is empty and will pick bad plans. Run ANALYZE table_name explicitly after bulk operations.

Comparing plan cost numbers across queries. The cost numbers are useful relative to each other within a single query. They are not comparable across different queries or different machines. Use actual time for cross-comparison.

Indexing every column the EXPLAIN output mentions. Every index slows down writes, takes disk space, and adds maintenance overhead. Only add indexes that fix a real plan problem you observed in EXPLAIN ANALYZE.

Trusting EXPLAIN without ANALYZE. Cost estimates can be wildly wrong. Actual time cannot. If a query is slow, you need EXPLAIN ANALYZE, not EXPLAIN.

FAQ

What’s the difference between EXPLAIN and EXPLAIN ANALYZE in Rails?

EXPLAIN shows the query plan Postgres would use, without running the query. It is fast and safe. EXPLAIN ANALYZE actually executes the query and reports real timing and row counts alongside the planner’s estimates. Use EXPLAIN for quick checks and EXPLAIN ANALYZE for real diagnostics, but never run EXPLAIN ANALYZE on an UPDATE or DELETE outside a transaction you intend to roll back.

How do I run EXPLAIN ANALYZE on an ActiveRecord query?

Call .explain(:analyze, :buffers) on any ActiveRecord relation in Rails 7.1+: User.where(active: true).explain(:analyze, :buffers). For more control, copy the SQL from .to_sql and run EXPLAIN (ANALYZE, BUFFERS) <query> directly in psql against your production or staging database.

Why does my query still use a Sequential Scan after I added an index?

Three common reasons. First, the table is small enough that Postgres correctly decides scanning is cheaper than indexing. Second, the column has low selectivity — most rows match the filter, so the index is not useful. Third, the planner’s statistics are stale and it is estimating row counts wrong. Run ANALYZE table_name and check the Rows estimate vs actual in your plan.

How do I read Buffers numbers in an EXPLAIN ANALYZE plan?

shared hit means the data page was in Postgres’s shared buffer cache (fast). shared read means it had to be fetched from disk (slow). High read numbers on a frequently-run query usually mean either insufficient cache memory, a missing index causing extra page scans, or both. Aim to keep frequently-accessed tables hot in shared buffers.

Need help fixing slow Rails queries before they wake your team at 3 AM? TTB Software tunes Rails-Postgres performance for startups and scale-ups, from query plans to indexes to schema design. We’ve been doing this for nineteen years.

#rails-postgres-explain-analyze #rails-slow-query-plan #postgres-query-optimization-rails #rails-database-performance #sequential-scan-rails #rails-index-tuning

Related Articles

Last section. Then please call.

It's a phone call. That's the worst it can get.

No discovery deck. No 45-minute "qualification" call. 30 minutes, your problem, my opinion. If we're a fit, you'll know by minute 12.

Direct line — answered by Roger
+31 6 5123 6132
Mon–Fri, 09:00–18:00 CET · Currently available

OR
info@ttb.software