Technical Due Diligence on a Rails Codebase: What I Actually Check
Six months ago a private equity firm called me about a SaaS acquisition. Mid-sized Rails app, about twelve years old, 40,000 paying customers, asking price in the eight figures. They’d done the financial due diligence. They wanted someone to look at the code.
I had a week. Here’s what I looked at.
The Gemfile First
Before I read a single model or controller, I open the Gemfile. What I’m looking for isn’t the list of gems — it’s the shape of the list.
A healthy Gemfile has a clear narrative. Production dependencies, test dependencies, development tools, clearly separated. The gem versions are pinned or have sensible constraints. There’s a recent Gemfile.lock committed.
This particular app had 187 gems. Some had been commented out but not removed. Several duplicated functionality — two gems for pagination, three for authentication (none of them Devise), one for “file uploads” that was six years past its last commit. The Rails version was 6.0. Current is 8.0.
A stale Gemfile tells you about culture. Either the team doesn’t do regular maintenance passes, or they’re too scared of breaking things to remove dependencies. Both are problems, but different ones. The first is laziness. The second is fear — which usually means inadequate test coverage.
I also check bundle outdated and look at the tail: gems that haven’t released a version in two-plus years and are still in active use. Those are future security liabilities. Then I check for native extensions, because they complicate deployment and can cause subtle version conflicts that only surface at 2 AM on a Tuesday.
Migrations Tell the History
ls -la db/migrate/ | wc -l
This app had 847 migration files. I sampled every hundredth one. What I’m looking for: are they reversible? Do they use disable_ddl_transaction! appropriately? Are there change_column calls that would have caused table rewrites on a live database?
Migration hygiene reflects deployment discipline. If I find migrations that clearly would have caused production downtime — adding non-null columns without defaults on large tables, missing algorithm: :concurrently on indexes — I know the team either had scheduled maintenance windows, or they shipped fast and absorbed the incidents.
# The wrong way — locks the entire orders table on a 50M-row dataset
add_index :orders, :customer_id
# The right way
class AddCustomerIdIndexToOrders < ActiveRecord::Migration[7.1]
disable_ddl_transaction!
def change
add_index :orders, :customer_id, algorithm: :concurrently
end
end
Neither answer is automatically disqualifying. But I need to know which pattern the team followed, because it determines the risk profile of future schema work.
I also look at the overall schema. This app had 94 tables. That’s not outrageous for a twelve-year-old product, but I spotted 11 tables with fewer than 50 rows in production that looked like artifacts of features that shipped quietly and died quietly. Dead schema is harmless technically, but it’s a signal: nobody’s doing cleanup.
Test Coverage: The Number Is a Lie
The team told me they had 87% test coverage. That sounds good. It isn’t necessarily.
I ran the suite:
time bundle exec rails test
Four minutes twenty-three seconds. For a twelve-year-old app with 87% coverage, that’s either a very lean test suite or a very fast machine. Turned out it was lean — 1,100 tests, almost all unit tests, almost no integration tests. The coverage number came from exercising individual methods in isolation.
What was missing: tests that exercise the full stack. Controller tests that verify authorization, not just response codes. Integration tests that simulate what a customer actually does across multiple steps. Tests for the billing flow — because the billing flow is where bugs are most expensive.
High unit coverage with low integration coverage is worse than the inverse, in my opinion. Unit tests tell you your individual components work in isolation. They don’t tell you your system works. The system is what your customers experience.
I also check the CI configuration. If tests only run on the main branch, the team isn’t catching regressions before they merge. If the suite has intermittently failing tests that are suppressed or skipped with # rubocop:disable or skip comments and a TODO that’s two years old — that’s technical debt with teeth.
Database Health in Five Queries
I ask for read-only access to the production database. If they won’t give it to me, I ask for the output of these queries instead:
-- Table bloat: has autovacuum kept up?
SELECT relname, n_dead_tup, n_live_tup,
round(n_dead_tup::numeric / nullif(n_live_tup, 0) * 100, 2) as dead_pct
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;
-- Slow queries, ranked by mean execution time
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;
-- Missing indexes: sequential scans on large tables
SELECT schemaname, relname, seq_scan, seq_tup_read,
idx_scan, seq_tup_read / seq_scan AS avg_tuples_per_scan
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 10;
-- Long-running queries right now
SELECT pid, now() - query_start AS duration, query, state
FROM pg_stat_activity
WHERE (now() - query_start) > interval '30 seconds'
AND state != 'idle';
This app had significant table bloat on the events table — 40% dead tuples. Autovacuum wasn’t keeping up, probably because the table was receiving constant writes. A bloated table degrades index performance over time. Not an emergency, but not free to fix either.
The slow query log showed seventeen queries averaging over 200ms. Three were classic N+1 patterns. Fourteen were legitimate slow queries — aggregations over large time ranges — that would need index tuning or query rewrites. That’s several weeks of database work that doesn’t show up in any financial model.
Database health is the fastest signal of operational discipline. A tidy database means the team cares about performance. A bloated, slow database means they’re shipping features and ignoring the engine warning light.
The Deployment Story
How does code get from a developer’s laptop to production? I ask to see the deployment process, not the documentation — I watch someone actually do it, or I read the actual CI/CD configuration.
This app was deploying via a shell script that SSHed into three servers and ran git pull. No staging environment. No blue-green deployment. No health checks before traffic hit new code. Deployments happened during business hours.
That’s terrifying. One bad commit touches all three servers simultaneously. There’s no smoke test before customers see the new code. There’s no rollback strategy beyond git revert and hoping you can do it faster than the support tickets pile up.
What I want to see: automated deployments triggered by CI passing, at minimum a staging environment, some form of health check before traffic shifts, and a documented rollback procedure. Kamal, Capistrano, Heroku releases, GitHub Actions — the tool doesn’t matter. The discipline does.
The absence of a staging environment is the finding that most surprises non-technical acquirers. “But the developers must test locally before pushing,” they say. Yes. And local environments are not production. They never have been.
Security: The Non-Negotiables
I don’t do penetration testing in a week. But I check the obvious things.
Brakeman. Run it, read every finding:
bundle exec brakeman -q
This app had 12 high-severity findings and 31 medium-severity ones. The highs included two potential SQL injection vulnerabilities via string interpolation in raw queries. The kind of thing five minutes of review would catch. Someone had written:
# Dangerous — user input directly in SQL string
User.where("name = '#{params[:name]}'")
# Safe
User.where(name: params[:name])
That’s Rails 101. Finding it in a mature codebase tells you something about the review culture.
Credential storage. Are secrets in environment variables or in the codebase? I’ve seen API keys committed in git history. Once they’re there, they’re compromised — even if you delete them later, because git history is forever unless you rewrite it:
# Check git history for committed secrets
git log --all -S "password" --oneline | head -20
git log --all -S "secret_key" --oneline | head -20
Dependency vulnerabilities:
bundle exec bundler-audit check --update
This app had three gems with known CVEs. Two medium severity, one high. All three had patched versions available. Nobody had noticed in four months.
Authorization boundaries. Are admin routes protected at the controller level, not just the view level? Can I access /admin by typing the URL even if there’s no link to it? In this app: yes. Two admin endpoints were protected only by hiding the navigation link. Security through obscurity is not security.
The Bus Factor
The bus factor is the answer to: how many people need to leave before this codebase becomes unmaintainable?
I look at the git log:
git shortlog -sn --all | head -20
This app had 14 contributors over its lifetime. Two of them accounted for 73% of commits. One had left the company a year earlier. The other was the CTO being replaced as part of this acquisition.
I then look at which files have only one contributor:
git log --format="%ae" -- app/models/subscription.rb | sort -u
subscription.rb had been touched by exactly one person. That person was leaving. The subscription model is the most important file in a SaaS application.
Bus factor analysis isn’t about blame. It’s about risk. Concentrated knowledge that lives in someone’s head rather than in documentation, tests, or readable code is a liability that doesn’t appear on the balance sheet. It shows up three months post-acquisition when the key engineer is gone and nobody can explain why the proration logic works the way it does.
What the Code Can’t Tell You
The most important question isn’t in the codebase: what does it take to on-board a new developer?
Ask to set up the development environment from scratch, following only the documented instructions. If it takes more than two hours, or if the README has steps that no longer work, you’re looking at a team that hasn’t on-boarded anyone recently. In a twelve-year-old app, that either means the team is extremely stable (a possible good sign) or nobody new has tried to join (a possible bad sign).
Talk to the developers who are staying. Not about the code — about what frustrates them. What do they dread shipping? What parts of the codebase do they avoid touching? The answers will locate your highest-risk areas faster than any automated analysis tool.
One question I always ask individually, not in a group: “If you could rewrite one part of this system, what would it be and why?” The consistency of answers across a team tells you where the real problem areas are. When three different engineers independently name the same subsystem, that’s where your post-acquisition technical risk lives.
What This Assessment Concluded
The PE firm got my report two days before the deadline. My summary: the codebase is acquirable but not at the asking price. The security findings required immediate remediation. The test coverage number was misleading — the actual coverage of critical paths was far lower. The deployment process needed a complete overhaul before the team could safely scale. The bus factor on the subscription model was a genuine business risk, not just a technical one.
They used the report to renegotiate. The deal closed at a lower price. I spent the following three months helping fix the things I’d identified.
After nineteen years of Rails, I’ve done this kind of assessment maybe thirty times. The specifics vary. The patterns don’t. Most problems in a mature Rails codebase aren’t surprising — they’re predictable accumulations of shortcuts that made sense in the moment. Understanding which shortcuts were taken, and why, tells you more about what the acquisition will actually cost than any line item in the financial model.
Frequently Asked Questions
How long does a thorough technical due diligence take?
For a serious pre-acquisition assessment, I budget five to seven days minimum: two days for automated analysis and code review, two days for database and infrastructure review, one to two days for developer interviews and writeup. Rushing it produces confident-sounding reports that miss important things. If you need a preliminary read in 48 hours, you can get it — but be clear that it’s preliminary.
Can you do due diligence without production database access?
You can do partial due diligence. Code review, static analysis, test suite, and CI configuration are all accessible without production access. What you lose is real performance data — actual table sizes, query performance under load, vacuum health. If the target company won’t provide read-only production access, ask for a pg_dump --schema-only plus the output of the diagnostic queries above. Their willingness to provide that output tells you something too.
What’s the most common finding that surprises acquirers?
Test coverage numbers that look good but cover the wrong things. An acquirer sees “87% test coverage” and assumes the codebase is solid. What that number usually means is that 87% of methods were invoked at least once during the test run. It says nothing about whether the tests verify correct behavior, catch regressions, or exercise the critical paths that actually matter to the business. Always ask to see the coverage breakdown by directory, and always ask what’s specifically not covered.
Evaluating a Rails acquisition, or about to join a company as technical leadership? TTB Software does independent technical due diligence and fractional CTO engagements. Nineteen years of Rails, thirty-odd assessments, no surprises we haven’t seen before.
About the Author
Roger Heykoop is a senior Ruby on Rails developer with 19+ years of Rails experience and 35+ years in software development. He specializes in Rails modernization, performance optimization, and AI-assisted development.
Get in TouchRelated Articles
Need Expert Rails Development?
Let's discuss how we can help you build or modernize your Rails application with 19+ years of expertise
Schedule a Free Consultation