DORA Metrics for Rails Teams: Tracking Deployment Frequency, Lead Time, and Change Failure Rate
DORA metrics for Rails teams: track deployment frequency, lead time, and change failure rate with GitHub Actions and ActiveRecord. Includes real code.
A few months ago I was brought in to assess a forty-person product team that was proud of their process. Two Jira boards covered in green ticks. Closed sprints everywhere. “We ship every two weeks,” the CTO said, with the confidence of someone who had just come back from a retro that went well.
Then I pulled the data.
Average time from first commit on a branch to that commit reaching production: twenty-three days. Change failure rate for the previous quarter: 41 percent. Mean time to restore after an incident: four hours and twelve minutes. They thought they were high performers. By DORA metrics they were solidly in the Low band, borderline Medium at best. The sprints were green because engineers were closing Jira tickets, not because software was shipping. Those are not the same thing.
The DORA metrics — Deployment Frequency, Lead Time for Changes, Change Failure Rate, and Time to Restore Service — come from the DORA research programme and were published by Nicole Forsgren, Jez Humble, and Gene Kim in “Accelerate” (2018). They are the most rigorously validated set of software delivery measurements we have. The research shows they correlate not only with delivery performance but with organisational outcomes: revenue growth, market share, and whether engineers stay at the company or leave to somewhere that ships. For every startup I work with as a fractional CTO, instrumenting these four numbers is one of the first things I do. Not because I plan to obsess over dashboards, but because you cannot have an honest conversation about an engineering team’s health until you know where it actually stands.
This post is the Rails implementation: the data model, the GitHub Actions wiring, the ActiveRecord queries for each metric, and the minimal admin dashboard that tells you in sixty seconds whether your team is improving or sliding.
What DORA Metrics Are (And Are Not)
DORA metrics are outcomes, not activities. They do not count commits, story points, lines of code, or test coverage percentage. They measure what the shipping machine produces: how often it delivers, how fast, how reliably, and how quickly it recovers. That distinction matters because activity metrics are easy to game. A team can produce a thousand commits a month and never ship anything a user sees. Outcome metrics describe what actually happened in production.
The four metrics and their performance bands, from the 2024 State of DevOps report:
| Metric | Elite | High | Medium | Low |
|---|---|---|---|---|
| Deployment Frequency | Multiple per day | Once per day to once per week | Once per week to once per month | Less than once per month |
| Lead Time for Changes | Under an hour | Between one day and one week | Between one week and one month | Between one month and six months |
| Change Failure Rate | Under 5% | 5–10% | 11–15% | Over 15% |
| Time to Restore Service | Under an hour | Under a day | Under a day | Between a day and a week |
Most early-stage Rails teams land in Medium on deployment frequency and Low on lead time. Change failure rate is the one nobody wants to look at because it requires admitting you have incidents. Start there anyway.
Deployment Frequency: Tracking Every Production Deploy
Deployment frequency is not “how often we merge pull requests.” It is how often code reaches the environment real users are on. On a Rails team using GitHub Actions for CI/CD, the cleanest approach is a webhook step at the end of your deploy workflow that notifies your own Rails app.
# .github/workflows/deploy.yml
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
# ... your build, test, and deploy steps ...
- name: Record deployment
if: success()
run: |
curl -s -X POST "$/webhooks/deployments" \
-H "Authorization: Bearer $" \
-H "Content-Type: application/json" \
-d '{
"sha": "$",
"ref": "$",
"actor": "$",
"run_id": "$"
}'
- name: Record rollback
if: failure()
run: |
curl -s -X POST "$/webhooks/deployments" \
-H "Authorization: Bearer $" \
-H "Content-Type: application/json" \
-d '{
"sha": "$",
"ref": "$",
"actor": "$",
"run_id": "$",
"status": "failed"
}'
On the Rails side, a Deployment model stores each event:
# db/migrate/20260630000001_create_deployments.rb
class CreateDeployments < ActiveRecord::Migration[8.0]
def change
create_table :deployments do |t|
t.string :sha, null: false
t.string :ref, null: false, default: "main"
t.string :actor
t.string :github_run_id
t.string :status, null: false, default: "success" # success, failed, rollback
t.integer :lead_time_seconds
t.integer :commit_count
t.jsonb :metadata, null: false, default: {}
t.datetime :deployed_at, null: false
t.timestamps
end
add_index :deployments, :deployed_at
add_index :deployments, :sha, unique: true
add_index :deployments, [:status, :deployed_at]
end
end
The webhook controller authenticates with a bearer token from credentials and creates the record:
# app/controllers/webhooks/deployments_controller.rb
class Webhooks::DeploymentsController < ApplicationController
skip_before_action :verify_authenticity_token
before_action :authenticate_deploy_token!
def create
deployment = Deployment.create!(
sha: params[:sha],
ref: params[:ref],
actor: params[:actor],
github_run_id: params[:run_id],
status: params[:status] || "success",
deployed_at: Time.current
)
DeploymentEnricherJob.perform_later(deployment.id)
render json: { id: deployment.id }, status: :created
rescue ActiveRecord::RecordNotUnique
head :ok # idempotent: same SHA already recorded
end
private
def authenticate_deploy_token!
token = request.headers["Authorization"].to_s.delete_prefix("Bearer ")
head :unauthorized unless ActiveSupport::SecurityUtils.secure_compare(
token, Rails.application.credentials.deploy_webhook_token!
)
end
end
Deployment frequency is then a single query:
# app/models/concerns/dora/deployment_frequency.rb
module Dora
module DeploymentFrequency
def self.deploys_per_day(period: 30.days.ago..Time.current)
count = Deployment.where(deployed_at: period, status: "success").count
days = (period.end - period.begin) / 1.day
count.to_f / days.to_f
end
def self.band(deploys_per_day)
case deploys_per_day
when (1.0..) then "Elite"
when (1.0 / 7..1.0) then "High"
when (1.0 / 30..1.0 / 7) then "Medium"
else "Low"
end
end
end
end
Lead Time for Changes: Where the Real Bottlenecks Hide
Lead Time for Changes measures from the first commit of a change to that commit reaching production. Not “how long does a sprint take.” The wall-clock time from “an engineer wrote a line of code” to “a user can see it.” Long lead times almost always mean batch-releasing, long-lived branches, or a review process nobody believes in anymore.
To measure it, you need to know which commits went out in each deploy. A background job fetches this from the GitHub API after each deploy is recorded:
# app/jobs/deployment_enricher_job.rb
class DeploymentEnricherJob < ApplicationJob
queue_as :default
def perform(deployment_id)
deployment = Deployment.find(deployment_id)
previous_sha = Deployment
.where(deployed_at: ...deployment.deployed_at, status: "success")
.order(deployed_at: :desc)
.pick(:sha)
return unless previous_sha
commits = GithubClient.commits_between(
base: previous_sha,
head: deployment.sha
)
return if commits.empty?
first_commit_at = commits.map { |c| c[:authored_at] }.min
return unless first_commit_at
deployment.update!(
lead_time_seconds: (deployment.deployed_at - first_commit_at).to_i,
commit_count: commits.size,
metadata: deployment.metadata.merge(
"first_commit_sha" => commits.last[:sha],
"first_commit_at" => first_commit_at.iso8601
)
)
end
end
The GitHub compare API does the heavy lifting:
# app/services/github_client.rb
class GithubClient
REPO = Rails.application.credentials.dig(:github, :repo)
TOKEN = Rails.application.credentials.dig(:github, :token)
def self.commits_between(base:, head:)
conn = Faraday.new("https://api.github.com") do |f|
f.response :raise_error
f.response :json
end
response = conn.get(
"/repos/#{REPO}/compare/#{base}...#{head}",
{},
{
"Authorization" => "Bearer #{TOKEN}",
"Accept" => "application/vnd.github+json",
"X-GitHub-Api-Version" => "2022-11-28"
}
)
response.body.fetch("commits", []).map do |c|
{
sha: c["sha"],
authored_at: Time.zone.parse(c.dig("commit", "author", "date"))
}
end
rescue Faraday::Error => e
Rails.logger.error("GithubClient#commits_between failed: #{e.message}")
[]
end
end
Lead time queries:
module Dora
module LeadTime
def self.median_seconds(period: 30.days.ago..Time.current)
times = Deployment
.where(deployed_at: period, status: "success")
.where.not(lead_time_seconds: nil)
.order(:lead_time_seconds)
.pluck(:lead_time_seconds)
return nil if times.empty?
times[times.size / 2]
end
def self.band(seconds)
return "Unknown" if seconds.nil?
case seconds
when (0...3_600) then "Elite" # under an hour
when (3_600...86_400) then "High" # under a day
when (86_400...604_800) then "Medium" # under a week
else "Low"
end
end
end
end
Use the median, not the mean. One deploy that batched three weeks of commits will blow out your average and hide what a normal deploy actually looks like.
The breakdown I look at first is not the aggregate lead time but where time is spent: coding time (first commit to PR open), review time (PR open to merge), and queue time (merge to production). Most Rails teams have normal coding and queue time and grotesque review time — pull requests sitting open for three days because nobody has context, everyone is busy, and there is no culture of treating code review as real-time work. Fixing that costs nothing and typically halves your lead time within a month.
Change Failure Rate: The Metric Nobody Wants to Look At
Change Failure Rate is the percentage of deployments that cause an incident, require a hotfix, or get rolled back. Teams undercount it systematically because they only log the big ones — the ones that wake someone up at 3 am. But the 3 am rollback is downstream of the change failure; the upstream event is a deploy where something quietly broke and a customer reported it the next morning. If it generated a hotfix PR, it counts.
Add a way to mark a deployment as failed after the fact. I do this via a separate webhook call from a rollback action or an incident slack command:
# app/controllers/webhooks/deployment_failures_controller.rb
class Webhooks::DeploymentFailuresController < ApplicationController
skip_before_action :verify_authenticity_token
before_action :authenticate_deploy_token!
def create
deployment = Deployment.find_by!(sha: params[:sha])
deployment.update!(
status: "rollback",
metadata: deployment.metadata.merge("failure_reason" => params[:reason])
)
Incident.create!(
deployment: deployment,
severity: params[:severity] || "p2",
source: "deploy",
detected_at: deployment.deployed_at
)
head :ok
end
private
def authenticate_deploy_token!
token = request.headers["Authorization"].to_s.delete_prefix("Bearer ")
head :unauthorized unless ActiveSupport::SecurityUtils.secure_compare(
token, Rails.application.credentials.deploy_webhook_token!
)
end
end
Change failure rate query:
module Dora
module ChangeFailureRate
def self.call(period: 30.days.ago..Time.current)
scope = Deployment.where(deployed_at: period)
total = scope.count
failures = scope.where(status: "rollback").count
return 0.0 if total.zero?
(failures.to_f / total * 100).round(1)
end
def self.band(rate)
case rate
when (0...5) then "Elite"
when (5...10) then "High"
when (10..15) then "Medium"
else "Low"
end
end
end
end
A 10% change failure rate sounds manageable until you do the maths. At weekly deploys that is five or six incidents per quarter, each requiring a hotfix cycle, post-mortem, and customer communication. At daily deploys it becomes your full-time job. The counterintuitive result from the Accelerate research — and I have seen this play out repeatedly on client teams — is that teams deploying more frequently have lower change failure rates, not higher. Smaller batches mean smaller blast radius. A two-file change is easier to review, easier to test, and faster to roll back than a 47-file feature that accumulated for three weeks.
Time to Restore Service: Measuring Resilience
Time to Restore Service (TTRS) is how long it takes to get the system back to normal after a failure. For a Rails team this usually means the interval from “first alert fires or first customer reports a problem” to “the fix or rollback shipped and monitors are green.” Track it in a separate incidents table because not every incident is caused by a deploy and you want the flexibility to record both kinds:
# db/migrate/20260630000002_create_incidents.rb
class CreateIncidents < ActiveRecord::Migration[8.0]
def change
create_table :incidents do |t|
t.references :deployment, foreign_key: true
t.string :severity, null: false, default: "p2" # p1, p2, p3
t.string :source, null: false, default: "manual" # deploy, alert, customer
t.datetime :detected_at, null: false
t.datetime :resolved_at
t.integer :ttrs_seconds
t.jsonb :metadata, null: false, default: {}
t.timestamps
end
add_index :incidents, :detected_at
add_index :incidents, :resolved_at
end
end
A before_save callback calculates TTRS when the incident is closed:
# app/models/incident.rb
class Incident < ApplicationRecord
belongs_to :deployment, optional: true
before_save :calculate_ttrs, if: -> { resolved_at_changed? && resolved_at.present? }
scope :p1, -> { where(severity: "p1") }
scope :resolved, -> { where.not(resolved_at: nil) }
private
def calculate_ttrs
self.ttrs_seconds = (resolved_at - detected_at).to_i
end
end
TTRS query:
module Dora
module TimeToRestore
def self.median_seconds(period: 30.days.ago..Time.current, severity: nil)
scope = Incident.where(detected_at: period).resolved
scope = scope.where(severity: severity) if severity
times = scope.order(:ttrs_seconds).pluck(:ttrs_seconds).compact
return nil if times.empty?
times[times.size / 2]
end
def self.band(seconds)
return "Unknown" if seconds.nil?
case seconds
when (0...3_600) then "Elite" # under an hour
when (0...86_400) then "High" # under a day
else "Low"
end
end
end
end
Track P1 incidents separately from P2 and P3. A four-hour TTRS on a P3 (minor visual bug) and a four-hour TTRS on a P1 (payments down) are very different stories. The DORA band applies to incidents that impair service for users; minor issues that get fixed in the next normal deploy do not count.
A Minimal DORA Dashboard in Rails
An admin controller that pulls all four metrics at once — four queries, all covered by the indexes, all returning in under 100 ms on a table with millions of rows:
# app/controllers/admin/dora_controller.rb
class Admin::DoraController < Admin::BaseController
PERIOD = 30.days.ago..Time.current
def show
df = Dora::DeploymentFrequency.deploys_per_day(period: PERIOD)
lt = Dora::LeadTime.median_seconds(period: PERIOD)
cfr = Dora::ChangeFailureRate.call(period: PERIOD)
ttr = Dora::TimeToRestore.median_seconds(period: PERIOD)
@metrics = [
{
name: "Deployment Frequency",
value: "#{df.round(2)} / day",
band: Dora::DeploymentFrequency.band(df)
},
{
name: "Lead Time for Changes",
value: lt ? ActiveSupport::Duration.build(lt).inspect : "n/a",
band: Dora::LeadTime.band(lt)
},
{
name: "Change Failure Rate",
value: "#{cfr}%",
band: Dora::ChangeFailureRate.band(cfr)
},
{
name: "Time to Restore",
value: ttr ? ActiveSupport::Duration.build(ttr).inspect : "n/a",
band: Dora::TimeToRestore.band(ttr)
}
]
end
end
In the view, colour the band column: green for Elite, teal for High, amber for Medium, red for Low. A table you can read at a glance in the weekly engineering meeting beats a custom analytics dashboard you maintain forever. I wired this up alongside OpenTelemetry tracing for a client recently — OpenTelemetry answers questions about your running system (latency, error rate, throughput), DORA answers questions about your delivery process. Together they cover the loop.
Benchmarks and What to Do With the Numbers
The four bands are a starting point, not a destination. Teams land in Low for different reasons and the fix is different each time.
A team with Low deployment frequency and Low change failure rate has usually over-engineered their approval gates. They are careful but slow, and “careful” is doing less safety work than they think because infrequent deploys concentrate risk. The first ten commits of a sprint might be fine; the forty-second commit — the one that ships two weeks later alongside everything else — is the one that breaks something nobody tested.
A team with High deployment frequency and a 25% change failure rate has the opposite problem. They ship fast, break things regularly, and their confidence is misplaced. The fix here is almost always testing coverage for the specific pathways that keep failing, not slowing down the deploys.
The lead time breakdown is what I look at first after pulling the baseline. Separate your lead time into three buckets: time from first commit to PR open, time from PR open to merge, and time from merge to production. Most Rails teams have reasonable coding time, a 10–20 minute CI pipeline, and a 48–72 hour review window because everyone is busy and nobody treats code review as urgent. That last number is the easiest to fix — a team agreement that PRs under 200 lines get same-day review typically cuts total lead time in half without changing anything else.
Change failure rate analysis should go deeper than “we had X rollbacks.” Which deployments fail? If 80% of your failures touch one specific domain model or one legacy service, that is a different problem from failures distributed evenly across the codebase. The technical due diligence framework I use with clients includes pulling this correlation from deploy and incident data as part of the first session.
One last thing the Accelerate research found, which still surprises teams when I show it to them: deploying more frequently improves change failure rate. The causality runs both ways. Teams that are forced to ship smaller pieces by a culture of daily deploys produce changes that are easier to review, easier to test, easier to roll back, and therefore less likely to fail. The shipping discipline is the safety mechanism.
Frequently Asked Questions About DORA Metrics for Rails Teams
How long does it take to get meaningful DORA data?
Thirty days gives you a baseline, ninety days gives you a trend. The first two weeks will be noisy — holidays, a big planned migration, a week where the team was all-hands-on-deck for a launch. Commit to measuring for a full quarter before drawing conclusions or setting targets.
Should we track DORA metrics per service or per team?
Both, but start with per team. If you have one Rails monolith and one cross-functional team, that is your unit. If you have multiple services or multiple sub-teams, track them separately and do not aggregate — a high failure rate in the payments service getting diluted by a low failure rate in the content service is exactly the kind of signal you want to see clearly.
What counts as a deployment for DORA purposes?
Any change that reaches the environment real users are on. Hotfixes count. Config changes pushed through your normal deploy pipeline count. Feature flags that enable new code paths count if the flag change itself goes through a deploy. Zero-downtime database migrations that ship alongside application code count. What does not count: changes to staging only, CMS content updates that bypass your deploy pipeline, or infra changes that do not touch your application code.
How do I handle teams that deploy multiple times per day?
Great problem to have. At multiple-per-day frequency, lead time for changes becomes the most informative metric because deployment frequency is already Elite and the constraint is somewhere in your workflow upstream of the deploy button. Break lead time down by commit author, by PR size, and by time of day. The bottleneck is almost always in review latency or in a CI step that nobody has optimised since it was added two years ago.
That CTO I mentioned at the start — 23-day lead time, 41% change failure rate — took the numbers back to his team and had the most productive engineering retrospective they had ever run. Six months later, lead time was under five days and change failure rate was below 10%. Not Elite. Not even High on every metric. But a team that knew the truth about itself and was moving in the right direction, which is more than most.
Need someone to instrument your DORA metrics and tell you honestly what they mean? TTB Software specialises in engineering assessment and fractional technical leadership for Rails teams at growth-stage companies. We’ve been doing this for nineteen years.
Related Articles
Rails Technical Due Diligence: A Fractional CTO Checklist for Acquirers and Investors
Rails technical due diligence checklist from a fractional CTO. What to audit before acquiring or investing in a Rails...
Build vs Buy Software: A Fractional CTO's Framework for Engineering Decisions
Build vs buy software decisions can sink a startup. A fractional CTO's framework for evaluating vendors, total cost, ...
Senior Rails Engineer Interview: A Fractional CTO's Hiring Rubric for 2026
Senior Rails Engineer Interview rubric from a fractional CTO with 19 years of Rails. The questions, scoring, signals ...