FRACTIONAL CTO · 25 MIN READ ·

DORA Metrics for Rails Teams: Tracking Deployment Frequency, Lead Time, and Change Failure Rate

DORA metrics for Rails teams: track deployment frequency, lead time, and change failure rate with GitHub Actions and ActiveRecord. Includes real code.

DORA Metrics for Rails Teams: Tracking Deployment Frequency, Lead Time, and Change Failure Rate

A few months ago I was brought in to assess a forty-person product team that was proud of their process. Two Jira boards covered in green ticks. Closed sprints everywhere. “We ship every two weeks,” the CTO said, with the confidence of someone who had just come back from a retro that went well.

Then I pulled the data.

Average time from first commit on a branch to that commit reaching production: twenty-three days. Change failure rate for the previous quarter: 41 percent. Mean time to restore after an incident: four hours and twelve minutes. They thought they were high performers. By DORA metrics they were solidly in the Low band, borderline Medium at best. The sprints were green because engineers were closing Jira tickets, not because software was shipping. Those are not the same thing.

The DORA metrics — Deployment Frequency, Lead Time for Changes, Change Failure Rate, and Time to Restore Service — come from the DORA research programme and were published by Nicole Forsgren, Jez Humble, and Gene Kim in “Accelerate” (2018). They are the most rigorously validated set of software delivery measurements we have. The research shows they correlate not only with delivery performance but with organisational outcomes: revenue growth, market share, and whether engineers stay at the company or leave to somewhere that ships. For every startup I work with as a fractional CTO, instrumenting these four numbers is one of the first things I do. Not because I plan to obsess over dashboards, but because you cannot have an honest conversation about an engineering team’s health until you know where it actually stands.

This post is the Rails implementation: the data model, the GitHub Actions wiring, the ActiveRecord queries for each metric, and the minimal admin dashboard that tells you in sixty seconds whether your team is improving or sliding.

What DORA Metrics Are (And Are Not)

DORA metrics are outcomes, not activities. They do not count commits, story points, lines of code, or test coverage percentage. They measure what the shipping machine produces: how often it delivers, how fast, how reliably, and how quickly it recovers. That distinction matters because activity metrics are easy to game. A team can produce a thousand commits a month and never ship anything a user sees. Outcome metrics describe what actually happened in production.

The four metrics and their performance bands, from the 2024 State of DevOps report:

Metric Elite High Medium Low
Deployment Frequency Multiple per day Once per day to once per week Once per week to once per month Less than once per month
Lead Time for Changes Under an hour Between one day and one week Between one week and one month Between one month and six months
Change Failure Rate Under 5% 5–10% 11–15% Over 15%
Time to Restore Service Under an hour Under a day Under a day Between a day and a week

Most early-stage Rails teams land in Medium on deployment frequency and Low on lead time. Change failure rate is the one nobody wants to look at because it requires admitting you have incidents. Start there anyway.

Deployment Frequency: Tracking Every Production Deploy

Deployment frequency is not “how often we merge pull requests.” It is how often code reaches the environment real users are on. On a Rails team using GitHub Actions for CI/CD, the cleanest approach is a webhook step at the end of your deploy workflow that notifies your own Rails app.

# .github/workflows/deploy.yml
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      # ... your build, test, and deploy steps ...

      - name: Record deployment
        if: success()
        run: |
          curl -s -X POST "$/webhooks/deployments" \
            -H "Authorization: Bearer $" \
            -H "Content-Type: application/json" \
            -d '{
              "sha":    "$",
              "ref":    "$",
              "actor":  "$",
              "run_id": "$"
            }'

      - name: Record rollback
        if: failure()
        run: |
          curl -s -X POST "$/webhooks/deployments" \
            -H "Authorization: Bearer $" \
            -H "Content-Type: application/json" \
            -d '{
              "sha":    "$",
              "ref":    "$",
              "actor":  "$",
              "run_id": "$",
              "status": "failed"
            }'

On the Rails side, a Deployment model stores each event:

# db/migrate/20260630000001_create_deployments.rb
class CreateDeployments < ActiveRecord::Migration[8.0]
  def change
    create_table :deployments do |t|
      t.string   :sha,           null: false
      t.string   :ref,           null: false, default: "main"
      t.string   :actor
      t.string   :github_run_id
      t.string   :status,        null: false, default: "success" # success, failed, rollback
      t.integer  :lead_time_seconds
      t.integer  :commit_count
      t.jsonb    :metadata,      null: false, default: {}
      t.datetime :deployed_at,   null: false
      t.timestamps
    end

    add_index :deployments, :deployed_at
    add_index :deployments, :sha, unique: true
    add_index :deployments, [:status, :deployed_at]
  end
end

The webhook controller authenticates with a bearer token from credentials and creates the record:

# app/controllers/webhooks/deployments_controller.rb
class Webhooks::DeploymentsController < ApplicationController
  skip_before_action :verify_authenticity_token
  before_action :authenticate_deploy_token!

  def create
    deployment = Deployment.create!(
      sha:           params[:sha],
      ref:           params[:ref],
      actor:         params[:actor],
      github_run_id: params[:run_id],
      status:        params[:status] || "success",
      deployed_at:   Time.current
    )

    DeploymentEnricherJob.perform_later(deployment.id)

    render json: { id: deployment.id }, status: :created
  rescue ActiveRecord::RecordNotUnique
    head :ok  # idempotent: same SHA already recorded
  end

  private

  def authenticate_deploy_token!
    token = request.headers["Authorization"].to_s.delete_prefix("Bearer ")
    head :unauthorized unless ActiveSupport::SecurityUtils.secure_compare(
      token, Rails.application.credentials.deploy_webhook_token!
    )
  end
end

Deployment frequency is then a single query:

# app/models/concerns/dora/deployment_frequency.rb
module Dora
  module DeploymentFrequency
    def self.deploys_per_day(period: 30.days.ago..Time.current)
      count = Deployment.where(deployed_at: period, status: "success").count
      days  = (period.end - period.begin) / 1.day
      count.to_f / days.to_f
    end

    def self.band(deploys_per_day)
      case deploys_per_day
      when (1.0..)        then "Elite"
      when (1.0 / 7..1.0) then "High"
      when (1.0 / 30..1.0 / 7) then "Medium"
      else "Low"
      end
    end
  end
end

Lead Time for Changes: Where the Real Bottlenecks Hide

Lead Time for Changes measures from the first commit of a change to that commit reaching production. Not “how long does a sprint take.” The wall-clock time from “an engineer wrote a line of code” to “a user can see it.” Long lead times almost always mean batch-releasing, long-lived branches, or a review process nobody believes in anymore.

To measure it, you need to know which commits went out in each deploy. A background job fetches this from the GitHub API after each deploy is recorded:

# app/jobs/deployment_enricher_job.rb
class DeploymentEnricherJob < ApplicationJob
  queue_as :default

  def perform(deployment_id)
    deployment   = Deployment.find(deployment_id)
    previous_sha = Deployment
      .where(deployed_at: ...deployment.deployed_at, status: "success")
      .order(deployed_at: :desc)
      .pick(:sha)

    return unless previous_sha

    commits = GithubClient.commits_between(
      base: previous_sha,
      head: deployment.sha
    )

    return if commits.empty?

    first_commit_at = commits.map { |c| c[:authored_at] }.min
    return unless first_commit_at

    deployment.update!(
      lead_time_seconds: (deployment.deployed_at - first_commit_at).to_i,
      commit_count:      commits.size,
      metadata:          deployment.metadata.merge(
                           "first_commit_sha" => commits.last[:sha],
                           "first_commit_at"  => first_commit_at.iso8601
                         )
    )
  end
end

The GitHub compare API does the heavy lifting:

# app/services/github_client.rb
class GithubClient
  REPO = Rails.application.credentials.dig(:github, :repo)
  TOKEN = Rails.application.credentials.dig(:github, :token)

  def self.commits_between(base:, head:)
    conn = Faraday.new("https://api.github.com") do |f|
      f.response :raise_error
      f.response :json
    end

    response = conn.get(
      "/repos/#{REPO}/compare/#{base}...#{head}",
      {},
      {
        "Authorization" => "Bearer #{TOKEN}",
        "Accept"        => "application/vnd.github+json",
        "X-GitHub-Api-Version" => "2022-11-28"
      }
    )

    response.body.fetch("commits", []).map do |c|
      {
        sha:         c["sha"],
        authored_at: Time.zone.parse(c.dig("commit", "author", "date"))
      }
    end
  rescue Faraday::Error => e
    Rails.logger.error("GithubClient#commits_between failed: #{e.message}")
    []
  end
end

Lead time queries:

module Dora
  module LeadTime
    def self.median_seconds(period: 30.days.ago..Time.current)
      times = Deployment
        .where(deployed_at: period, status: "success")
        .where.not(lead_time_seconds: nil)
        .order(:lead_time_seconds)
        .pluck(:lead_time_seconds)

      return nil if times.empty?
      times[times.size / 2]
    end

    def self.band(seconds)
      return "Unknown" if seconds.nil?
      case seconds
      when (0...3_600)       then "Elite"    # under an hour
      when (3_600...86_400)  then "High"     # under a day
      when (86_400...604_800) then "Medium"  # under a week
      else "Low"
      end
    end
  end
end

Use the median, not the mean. One deploy that batched three weeks of commits will blow out your average and hide what a normal deploy actually looks like.

The breakdown I look at first is not the aggregate lead time but where time is spent: coding time (first commit to PR open), review time (PR open to merge), and queue time (merge to production). Most Rails teams have normal coding and queue time and grotesque review time — pull requests sitting open for three days because nobody has context, everyone is busy, and there is no culture of treating code review as real-time work. Fixing that costs nothing and typically halves your lead time within a month.

Change Failure Rate: The Metric Nobody Wants to Look At

Change Failure Rate is the percentage of deployments that cause an incident, require a hotfix, or get rolled back. Teams undercount it systematically because they only log the big ones — the ones that wake someone up at 3 am. But the 3 am rollback is downstream of the change failure; the upstream event is a deploy where something quietly broke and a customer reported it the next morning. If it generated a hotfix PR, it counts.

Add a way to mark a deployment as failed after the fact. I do this via a separate webhook call from a rollback action or an incident slack command:

# app/controllers/webhooks/deployment_failures_controller.rb
class Webhooks::DeploymentFailuresController < ApplicationController
  skip_before_action :verify_authenticity_token
  before_action :authenticate_deploy_token!

  def create
    deployment = Deployment.find_by!(sha: params[:sha])
    deployment.update!(
      status:   "rollback",
      metadata: deployment.metadata.merge("failure_reason" => params[:reason])
    )

    Incident.create!(
      deployment:   deployment,
      severity:     params[:severity] || "p2",
      source:       "deploy",
      detected_at:  deployment.deployed_at
    )

    head :ok
  end

  private

  def authenticate_deploy_token!
    token = request.headers["Authorization"].to_s.delete_prefix("Bearer ")
    head :unauthorized unless ActiveSupport::SecurityUtils.secure_compare(
      token, Rails.application.credentials.deploy_webhook_token!
    )
  end
end

Change failure rate query:

module Dora
  module ChangeFailureRate
    def self.call(period: 30.days.ago..Time.current)
      scope    = Deployment.where(deployed_at: period)
      total    = scope.count
      failures = scope.where(status: "rollback").count
      return 0.0 if total.zero?
      (failures.to_f / total * 100).round(1)
    end

    def self.band(rate)
      case rate
      when (0...5)  then "Elite"
      when (5...10) then "High"
      when (10..15) then "Medium"
      else "Low"
      end
    end
  end
end

A 10% change failure rate sounds manageable until you do the maths. At weekly deploys that is five or six incidents per quarter, each requiring a hotfix cycle, post-mortem, and customer communication. At daily deploys it becomes your full-time job. The counterintuitive result from the Accelerate research — and I have seen this play out repeatedly on client teams — is that teams deploying more frequently have lower change failure rates, not higher. Smaller batches mean smaller blast radius. A two-file change is easier to review, easier to test, and faster to roll back than a 47-file feature that accumulated for three weeks.

Time to Restore Service: Measuring Resilience

Time to Restore Service (TTRS) is how long it takes to get the system back to normal after a failure. For a Rails team this usually means the interval from “first alert fires or first customer reports a problem” to “the fix or rollback shipped and monitors are green.” Track it in a separate incidents table because not every incident is caused by a deploy and you want the flexibility to record both kinds:

# db/migrate/20260630000002_create_incidents.rb
class CreateIncidents < ActiveRecord::Migration[8.0]
  def change
    create_table :incidents do |t|
      t.references :deployment, foreign_key: true
      t.string   :severity,    null: false, default: "p2"     # p1, p2, p3
      t.string   :source,      null: false, default: "manual" # deploy, alert, customer
      t.datetime :detected_at, null: false
      t.datetime :resolved_at
      t.integer  :ttrs_seconds
      t.jsonb    :metadata,    null: false, default: {}
      t.timestamps
    end

    add_index :incidents, :detected_at
    add_index :incidents, :resolved_at
  end
end

A before_save callback calculates TTRS when the incident is closed:

# app/models/incident.rb
class Incident < ApplicationRecord
  belongs_to :deployment, optional: true

  before_save :calculate_ttrs, if: -> { resolved_at_changed? && resolved_at.present? }

  scope :p1, -> { where(severity: "p1") }
  scope :resolved, -> { where.not(resolved_at: nil) }

  private

  def calculate_ttrs
    self.ttrs_seconds = (resolved_at - detected_at).to_i
  end
end

TTRS query:

module Dora
  module TimeToRestore
    def self.median_seconds(period: 30.days.ago..Time.current, severity: nil)
      scope = Incident.where(detected_at: period).resolved
      scope = scope.where(severity: severity) if severity

      times = scope.order(:ttrs_seconds).pluck(:ttrs_seconds).compact
      return nil if times.empty?
      times[times.size / 2]
    end

    def self.band(seconds)
      return "Unknown" if seconds.nil?
      case seconds
      when (0...3_600)  then "Elite"  # under an hour
      when (0...86_400) then "High"   # under a day
      else "Low"
      end
    end
  end
end

Track P1 incidents separately from P2 and P3. A four-hour TTRS on a P3 (minor visual bug) and a four-hour TTRS on a P1 (payments down) are very different stories. The DORA band applies to incidents that impair service for users; minor issues that get fixed in the next normal deploy do not count.

A Minimal DORA Dashboard in Rails

An admin controller that pulls all four metrics at once — four queries, all covered by the indexes, all returning in under 100 ms on a table with millions of rows:

# app/controllers/admin/dora_controller.rb
class Admin::DoraController < Admin::BaseController
  PERIOD = 30.days.ago..Time.current

  def show
    df  = Dora::DeploymentFrequency.deploys_per_day(period: PERIOD)
    lt  = Dora::LeadTime.median_seconds(period: PERIOD)
    cfr = Dora::ChangeFailureRate.call(period: PERIOD)
    ttr = Dora::TimeToRestore.median_seconds(period: PERIOD)

    @metrics = [
      {
        name:  "Deployment Frequency",
        value: "#{df.round(2)} / day",
        band:  Dora::DeploymentFrequency.band(df)
      },
      {
        name:  "Lead Time for Changes",
        value: lt ? ActiveSupport::Duration.build(lt).inspect : "n/a",
        band:  Dora::LeadTime.band(lt)
      },
      {
        name:  "Change Failure Rate",
        value: "#{cfr}%",
        band:  Dora::ChangeFailureRate.band(cfr)
      },
      {
        name:  "Time to Restore",
        value: ttr ? ActiveSupport::Duration.build(ttr).inspect : "n/a",
        band:  Dora::TimeToRestore.band(ttr)
      }
    ]
  end
end

In the view, colour the band column: green for Elite, teal for High, amber for Medium, red for Low. A table you can read at a glance in the weekly engineering meeting beats a custom analytics dashboard you maintain forever. I wired this up alongside OpenTelemetry tracing for a client recently — OpenTelemetry answers questions about your running system (latency, error rate, throughput), DORA answers questions about your delivery process. Together they cover the loop.

Benchmarks and What to Do With the Numbers

The four bands are a starting point, not a destination. Teams land in Low for different reasons and the fix is different each time.

A team with Low deployment frequency and Low change failure rate has usually over-engineered their approval gates. They are careful but slow, and “careful” is doing less safety work than they think because infrequent deploys concentrate risk. The first ten commits of a sprint might be fine; the forty-second commit — the one that ships two weeks later alongside everything else — is the one that breaks something nobody tested.

A team with High deployment frequency and a 25% change failure rate has the opposite problem. They ship fast, break things regularly, and their confidence is misplaced. The fix here is almost always testing coverage for the specific pathways that keep failing, not slowing down the deploys.

The lead time breakdown is what I look at first after pulling the baseline. Separate your lead time into three buckets: time from first commit to PR open, time from PR open to merge, and time from merge to production. Most Rails teams have reasonable coding time, a 10–20 minute CI pipeline, and a 48–72 hour review window because everyone is busy and nobody treats code review as urgent. That last number is the easiest to fix — a team agreement that PRs under 200 lines get same-day review typically cuts total lead time in half without changing anything else.

Change failure rate analysis should go deeper than “we had X rollbacks.” Which deployments fail? If 80% of your failures touch one specific domain model or one legacy service, that is a different problem from failures distributed evenly across the codebase. The technical due diligence framework I use with clients includes pulling this correlation from deploy and incident data as part of the first session.

One last thing the Accelerate research found, which still surprises teams when I show it to them: deploying more frequently improves change failure rate. The causality runs both ways. Teams that are forced to ship smaller pieces by a culture of daily deploys produce changes that are easier to review, easier to test, easier to roll back, and therefore less likely to fail. The shipping discipline is the safety mechanism.

Frequently Asked Questions About DORA Metrics for Rails Teams

How long does it take to get meaningful DORA data?

Thirty days gives you a baseline, ninety days gives you a trend. The first two weeks will be noisy — holidays, a big planned migration, a week where the team was all-hands-on-deck for a launch. Commit to measuring for a full quarter before drawing conclusions or setting targets.

Should we track DORA metrics per service or per team?

Both, but start with per team. If you have one Rails monolith and one cross-functional team, that is your unit. If you have multiple services or multiple sub-teams, track them separately and do not aggregate — a high failure rate in the payments service getting diluted by a low failure rate in the content service is exactly the kind of signal you want to see clearly.

What counts as a deployment for DORA purposes?

Any change that reaches the environment real users are on. Hotfixes count. Config changes pushed through your normal deploy pipeline count. Feature flags that enable new code paths count if the flag change itself goes through a deploy. Zero-downtime database migrations that ship alongside application code count. What does not count: changes to staging only, CMS content updates that bypass your deploy pipeline, or infra changes that do not touch your application code.

How do I handle teams that deploy multiple times per day?

Great problem to have. At multiple-per-day frequency, lead time for changes becomes the most informative metric because deployment frequency is already Elite and the constraint is somewhere in your workflow upstream of the deploy button. Break lead time down by commit author, by PR size, and by time of day. The bottleneck is almost always in review latency or in a CI step that nobody has optimised since it was added two years ago.

That CTO I mentioned at the start — 23-day lead time, 41% change failure rate — took the numbers back to his team and had the most productive engineering retrospective they had ever run. Six months later, lead time was under five days and change failure rate was below 10%. Not Elite. Not even High on every metric. But a team that knew the truth about itself and was moving in the right direction, which is more than most.

Need someone to instrument your DORA metrics and tell you honestly what they mean? TTB Software specialises in engineering assessment and fractional technical leadership for Rails teams at growth-stage companies. We’ve been doing this for nineteen years.

#dora-metrics-rails #deployment-frequency-tracking #lead-time-for-changes #change-failure-rate #engineering-metrics #github-actions-deployment-tracking #rails-team-performance

Related Articles

Last section. Then please call.

It's a phone call. That's the worst it can get.

No discovery deck. No 45-minute "qualification" call. 30 minutes, your problem, my opinion. If we're a fit, you'll know by minute 12.

Direct line — answered by Roger
+31 6 5123 6132
Mon–Fri, 09:00–18:00 CET · Currently available

OR
info@ttb.software