The DORA Four Keys have become the definitive framework for measuring software delivery performance. But most VP and Directors of Engineering face a common problem: they want to implement DORA metrics, but they lack access to comprehensive CI/CD telemetry, their deployment pipelines are fragmented across multiple tools, or their incident tracking is manual and inconsistent. Can you still measure the Four Keys accurately?
Yes. This guide shows you exactly how to implement DORA metrics using only GitHub data as your source of truth. You'll learn how each metric maps to observable Git activity, which proxies work (and which don't), and how to benchmark your performance against industry standards—all without instrumenting a single deployment pipeline.
What Are the DORA Four Keys (And Why Google Invested in Them)
The DORA (DevOps Research and Assessment) program began in 2014 as an academic research initiative to identify what separates high-performing engineering organizations from low performers. After analyzing data from over 36,000 engineering professionals across multiple years, the research team (later acquired by Google in 2018) identified four key metrics that reliably predict organizational performance.
Why Google Cares About DORA
Google's acquisition of the DORA team wasn't academic curiosity—it was strategic. The Four Keys directly correlate with business outcomes that matter to any engineering organization:
- 2x higher organizational performance: Teams in the elite DORA category are twice as likely to meet or exceed their organizational performance goals
- 50% less burnout: High-performing teams report significantly lower rates of burnout and fatigue
- Higher employee retention: Teams with strong DORA metrics have measurably better retention rates
- Faster innovation: Elite performers can experiment and iterate at significantly higher velocity
The Four Keys matter because they measure outcomes (how fast you deliver value, how reliable your delivery is) rather than outputs (how many commits you make, how many hours you work). This makes them uniquely valuable for engineering leadership.
The Four Key Metrics
The DORA framework measures software delivery performance across two dimensions:
- Throughput (Speed): How fast can you deliver changes?
- Deployment Frequency: How often do you release to production?
- Lead Time for Changes: How long from commit to production?
- Stability (Quality): How reliable are your changes?
- Change Failure Rate: What percentage of changes cause incidents?
- Time to Restore Service: How quickly can you recover from failures?
The genius of the framework is this balance: you can't optimize for speed alone (you'll ship broken code) or stability alone (you'll never ship anything). Elite teams excel at both simultaneously.
🎯 Why These Four Metrics?
The DORA researchers tested hundreds of potential metrics and found that these four had the strongest correlation with overall organizational performance. More importantly, they're predictive: improving these metrics leads to improved business outcomes, not just better engineering stats.
Other metrics you might track—like code coverage, story points, or lines of code—don't have this predictive power. The Four Keys do.
How Each Key Metric Maps to GitHub Data
The traditional DORA implementation requires comprehensive instrumentation: deployment webhooks, incident management integrations, CI/CD telemetry. But for teams using GitHub as their source of truth for development activity, you can measure all four metrics with surprising accuracy using only Git data.
Deployment Frequency from GitHub
DORA Definition: How often your organization successfully releases to production (or releases to end users for on-demand software).
GitHub Proxy: PRs merged to your main/production branch per working day.
This proxy works when your development workflow follows one of these patterns:
- Trunk-based development: Every merge to
maintriggers an automated deployment to production - Continuous deployment: Merges to
maindeploy within minutes via automated pipelines - Daily/weekly release trains: Merges to
maindeploy in batches on a predictable schedule
Accuracy caveat: If you merge to main but manually gate deployments (e.g., merge on Wednesday, deploy on Friday), this proxy will overestimate your deployment frequency. In that case, consider tracking release tags instead.
# Example: Calculating Deployment Frequency from Git # Count merges to main in the last 30 days git log --merges --first-parent main --since="30 days ago" --format="%ad" | wc -l # Divide by working days (30 days ≈ 21 working days) # If result = 42 merges → 42/21 = 2 deploys per working day (High Performer)
Lead Time for Changes from GitHub
DORA Definition: The amount of time it takes a commit to get into production. Specifically: time from when code is committed to version control to when it's running in production.
GitHub Proxy: Time from first commit on a branch to PR merged tomain. For a more actionable metric, use cycle time: PR opened to merged.
Why cycle time is often better than commit-to-merge lead time:
- First commit is often experimental: Developers may commit locally multiple times while exploring a solution before the "real" work begins
- PR creation signals readiness: When a developer opens a PR, they're saying "this is ready for review and potential merge"—that's your true start time
- Measurable bottlenecks: Cycle time lets you see exactly where delays happen (waiting for review, review duration, CI failures, merge conflicts)
CodePulse tracks both variants and breaks cycle time into four actionable components. For detailed analysis, see our Deployment Frequency and Lead Time Guide.
# Example: Calculating Lead Time from GitHub API # For each merged PR: first_commit_time = pr.commits[0].created_at merge_time = pr.merged_at lead_time = merge_time - first_commit_time # Aggregate to median (more robust than mean for skewed distributions) median_lead_time_hours = median(all_lead_times) # Example result: Median = 18 hours → High Performer
Change Failure Rate from GitHub
DORA Definition: The percentage of changes to production that result in degraded service or require remediation (e.g., lead to service impairment or outage, require a hotfix, rollback, fix forward, or patch).
GitHub Proxy (Option 1): Revert commit ratio—count commits that revert previous changes as a proxy for production failures.
# Example: Detecting Reverts in Git # Find revert commits (usually have "Revert" or "revert" in message) git log --grep="Revert" --grep="revert" -i --oneline --since="30 days ago" # Calculate ratio total_commits = 120 revert_commits = 8 change_failure_rate = (8 / 120) * 100 = 6.7% → Elite Performer
GitHub Proxy (Option 2): Hotfix branch frequency—if your team uses dedicated hotfix branches for production issues, count hotfix PRs merged as a proxy for failures.
GitHub Proxy (Option 3): PRs with failing status checks that were eventually merged—indicates code that passed review but failed automated testing, suggesting quality issues.
The hard truth: Change failure rate is the hardest DORA metric to measure accurately from Git alone. For true accuracy, you need incident management data (PagerDuty, Opsgenie, etc.) cross-referenced with deployments. If you track incidents in GitHub Issues with specific labels (incident, production-bug), you can correlate those with recent merges.
Time to Restore Service from GitHub
DORA Definition: How long it generally takes to restore service when a service incident occurs (e.g., unplanned outage, service impairment).
GitHub Proxy (Option 1): For teams that create GitHub Issues for incidents, measure time from issue creation (with incident label) to issue closure.
GitHub Proxy (Option 2): Time from revert commit to fix commit merged. This captures "we broke production, reverted it, and then fixed the underlying issue."
# Example: Measuring MTTR from GitHub Issues # For each incident issue: incident_created = issue.created_at incident_resolved = issue.closed_at time_to_restore = incident_resolved - incident_created # Aggregate to median median_mttr_hours = median(all_mttr_values) # Example result: Median = 2.5 hours → High Performer
The hard truth (again): Like change failure rate, Time to Restore Service is difficult to measure accurately from GitHub alone. Most engineering organizations need to integrate incident management tooling or at minimum maintain a rigorous practice of tracking incidents as GitHub Issues.
💡 Start with What You Can Measure
If you can't accurately measure all four keys immediately, start with the two throughput metrics (Deployment Frequency and Lead Time). These are straightforward to calculate from GitHub data and provide immediate value.
Add the two stability metrics (Change Failure Rate and Time to Restore) once you've established better incident tracking practices. Imperfect data for all four is better than perfect data for only two, but don't let perfect be the enemy of good.
Setting Up Measurement Without CI/CD Access
Many VPs and Directors of Engineering don't control their CI/CD infrastructure—DevOps or Platform teams do. Here's how to implement DORA metrics when you can't instrument your deployment pipelines.
Phase 1: Establish Your Development Workflow Patterns
Before you can choose the right proxies, you need to understand your actual workflow:
- Map merge-to-deploy relationship: When a PR merges to
main, how long until it's in production?- Immediate (automated CD) → Use merge timestamps directly
- Within hours (manual trigger) → Add average delay to merge timestamps
- Batched weekly → Use release tag timestamps instead of merge timestamps
- Identify production branches: Does
mainrepresent production? Or do you have a separateproductionorreleasebranch? - Document release patterns: Do you deploy continuously, on schedules (daily/weekly), or ad-hoc?
This audit takes 1-2 hours and is essential for choosing accurate proxies. Interview 2-3 developers and your DevOps lead to understand the real workflow.
Phase 2: Connect GitHub as Your Single Source of Truth
Once you understand your workflow, configure your metrics tool to extract DORA data from GitHub:
- Deployment Frequency: Count PRs merged to your production branch per working day
- Lead Time: Calculate median time from PR creation to merge (cycle time)
- Change Failure Rate: Count revert commits or hotfix PRs as percentage of total merges
- Time to Restore: Track incident issues (if using GitHub Issues) or measure revert-to-fix time
Tools like CodePulse automatically extract these metrics from your GitHub data without requiring any CI/CD instrumentation. For manual tracking, you can query the GitHub API:
# Example: GitHub API query for merged PRs (Deployment Frequency)
GET /repos/{owner}/{repo}/pulls?state=closed&base=main&per_page=100
# Filter results where merged_at is not null
# Count PRs merged in last 30 days
# Divide by working days (typically 21-22 per 30-day period)Phase 3: Establish Baseline Performance
Before setting targets or making changes, collect 30-90 days of baseline data. This tells you where you currently stand:
DORA Baseline Performance
30-day snapshotThis baseline becomes your "before" snapshot. All improvement initiatives will be measured against this baseline.
Phase 4: Create Your DORA Dashboard
Present your DORA metrics in a format that's accessible to both engineering teams and executives. Your dashboard should show:
- Current values: This week/month's performance for all four metrics
- Trend indicators: Are metrics improving or regressing vs. previous period?
- Performance classification: Elite, High, Medium, or Low according to DORA benchmarks
- Historical trends: 90-day rolling view showing improvement over time
- Team breakdowns: Compare metrics across different teams/repositories
For dashboard design best practices, see our Engineering Metrics Dashboard Guide.
Benchmarking Against Industry Standards
The DORA research team established performance benchmarks based on surveying tens of thousands of engineering teams. Here's how to interpret where your organization falls.
Complete DORA Performance Benchmark Table
| Metric | Elite | High | Medium | Low |
|---|---|---|---|---|
| Deployment Frequency | On-demand (multiple deploys per day) | Between once per day and once per week | Between once per week and once per month | Fewer than once per month |
| Lead Time for Changes | Less than one hour | Between one day and one week | Between one week and one month | More than one month |
| Change Failure Rate | 0-15% | 16-30% | 16-30% | More than 30% |
| Time to Restore Service | Less than one hour | Less than one day | Between one day and one week | More than one week |
Understanding Your Performance Profile
Most organizations don't fall cleanly into one category—you might be Elite at some metrics and Medium at others. Common patterns:
"Fast but fragile" teams: Elite deployment frequency and lead time, but Medium/Low change failure rate and time to restore. These teams ship quickly but break production frequently. Focus: Quality gates, testing practices, staged rollouts.
"Slow but stable" teams: Elite change failure rate and time to restore, but Medium/Low deployment frequency and lead time. These teams are careful but slow. Focus: Automation, reducing batch sizes, improving cycle time.
"Unbalanced" teams: High performers on throughput metrics but haven't invested in incident response. Elite deployment frequency and lead time, but no visibility into stability metrics. Focus: Implement incident tracking, establish on-call practices.
Comparing Against Industry Peers
DORA performance varies significantly by industry and company stage:
- Startups/Scale-ups: Typically achieve Elite/High throughput metrics (fast shipping) but Medium stability metrics (breaking things is acceptable when moving fast)
- Enterprise: Often Medium/High across all metrics—slower but more stable than startups
- Highly regulated (fintech, healthcare): Often Low/Medium throughput metrics due to compliance requirements, but Elite stability metrics (failures are not acceptable)
Don't compare your healthcare startup to a consumer SaaS company. Compare against companies with similar regulatory requirements and business constraints. For industry- specific guidance, see our guides on Engineering Metrics for Fintech and Engineering Metrics for Healthcare.
Common Implementation Pitfalls
After helping hundreds of engineering teams implement DORA metrics, we've seen the same mistakes repeatedly. Here's how to avoid them.
Pitfall 1: Measuring Without Context
The mistake: Tracking metrics in isolation without understanding what drives them or what trade-offs you're making.
Example: Your deployment frequency drops from 3/day to 1/day. Is that a problem? Maybe—or maybe you just finished a major migration that required larger, more careful changes.
The fix: Always annotate your metrics with context:
- Major releases or refactoring efforts
- Team size changes (people joining/leaving)
- Seasonal effects (holidays, end-of-year code freezes)
- Deliberate process changes (new review requirements, security audits)
Pitfall 2: Gaming the Metrics
The mistake: Using DORA metrics for individual performance evaluation, which incentivizes gaming behavior.
Example: If deployment frequency is tied to bonuses, developers will split every change into tiny PRs to inflate their merge count—even when larger PRs would be more appropriate.
The fix: Use DORA metrics only at the team/organization level. Never use them for individual performance reviews. Track individual contributions separately through qualitative feedback, peer reviews, and project impact.
Pitfall 3: Optimizing One Metric at the Expense of Others
The mistake: Focusing obsessively on one metric (usually deployment frequency) while ignoring the others.
Example: Pushing for Elite deployment frequency by removing code review requirements and skipping tests. Deployment frequency improves, but change failure rate skyrockets.
The fix: Track all four metrics together. Elite performers excel atboth throughput and stability simultaneously. If improving one metric degrades another, you're not actually improving—you're just shifting where the problem shows up.
Pitfall 4: Comparing Teams Without Adjusting for Context
The mistake: Creating team leaderboards or directly comparing teams with different missions and constraints.
Example: Comparing your infrastructure team's deployment frequency (0.5/day) to your web frontend team's deployment frequency (5/day) and concluding the infrastructure team is "underperforming."
The fix: Compare teams against their own historical performance, not against each other. Platform teams, infrastructure teams, and security teams naturally have different DORA profiles than application teams. Judge each team by improvement, not absolute values.
Pitfall 5: Treating Elite Performance as the Only Goal
The mistake: Assuming every team must reach Elite status on all four metrics to be successful.
Example: A team moves from Low to Medium performance—a massive improvement—but leadership still sees them as "failing" because they're not Elite.
The fix: Celebrate improvement, not just absolute performance. The research shows that improving your metrics correlates with business outcomes— you don't need Elite status to benefit. A team that moves from Medium to High sees real benefits even if they never reach Elite.
Pitfall 6: Implementing Without Buy-In
The mistake: Rolling out DORA metrics as a top-down mandate without explaining why they matter or how they'll be used.
Example: Engineering teams see DORA metrics as "another management dashboard" and don't engage with them, or worse, actively try to game them.
The fix: Implement DORA metrics collaboratively:
- Explain why these specific metrics correlate with team success and reduced burnout
- Show how the metrics will be used (improvement tracking, not individual evaluation)
- Involve team leads in defining baselines and setting realistic targets
- Make metrics transparent—everyone should see the same dashboard you see
For a structured rollout approach, see our Engineering Metrics Rollout Playbook.
From Measurement to Improvement
Measuring DORA metrics is only valuable if you use that data to drive actual improvement. Here's a practical framework for turning metrics into better performance.
Step 1: Identify Your Biggest Constraint
Don't try to improve all four metrics simultaneously. Use your baseline data to identify which metric is your biggest bottleneck:
- If deployment frequency is Low/Medium: You're not shipping fast enough. Focus on reducing cycle time and batch sizes.
- If lead time is Low/Medium: You have process bottlenecks. Focus on identifying and removing delays (review time, CI time, approval processes).
- If change failure rate is Low/Medium: You're shipping broken code. Focus on quality gates (testing, review rigor, staged rollouts).
- If time to restore is Low/Medium: You can't recover from failures quickly. Focus on incident response (monitoring, runbooks, rollback procedures).
Step 2: Set a 90-Day Improvement Target
Choose realistic, achievable targets for your constraint metric:
- Realistic improvement rate: 20-30% improvement per quarter is achievable. 100% improvement requires fundamental process changes.
- Example targets:
- Deployment frequency: 1.2/day → 1.8/day (50% increase)
- Lead time: 48 hours → 32 hours (33% reduction)
- Change failure rate: 22% → 16% (move from Medium to High)
- Time to restore: 6 hours → 4 hours (move from Medium to High)
Step 3: Identify Specific Interventions
For each target, identify 2-3 specific changes you'll make to drive improvement:
To improve Deployment Frequency:
- Reduce PR size (set 400-line soft limit)
- Implement feature flags to allow shipping incomplete features
- Automate deployment triggers (remove manual gates)
- Encourage parallel work (multiple PRs in flight per developer)
To improve Lead Time:
- Balance review load across more team members (no single bottleneck reviewer)
- Set SLA for first review (PRs reviewed within 4 business hours)
- Parallelize CI pipelines to reduce test wait time
- Enable auto-merge for approved PRs (remove manual merge step)
To improve Change Failure Rate:
- Increase test coverage for critical paths
- Implement staged rollouts (canary to 5% → 50% → 100%)
- Require two approvals for high-risk changes
- Add pre-merge integration testing in production-like environments
To improve Time to Restore:
- Implement automated rollback procedures
- Create incident runbooks for common failure modes
- Improve observability (add monitoring for key user journeys)
- Practice incident response with quarterly game days
Step 4: Measure Weekly, Adjust Monthly
Track your target metric weekly to see if your interventions are working:
- Week 1-2: Implement changes, expect metrics to dip slightly as team adjusts to new processes
- Week 3-6: Metrics should start improving as new practices take hold
- Week 7-12: Metrics should show sustained improvement; if not, adjust interventions
If you're not seeing improvement by week 8, hold a retrospective to understand why:
- Are people actually following the new process?
- Did we identify the right bottleneck?
- Are there hidden dependencies we didn't account for?
Step 5: Sustain and Expand
Once you've improved your constraint metric, celebrate the win and then move to the next constraint:
- Quarter 1: Improve deployment frequency from Low to Medium
- Quarter 2: Improve lead time from Medium to High
- Quarter 3: Improve change failure rate from Medium to High
- Quarter 4: Improve time to restore from Medium to Elite
This iterative approach prevents overwhelm and ensures each improvement becomes embedded before moving to the next challenge.
🚀 Track DORA Metrics in CodePulse
CodePulse automatically calculates all four DORA metrics from your GitHub data:
- Deployment Frequency: PRs merged to main per working day
- Lead Time: PR cycle time with component breakdown
- Change Failure Rate: Revert ratio and hotfix frequency
- Time to Restore: Incident issue duration
All metrics include trend analysis, performance classification, and team-level breakdowns—no CI/CD instrumentation required.
Reporting to Executives and Boards
When presenting DORA metrics to non-technical stakeholders, focus on business impact:
- Instead of: "We improved our deployment frequency from 1.2 to 2.1 per day"
- Say: "We've reduced time-to-market for customer-requested features by 40%, allowing us to respond to competitive threats within days instead of weeks"
- Instead of: "Our change failure rate decreased from 22% to 14%"
- Say: "We've reduced production incidents by 35%, improving customer experience and reducing on-call burden on the team"
For comprehensive guidance on board-ready reporting, see our Board-Ready Engineering Metrics Guide.
Final Thoughts: DORA Metrics as a Foundation
The Four Keys aren't the only metrics you should track, but they provide a solid foundation for understanding software delivery performance. Once you've implemented DORA metrics and established a baseline, you can expand to track additional dimensions:
- Team health metrics (review load distribution, burnout signals)
- Code quality metrics (test coverage, technical debt)
- Individual growth metrics (for mentoring, not performance evaluation)
- Business alignment metrics (feature adoption, customer impact)
But start with DORA. These four metrics, measured consistently over time, will give you the visibility and improvement framework your engineering organization needs to level up.
See these insights for your team
CodePulse connects to your GitHub and shows you actionable engineering metrics in minutes. No complex setup required.
Free tier available. No credit card required.
Related Guides
DORA Metrics Are Being Weaponized. Here's the Fix
DORA metrics were designed for research, not management. Learn how to use them correctly as signals for improvement, not targets to game.
Measuring Deploy Frequency Without CI/CD (The Hack That Works)
Master DORA deployment frequency and lead time using GitHub data alone, without requiring CI/CD pipeline access.