Skip to main content
All Guides
Tools & Comparisons

The Slack Alert That Catches Stuck PRs Before Standup

How to configure alerts for stuck PRs, review SLA breaches, and key metric changes to stay informed without constant dashboard checking.

9 min readUpdated January 15, 2025By CodePulse Team

Dashboards are great for analysis, but they require you to look at them. Alerts bring the insights to you—notifying you when PRs are stuck, SLAs are breached, or metrics drift outside acceptable ranges. Done well, alerts let you stay informed without constantly checking dashboards.

This guide covers how to set up effective alerts for engineering metrics, including what to alert on, how to configure thresholds, how to avoid the dreaded alert fatigue, and how to build an alert system that actually drives action.

Our Take

Most engineering teams have more alerts than they need and fewer alerts that matter. If your team ignores alerts, you don't have an "alert fatigue" problem—you have a "bad alerts" problem. Every alert should pass this test: "If this fires at 3 AM, would I be glad we set it up?" If the answer is no, delete it.

Why Real-Time Alerts Beat Reports

The Problem with Dashboard-Only Visibility

Dashboards are valuable, but they have limitations:

  • Requires active checking: If you don't look, you don't know
  • Delayed awareness: Issues fester until someone happens to check
  • Context switching: You have to stop what you're doing to look
  • Information overload: Dashboards show everything, making it hard to spot what matters

"The best alert is one that tells you something you didn't know, at a time when you can still do something about it."

The Value of Proactive Alerts

Alerts solve these problems by:

  • Pushing information to you: No need to remember to check
  • Catching issues early: Immediate notification when something goes wrong
  • Filtering signal from noise: Only alert on what exceeds thresholds
  • Enabling faster response: Minutes instead of hours or days

Alerts vs. Reports: When to Use Each

Use Real-Time Alerts for:
  - Threshold breaches (PR stuck > 24 hours)
  - Anomalies (cycle time spiked this week)
  - Process violations (PR merged without approval)
  - Time-sensitive issues (Friday deploy with no reviewer)
  - Individual PR states that need immediate attention

Use Daily Digests for:
  - Aggregate metrics (5 PRs are stuck today)
  - Trend summaries (cycle time up 15% this week)
  - Team health snapshots (review load imbalance)
  - Non-urgent patterns worth noting

Use Weekly Reports for:
  - Trends over time (cycle time month-over-month)
  - Comparative analysis (team A vs team B)
  - Executive summaries (weekly health scorecard)
  - Deep dives (why did cycle time increase?)
Identify bottlenecks slowing your team with CodePulse

The Alert Fatigue Problem

How Alert Fatigue Kills Your System

Alert fatigue is the silent killer of engineering alerting systems. It happens gradually: you start with a few important alerts, add more over time, and suddenly your team ignores them all.

"An alert that's ignored is worse than no alert at all. It creates the illusion of monitoring while providing none of the benefit."

Signs your team has alert fatigue:

  • Alerts go unacknowledged for hours or days
  • Team members mute or filter alert channels
  • "It's probably nothing" attitude toward new alerts
  • Real issues missed because alerts are ignored
  • New team members ask "do we actually look at these?"
  • Alert channels have hundreds of unread messages

Our Take

If your team has more than 10 alerts firing per day on average, you have too many. The math is simple: 10 alerts/day means each one gets less than 5 minutes of attention on average. That's not alerting—that's noise generation. Cut ruthlessly until every alert feels important.

The Psychology of Alert Fatigue

Understanding why alert fatigue happens helps you prevent it:

  • Cry wolf effect: After enough false positives, the brain learns to dismiss all alerts from that source
  • Cognitive overload: More than 3-5 pieces of information at once overwhelms working memory
  • Decision fatigue: Each alert requires a decision; too many depletes willpower
  • Learned helplessness: If alerts can't be acted on, people stop trying

Alert Prioritization Framework

The P0-P3 Priority System

Not all alerts deserve the same response. Implement a clear priority system that everyone understands:

P0 - CRITICAL (Immediate Response Required)
├── Definition: Production impact or security risk
├── Response time: Minutes
├── Notification: Phone call, SMS, @here in dedicated channel
├── Engineering examples:
│   ├── Merge to main with failing security scan
│   ├── PR merged without any approval to protected branch
│   └── Suspected credential leak in committed code
└── Owner: On-call engineer

P1 - HIGH (Same Business Day)
├── Definition: Process violation or significant blocker
├── Response time: 2-4 hours
├── Notification: Direct Slack message + channel post
├── Engineering examples:
│   ├── PR stuck in review > 48 hours
│   ├── SLA breach on critical path work
│   └── Test failure rate spike > 50%
└── Owner: Team lead or PR author

P2 - MEDIUM (Next Business Day)
├── Definition: Needs attention but not urgent
├── Response time: 24-48 hours
├── Notification: Team channel only
├── Engineering examples:
│   ├── PR awaiting review > 24 hours
│   ├── Cycle time trending up week-over-week
│   └── Review coverage dropped below target
└── Owner: Team or assigned reviewer

P3 - LOW (Weekly Review)
├── Definition: Informational, track for trends
├── Response time: Within a week
├── Notification: Daily/weekly digest only
├── Engineering examples:
│   ├── PR size exceeds guideline (non-blocking)
│   ├── Test flakiness detected
│   └── Minor metric drift
└── Owner: Team during retrospective
Alert priority pyramid showing P0 Critical at top through P3 Low at bottom, with response times and Slack channel recommendations
Alert priority levels with expected response times and recommended channels

Which Metrics Deserve Real-Time Alerts

The question isn't "can we alert on this?" but "should we?" Here's a framework for deciding:

REAL-TIME ALERTS (Within minutes)
┌─────────────────────────────────────────────────────────────┐
│ Criteria:                                                    │
│ - Actionable right now                                      │
│ - Getting worse by the minute                               │
│ - Clear owner who can respond                               │
├─────────────────────────────────────────────────────────────┤
│ Examples:                                                    │
│ ✓ PR merged without approval                                │
│ ✓ Critical PR stuck > 4 hours                               │
│ ✓ Failing checks overridden to merge                        │
│ ✓ SLA breach on customer-facing work                        │
│ ✗ Cycle time increased 10% this week (use digest)           │
│ ✗ PR is large (informational, not urgent)                   │
└─────────────────────────────────────────────────────────────┘

DAILY DIGEST (Once per day, morning)
┌─────────────────────────────────────────────────────────────┐
│ Criteria:                                                    │
│ - Needs attention today but not this minute                 │
│ - Multiple items that should be reviewed together           │
│ - Trend data that's meaningful in aggregate                 │
├─────────────────────────────────────────────────────────────┤
│ Examples:                                                    │
│ ✓ 5 PRs awaiting review > 24 hours                          │
│ ✓ Yesterday's velocity summary                              │
│ ✓ Review load imbalance across team                         │
│ ✓ PRs approved but not merged                               │
└─────────────────────────────────────────────────────────────┘

WEEKLY REPORT (Monday morning)
┌─────────────────────────────────────────────────────────────┐
│ Criteria:                                                    │
│ - Trend data over time                                      │
│ - Strategic, not tactical                                   │
│ - Requires analysis, not immediate action                   │
├─────────────────────────────────────────────────────────────┤
│ Examples:                                                    │
│ ✓ Week-over-week cycle time comparison                      │
│ ✓ Review coverage trends                                    │
│ ✓ Team contribution balance                                 │
│ ✓ SLA compliance rate                                       │
└─────────────────────────────────────────────────────────────┘

Essential Alerts to Set Up

Category 1: Stuck Work Alerts

Stuck PRs are one of the biggest sources of wasted time. These alerts ensure nothing falls through the cracks. For more on reducing cycle time, see our guide on reducing PR cycle time.

PR awaiting review too long:

  • Threshold: PR open > 24 hours with no review
  • Action: Notify author and potential reviewers
  • Escalation: After 48 hours, notify team lead

PR stuck in review:

  • Threshold: Changes requested > 24 hours ago, no update
  • Action: Remind author to address feedback
  • Context: Include link to PR and pending comments

Approved but not merged:

  • Threshold: Approved > 8 hours ago, not merged
  • Action: Remind author to merge or explain delay
  • Context: Check for failing CI that might be blocking

Category 2: SLA Breach Alerts

If you have review SLAs (and you should—see our PR SLA implementation guide), alert when they're breached:

Time to first review SLA:

  • Example SLA: First review within 4 hours during business hours
  • Alert: When PR passes 4-hour mark with no review
  • Route to: Assigned reviewers, then backup reviewers

Cycle time SLA:

  • Example SLA: PRs merged within 2 business days
  • Alert: When PR approaches or exceeds target
  • Route to: Author and manager

Category 3: Quality Risk Alerts

Alert when review quality might be compromised:

Rubber-stamp reviews:

  • Condition: Large PR approved in under 5 minutes
  • Alert: "PR #123 (450 lines) approved after 3 minutes—verify review quality"
  • Route to: Team lead or secondary reviewer

Merge without approval:

  • Condition: PR merged with zero approvals
  • Alert: Immediate notification to team lead
  • Context: Was this an emergency? Document justification

Failing checks merged:

  • Condition: CI checks failing but PR merged anyway
  • Alert: "PR #456 merged with failing tests—investigate"
  • Route to: Author and on-call

Category 4: Trend Alerts

Alert when metrics trend in the wrong direction:

Cycle time increasing:

  • Condition: Week-over-week cycle time up > 20%
  • Alert: Weekly summary with trend data
  • Route to: Engineering manager

Review coverage dropping:

  • Condition: Percentage of PRs with reviews drops below threshold
  • Alert: "Only 85% of PRs were reviewed this week (target: 95%)"
  • Route to: Team lead

🔔 How CodePulse Helps

CodePulse's Alerts page lets you create custom alert rules:

  • Set thresholds for any metric (cycle time, PR count, review coverage)
  • Choose comparison operators (greater than, less than, etc.)
  • Receive email notifications when alerts trigger
  • View active and historical alerts on your Dashboard
Identify bottlenecks slowing your team with CodePulse

Slack Channel Organization Strategies

The Channel Structure That Works

Poor channel organization is a leading cause of alert fatigue. Here are three proven patterns:

Pattern 1: By Severity (Recommended for most teams)

#eng-alerts-critical
├── P0 alerts only
├── @here notifications enabled
├── Expected volume: < 5 per week
└── Must acknowledge within 15 minutes

#eng-alerts
├── P1 and P2 alerts
├── No @here, no @channel
├── Expected volume: 5-15 per day
└── Check at least twice daily

#eng-metrics-digest
├── Daily summaries, weekly reports
├── P3 and informational
├── Muting OK
└── Review during planning/retros

Pattern 2: By Team (For larger organizations)

#team-platform-alerts
├── All alerts for Platform team's repos
├── Team members only
└── Reduces noise for unrelated work

#team-frontend-alerts
├── All alerts for Frontend team's repos
├── Team members only
└── Context stays relevant

#eng-alerts-cross-team
├── Alerts that span multiple teams
├── Escalations
└── Organization-wide issues

Pattern 3: Hybrid (Severity + Team)

#eng-alerts-critical      (all teams, P0 only)
#team-platform-alerts     (platform team, P1-P3)
#team-frontend-alerts     (frontend team, P1-P3)
#eng-metrics-weekly       (all teams, weekly digest)

Our Take

The hybrid pattern works best for teams of 15-50 engineers. Smaller teams can use severity-only. Larger orgs need team-based routing. But whatever you choose, be consistent. The worst outcome is channels that exist but nobody knows which one to watch.

Example Alert Messages That Get Action

The Anatomy of an Effective Alert

Good alerts have five elements: severity indicator, clear problem, relevant context, suggested action, and a direct link. Bad alerts have only one or two.

Bad alert (no context, no action):

PR #1234 needs review

Good alert (complete information):

🟡 P2: PR Awaiting Review > 24 Hours

📋 PR: Add user authentication flow (#1234)
👤 Author: @alice
⏱️ Waiting since: 26 hours (opened Tuesday 2pm)
👥 Requested reviewers: @bob, @carol

📊 Context:
  • PR size: 245 lines (+180, -65)
  • Linked issue: AUTH-456 (high priority)
  • CI status: ✅ All checks passing

🎯 Action needed: Review or reassign
🔗 https://github.com/org/repo/pull/1234

Templates for Common Alerts

═══════════════════════════════════════════════════════════════
TEMPLATE: Stuck PR Alert
═══════════════════════════════════════════════════════════════
🟡 P2: PR Awaiting Review > {threshold}

📋 PR: {title} (#{number})
👤 Author: @{author}
⏱️ Waiting: {hours_waiting} hours (opened {date_opened})
👥 Reviewers: {reviewers}

📊 Size: {additions}+ / {deletions}- ({total_changes} lines)

🎯 Action: Please review or ping for reassignment
🔗 {pr_url}

═══════════════════════════════════════════════════════════════
TEMPLATE: SLA Breach Alert
═══════════════════════════════════════════════════════════════
🔴 P1: Review SLA Breached

📋 PR: {title} (#{number})
👤 Author: @{author}
⏱️ SLA target: {sla_hours}h | Actual: {actual_hours}h
👥 Reviewers: {reviewers}

📊 Impact: {linked_issue_priority} priority work blocked

🎯 Escalation: {team_lead} - please assign a reviewer immediately
🔗 {pr_url}

═══════════════════════════════════════════════════════════════
TEMPLATE: Quality Risk Alert
═══════════════════════════════════════════════════════════════
🔴 P1: Potential Rubber-Stamp Review

📋 PR: {title} (#{number})
📊 Size: {total_lines} lines changed
⏱️ Review time: {review_minutes} minutes
👤 Reviewer: @{reviewer}

⚠️ Concern: Large PR approved very quickly

🎯 Action: @{team_lead} - verify review quality
🔗 {pr_url}

═══════════════════════════════════════════════════════════════
TEMPLATE: Process Violation Alert
═══════════════════════════════════════════════════════════════
🔴 P0: PR Merged Without Approval

📋 PR: {title} (#{number})
👤 Author: @{author}
🕐 Merged at: {merge_time}
⚠️ Approvals: 0

🎯 Immediate action required:
  1. Was this an emergency? Document in PR
  2. If not: revert and get proper approval

🔗 {pr_url}

Configuring Alert Thresholds

Start with Baselines

Before setting thresholds, understand your current performance. For guidance on what metrics to baseline, see our engineering metrics dashboard guide.

  1. Measure current metrics over 30-90 days
  2. Calculate averages and standard deviations
  3. Identify natural variation vs. problems
Example baseline analysis:

PR cycle time (last 90 days):
  Mean: 1.8 days
  Std dev: 0.6 days
  90th percentile: 2.8 days

Threshold options:
  Conservative: Alert at > 3 days (mean + 2 std dev)
  Moderate: Alert at > 2.5 days (90th percentile)
  Aggressive: Alert at > 2 days (above average)

Recommendation: Start conservative, tighten over time

Threshold Setting Guidelines

For stuck PR alerts:

  • Start with 24 hours for first review alert
  • Adjust based on team norms and timezone distribution
  • Consider business hours vs. calendar hours

For SLA alerts:

  • Set threshold at the SLA target, not above it
  • Consider warning alerts at 80% of SLA
  • Example: 4-hour SLA → warn at 3.2 hours, alert at 4 hours

For trend alerts:

  • Use percentage change, not absolute values
  • 20-30% week-over-week change is usually significant
  • Consider requiring consecutive weeks before alerting

Iterating on Thresholds

Thresholds should evolve over time:

  1. Start conservative: Better to miss some alerts than drown in noise
  2. Track false positives: If most alerts aren't actionable, raise threshold
  3. Track false negatives: If issues slip through, lower threshold
  4. Tighten as you improve: As team gets faster, lower cycle time thresholds

Slack Workflow Builder Integrations

Automating Alert Response

Slack Workflow Builder can automate common responses to alerts, reducing manual toil and ensuring consistent handling.

Workflow 1: Alert Acknowledgment

Trigger: Emoji reaction 👀 on alert message
Actions:
  1. Add thread reply: "@{user} is investigating"
  2. Update message with: "⏳ Being handled by @{user}"
  3. Set reminder for user: "Follow up on alert" in 2 hours

Result: Team knows who's handling it, owner has reminder

Workflow 2: Escalation Request

Trigger: Emoji reaction 🆘 on alert message
Actions:
  1. Send DM to team lead: "Alert escalation requested"
  2. Add thread reply: "Escalated to @{team_lead}"
  3. Post to #eng-alerts-critical if P2 or lower

Result: One-click escalation without context loss

Workflow 3: False Positive Tracking

Trigger: Emoji reaction ❌ on alert message
Actions:
  1. Add thread reply: "Marked as false positive by @{user}"
  2. Log to spreadsheet: alert type, date, user, reason form
  3. If 5+ false positives this week: notify alert admin

Result: Data for threshold tuning, trend visibility

Workflow 4: Daily Digest Summary

Trigger: Scheduled, 9am Monday-Friday
Actions:
  1. Collect unresolved alerts from past 24 hours
  2. Group by priority and type
  3. Post summary to #eng-metrics-digest:
     "📊 Daily Alert Summary
      🔴 P0/P1: {count} ({unresolved} unresolved)
      🟡 P2: {count} ({unresolved} unresolved)
      🟢 P3: {count}

      Top issues: {list of oldest unresolved}"

Result: Morning awareness without notification fatigue

Escalation Patterns When Alerts Are Ignored

Time-Based Escalation

When alerts go unacknowledged, automatic escalation prevents issues from festering. Here's a proven pattern:

ESCALATION LADDER: Stuck PR

T+0h:   PR opened
        └─ No alert (normal state)

T+4h:   First alert
        ├─ Recipients: Assigned reviewers only
        ├─ Channel: Thread in #eng-alerts
        └─ Message: "PR awaiting first review"

T+8h:   Second alert (if no activity)
        ├─ Recipients: Reviewers + PR author
        ├─ Channel: #eng-alerts (new message)
        └─ Message: "Still awaiting review - needs attention"

T+24h:  Team escalation
        ├─ Recipients: Add team channel
        ├─ Channel: #team-{name} + #eng-alerts
        └─ Message: "PR stuck 24h+ - team please help unblock"

T+48h:  Manager escalation
        ├─ Recipients: Add team lead DM
        ├─ Channel: Previous + DM to lead
        └─ Message: "Requires intervention - SLA significantly breached"

T+72h:  Skip-level escalation
        ├─ Recipients: Add engineering manager
        ├─ Channel: Previous + DM to EM
        └─ Message: "Chronic blocker - process review needed"

Our Take

Escalation should feel uncomfortable—for everyone. If your escalation ladder gets used frequently, you have a process problem, not an alerting problem. The goal is for escalation to happen rarely because people respond to the initial alert.

Ownership-Based Escalation

Sometimes the right escalation is lateral (to a different owner), not vertical (to management):

LATERAL ESCALATION: Reviewer Unavailable

If assigned reviewer:
  ├─ Is OOO (calendar check)
  ├─ Has >3 pending reviews (overloaded)
  └─ Hasn't responded in 8 hours

Then:
  1. Find backup reviewer (CODEOWNERS or rotation)
  2. Auto-assign backup
  3. Notify original reviewer: "Reassigned to @backup due to {reason}"
  4. Track for reviewer load balancing

The Quiet Hours Concept

Non-Urgent Alerts Don't Need 24/7 Delivery

Not every alert needs to interrupt dinner or wake someone up. Implementing quiet hours improves quality of life without sacrificing coverage for what truly matters.

QUIET HOURS CONFIGURATION

P0 - Critical:
  └─ Always delivered immediately (no quiet hours)

P1 - High:
  ├─ Quiet hours: 10pm - 7am local time
  └─ Held alerts delivered at 7am

P2 - Medium:
  ├─ Quiet hours: 8pm - 9am local time
  ├─ Weekends: Held until Monday 9am
  └─ Held alerts delivered in morning digest

P3 - Low:
  ├─ Quiet hours: 6pm - 10am local time
  ├─ Weekends: No delivery
  └─ Delivered in daily digest only

TIMEZONE HANDLING:
  ├─ Use each user's local timezone
  ├─ For team channels: Use team's primary timezone
  └─ Cross-timezone teams: Deliver during overlap hours

"An engineer who sleeps well reviews code better than one who was woken up by a P3 alert about a non-urgent PR."

Implementing Quiet Hours in Slack

# Approach 1: Slack's Built-in DND
Encourage team members to set personal notification schedules:
Settings > Notifications > Notification schedule

# Approach 2: Alert Queue System
Your alerting tool holds non-critical alerts:
1. Check alert priority
2. Check recipient's timezone
3. If P2+ and within quiet hours: queue
4. Deliver queued alerts at quiet hours end

# Approach 3: Digest Channels
Route P2/P3 to digest-only channels:
- Real-time: #eng-alerts-critical (P0/P1 only)
- Digest: #eng-alerts-daily (P2/P3, posted at 9am)

Measuring Alert Effectiveness

Meta-Metrics: Alerts About Your Alerts

Track these metrics to ensure your alerting system is healthy:

ALERT HEALTH DASHBOARD

Volume Metrics:
├── Alerts per day/week (by priority)
├── Trend: Is volume increasing or decreasing?
└── Target: <10 P1/P2 alerts per day for a 10-person team

Response Metrics:
├── Time to acknowledge (first reaction/reply)
├── Time to resolve (underlying issue fixed)
├── Acknowledgment rate (% alerts that get any response)
└── Targets:
    P0: Acknowledge <15min, Resolve <1hr
    P1: Acknowledge <1hr, Resolve <4hr
    P2: Acknowledge <4hr, Resolve <24hr

Quality Metrics:
├── False positive rate (alerts that needed no action)
├── False negative rate (issues missed by alerts)
├── Action rate (alerts that resulted in meaningful action)
└── Targets:
    False positive: <10%
    False negative: <5%
    Action rate: >80%

Engagement Metrics:
├── Click-through rate (% that clicked the link)
├── Thread participation (% with discussion)
├── Escalation rate (% that escalated)
└── Use for: Identifying poorly-formatted alerts

Monthly Alert Review Process

Schedule a monthly review to tune your alerting system:

  1. Volume review: Are we drowning in alerts? Cut low-value ones.
  2. False positive review: Which alerts fire but don't need action? Raise thresholds.
  3. False negative review: What slipped through? Add missing alerts.
  4. Response time review: Are alerts being acknowledged quickly enough?
  5. Threshold review: Have our metrics improved? Tighten thresholds.
MONTHLY ALERT REVIEW TEMPLATE

Date: {date}
Reviewer: {name}

VOLUME SUMMARY:
- Total alerts this month: {count}
- By priority: P0={x}, P1={y}, P2={z}, P3={w}
- Trend vs last month: {+/-x%}

TOP 5 NOISIEST ALERTS:
1. {alert_name}: {count} times, {action_rate}% action rate
2. ...

RECOMMENDED CHANGES:
- [ ] Raise threshold on {alert} from X to Y
- [ ] Disable {alert} - consistently ignored
- [ ] Add alert for {gap} - missed issues
- [ ] Change routing for {alert} from #channel to #other

UNRESOLVED FROM LAST MONTH:
- {item} - status: {status}

Alert Anti-Patterns to Avoid

Anti-Pattern 1: Too Many Alerts

❌ Problem: Alert on every PR state change
   "PR opened" "PR updated" "PR approved" "PR merged"

   Result: 50+ alerts per day, all ignored

✅ Solution: Alert on exceptions only
   "PR stuck" "SLA breach" "Merged without approval"

   Result: 5-10 meaningful alerts per day

Anti-Pattern 2: Unclear Alerts

❌ Problem: Vague alert messages
   "Metric threshold exceeded"
   "PR needs attention"

   Result: Recipients don't know what to do

✅ Solution: Specific, actionable alerts
   "Cycle time hit 4.2 days (threshold: 3 days).
    Top contributor: PR #1234 open 6 days.
    Action: Review PR or raise threshold."

   Result: Clear problem, clear action

Anti-Pattern 3: No Ownership

❌ Problem: Alerts go to large channels with no @mention
   Posted to #engineering (200 people)

   Result: "Someone else will handle it" → nobody handles it

✅ Solution: Clear routing and ownership
   Post to #team-platform (15 people)
   @mention: @platform-reviewers
   Escalation: @alice (team lead)

   Result: Named individuals feel responsible

Anti-Pattern 4: Alert and Forget

❌ Problem: Set up alerts, never review them
   Thresholds from 2 years ago
   Team has improved but alerts still fire constantly

   Result: Learned helplessness, ignore all alerts

✅ Solution: Monthly alert hygiene
   Review alert volume monthly
   Tighten thresholds as team improves
   Delete alerts that aren't actionable

   Result: Alerts stay relevant and respected

Anti-Pattern 5: Duplicate Alerts

❌ Problem: Same issue triggers multiple alert types
   "PR stuck" + "SLA warning" + "SLA breach" + "Cycle time high"
   All for the same PR

   Result: 4x noise, same information

✅ Solution: Deduplicate and consolidate
   One alert per issue, with escalation built in
   "PR stuck (24h)" → escalates to "SLA breach (48h)"

   Result: One notification stream per problem

Our Take

The #1 predictor of alert system success isn't the tool you use—it's whether someone owns alert hygiene. Assign an "Alert DRI" (directly responsible individual) who reviews alert health monthly. Without ownership, entropy wins and alert fatigue becomes inevitable.

Building an Alert Playbook

Document how to respond to each alert type. This is especially valuable for on-call rotations and new team members.

═══════════════════════════════════════════════════════════════
ALERT PLAYBOOK
═══════════════════════════════════════════════════════════════

ALERT: PR awaiting review > 24 hours
PRIORITY: P2 (Medium)
RECIPIENTS: Assigned reviewers → team channel → team lead
EXPECTED VOLUME: 2-5 per day

Response steps:
1. Check if reviewers are available (not OOO)
2. If reviewers busy, reassign to available reviewer
3. If no reviewers available, escalate to team lead
4. If PR is urgent, ping in team channel

Expected resolution: Within 4 hours of alert
Escalation: If unresolved after 4 hours, notify manager

Common false positives:
- Draft PRs (exclude from alerting)
- PRs marked "WIP" (exclude from alerting)
- Weekends/holidays (adjust for business hours)

───────────────────────────────────────────────────────────────

ALERT: Merge without approval
PRIORITY: P0 (Critical)
RECIPIENTS: Author, team lead, engineering manager
EXPECTED VOLUME: < 1 per week

Response steps:
1. Verify if this was an emergency hotfix
2. If emergency: Document justification in PR comments
3. If not emergency: Discuss with author, consider revert
4. If pattern: Address in 1:1, review branch protection

Expected resolution: Within 2 hours of alert
Escalation: Multiple occurrences → process review in retro

Common false positives:
- Admin overrides for migrations (document in advance)
- Bot commits (filter by author)

───────────────────────────────────────────────────────────────

ALERT: Cycle time exceeded 3 days
PRIORITY: P2 (Medium)
RECIPIENTS: PR author, team lead
EXPECTED VOLUME: 5-10 per week

Response steps:
1. Identify blocking stage (waiting? review? CI?)
2. If waiting for review: trigger stuck PR workflow
3. If waiting for author: ping author
4. If CI issues: escalate to platform team

Expected resolution: PR merged within 24 hours of alert
Escalation: If PR exceeds 5 days, manager involvement

Common false positives:
- Large refactors (expected longer cycle)
- Vacation/holiday periods
- Dependencies on external teams

Getting Started This Week

Week 1: Audit and Plan

  1. List all current alerts (if any)
  2. For each: what's the action rate? Keep only >50% action rate
  3. Identify top 3 issues that slip through (need new alerts)
  4. Define your channel structure

Week 2: Implement Core Alerts

  1. Set up stuck PR alert (24-hour threshold)
  2. Set up merge without approval alert
  3. Set up weekly metrics digest
  4. Configure quiet hours for P2/P3

Week 3: Add Workflows

  1. Build acknowledgment workflow (👀 reaction)
  2. Build escalation workflow (🆘 reaction)
  3. Build false positive tracking (❌ reaction)

Week 4: Measure and Iterate

  1. Review alert volume and action rate
  2. Adjust thresholds based on data
  3. Document playbook for each alert type
  4. Schedule monthly alert review

"The best alerting system is one your team actually trusts. Build that trust by ensuring every alert is worth their attention."

Identify bottlenecks slowing your team with CodePulse

Conclusion

Well-configured alerts turn your engineering metrics from passive data into active intelligence—surfacing issues when they matter and keeping your team informed without overwhelming them.

The key principles to remember:

  • Quality over quantity: Fewer, better alerts beat more, ignored alerts
  • Clear ownership: Every alert needs someone responsible for responding
  • Continuous tuning: Alert systems need maintenance like any other system
  • Context matters: Good alerts tell you what's wrong and what to do about it
  • Respect attention: Your team's focus is precious—only interrupt when it matters

For related guidance, see our articles on implementing PR SLAs, reducing cycle time, and building effective metrics dashboards.

See these insights for your team

CodePulse connects to your GitHub and shows you actionable engineering metrics in minutes. No complex setup required.

Free tier available. No credit card required.