Skip to main content
All Guides
Tools & Comparisons

The Alert Rules That Actually Get Action (Not Ignored)

A practical guide to configuring engineering metric alerts that catch problems early without causing alert fatigue.

10 min readUpdated January 15, 2025By CodePulse Team

Dashboards are great for understanding trends, but they require someone to actively check them. Alerts flip the model—they come to you when something needs attention. This guide shows you how to configure alerts that catch real problems without drowning you in noise.

Why Proactive Alerting Matters

The Problem with Dashboard-Only Monitoring

  • Requires active attention: Someone has to remember to check the dashboard regularly
  • Delayed detection: Problems may fester for days before anyone notices
  • Context loss: By the time you notice a trend, you've forgotten what changed
  • Inconsistent checking: Busy periods mean dashboards get neglected

What Good Alerting Provides

  • Early warning: Catch problems when they start, not when they're critical
  • Consistent monitoring: Alerts work 24/7, even when you're busy
  • Accountability: Clear notification that something needs attention
  • Historical record: Alert history shows when problems occurred
Identify bottlenecks slowing your team with CodePulse

Anatomy of an Alert Rule

Every alert rule in CodePulse has these components:

1. Metric

What you're measuring. Common metrics for alerts include:

  • cycle_time_hours - Total PR cycle time
  • wait_for_review_hours - Time waiting for first review
  • test_failure_rate_percent - CI failure rate
  • review_coverage_percent - Percentage of PRs reviewed
  • merge_without_approval_rate_percent - Approval bypasses

2. Operator

How to compare the metric value to your threshold:

  • > - Greater than (for upper limits)
  • < - Less than (for lower limits)
  • >= - Greater than or equal
  • <= - Less than or equal
  • == - Equals (rarely used)

3. Threshold

The value that triggers the alert. Choose based on:

  • Your current baseline (alert on significant deviation)
  • Industry benchmarks (alert when below standard)
  • Team goals (alert when off-target)

4. Time Period

How often the metric is evaluated:

  • Daily: Most responsive, good for critical metrics
  • Weekly: Smooths out daily variance, good for trends
  • Monthly: Big-picture view, good for strategic metrics

5. Severity

How urgent is this alert?

  • Critical: Requires immediate attention. Use sparingly.
  • Warning: Should be addressed soon but not urgent.
  • Info: Good to know, investigate when convenient.

Creating Rules in CodePulse

Navigate to AlertsAlert Rules tab → Create Rule:

  • Fill in metric, operator, threshold, period, and severity
  • Add a description that explains what action to take
  • Rules can be enabled/disabled without deleting
  • Edit existing rules to tune thresholds over time

Essential Alerts to Configure

Velocity Alerts

Alert: Cycle Time Warning
  Metric: cycle_time_hours
  Operator: >
  Threshold: 48
  Period: weekly
  Severity: warning
  Description: "Weekly average cycle time exceeds 2 days.
    Check for review bottlenecks or blocked PRs."

Alert: Cycle Time Critical
  Metric: cycle_time_hours
  Operator: >
  Threshold: 72
  Period: weekly
  Severity: critical
  Description: "Weekly average cycle time exceeds 3 days.
    Immediate investigation needed - team is significantly blocked."

Review Health Alerts

Alert: Slow First Review
  Metric: wait_for_review_hours
  Operator: >
  Threshold: 8
  Period: daily
  Severity: warning
  Description: "PRs waiting over 8 hours for first review.
    Check reviewer availability and assignment."

Alert: Review Coverage Drop
  Metric: review_coverage_percent
  Operator: <
  Threshold: 95
  Period: weekly
  Severity: warning
  Description: "Review coverage below 95%.
    Ensure all PRs are getting reviewed before merge."

Alert: Approval Bypasses
  Metric: merge_without_approval_rate_percent
  Operator: >
  Threshold: 5
  Period: weekly
  Severity: critical
  Description: "More than 5% of merges bypassing approval.
    Check for branch protection issues or emergency merges."

Quality Alerts

Alert: Test Failure Rate High
  Metric: test_failure_rate_percent
  Operator: >
  Threshold: 15
  Period: weekly
  Severity: warning
  Description: "Test failure rate above 15%.
    Investigate flaky tests or CI issues."

Alert: Test Failure Rate Critical
  Metric: test_failure_rate_percent
  Operator: >
  Threshold: 25
  Period: weekly
  Severity: critical
  Description: "Test failure rate above 25%.
    CI pipeline is unreliable - prioritize fixing."
Identify bottlenecks slowing your team with CodePulse

Alert Anti-Patterns

1. Too Many Alerts (Alert Fatigue)

Problem: So many alerts that people start ignoring them.

Signs:

  • Alerts trigger daily but no one investigates
  • Team asks to disable alerts because "they're annoying"
  • Multiple alerts fire for the same underlying issue

Solutions:

  • Start with 3-5 essential alerts, not 20
  • Prefer weekly over daily to reduce noise
  • Use warning for early signals, critical only for true emergencies
  • Review and prune alerts that never lead to action

2. Thresholds Too Tight

Problem: Alerts fire for normal variation, not real problems.

Signs:

  • Alert fires, but investigation shows "everything is fine"
  • Small changes cause alerts (e.g., cycle time goes from 23h to 25h)
  • Thresholds based on ideal state, not realistic baseline

Solutions:

  • Set thresholds based on actual data, not aspirations
  • Add buffer: if average is 24h, alert at 36h not 25h
  • Use percentiles to avoid outlier sensitivity

3. Thresholds Too Loose

Problem: Alerts never fire because thresholds are too generous.

Signs:

  • Known problems don't trigger alerts
  • Team discovers issues from complaints, not alerts
  • Thresholds set at "worst case" levels that never hit

Solutions:

  • Review thresholds quarterly and tighten as you improve
  • Set tiered alerts: warning at moderate level, critical at extreme
  • Check: "Would this alert have caught our last incident?"

4. No Escalation Path

Problem: Alert fires but it's unclear who should act.

Signs:

  • Alerts go to a channel that everyone ignores
  • No one feels responsible for investigating
  • Same alert fires repeatedly without resolution

Solutions:

  • Assign clear ownership for each alert type
  • Include action guidance in alert description
  • Review unacknowledged alerts in team meetings

Notification Configuration

Slack Integration

Configure Slack delivery in AlertsNotifications tab:

  • Team channel: Good for visibility, but can get noisy
  • Dedicated alerts channel: Keeps alerts organized, but may be ignored
  • Direct messages: For critical alerts to responsible party

Email Options

  • Immediate: Each alert sends an email (can be noisy)
  • Daily digest: Summary of all alerts once per day
  • Critical only: Email only for severity=critical

Quiet Hours

If your team is co-located, configure quiet hours to suppress non-critical alerts outside business hours. This prevents:

  • Weekend notifications for issues that can wait until Monday
  • Middle-of-night alerts that no one will act on anyway
  • Notification fatigue from off-hours accumulation

Iterating on Alert Rules

Start Conservative

When first setting up alerts:

  1. Start with looser thresholds than you think you need
  2. Monitor for 2-4 weeks
  3. Tighten thresholds based on what you learn
  4. Repeat until alerts are meaningful but not overwhelming

Review Dismissed Alerts

When alerts are dismissed without action, ask:

  • Was this a false positive? (Tighten threshold or change metric)
  • Was it a real issue we chose to ignore? (Why? Should we care?)
  • Was the threshold wrong for our context? (Adjust)

Monthly Alert Health Check

Each month, review your alert configuration:

  1. Which alerts fired? Were they actionable?
  2. Which alerts never fired? Are thresholds too loose?
  3. Did any incidents happen that alerts should have caught?
  4. Are there new metrics we should be alerting on?

Connecting Alerts to Action

Good alerts include guidance on what to do:

Alert Description Quality

Bad Description
  • "Cycle time is high"
  • No context about what triggered
  • No guidance on what to do
  • No escalation path
Good Description
  • "Weekly average cycle time exceeds 48 hours"
  • Check Dashboard for cycle time breakdown
  • Look for PRs stuck in review (wait_for_review)
  • Check if specific repos are worse than others
  • Review reviewer workload distribution
  • Escalate to: #engineering-leads

For more on specific metrics to alert on, see our guides on Cycle Time Breakdown, Test Failure Rate, and Review Coverage.

See these insights for your team

CodePulse connects to your GitHub and shows you actionable engineering metrics in minutes. No complex setup required.

Free tier available. No credit card required.