Alert-to-Incident Ratio: The Single Number That Diagnoses Alert Fatigue
Updated May 2026. Sources: Google SRE Book Chapter 6 (Practical Alerting), Catchpoint 2024 SRE Report, incident.io 2024 State of On-Call, PagerDuty 2023 State of Digital Operations.
The Most Useful Single Number
If you can only track one metric to diagnose whether your alerting practice is healthy or noisy, track the alert-to-incident ratio. It is the ratio of paging alerts received to declared incidents in the same time window. A 5:1 ratio means 5 paging alerts produced 1 real incident, so 4 of those 5 alerts were noise. A 50:1 ratio means 50 paging alerts produced 1 real incident, so 49 of 50 were noise. The metric captures both the noise rate (how many alerts fire) and the precision of those alerts (how many correspond to real incidents) in a single number that anyone can interpret.
The ratio is diagnostic in a way that raw page count is not. A team paging 500 times per week can have a healthy ratio (3:1, meaning 167 real incidents) if the team is genuinely operating at incident-heavy scale, or an unhealthy ratio (50:1, meaning 10 real incidents) if they are drowning in noise. The page count alone does not distinguish these two cases; the ratio does. A team paging 50 times per week with a 50:1 ratio has the same noise problem as a team paging 500 times per week with a 50:1 ratio, just at different scale.
The metric also creates a clear improvement narrative. "We are going to halve our alert-to-incident ratio this quarter" is a target that engineers understand intuitively, that translates to specific tuning work, and that produces measurable results within a single quarter. Compare to softer goals like "we are going to reduce alert fatigue", which lack the metric anchor that makes progress trackable.
Computing It
Both PagerDuty and Opsgenie expose the necessary data via their analytics APIs. The math is intentionally simple: count paging alerts received in a window, count declared incidents in the same window, divide. Most teams compute weekly and monthly. PagerDuty's analytics dashboard does this calculation natively if you configure the report. Opsgenie's incident metric endpoints provide the same data; a simple dashboard widget computes the ratio. incident.io and Rootly both expose ratios in their default dashboards.
The non-trivial work is defining "incident" consistently across teams. If incident creation is automatic (every Sev-2-or-higher alert auto-creates an incident), the ratio looks artificially favourable because the count of incidents inflates. If incident creation requires manual declaration (an engineer must mark the page as a real incident), the ratio reflects reality more accurately but requires discipline to capture. Most mature operations require manual incident declaration with a low-friction Slack command (incident.io and Rootly both excel at this); auto-creation tends to corrupt the metric.
Compute the ratio at two grains: aggregate across the team for the headline diagnostic, and per-service for the action surface. The aggregate ratio answers "how noisy is our alerting practice overall"; the per-service ratio answers "which services should we tune first". A team with a 4:1 aggregate but a 30:1 outlier on the checkout service has a clear next move that the aggregate alone would not surface.
Target Ratios and What They Feel Like
| Ratio | State | What it feels like | Realistic time to fix |
|---|---|---|---|
| 1:1 to 2:1 | Healthy | Each page is real signal; engineer engages promptly without resentment; on-call week ends without fatigue residue | Maintain via quarterly audit |
| 3:1 to 5:1 | Acceptable | Most pages are real; some triage required; on-call is workload but not burnout | 1 to 2 quarters of focused tuning |
| 5:1 to 10:1 | Stressed | Engineer learns to triage quickly; real incidents emerge from noise; tolerable for a quarter, attrition risk over a year | 2 to 3 quarters of focused work |
| 10:1 to 20:1 | Painful | Dread before primary week; constant context-switching; real incidents recognisable but cognitive load is high | 3 to 6 quarters of structural intervention |
| 20:1 to 50:1 | Critical | Pattern-matching dominates investigation; missed signals; sustained sleep disruption; high attrition pressure | 6 to 12 months structural plus tooling intervention |
| 50:1+ | Crisis | Engineers cannot reliably distinguish signal from noise; missed Sev-1 incidents; team morale collapsing | Emergency response, suspend non-critical alerts, leadership escalation |
Weekly Review Cadence
The alert-to-incident ratio drifts unless it is reviewed regularly. The cheapest and most reliable mechanism is a 30-minute weekly review with three standing items. Item one: aggregate ratio for the week, compared to the four-week rolling average and the target. A spike (this week 25 percent above the rolling average) triggers investigation. Item two: per-service ratios for the top-10 noisiest services. The noisiest service typically warrants an action item before the next review. Item three: any new alert rules added since the last review, evaluated for whether they are likely to maintain or degrade the ratio.
The meeting needs a clear owner. At small scale (under 30 engineers) this can be the SRE lead or the most experienced senior engineer rotating monthly. At larger scale, the alert review board (read /alert-fatigue-scale-up-50-engineers) takes ownership and the weekly review can be lighter. The key is consistency: a weekly review that happens for three months and then drifts produces worse outcomes than no review at all, because the team learns that the metric does not actually matter to leadership.
For the dashboard backing this review, the recommended layout is: ratio chart over the last 12 weeks with target line, top-10 services by ratio with both ratio and absolute page count, list of new alert rules added since last review, and a list of action items from the previous review with status. This is standard dashboard infrastructure that takes a day to build and pays back across many years of operational discipline. Most pager tools (PagerDuty Analytics, Opsgenie Reports, incident.io Insights) ship with most of these views available out of the box; the work is in configuring them and committing to the review cadence.
Dashboard Sketch
A minimal but effective alert-to-incident dashboard contains four panels and is the only dashboard most teams need for alert hygiene. Panel one (top-left): aggregate weekly ratio over the last 12 weeks, with a horizontal line showing the team target (typically 3:1 or 5:1). Panel two (top-right): per-service ratio bar chart for the top-10 services, sorted by ratio descending. Panel three (bottom-left): list of new alert rules added in the last 30 days, with the per-rule ratio for each.
Panel four (bottom-right): list of high-ratio services with the most recent action items from the alert review and their status (open, in progress, closed). This panel is the closed loop: the dashboard does not just measure, it tracks what is being done about the measurement. Most teams underinvest in this panel and pay for it later with measurement-without-action drift. The bar to ship is low: a Google Sheet or a Linear list view linked from the dashboard is enough to start. The work is the discipline of using it.