Build SLO dashboards in a day
A lightweight recipe for rolling out error budget dashboards that engineers actually check.
SLOs lose their power if nobody sees them. We standardized a small set of dashboards that can be rolled out in a day for any new service.
Start with one golden signal
Pick a single request path or queue that represents real user outcomes. Define success criteria and calculate the rolling error rate using a 28-day window to smooth weekend traffic swings.
Visualize budgets, not just uptime
Our default Grafana panel shows remaining error budget, burn rate, and a forecasted exhaustion date. A separate panel highlights the last five deploys so regressions line up with releases.
Close the loop with alerts
We wire two alerts: a slow-burn notification at 2x burn rate during business hours and a hard page at 14x anytime. Both include links to the dashboard and relevant runbooks.
With shared templates, teams can measure reliability quickly without reinventing the tooling for every service.