Service catalog SLO handbook
How we wire SLOs into the catalog so ownership, alerts, and error budgets stay consistent.
A catalog entry without SLOs is just a name. We treat the catalog as the source of truth for reliability targets and let tooling sync the rest.
Model SLOs as code
Each service owns an SLO manifest stored next to the code. The catalog ingests it and stamps labels for alerts.
service: billing-api
owner: team-invoices
slos:
availability:
objective: 99.9
window: 30d
threshold:
good_statuses: [200, 201, 204]
latency_ms_p95: 350
latency:
objective: 99.0
window: 7d
threshold:
latency_ms_p99: 800
# publish to the catalog
catalogctl push slo ./manifests/billing-api.slo.yaml
Derive alerts from SLO math
We avoid custom dashboards per service. Instead, alerts follow standard burn-rate windows.
rule "billing-availability-burn" {
expr = "sli_error_rate:ratio_rate5m > (1-0.999) * 14"
for = "2m"
labels = { severity = "page", service = "billing-api" }
annotations = {
summary = "Billing availability SLO burn (fast)",
runbook = "https://runbooks.internal/billing-api/slo-burn"
}
}
rule "billing-availability-burn-slow" {
expr = "sli_error_rate:ratio_rate1h > (1-0.999) * 3"
for = "10m"
labels = { severity = "ticket", service = "billing-api" }
}
Wire ownership to rotation data
Every SLO alert resolves to a rotation maintained in the catalog. No more guessing who is on point.
const { team, oncall } = await catalog.lookup("billing-api");
const responder = oncall.current.rotation[0];
return {
summary: `Page ${responder} for SLO breach`,
labels: { team },
};
-- report services missing SLO metadata
select service_name
from catalog_entries
where slos is null or jsonb_array_length(slos) = 0;
Keep budgets visible
We chart error budget burn in the same place teams find deployments and logs.
# generate a budget summary panel
catalogctl render slo-dashboard --service billing-api --window 30d > panels/billing.json
{
"title": "Billing SLO burn",
"panels": [
{ "type": "timeseries", "expr": "sli_error_budget_remaining" },
{ "type": "stat", "expr": "sli_availability:ratio_rate30d" }
]
}
The catalog becomes more than an index—it is the contract for how a service should behave and how quickly the team must respond when it drifts.