Symptom Based Monitoring

Symptom Based MonitoringMonitoring
Monitoring is an integral part of running services in production. Without it, we are blind to what's going on, and thus unable to act according to our best interest.

Providing visibility is in the...
points us to allows us to observe the user experience. A metric is Symptom based if it shows an actual symptom that is making our users happy or sad. We gather Symptom based metrics almost exclusively through Black Box MonitoringBlack Box Monitoring
Black Box Monitoring is when we look at our system from the perspective of our users – without knowing anything about its internal state.

Since Black Box Monitoring is looking at customer experien...
. Some examples of Symptom Metrics are:

  • Error rate
  • Request latency

Since these metrics directly represent user's experience, they are a perfect match to be our SLISLI
Service Level Indicators are quantitative measures of provided level of service, often aggregated into rates, averages, percentiles.

Common SLIs are availability, error rate, latency, throughput, ...
s and to implement Alerting on. See What should i be Alerting onWhat should i be Alerting on
When setting up [[Alerting]] for the first time, many people instinctively set alerts on [[Cause Based Monitoring]] metrics – if my service's CPU is ramped to 100%, of course I want to be alerted!

Status: #💡


  • Video - Practices for Creating Effective Customer SLOsVideo - Practices for Creating Effective Customer SLOs

    Source: InfoQ: Stop Talking & Listen; Practices for Creating Effective Customer SLOs

    Status: #🛈/📹/✅

    sre workbook chapter 3 has case studies on implementing slos
    [[Cause Based Monitor...