SLO

Service Level Objectives are values (or ranges of values) in which SLISLI
Service Level Indicators are quantitative measures of provided level of service, often aggregated into rates, averages, percentiles.

Common SLIs are availability, error rate, latency, throughput, ...
s are allowed to be. For example, if SLISLI
Service Level Indicators are quantitative measures of provided level of service, often aggregated into rates, averages, percentiles.

Common SLIs are availability, error rate, latency, throughput, ...
is request latency, SLO could be that request latency should be less than 100ms.

SLOs are important in many ways in SRESRE
Site Reliability Engineering is what happens when you treat operations as a software engineering problem. The goal of SRE is to make software reliable and scalable. SRE can be seen as a specific im...
philosophy, so it's important that Defining Service Level ObjectivesDefining Service Level Objectives
The first thing to know when choosing [[SLI]]s and [[SLO]]s is that [[SLO]]s should always be defined first. The thing we want to avoid by this is just picking whatever's easy to measure and ending...
is done properly:

  • they are a main component of Error BudgetsError Budgets
    It's difficult for product and ops teams to find middle ground between investing in reliability vs taking risks. If you test your software too much before releasing, you are going too slow and the ...
  • they help drive our development decisions (should i work on new features or on improving latency?)
  • they set the right reliability expectations (it's harder to be shocked by service being down if you know that it will be down 3.65 days every year (Service Availability TargetService Availability Target
    When deciding the level of availability we want for our services, the target that we want to achieve is often described as a percentage of time the service is available.

    It's worth noting that 100...
    ), and helps developers using the service to take this into consideration)
  • they help manage performance expectations - they put a stop to the "app seems slow" conversation. As you know exactly what slow means, there is no more guesswork

While meeting SLOs is important, over-achieving them is a bad idea, as people will adapt to it and take it as a new standard. If you have a flawless quarter, it might not be a bad idea to introduce a planned outage to come closer to the actual SLO.


Status: #💡

References: