Measuring Request Latency

Latency is the time taken to serve a request, and is one of The Four Golden Signals of MonitoringThe Four Golden Signals of Monitoring
The four golden signals of [[Monitoring]] are:

[[Measuring Request Latency]]
[[Measuring Traffic]]
[[Measuring Error Rate]]
[[Measuring Service Saturation]]

Status: #🌲



The most common metric looked at here is usually the MeanMean
In Statistics, Mean is the thing you normally know as "average". It's calculated by summing all numbers and then dividing the sum by the number of numbers.

// not the actual way to do it, just po...
latency, but this can easily become misleading – imagine serving 99 requests in 10ms and serving one in 20000ms gives you an average latency of ~210ms which is not a good representative of what's actually happening.

A cheap and easy alternative to this would be to use a (e.g. PrometheusPrometheus
Prometheus is an open source, metrics based [[Monitoring]] system. Its data model is kept as a time series, each consisting of key value pairs called labels.

PromQL is a querying language that all...
) Histogram metric and store request in buckets based on their latency (requests that completed in 0-10ms, 10-30ms, 30-100ms, +100ms). With the same set of requests, this would give us a much more precise idea about what's going on (99 requests of 0-10ms and 1 request of +100ms). The buckets can be configured around our SLOSLO
Service Level Objectives are values (or ranges of values) in which [[SLI]]s are allowed to be. For example, if [[SLI]] is request latency, SLO could be that request latency should be less than 100m...
s so that we can easily track how we are standing.

Taking the same bucket idea a step further would be to utilize the same data to calculate PercentilePercentile
In statistics, n-th percentile represents the value below which n% records can be found. It's calculated by sorting a data set and dividing it into 100 equal groups, and then dividing the same data...
s of response times – be careful however to understand the following:

  • When using Histograms, response times in percentiles are estimated – you are guaranteed the value to be from the correct bucket, but the value itself is calculated via Linear InterpolationLinear Interpolation
    Linear interpolation (commonly refered to as lerp) is a useful function in the fields of [[Game Development]] and [[Creative Coding]]. It's used to get a number on a specific point between two numb...
  • the more buckets you have and the smaller they are, the more precise the value in the percentile will be
  • this however costs a lot as more buckets == higher metric cardinality, which we want to avoid if we want to have monitoring system that's actually usable

In addition to tracking your response times, it's valuable to distinguish between latency of failed and successful requests, as the errors can skew the image of what our response times look like. Imagine having a lot of errors that fail early on and show you that your response times are much faster than they normally are.

On the other hand, you should also track error latency itself to see how slow your error responses are, so you can make sure you are Failing Fast.

Status: #💡