What should I Monitor

When setting up MonitoringMonitoring
Monitoring is an integral part of running services in production. Without it, we are blind to what's going on, and thus unable to act according to our best interest.

Providing visibility is in the...
for a system that is able to output A LOT of metrics, it's often hard to determine which of these we should be monitoring.

If not sure where to start, covering The Four Golden Signals of MonitoringThe Four Golden Signals of Monitoring
The four golden signals of [[Monitoring]] are:

Measuring Request LatencyMeasuring Request Latency
Latency is the time taken to serve a request, and is one of [[The Four Golden Signals of Monitoring]].

The most common metric looked at here is usually the [[Mean]] latency, but this can easily be...

[[Measuring Traffic]]
[[Measuring Error Rate]]
[[Measuring Service Saturation]]




Status: #🌲

References:

...
is typically a good place to start.

Some general tips:

  • Don't shy off from recording "the same metric" in different places – if you don't track both how slow your DatabaseDatabase
    Databases are computer systems designed for storing data.

    This note serves as a link to connect database-related notes.


    [[ACID]]
    [[Optimistic Locking]]/[[Pessimistic Locking]]




    Status: #🌱...
    server is, and how slow your application perceives it to be, you will not be able to tell a database issue from a network issue. See Where to Collect MetricsWhere to Collect Metrics
    Different layers of infrastructure and application are exposing the same [[Monitoring]] metrics. For example, your [[Database]] reports the query duration, and so does your application. These two a...
    .
  • Measuring Request LatencyMeasuring Request Latency
    Latency is the time taken to serve a request, and is one of [[The Four Golden Signals of Monitoring]].

    The most common metric looked at here is usually the [[Mean]] latency, but this can easily be...
    and similar metrics in buckets (e.g., 10-30ms, 30-100ms) is a good way to easily see that metric's distribution.
  • Some metrics need higher resolution than others – when looking at a fast-changing metric like CPU usage, it's useful to have higher resolution, while things like hard disk fullness can be done with a much lower resolution.

Status: #💡

References: