The Four Sins of Kubernetes Observability

By Observe Team ,February 4, 2021

It’s easy to fall into various traps — you might even call them sins — in your quest to make Kubernetes observability work. We want to be sure you avoid the common pitfalls and stick to the straight and narrow way on your journey toward K8s observability salvation.

1. Don’t just aggregate logs

It can be tempting to attempt to solve Kubernetes observability challenges by collecting all of the log data you can — from your master nodes, worker nodes, containers, and the underlying physical infrastructure — and then aggregating all of that data in the mistaken belief that analyzing it will give you the holistic visibility you need.

The problem with this approach is that every component in your cluster logs different types of information at different rates. As a result, if you look at aggregated log data from a specific point in time — say, the moment that a pod crashed — you are unlikely to gain the complete context you need to understand what happened. The events that caused the pod to crash may have occurred in different components at an earlier time, but you probably won’t see that by looking just at aggregated log data based on a single event.

2. Don’t focus on metrics alone

Collecting metrics data from the Kubernetes metrics API is another tempting way to attempt to gain across-the-board visibility into your cluster. After all, the metrics API covers the entire cluster, and it exposes critical data like CPU and memory usage.

Those are useful sources of visibility, and they should be part of any Kubernetes observability strategy. On their own, however, they are hardly enough to understand the state of your cluster. Focusing just on cluster-level metrics would be like trying to monitor a virtual machine-based solely on the CPU and memory metrics of the physical server that hosts it: It would give you some clue as to what is happening inside the virtual machine, but not the level of detail necessary to gain true observability.

Instead, you need context — which depends on the correlation of data of multiple types from across your cluster — to understand what is happening.

3. Don’t focus just on applications

On the opposite end of the spectrum, you might decide to ignore cluster-level metrics and focus just on the logs, traces, and metrics you can get from applications running in Kubernetes. That data is straightforward to collect if you use a so-called sidecar container to stream application data to an external monitoring tool.

The fallacy in this approach is obvious enough: If you look only at application-level observability data, you can’t know how changes in the cluster — such as the failure of a node or the exhaustion of storage volume capacity — impacts your applications. It’s only by contextualizing application data with cluster data, and vice versa, that you can begin to understand what is actually happening at all layers of your environment.

You shouldn’t stop with those data sources, by the way. Complete observability means bringing data that is external to your cluster and applications — things like CI/CD pipeline metrics — into the picture, too.

4. Don’t rely on your managed Kubernetes service

If you run Kubernetes on a managed platform, such as Amazon Elastic Kubernetes Service or Azure Kubernetes Service, you may believe that you don’t need a sophisticated observability strategy at all because your Kubernetes service will send you alerts when something goes wrong. After all, the vendor probably promises that its managed K8s platform is pain-free, so you don’t need to worry about observability for it, right?

Not quite. The reality is that, although managed Kubernetes services typically offer basic alerting and monitoring functionality as built-in platform features, they focus mainly on notifying users about critical disruptions, not overall performance management. If you want a more nuanced level of observability, such as understanding how a new application deployment performs relative to a previous version, you’ll need to collect, correlate and analyze the necessary data yourself.

This is an excerpt from The Definitive Guide to Observability in Kubernetes.