10 Lessons for Log Aggregation: What Practitioners and Managers Need to Know
Occasionally, you may hear the terms “log aggregation” and “log collection” used almost interchangeably– but they shouldn’t be.
Let’s be frank, log aggregation is a topic that’s hard to get very excited about. Unless you are preternaturally disposed to love logs and log management, chances are that you think of log aggregation as a pretty mundane and tedious process.
Indeed, some folks might even thumb their noses at log aggregation. They believe that logs are too difficult to work with given their lack of standard structure and that other observability sources, like logs and metrics, are better resources than logs.
Yet the fact is that virtually everything generates logs, which makes log aggregation an essential part of any modern observability strategy. In that respect, learning to aggregate logs efficiently is something worth getting excited about. It’s as important as choosing the right cloud architecture or deciding whether or not to containerize your app.
In the “lessons” that follow, you’ll find everything you need to know about how log aggregation works, how to aggregate logs in public clouds like AWS, how to choose a log aggregation tool, and more. Whether you work with logs day-to-day, or you manage engineers who do, reading this guide will help you perfect the art and science of log aggregation, while also gaining perspective on why aggregating logs the right way is more important than it may seem at first glance.
Lesson 1: Defining log aggregation
Log aggregation is the process of consolidating multiple logs into a central location. Or, put another way, log aggregation is the process by which you combine various log files into a single, consolidated location that is easy to analyze and search. Log aggregation saves engineers from having to interpret logs individually to gain visibility and observability into systems they manage. Just as important, log aggregation makes it easier to identify correlations between events or patterns that are recorded in discrete log files.
Lesson 2: How log aggregation works
Although it’s common to talk about log aggregation as a single process, it’s a set of discrete processes:
- Collection: Log aggregation starts with the collection of each log that you want to aggregate. These logs may be spread across a variety of different locations: Some may be on local servers, others in containers, others in a public cloud, and so on.
- Standardization: The data inside logs can be structured and formatted in a variety of ways. To aggregate logs, you must either standardize the data so it can be analyzed consistently, or use a strategy like schema-on-read to run consistent analytics across differently structured data sources. The data standardization approach is usually more convenient when it’s possible to implement, but because not all data sources can be standardized, it’s useful to leverage schema-on-read as an alternative strategy.
- Verification and quality control: Some of your logs may contain errors, missing data, or other problems. A good log aggregation process will include a step to verify the data in each log and address any data quality issues. For example, incomplete log entries could be auto-completed or removed from the log entirely.
- Consolidation: After log data has been collected, standardized, and controlled for quality, you can combine your various logs into a central, consolidated source. Usually, you do this by ingesting aggregated log data into a log management or observability tool, where you can analyze and search the data directly.
Lesson 3: Log collection vs. log aggregation
Occasionally, you may hear the terms “log aggregation” and “log collection” used almost interchangeably– but they shouldn’t be.
There are excellent open-source logging tools out there – such as Fluentd and Logstash – that are designed only to be log collectors. They can pull logs from their sources and move them to a different location. But they don’t perform all of the steps in log aggregation. To do that, the log collectors need to integrate with other tools. However, because these log collectors start the log aggregation pipeline, they can be easy to conflate with log aggregators.
Don’t make that mistake. Log collectors play a hand in log aggregation, but they’re not log aggregation tools unto themselves. You need more than a log collector to perform log aggregation.
Lesson 4: Why efficient log aggregation matters
Again, log aggregation may seem like a pretty mundane topic. Does it matter, after all, how you collect, standardize, verify and consolidate logs?
The answer is a resounding yes – especially in the context of modern, distributed systems, like Kubernetes or any public cloud service. The more complex your software environment, the more logs it probably produces, and the more you stand to gain by optimizing the way you aggregate those logs. At the same time, some log files are ephemeral in their original locations, meaning they disappear or are erased periodically unless you aggregate them to a different location.
To use Kubernetes as an example, a Kubernetes environment consists of more than a half-dozen different types of components: You have an API server, Nodes, Pods, a key-value store, a service mesh, and so on. Each of these components produces telemetry data, which can then be turned into logs.
Some of these components – like containers – generate logs that are ephemeral, and disappear when the container shuts down or restarts. Others store only limited log data before it is overwritten; most Kubernetes distributions limit container logs to 10 megabytes, for example. For this reason, it’s critical to aggregate the log data before it is lost.
At the same time, aggregation makes it easier to compare logs and identify trends or anomalies that stretch across them. Looking at each log individually provides very limited insight into what’s happening inside your Kubernetes environment. By aggregating logs, you can analyze them collectively and gain a deep and expansive understanding of the health and performance of your environment. This insight can help you understand if the performance problem within a Node relates to a performance issue in a Pod, or how decisions made by your service mesh impact the performance of your application.
This isn’t true just in Kubernetes. Virtually any modern application stack consists of multiple components – servers, hypervisors, databases, and the like – each with its own logs. The more efficiently you can aggregate those logs, the deeper the visibility you can gain into the environment.
Lesson 5: Log aggregation’s role in observability
Logging is often said to be one of the three so-called pillars of observability. The other two are metrics and distributed traces. By analyzing logs, metrics, and traces from multiple data sources in tandem, you can observe a system, which means understanding the internal state of the system based on the data it exposes externally.
Thus, log aggregation is one of the first steps toward observability, and any observability platform must support log aggregation.
However, the way observability tools manage log aggregation can vary. Some observability platforms are essentially just log aggregators integrated with other tools to perform data analytics.
Other, more flexible observability platforms allow users to ingest logs using log aggregators of their choice, rather than requiring a specific aggregator. These platforms may be able to ingest logs natively, but they can also perform log ingestion and aggregation using external, open-source log aggregators.
Having a choice of log aggregation tools within your observability platform makes it easier to integrate the platform into any type of environment, as well as to cater to the varying skill sets and preferences of engineers who need to use the platform.
Lesson 6: Essential log aggregator tool features
When choosing a log aggregator, look for tools that are:
- Lightweight: Collecting and consolidating logs can consume significant compute and network resources when at scale. Good log aggregators operate efficiently to minimize this burden and leave more resources available for production applications.
- Compatible with many data types and formats: The best log aggregators support a variety of log types and formats as log formats vary wildly.
- Easy to operate: Deploying and running log aggregators can be a hassle, especially if you have to deploy the aggregator manually to each log source. Look for log aggregators that use an agentless approach.
- Easy to configure: Good log aggregators should be capable of ingesting most types of logs out-of-the-box, with minimal tweaking required to support a given log source. However, they should also enable configuration customizations for teams that want them.
Lesson 7: When, and when not, to use an open-source log aggregator
As we’ve hinted, some log aggregators are open source and others are proprietary.
The main benefits of open source log aggregation are obvious enough if you’re familiar with open source tools in general. Open source aggregators are free to use, and often highly customizable.
On the other hand, open-source log aggregators tend to be more challenging to deploy and manage. They don’t prioritize user-friendliness, and you’ll likely need to spend more time customizing them to get it to work in your environment than you would if you use a proprietary log aggregator. For example, aggregators like AWS CloudWatch integrate natively and instantaneously with most major AWS services.
Another limitation of open source log aggregation – as noted above – is that some open source tools that are passed off as log aggregators are just log collectors. They lack native support for data standardization, validation, and quality control. You’ll need to perform those tasks separately.
Ease of Deployment | Ease of Configuration | Cost | Customizability | |
---|---|---|---|---|
Open Source | Low | low | Free or Low | High |
Proprietary | High | High | Medium or High | Limited |
Fortunately, it’s possible to leverage the flexibility of open-source log aggregators without the drawbacks. If you use an observability platform that integrates with open source log aggregators, you can collect and consolidate logs using an open-source aggregation tool of your choice, while also benefiting from the enhanced usability, data management, and analytics features of the broader observability platform.
Lesson 8: Understanding cloud log aggregation
The points we’ve covered so far about log aggregation apply to any type of environment, in the cloud, or not. Though, if you do use the cloud, there are some special considerations to bear in mind.
First, understand that while most public cloud providers offer default log aggregation solutions, you don’t have to use them. For example, AWS CloudWatch is Amazons’ de facto logging tool. But you can also use an open-source log aggregator, like Fluentd or Logstash, to ingest logs from most AWS services.
If you use multiple clouds or have a hybrid cloud architecture, the ability to deploy your log collector instead of relying on a cloud provider’s native aggregation tool becomes especially important. Third-party log aggregators can collect logs from all segments of a hybrid or multi-cloud environment, whereas cloud vendors’ aggregators usually cannot.
Remember, too, that you’ll have to pay a monthly fee for logs you leave in a public cloud as log data contributes to your total cloud storage. For that reason, it’s beneficial to use log aggregators that allow you to move logs from their source to an external location – ideally, one where you can take advantage of lower-cost storage.
Lesson 9: The more logs you aggregate, the better
Maximizing observability hinges on collecting, correlating, and analyzing as much data as possible. You should strive to aggregate as many logs as you can, while also analyzing as many metrics and traces as possible.
Though, it may be difficult to aggregate every log if you have to configure a log collector for each of the dozens of different log sources in your environment. More logs also lead to higher storage requirements, which becomes a problem if you lack access to low-cost storage. As a result, some teams choose to aggregate only key logs, or to strip out less important data from their logs to reduce log size and storage costs.
A better approach is to choose an observability platform that lets you aggregate all of your logs with minimal configuration requirements and low storage costs. Otherwise, you’ll end up trying to guess which logs will end up being the most important, or waste time configuring log indexing.
Lesson 10: Log aggregation is only one step in log management
On a parting note, it’s important not to conflate log aggregation with log management. The latter term refers to a broader set of processes, which include but are not limited to log aggregation.
Log management starts with log aggregation. After logs have been aggregated, you’ll typically want to analyze the logs, then generate alerts or reports based on relevant trends or anomalies within the log data. You’ll also have to decide whether to store your aggregated logs and when to delete logs that you no longer need to free up disk space (a process known as log retention).
So, think of log aggregation as a critical part of effective log management, but not the only part. When you design log aggregation processes and choose log aggregation tools, make sure they support your broader log management strategy, rather than enabling log aggregation alone.