GCP Observability and Monitoring: The Ins and Outs
Google Cloud Platform (GCP) is similar in most respects to all of the other major public clouds, like Microsoft Azure and Amazon Web Services. By extension, you can monitor and observe GCP resources in the same basic way that you would monitor any public cloud environment.
Though similar, GCP isn’t identical to other public clouds. It has different cloud services, different levels of customization, as well as different monitoring tools. In addition, Google Cloud’s native tools aren’t very useful for managing private servers or data centers. For these reasons, organizations that deploy workloads on GCP need to adapt their monitoring and observability workflows to meet the unique requirements of GCP.
This article explains the fundamentals of GCP monitoring and observability by identifying the types of workloads you can monitor in GCP, available monitoring tools from both GCP and third parties, as well as what to consider when developing a GCP observability strategy.
Understanding Your GCP Monitoring Needs
Like other major public clouds, GCP offers a variety of cloud computing services, each designed to host different workloads or different parts of a workload. Among the most important cloud services on GCP are these services:
- Virtual machine instances (Google Compute Engine)
- Databases (Google Cloud Databases)
- Object storage (Google Cloud Storage)
- Serverless functions (Google Cloud Functions)
- Containers (hosted in Google Kubernetes Engine)
GCP provides many other cloud computing services, but the services described above are the bread-and-butter elements that most teams would use to construct a GCP environment.
Because each type of cloud service on GCP exposes different types of metrics, it typically makes the most sense to plan your monitoring strategy based on the types of cloud services you use. Different cloud services may require different monitoring and analytics techniques due to the unique metrics that each service generates.
The first step in planning a GCP observability strategy is determining which specific cloud services your workloads use in GCP. Then, you can read the Google Cloud metrics documentation to determine which types of metrics are available for those services. If you run workloads hosted in VMs using Compute Engine, you have an entirely different set of metrics to collect than you do for workloads running in Google Kubernetes Engine.
Google Cloud Monitoring Tools
Like the other major public clouds, Google Cloud provides several native tools to help teams collect and analyze metrics from across its various cloud services. These tools are useful for basic metrics collection and interpretation. But GCP’s native monitoring solutions typically aren’t enough for complex monitoring needs.
Google Cloud Monitoring
The most important monitoring and observability tool in GCP is Cloud Monitoring. Cloud Monitoring is a SaaS product that lets you:
- Collect metrics from most Google Cloud services.
- Visualize service status and health using graphs and charts.
- Configure alarms to generate alerts when metrics cross certain thresholds.
Cloud Monitoring is free if the metrics you collect fall within the category of what Google defines as “non-chargeable” metrics, which include most of the standard metrics you’d monitor for typical GCP workloads. However, if you define custom metrics, or you ingest metrics from services external to GCP, you have to pay extra for those.
Cloud Logging is the primary GCP tool for collecting and analyzing logs from those GCP services that expose logs. It lets you ingest logs into a log storage “bucket” and perform basic analytics operations on the log data. You can also configure automated alarms based on log data.
You have to pay to use GCP Cloud Logging, although you get 50 gigabytes of free log ingestion per month before the fees start. Google Cloud also charges log storage fees if you choose to retain logs for more than 30 days.
Cloud Audit Logs
If you want to monitor the actions that human and machine users perform in your GCP environment, you can use Cloud Audit Logs, a GCP service that tracks administrative activities. Cloud Audit Logs only works for GCP services that generate audit logs. Though this includes most services, not all types of actions are recorded for every service.
Cloud Audit Logs is subject to the same pricing terms as Cloud Logging.
The Limitations of Google Cloud Monitoring Tools
Google Cloud’s native monitoring products are useful as the basis for building a simple GCP monitoring and observability strategy. However, they are not sufficient on their own for meeting complex monitoring and observability needs.
Lack of Multi-Cloud Support
Although some native GCP monitoring tools can ingest data from certain AWS resources, for the most part they only support workloads hosted on GCP. This makes GCP’s native tools a poor solution if you need to monitor or observe more than one cloud.
Limited Hybrid Cloud Infrastructure Metrics
GCP’s monitoring tools aren’t designed to help you observe any private infrastructure that you deploy as part of a GCP-based hybrid cloud environment using a platform like Anthos. Google Cloud’s native tools can monitor the status of services deployed on hybrid cloud environments, but they aren’t very useful for managing private servers or data centers outside of Google Cloud. You need separate monitoring tools for that purpose.
Although GCP’s native monitoring tools won’t cost you anything if you ingest and analyze small amounts of data, the costs can add up for larger-scale observability operations. Also, given that the amount of data most IT environments generate gradually increases over time, observability toolchains that are free at first could become costly over time once you surpass GCP’s free data ingestion quotas. You may also end up paying a lot more if you need to retain log data beyond the initial 30-day retention period that GCP offers for free. And you most likely want to retain the log data longer.
For these reasons, relying on GCP’s native monitoring tools may not be ideal from a cost standpoint, especially for organizations with a large amount of data to analyze and/or require long-term data retention. In that case, third-party tools that offer more cost-effective storage resources may provide a lower total cost of ownership.
Limited Kubernetes Observability
In addition to providing the managed Google Kubernetes Engine service, Google Cloud also uses Kubernetes as the foundation for Anthos, its hybrid cloud framework.
Despite being a key platform, Kubernetes monitoring in Google Cloud is relatively basic. You can track the status of Kubernetes Services or determine how much CPU usage is generated by your clusters. But if you need to pinpoint the root cause of a Pod that has failed to start, or determine why a node has crashed, GCP’s native tooling isn’t useful.
Disparate Metrics and Service Types
Although GCP monitoring and logging tools let you collect data from virtually any type of GCP service, they don’t do a good job of standardizing or correlating that data. In most cases, each service generates specific types of metrics and log data. That makes it difficult to compare performance between services or understand how the health of one service (a database) impacts another (an application hosted in a VM) that depends on it.
Thinking Beyond GCP’s Native Observability Tools
For the reasons just explained, a GCP monitoring and observability strategy that depends solely on GCP’s native monitoring products is not likely to meet the needs of most organizations. Although tools like Cloud Monitoring and Cloud Logging can help gain a quick overview of the status of your GCP environment, you may want to deploy additional, third-party observability tools that provide deeper insights into GCP.
As you evaluate external GCP monitoring tools look for features that fill the crucial gaps between GCP’s native monitoring functionality and the insights teams need to maintain a fully healthy GCP environment.
Tracing the root cause of performance issues requires the ability to compare and correlate data from disparate sources. To that end, GCP observability solutions should be able to ingest, standardize, and correlate data from all types of GCP services. In addition, these solutions should be able to do the same for any workloads that you run in any other cloud environment – or on-premises – then analyze them collectively to deliver meaningful insights.
When you can do this, you step beyond an observability strategy that limits you to monitoring on a service-by-service basis. It becomes possible instead to understand the complex relationships between GCP services, as well as how your GCP resources affect the performance of your broader cloud architecture.
Understand Historical State
Although detecting performance issues or risks in real time may be your primary goal, it’s important to be able to understand what happened in the past to gain critical context on problems that you didn’t identify at the time they occurred. For example, you may have missed a container that failed to start and need to investigate the cause after the container has disappeared and taken its log files with it. Container logs don’t persist once the container has shut down.
That’s why your observability tools should allow you to reconstruct the historical state of your GCP environment so that you can investigate issues – and prevent them from recurring – even if you do not catch them in real time.
Simple Data Ingestion
One problem you may run into with some non-native GCP observability products is complex data ingestion. Specifically, you may need to spend a lot of time configuring the tools to ingest GCP metrics and logs, or even create a different setup approach for each type of service you want to monitor.
To simplify the ingestion process, look for observability tools that can collect data from any GCP resource with minimal setup. You should be able to get the metrics and log data you need just as easily as you could when using a native GCP tool while benefiting from missing GCP features in GCP’s tools.
Cost-effective Data Storage
Observability tools that force you to store data on expensive services limit your ability to retain data for as long as you need. For that reason, they tend to result in a higher total cost of ownership – even if their direct licensing costs are low.
Avoid this pitfall by choosing observability solutions that provide access to low-cost storage, so you can retain data as long as you want without breaking the bank.
Observe’s Approach to GCP Monitoring and Observability
The Observe platform provides users with critical GCP monitoring and observability features like you would find in Google Cloud and more – all without adopting multiple tools.
With the Observe platform, you get these features:
- Simple data ingestion from any service: Whether you use one, or one hundred, services in GCP, you don’t have to spend any more time hunting down far-flung data sources just to stream them to multiple monitoring tools. Simply start collecting data from your GCP environment(s) with a single click in Observe by installing the GCP App.
- Data shaping and correlation: Want to know how the performance of an app hosted in a Compute Engine VM was impacted by a serverless function? Thanks to Observe’s innate ability to automatically correlate seemingly disparate datasets, you can do these things easily by drilling down into event data from across your GCP environment.
- Track state over time: Observe lets you “time travel” as it tracks the state of everything in your system including applications, infrastructure, services, and more over time. So you always know what happens, and when.
- Usage-based pricing: Cloud providers like GCP and their accompanying services generate mounds of data. Though useful, many organizations opt to not collect all of it, or choose to aggressively sample that data due to cost concerns. With Observe you can put down your crystal ball and stop guessing which data you may need – simply collect and store it all for cheap, and then analyze it when, or if, you want!
- Multi-cloud observability: Observe works across all other major public cloud and on-premises environments making it an even more obvious choice for GCP observability. With Observe, you can consolidate your cloud monitoring tool set around a single platform, no matter which or how many clouds you need to observe.
- Native Kubernetes observability: Observability around distributed and ephemeral architectures like Kubernetes is tricky for most vendors due to the frequent state and status changes of infrastructure. But this has always been Observe’s bread and butter. Because Observe tracks state and status over time, and automatically show you how resources are related. Observe provides you with tools to drill down into more context. Observe is the obvious choice for Kubernetes observability.
Observe provides GCP observability that extends far beyond what is possible using GCP tooling alone. Instead of juggling multiple GCP tools, trying to link data sources together manually, and monitoring infrastructure and services separately, you can leverage Observe to gain deep visibility into your entire GCP environment—not to mention anything linked to it. You can track the performance, reliability, and cost of all of your resources. With Observe, your GCP services are part of your data universe, along with your other cloud services, and not just another data silo.
If you want to see how Observe can make observability easier with GCP, and anything else you’d want to observe, then click here to request trial access today!