What is The Data Lake?

By Observe Team ,March 28, 2023

This post is one of many in a series that attempts to explain basic concepts and terminology around Observe, and “The Observability Cloud.” Topics will range from architecture to deeper technical dives into topics like Temporal Algebra, Schema-On-Demand, and more.

The Observability Cloud is based on an entirely new architecture that changes how you ingest, store, analyze, and visualize observability data. Comprised of three components; the Data Lake to unify telemetry, the Data Graph to map and link relevant Datasets, and Data Apps to make observability more turnkey from your favorite services.

Let’s look more closely at the Data Lake and how it helps the Observability Cloud deliver on these promises.

All Of Your Data, In One Place

The Data Lake is the single destination for telemetry of all types and in any format. Metrics, logs, and traces from popular services like Kubernetes, AWS, and OpenTelemetry, as well as any other kind of event data like Salesforce account interactions, or even Zendesk incidents — go into the Data Lake. 

We use Telegraf, Fluentbit, and other familiar open-source collectors, to send observability data to Observe, which means there’s no vendor lock-in. And with all your telemetry in one place, Observe creates Datasets or “things” you care about like servers, shopping carts, or even customers. From there, these Datasets are linked to show you relationships you didn’t know existed and crucial context where and when you need it while troubleshooting.

 The data lake collects logs, metrics, traces, and more into a single data store (based on Amazon S3) and then compresses it by 10x

And because the Observability Cloud was built using Snowflake, we’re able to separate storage and compute by storing your data in the low-cost Amazon S3-based Data Lake. This affords us the ability to scale elastically and allows you to simply send all your telemetry to Observe, no matter its shape, and analyze it later. No more data silos, fixed schemas, and the need to spend hours pre-processing (or tagging) your data before you can use it. 

Cheap Long-Term Storage

A whopping 46% of organizations report that they have discarded telemetry data based solely on cost concerns. Whether due to sampling or tossing data outright, getting rid of telemetry is a dangerous practice that may leave you high and dry during an investigation. 

Thanks to the Data Lake, hard decisions around what data to keep, and how long, aren’t something you have to worry about. Key to enabling our usage-based pricing model, the Data Lake ensures that your observability practice can scale economically with your organization’s data growth.

Data that you store in the data lake accounts for approximately 10x of your total cost in observe.

Observe is able to do this by compressing your data 10x before coming to rest in the Data Lake. And because Observe doesn’t mark up storage costs, your storage rarely exceed 10% of your overall bill — making the cost roughly $0.0023/GB per Month (Based on Amazon S3 pricing of $23/TB per Month.) Additionally, a customer’s bill also includes acceleration costs, typically accounting for around 50% of the total bill, which ensures rapid data processing and up-to-date datasets, while query-related expenses constitute about 20% of the bill.

Lastly, we store all of this data for a default of 13 months because the economics of the Data Lake make retention periods a non-issue — no more hard decisions about what data to keep and for how long. Simply pay for the data you store, and then when you query it.

Next Stop, the Data Graph

The Data Lake is crucial for allowing the Observability Cloud to deliver an order-of-magnitude improvement in both economics and helping you troubleshoot faster, but there’s more that makes the Observability Cloud tick.

In our next post, we’ll head to the next stop for your telemetry — the Data Graph. We’ll specifically look at how it transforms mounds of machine data and reduces troubleshooting time by bringing you relevant context where, and when, you need it. 

If you’d like to learn more about the Observability Cloud then check out the full series here.

Or, if you’re ready to see how the Observability Cloud will change how you ingest, store, analyze, and visualize your observability data, click here to get access today!