The Economics of Observability
At scale, this modern architecture promises an order-of-magnitude improvement in cost.
Old Architectures Don’t Solve New Problems
When surveying Observability vendors, users will encounter almost every pricing model ever invented. Each vendor promises to make their offering more affordable, more efficient, and more representative of the value they provide. Yet still today, users deploying products designed a decade or more ago are not happy about the price they pay versus the value they receive. Worse, as they project growth in data ingest volume they realize that a 30-50% increase in data also means a 30-50% increase in cost.
The fundamental problem is product architecture. Products built a decade or more ago were not designed for either the volume or diversity of data that exists in most environments today. Vendors typically apply band-aids like using multiple data stores, filtering out data they deem unnecessary or offloading data to cheaper cold storage. But these “fixes” push more complexity and management burden onto the customer. Even worse, they both fragment and reduce data volumes — making it much harder to actually “observe” their environment.
Observe: A Modern Cloud-Native Architecture
Old architectures don’t solve new problems, and Observability is a new “problem.” Imagine for a second what data volumes will be a decade from now… it’s quite possible that 10-20TB per day will be commonplace for smaller environments and 100TB to 1PB per day for large enterprises.
Observe has taken a different approach from every other vendor in the market. Observe believes that all event data — regardless of whether it’s a log, metric, or trace — should be in one central data store. Because Observe is one of the newest entrants in the Observability market it employs a modern, cloud-native, architecture. The benefit is that the cost of Observe does not increase in lockstep with the volume of data ingested. At scale, this modern architecture promises an order-of-magnitude improvement in cost.
How Is This Possible?
To explain why Observe is so much more cost-effective than incumbents requires a more detailed explanation of Observe’s architecture, how that translates into Observe costs, and most importantly, how that translates into how Observe bills for usage.
Observe Architecture: Part I – Separation Of Storage & Compute
A decade or more ago, to get acceptable performance when querying large volumes of data, the compute layer (the server on which the query engine ran) needed a high bandwidth connection to high-performance storage. The constant battle was getting data off of disk into memory quickly enough to run the query and satisfy impatient users. If vendors managed to overcome this challenge, then they could move on to the next challenge — how to ensure the server was large enough to run all the queries. The answer was always to buy a bigger server… and attach more high-performance storage… via more high-bandwidth network connections. This was painful for on-premises deployments where the user had to do all the heavy lifting on the infrastructure, but many early “cloud-architected” products were no different, they simply hid what happened behind the scenes.
Obviously, high-performance storage is expensive. And if data has to be kept for long periods, then you need a lot of it. This explains why retention periods for products with legacy architectures are incredibly expensive. The band-aid that most vendors throw at this problem is allowing the user to archive data into an Amazon S3 bucket (low-cost cloud storage) for long-term retention and pushing the complexity of managing that data to the user. But what happens when the user wants to query that archived data? Well, then there’s a clunky, very manual, and expensive process to re-ingest the data so they can query it.
Observe’s modern architecture separates storage and compute. We ingest all data, no matter the format, into low-cost cloud storage. We then compress the data an average of 10x — making the cost roughly $0.0023/GB per Month (Based on Amazon S3 pricing of $23/TB per Month.)
With this approach, you might expect performance to be terrible, but Observe utilizes the Snowflake Data Cloud which drives massive throughput by loading data in parallel before querying it in appropriately sized compute clusters. After that, the results are stored back in Amazon S3. More importantly, Observe administrators and users spend zero time managing, filtering, or archiving data.
Observe Architecture: Part II – Accelerate Only The Data You Use Most
When Observe ingests data, it is stored in a raw form inside an “Observability Data Lake.” The Data Lake contains all the user’s event data — that includes metrics, logs, and traces, but could also include any kind of event data such as Salesforce account interactions or even Zendesk incidents. Although the Data Lake contents are comprehensive, clearly not all data in the lake is interesting to the user all the time.
To make the more interesting data extremely fast to query, Observe “accelerates” data out of the Data Lake into a more structured form, called Datasets. Datasets can be loosely structured entities like “Container Logs” simply containing a timestamp and a log message, or they can be highly structured entities such as a ‘Customer’ Resource that contains fields like Name, Email Address, Company, etc.
Once accelerated, Observe links Datasets to form a Data Graph. For example, the “Customer” entity may be linked to a “User Sessions” entity, which may be linked to the “Container” entity. These links provide the user with immediate access to related context during an investigation, (e.g. “Show me logs for the sessions where customer XYZ saw an increase in 404 errors.”)
There are a couple of cost implications to this approach. First, and maybe most obvious, about 30% of the ingested data is duplicated. In many other products, this would be a cause for alarm because the users’ bill would go up by 30%. Not with Observe. Because of our innovative approach to compression, typical customer bills for storage amount to only about 10% of the expected bill. Second, acceleration costs money — as event data streams into Observe, it is continually accelerated to keep Datasets fresh and up-to-date. On average, about 50% of a customer’s bill are acceleration costs. Once accelerated though, data that is relatively inexpensive to query. In fact, only about 20% of a typical customer’s bill is query related to query costs.
To the casual observer (no pun intended), this approach of accelerating data may seem similar to legacy vendors who archive raw data to S3 and force a “re-hydration” to query. It is worth noting however that not only is Observe’s Data Lake stored in S3, but the Data Graph is too as it does not require high-performance storage to deliver great query performance. In addition, the process of acceleration is completely transparent to the user — at best asking for confirmation before executing large queries that go beyond the accelerated time range.
Observe Architecture: Part III – Multi-Tenancy Drives Down Query Costs
As mentioned earlier, Observe uniquely relies on the Snowflake Data Cloud to query data. The good news is that Observe users do not know that Snowflake is being used — it is entirely transparent to them.
Under the hood, Observe provisions many “lanes” of Snowflake warehouses of varying sizes. Sizes range from Extra Small (XS), Small (S), and Medium (M) — right up to Extra-Extra-Extra-Extra-Large (4XL)! All users across all accounts share all lanes of Snowflake warehouses.
When a user submits a query, Observe calculates the optimal location to execute that query. For example, if it’s a simple Dashboard query it could go to an XS warehouse. If it’s a needle-in-haystack search across millions of container logs it could go to a 4XL. Because Observe is a multi-tenant system the user submitting the query doesn’t bear the full cost for the warehouse — it’s shared across all users on the system at any point in time. Observe’s warehouse management and scheduling of queries is just one of the many things that make Observe much more efficient than the casual Snowflake user — who might ingest a few logs into Snowflake and tries to query them using SQL.
Finally, Snowflake is an elastic system providing granular, nearest-millisecond, billing. Warehouses are easily scaled up (S, M, L, etc.) and scaled out (Multiple S, M, L, etc.) for maximum throughput. However, when demand drops off, warehouses are shut down, meaning the user, or Observe incurs any cost. Observe customers don’t need to manage any of these lanes or queries or budgets — behind the scenes Observe continually optimizes the balance of cost and performance.
Observe Architecture: Part IV – Administrative Cost Controls
Unique to the observability market, Observe offers usage-based pricing. As users execute queries through Observe, the customer consumes Observe Compute Credits (OCCs) and then billed, rounded to the nearest millisecond.
A common objection to this approach is that rogue users can blow through annual budgets in days, or that a user may run out of credits while troubleshooting an incident. These objections become moot when usage-based pricing is accompanied by administrative controls. Most of Observe’s controls are passive and inform the user ahead of time if their action might cause a large credit burn — giving them the option to abort or proceed. Observe also provides detailed system usage Dashboards that provide insight into which users accessed what data, and how many credits they have consumed.
Other controls, such as the Credit Manager, allow Observe administrators to set a target for both acceleration and query credit burn. This is a more invasive control but has the benefit of ensuring that costs for a particular period do not exceed a specific amount. If the Credit Manager intervenes during an inopportune time — such as during an incident — any user can snooze it for an hour, or the for the rest of the day.
The Credit Manager drives down costs by altering the freshness of certain Datasets. This means that the acceleration process will run less often, consuming fewer credits. In addition, if credit burn is acute, Observe will run queries on smaller warehouses, gracefully degrading performance before rejecting them altogether.
And because admins are wary of automated controls taking action on their behalf, Observe provides insight into exactly what actions the Credit Manager is taking via the Acceleration Manager.
The Bottom Line
Dated architectures found in incumbent tooling simply cannot deal with the data volumes required for modern Observability. Band-aids try to limit data ingest volume and/or archive data for long-term retention, but this simply pushes more complexity onto users and makes it harder for them to observe their applications and infrastructure.
Observe has taken a unique approach based on a modern, cloud-native, architecture. Observe ingests all data into an Amazon S3-based Data Lake, compresses it 10x, and then stores it for 13 months by default. Observe then accelerates frequently accessed data into a graph of connected Datasets called the Data Graph. Queries are executed efficiently via Observe’s multi-tenant implementation on top of the Snowflake Data Cloud. All queries across all customers, in all companies, share the same Snowflake infrastructure driving an order-of-magnitude improvement in efficiency.
Finally, Observe is an elastic system that offers usage-based pricing. Concerns regarding runway usage or rogue users are eased by the implementation of both passive and active cost controls. Observe prompts users to confirm expensive queries and admins can set credit limits to manage usage to an annual budget number.
If you’re not an Observe customer, but want to take advantage of an observability tool that scales with your budget, then click here to get started!