The End of RegEx (Sort of…)

By Knox Lively, July 24, 2023

With 011y Extract, gaining crucial insights from your logs is as easy as clicking a button.

Pain in the /[A]{1}[s]{2}/

There’s a running joke in the development world that goes something like, “If you have a problem and you decide to use a regular expression to solve it, well, now you have two problems.” Indeed, regexes can be time-consuming to create, time-consuming to troubleshoot, and a headache to maintain. On top of that, most of the popular log analytics platforms rely heavily on regexes to make sense of your data; meaning unless you and your team spend countless hours developing them upfront you won’t get a drop of insight from your data. 

what if i told you regex meme

Well, imagine you had access to a tool that let you harness the power of regexes, but without having to write them by hand. Spoiler alert, there is and it involves the latest in LLM technology.

But first, let’s look at a bit of background on how regexes are used in observability and log analytics platforms.

Handwriting Regular Expressions is so 2022… 

When it comes to observability tools and especially more traditional log analytics tools, regular expressions are widely used to help users extract meaning from their observability data. One of the most common uses of regexes in observability is for parsing logs. Log files often contain a wealth of data, but usually in a raw, unstructured format. In order to extract valuable information, users often have to spend a considerable amount of time writing regular expressions to do so, without ever knowing they’ll see any real value from their efforts. - - [04/Jan/2015:18:12:06 +0000] 807840 "GET /inventoryService/inventory/purchaseItem? userId=5233471&itemId=494300 HTTP/1.1" 500 15 "-" "Apache-HttpClient/4.1.6 (java 1.8)"

Using the example of a typical HTTP access log, it’s relatively easy to discover what fields might be valuable to a user. In this logline, we can see an IP address, timestamp, method, URL, and so on. An experienced regex user may write an expression in just a few minutes, but what if you’re not experienced? Here’s what the resulting RegEx would look like — a pain to write no matter your experience level.

/^'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) - - \[(?P<timestamp>.+?)\] \d+ "(?P<method>.+?) (?P<url>.+?) (?P<protocol>.+?)" (?P<status_code>\d{3}) (?P<request_size>\d+) "-" "(?P<user_agent>.+)"'

Either way, consider that this is just one of possibly dozens if not hundreds of log patterns you’ll encounter when digging through your log data. The time spent developing regexes for each quickly adds up, ultimately distracting you from more important tasks. We felt the less time our users spent manually trying to turn their data into insights, the better so we created a new feature to do just that!

Introducing 011y Extract

Our latest feature, 011y Extract helps you get the most from your logs and renders those countless hours of regex-related frustration obsolete. Inspired by the power of GPT, you can now effortlessly generate regexes to help you better parse and understand your logs with the click of a button. In addition, it will also intelligently generate field names (or, “Named capturing groups” for you reg-heads) based on the context of the data — making the entire regex creation process more streamlined, and less time-consuming. 

Let’s look at an example of just how quickly an SRE or DevOps Engineer can extract various fields from their logs, and then apply filters to easily find that proverbial needle in the haystack.

Let’s assume I’ve just created a new online store that sells socks, and based on early user feedback the website isn’t as snappy as they’d like. My shop is hosted on Kubernetes, so the first and most obvious place to look for clues would be in my Container Logs, so I begin by opening them in a Worksheet — an easy way to explore your data in Observe.

Next, I right-click on the “log” field and select the Extract from string option — as this field is a large string that is not easily read by humans. From there, I choose the Generate from GPT button found in the sidebar to the right. Within a matter of seconds, a regex appears in the Expression field in the sidebar. I quickly inspect it and check for any errors and check the suggested field names and then click Apply. After a few moments, my newly split log line and nearly a dozen new fields appear!

Now I can quickly troubleshoot any latency issues with my new sock shop by navigating to the response_time field and sorting from the longest to the shortest times to see what service and endpoint may be experiencing issues.

But Wait…There’s More!

011y Extract makes gleaning crucial insights from your logs on-the-fly as easy as clicking a button, but we realize some of our users may want a more permanent solution when it comes to formatting your log data. Thankfully, that’s only one more click in Observe. 

Publish new dataset feature screenshot in observe

Once your Container Logs have been formatted to your liking, simply click the Publish New Dataset button found in the sidebar. This allows you to create a new Dataset based on your Container Logs, which means you won’t have to reinvent the wheel each time you dig through your log data. Simply open the Dataset you created and filter on any field  — all while reducing your dependence on regex!

A Regex-Free Future (Sort of…)

With 011y Extract, gaining crucial insights from your logs is as easy as clicking a button. But beyond allowing your team to spend less time developing, maintaining, and troubleshooting regexes, this feature also opens the door for more junior team members to help out. Not everyone has the time or patience to learn regexes — especially during an outage. 

And thanks to The Observability Cloud’s unique architecture, you don’t need to worry about extracting fields and shaping your data before ingesting data. Unlike other vendors, Observe supports ‘schema on-demand’ which means data can be shaped as needed based on the questions users want to ask.

Get Observe by clicking here and start using our GPT integrations today!