Observability Hero: Wayne Watson from Saviynt

By Knox Lively, November 27, 2022

Observe offers a lot of flexibility in both the types of data that we can consume and in the ability to coalesce it together into a consistent view.

Introduction

Though observability is the new kid on the block in many regards, most companies – over 70% according to our 2022 State of Observability Report – have some sort of observability practice in production. But the hopes and dreams of observability vary widely. Some companies are simply looking for a place to store all of their logs, metrics, and tracing data. While others hope it will bring them minority report-like visibility into their systems and self-heal issues before they bubble up to the human level.

Wayne Watson and his team at Saviynt are focused on what they can do now, by getting the most from their logs, easing troubleshooting, and reducing their overall MTTR. But that’s not to say they don’t have big plans for observability in the future.

Keep reading to see what’s important to Wayne and his team when it comes to observability, and how Observe gets them closer to their o11y goals.


Tell us a bit about yourself and how you came to Saviynt.

My name is Wayne Watson, I am the VP of cloud operations at Saviynt. My team is responsible for building, delivering, and managing the infrastructure that we use to host all of our customer environments as well as providing infrastructure services internally with our teams.

Infrastructure team

I have been working in IT with startups now for more than 20 years or so. I have been largely focused on the delivery of services using automation, and an approach toward holistic management. This involves building out a solution that is both agnostic and flexible – meaning we can deliver across multiple cloud providers and even on-prem if we need to – as well as needing to build out a team to support that. 

What are your top concerns/priorities/goals as VP of Cloud Operations?

At Saviynt, the function of our product is to provide identity management and governance capabilities to our customers. This means we operate a service that has high visibility offering several critical functions that our customers have to deliver to their users. Ultimately, the things that we worry about are both the security of our platform, as well as the reliability and stability of the platform.

To me, observability plays a key piece in that. As we build out the application we want to think about what needs to be observed, how we can achieve that, etc. Our journey with observability specifically started with the question, “How can we do more with logs?” because the tools we were trying to use for log management weren’t fitting the needs we had.

We are continuing to work with Observe to grow that, as well as how to better manage our Kubernetes-based environments, and the information we get from our applications to see what’s going on in our environments.

Tell us about a recent problem that Observe has helped your team solve.

As a company, we are constantly looking to improve and optimize our services by adopting best-of-breed technologies. That means I’ve got an entire operational team that has to continuously adapt to a changing landscape of technology, so the more I can shortcut their process to troubleshoot, the better. 

We leverage Kubernetes as a core part of our application services, however, it can become a double-edged sword. Yeah it helps you automate application deployments and abstracts away some “ugliness” in your environment, but at the same time you have to be paying close attention to events, otherwise, you’ll miss something crucial.

We have an issue that is popping up right now in our environment where we’ve got many Pod restarts occurring.

As a result, my team is working on building out new alerts in Observe for Pod restarts so that we can quickly react. It’s not something unique to Observe, but the ability to quickly take simple events like that, provide context around them, and deliver those out to the teams for a quick response is incredibly important.

What is unique about how Observe approaches the “problem” of observability?

There are a couple of pieces to me that are very interesting about Observe. One is the way that you guys approach the consumption and even the costing model, which is less focused on the amount of data that you collect and much more focused on how you query and access that data.

This promotes the idea of “pushing” more data to Observe. I am not a huge proponent of “give me ALL data”, because all data can end up masking or hiding issues in your environment, but you don’t ever want to have to question whether or not you should send that data, right?

Observe ingests all of your observability data

The idea is that I get the data in there, and determine whether the data is valuable or not later. You can always determine later to not consume that data anymore – but you want that flexibility. 

On top of that, having the onus to think about the way you are querying the data, right? Because that is your cost component now. I’m a huge proponent of things that are going to drive action. If you can understand what the code is doing, what it’s supposed to be doing, and what it’s not doing then you can immediately start taking action.

Observe offers a lot of flexibility in both the types of data that we can consume and in the ability to coalesce it together into a consistent view. For example, Observe allows us to combine logs with metric data and performance behavior across our environment and then segment that across our large customer base. Doing that at scale is beneficial because we can continue to narrow down to the level we want to focus on driving action, then build our processes around it and then repeat.

On top of that, the ability to programmatically access that data and then tie it back into other processes we have is invaluable. 

Can you talk about how you and your team have adapted to our useage-based pricing model?

The idea behind useage-based pricing where I’m being charged based on the information I’m using, as opposed to the information I’m consuming I believe is the right approach.

In terms of how to get the most from usage-based pricing, I think educating your team is a good start. I’m a very results-oriented person, so I typically start asking questions like, “What problem am I trying to solve?”, “How am I going to solve it?”, and “What do I need to do it right?”

Still talking about education, I believe there are two types of people when it comes to troubleshooting. On one hand, some people understand at a conceptual level what they’re looking for and have a strong hunch where the problem is. On the other hand, you have people who don’t necessarily know what they’re looking for and use what I call a “needle in the haystack” approach. These people start really wide in terms of their scope and then continually refine it until they find the problem.

Engineer Troubleshooting an issue

I think these are the types of people who you have to educate or re-educate, in terms of troubleshooting fundamentals. It’s not that you want to prevent that type of behavior, because it can be beneficial at times, but you want to help focus their efforts more.

What are some “moonshot” observability goals that you guys hope Observe can help you solve?

One of the big ones for me would be more granular application-level traceability.

A lot of people seem to think they know exactly what’s causing the problem, but typically a lot of the time what is happening is what I call the “noisy neighbor” syndrome. For example, maybe the slowness you’re experiencing isn’t coming from that actual API showing signs of “slowness”, but rather it’s another API call that is either waiting for another function or trying to consume the same resources.

How you truly drive traceability is something that continues to be developed. I think the pattern around this, especially from an analytics perspective, is a combination of logs with metric behavior and tracing behavior. 

The more you can bring these together, the more I can start automating detection, to then drive automated resolution. Even if that means I have to call a person to help resolve an issue, my MTTR still drops dramatically. At the end of the day, that’s what I care about.


Wrapping Up

A big thanks to Wayne for taking some time out of his busy schedule to speak with me about how he and his team at Saviynt use Observe, as well as their future observability goals.

If you and your team are interested in connecting and correlating all of your observability data, reducing your MTTR, and not having to make hard decisions about what data to keep then click here to see how Observe can help today!