Security Observability and the Mystery of Detection as Code

By Jack Coates,February 14, 2024

Security Observability borrows concepts from operational observability, enabling teams to understand risks and incidents in a more holistic way than the traditional “rapidly growing pile of notable events”. Security professionals use close observation of system behavior to detect, understand, and stop new and unknown attacks. So what’s that got to do with Detection as Code? Stay tuned and we’ll unpack what detection as code means and how it intersects with Security Observability.

Detection, The Old Way

Once upon a time, security operations analysts would put their security detection and block rules directly into the gear. The software ancestors of SIEM and EDR were management servers and firewall appliances with WIN32 consoles, enriched with a little Java or Shockwave. This was frustratingly slow and error-prone, and we dreamed of code and APIs. And so, as systems and teams grew more mature, the rules would be checked into source code control systems like Perforce or Subversion, and pushed into production via fanout SSH or Perl scripts wrapped around SQL interfaces. Still, every organization was on their own; this sort of automation was non-standard.

In a sense, it’s like the difference between refining fuel or building engines. Detecting problems is an engine (built once, rarely updated), but describing how to detect the problems is fuel that you put into the engine (continual investment, refining data to information). Anton Chuvakin brought that fuel content under one phrase three years ago in Detection as Code. That blog post asks organizations to define the entire security operations rule lifecycle as a software development problem.

Treating security content as software deliverables is actually a very old rodeo, but it’s always been hidden inside of vendors, working to meet each of Anton’s bullet points so they don’t ship painful support ticket generators. We were doing test-harnessed detection-as-code vulnerability definition in LANDesk from the early 2000’s and really formalized our efforts in 2005 when we opened our Beijing development center. LANDesk and BigFix’s licenses were even structured this way: You bought the engine, then you subscribed to the fuel. As Anton calls out, it’s not just about your own quality; if you do detection as code well you can enable partners to sell content streams. Partners ranging from small boutiques and startups to the Global Systems Integrators have always been excited about selling WORA (Write Once, Run Anywhere) rule packs for vendor’s engines. 

What’s Different Now

So, what’s different now? The obvious answer is Infrastructure as Code, exemplified by Hashicorp’s Terraform or Amazon’s CloudFormation. Those systems allow DevOps teams to describe their configuration and infrastructure requirements in a language that is portable across employers. Not only is that nice for individual DevOps engineers, it also helps IT teams and their vendors to minimize effort on “under the hood” needs and put more focus on their respective “day jobs”. So, wouldn’t it be neat if you could do the same thing with all the rules that go into firing alerts for a Security Operations Center to review? On that note, let’s look at each section of the Detection as Code idea:


Versioning your package of code so that you can identify it, update it, guess at the size of changes, maybe even downgrade. This is not a new thing for rules, but it gets more interesting when you think about Indicators of Compromise or Vulnerability Detections as code. Can your SIEM or VMS look at the time of detection in context with other events? If the device was attacked yesterday but shows that vuln was mitigated yesterday, was the mitigation already in place at the time of the attack or did it succeed? You have to be able to understand the play of resource states through time, a thing that traditionally only humans are really good at. That is precisely where Observe’s temporal resource definitions come to play. Observe can answer this question, at scale, by considering resource states in time as the background to event observations.

Quality Assurance

Testing that rules still match the data they target is hardly new. I’ve worked on teams where practically every regular expression had an event generator and unit test associated. Testing for newly introduced false positive or false negative though, what exactly is that supposed to mean? Is there truly any such thing outside of real life production systems? It is theoretically possible to direct a sample of real world data into an offline processor, it simply costs money. It is theoretically possible to feed the results of that processor through cognitive computing infrastructure to perform anomaly detection, it just takes more money. And it’s certainly possible to have knowledgeable humans review that output and determine if it’s valid, again it only takes some more money. QA is a thing that costs money, in a way that scales linearly without shortcuts. This isn’t directly relevant to Detection as Code.


A modular approach to detection content is how you spot the vendors who’ve run into a need to optimize. By establishing predictable interfaces and exit codes, modularity lets you structure your rules into execution trees. 

  • Only perform this test if these other ones returned false. 
  • Perform this test after this dataset acceleration completes and not before. 
  • If this test returns true, stop further testing, take off, and nuke the site from orbit. 

Modularity is the virtue that people are talking about when they say shift-left, when they say test-driven architecture, when they say Detection As Code. The system should use each atomic rule as part of a greater whole so that everything is more efficient and a clearer vision of the true state is produced at the end of the massive tree of rules. This could be an area where security teams can learn from compliance: findings have impact on the state of compliance, they’re organized into classes of problem (or controls being violated), and a given control has impact on N number of standards. Similarly, a security rule finding might impact risk scoring vectors, might inform a security framework like ATT&CK or Kill-Chain or STRIDE, might open or close avenues for more rule testing.

Cross vendor content

Now this is where things get a little wild. In the world of Microsoft’s, it’s not hard to share a repository between multiple organizations. What’s still very hard is sharing responsibility between vendor and customer, much less two vendors and a customer. As a customer security manager, a person might really like to pin versions so they can prevent any change during a sensitive time. As a vendor to that customer, allowing the customer to version-control and pin definitions is very unsavory, because they will not be using the latest detection knowledge. The vendor knows they’ll get flaming hot support calls when the customer can’t detect the bad guys (or worse, experience a performance-impacting bug). That’s assuming we can get past some basic stuff like “what format is the log in” or “are your container base images more Debian-flavored or RedHat-flavored”. Does this detection still work if I change my Windows Event collector from Fluent-bit to Telegraf or OpenTelemetry Collector? What if Windows Events were already aggregated in another system and then forwarded to us? Cross-vendor content is an attractive dream, but still just a dream. There is one place where that is potentially not true, and that is the fuzzy border between “engines” and default “fuel”. Open source, cross-vendor technology stacks like OpenTelemetry do work, and there’s a level of content that comes with them. Say that the OTel collector ships that Elastic Common Schema support, and now it’s opinionated about the shape and labeling of log data… that’s a slow-changing, high-leverage place where vendors can sensibly cooperate without causing customer pain.

Cross tool content

There are a bunch of open source projects and even a couple of SIEM vendors suggesting the value add is that you can write your detections in cross-tool languages. This is the same old stuff, only a little worse: detections in a non-native WORA language like Python (or Java before it) means that you have to bring the engine along and match it to the environment it runs in, whether that’s your CI/CD pipeline or your production K8S or your distributed devices. These vendors pitch reusability of detection-as-code, and yet none of them reuse anything, and almost none of them support converting definitions into their language. RootA, Sigma, Panther, or YARA-L, why aren’t they leaning into the massive existing corpus of material that’s written in Java instead of starting from scratch in a new language? Sure, maybe that old content doesn’t yet support new tech… but the old tech didn’t go away, and still needs to be supported and audited and scanned, so that means running the “cross-tool” stuff alongside existing infrastructure, not with it. Even if the idea is to only support new things and wait for the old to go away, being the fifteenth standard seems more likely. A cynical person might suggest that “cross-tool” is a very tough design goal to hit in real life and we’ll be looking at field engineers with laptops full of conversion scripts for a long time.

Metrics tracking

Metrics from detection as code are an attractive idea for both vendor and customer… but looking at the ones from Splunk’s blog:

  • Detection coverage
  • False positive/negative rates
  • The effectiveness of detection with specific rules, signatures and definitions

Only one of those is even automation-friendly, the rest require human intervention (and therefore are constructed from incomplete data). Metrics test code can determine if there is a number of matches per rule and if a rule’s unit test is passing, but it takes a human to state if a detection is inaccurate, if the incident has been detected, and if it’s been adequately resolved. Metrics that can’t be built as shared content which is generally valid across many customers aren’t a product. Instead, they are a custom content endeavor, tailored to a single customer’s reality, which takes us right back to the old days. In other words, the metrics value to a customer of detection as code is not really a thing. There is certainly some value that can be extracted by a vendor, though – as an MSSP, it’s possible to see if a given detection is firing or not across all your customers. Thing is, that’s always been true for an MSSP. 

CI/CD pipeline

Continuous Integration and Deployment (CI/CD) automation can help improve the speed of QA, but pure Detection as Code does not change many fundamentals. The thing that would make a difference is to allow CI/CD to automate security and compliance systems which are currently operated on more of an “infrequent maintenance windows with strict change controls” basis. That’s not a technology problem, it’s a people problem.

What About AI?

Cognitive computing approaches have a long history in security and remain relevant in Security Observability. Anomaly detection, cohort comparison, linear projections, and more technologies are building blocks for risk modeling. The Bayesian feedback recursion loop is extremely useful for empowering a SOC to reprioritize rules. The new kids on the block are LLM based chat bots, and they show a lot of interesting promise for DaC. If a bot can be trained on many security rule language features, it could be a lot easier to translate between languages, or even from natural language requests to an intermediate language. This is an area where we are all dreaming big, because there’s a lot of interest in doing tasks like SPL conversion.

It’s Dangerous to Go Alone, Take This!

There’s a lot of smoke around redefining the same old stuff, and the fire is another attempt to make fuel production a more open, cross-vendor activity, but that’s really hard to do sustainably because the economics don’t work out. It could be the right path for an ISAC (Information Sharing and Analysis Center) and it’s great for use between departments within an organization, but it seems pretty unlikely to work across vendor products. That means the economics of classic Enterprise SIEM aren’t likely to be changed by Detection as Code either, because those products are totally driven by the vendors that sell them (or sell managed services wrapped around them). Security Observability could be a different story though: if we presume that security is handled as a shared task by the entire IT team, a shared language for describing actions becomes more attractive. Stating the search in Sigma or YARA-L instead of a vendor’s DSL doesn’t do anything to help with its performance or correctness, but it does remove a linguistic barrier for teams that don’t speak the same language. We’d love to hear from organizations that are exploring or using Detection as Code principles, what do you think? What tools and processes are genuinely helpful? Come take a look at the Observability Cloud and let’s talk.

Security Observability and the Detection as Code