In Cribl Stream and Cribl Edge, you can operate on your observability event data in flight, all the way down to the field level. Instead of writing complex regex to wrangle JSON and other structured formats, use Cribl’s built-in functions and extensibility to get the results you want. You’ll see formerly complex situations become easier to address and manage over the long term.
In this blog, we’ll cover two troublesome use cases:
In my previous life as Splunk admin, several hours every week were dedicated to just reporting on and attempting to remediate sensitive data in log events. Often this meant working in configuration files, restarting, pushing production data through, then validating that it worked… rinse and repeat when it didn’t. Formatting constraints made building the regex that much more difficult. I needed to account for the data around the match, as well as maintain the overall “shape” of the event.
Let’s see how this looks in Cribl.
Case 1: Masking fields with names matching *pattern* (wild card match).
Note the {} next to _raw: this means the data has been parsed into a JSON object; it’s not just a string.
Key points:
Case 2: Masking data within a blob of text.
Note the α symbol next to _raw: this means that the field’s value is a string.
In this case we’re back to operating on the event as a whole, so we do have to be careful with formatting and position. But we still get the benefit of seeing immediately what impact we’re having on the data.
From the preview pane:
Key points:
Cleaning up noisy logs accounted for around 25% of my time. I mean, it could have been 100%, but I had other work to do as well. Like painting the Golden Gate Bridge, it was a never-ending cycle.
One particularly egregious case involved JSON data sent from Kubernetes clusters. At some point the team decided it would be a grand idea to include the full HTTP conversation in the logs… entire PDF files and all. Average event size ballooned to 5 MB. Fields often changed both location and name, making it very difficult to write a general case regex that would fix it up safely. (Not to mention that in Splunk, HEC events don’t pass through the same pipeline as normal events, so SEDCMD and friends weren’t even available.)
In Cribl, after using the Parser Function to extract all the discrete fields from the event, a simple Mask Function can trim any field with more than, say, 30 characters. REDACT will replace the rest:
In this example, I applied the Mask Function to all fields in the _raw object. If I knew which field names were likely to be in play, I could specify patterns like *file* *enclos*, or *reallydumbcontent*. Cribl will march through all fields in the event, first comparing the field name, and then, if it matches, applying the regex and replacement for that field’s value.
Key points:
Maybe your replacement requirements are leaning toward edge case territory; the built-in commands in Cribl aren’t catching your fancy; or, you just want to make something more specialized. We goat you. You can use the Code Function to really get your geek on. Here, you interact with the event in pure javascript. You’ve got far more control.
In the sample Code Function below, we replicate the truncation task in the Mask Function above, but without using regex. Simple load testing revealed this regex-free version to be about twice as fast! Click here to download the code in plain text.
It takes extra work, but it’s also much more flexible. You could include other field qualifiers or functions as needed in matcher() and replacer(). For example, instead of checking f_value.length, check against this SSN regex:
/^(?!000)(?!666)([0-6]\d{2}|7(?:[0-6]\d|7[012]))-?(?!00)(\d\d)-?(?!0000)(\d{4})$/.test(f_value)
This could be the basis of a universal social security number masker. Can you make it work? What other use case can you see? Let me know in Cribl Community Slack!
Masking data is a major use case for logging teams. So much time is spent crafting, and modifying complex regex. And many times you’ll want more than just ‘xxx’ over some chunks of data. Cribl’s Mask Function gives you the power to handle these situations easily. If you need still more options, roll your own with the Code function and some simple JavaScript.
Cribl’s guiding principle is to get all of your observability data onboarded, optimized, and transformed into whatever format you need. With its extensive toolset, Cribl enables you to do more with less effort than ever before.
Free. Your. Data.
The fastest way to get started with Cribl Stream and Cribl Edge is to try the Free Cloud Sandboxes.
Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.