While I write this blog post, I reflect on the years of being a system administrator and the task of ensuring that no sensitive data made its way past me. What a daunting task right? The idea that sensitive data can make its way through our systems and other tools and reports is terrifying, and not to mention the potential financial/contractual problems this can cause.
Identifying complex patterns in logs, metrics, and traces… filtering through long regex statements to catch every possible pattern that data can show up in… datasets that constantly change… new datasets… What a mess! Help!
So how can Cribl Stream help? We help IT, and Security professionals manage their data with an easy-to-use web interface, previewing potential changes to your data and giving you built-in functions to help work through whatever transformations, reductions, and routing are needed. Giving you full control of your data, what shape or format it needs to be in, and which destinations that data needs to go to. No need to make changes at the data source, which can be nearly impossible or take a long time to do. Instead, you can change that data in-stream on the way to its destination. Pretty cool, right?
So, let’s talk about masking with some sample data I already have streaming through Stream, and walk you through a few different ways to mask that sensitive data using Stream as an observability pipeline. Ready? Let’s go!
You know the patterns of the sensitive data in your logs and can easily identify these one or more strings in the logs. If they are present, let’s do an md5 hash or replacement of those values.
MASK function, using regex statements for one or more known patterns. You can hash, redact, remove, replace with the text of choice, etc.
While onboarding a new dataset, the business teams have identified several fields that need to be addressed if present. And if present, what to do with that data. And we see this sensitive data in clear text in our live capture sample. Uh oh! Let’s fix this immediately!
SENSITIVE FIELD LIST | FIELD LABEL | ACTION IF FOUND |
---|---|---|
Social Security Number |
“social” |
Hash |
Electronic Serial Number |
“esn” |
Redact with (12) X characters |
Card Number |
“cardNumber” |
Mark as “Removed” |
In Stream, we simply add a MASK function to your pipeline to handle the masking needs. This is a handy function for replacing values with simple regex matches. A key benefit here is that you can add multiple simple regex patterns without relying on one complex, large regex pattern that may need more maintenance over time. Find the values you are looking for and replace them with the values that the business team requested.
Mask Function:
Preview the proposed changes in the UI before pushing out the configurations to affect the live data stream. Pretty easy, right?
Results:
You know the patterns of the sensitive data, but you’re not so strong at regex syntax. Maybe you need to look for the field names in the data and reference them in simple English. If they are present, let’s do an md5 hash of those values.
Let’s look at the PARSER and EVAL functions and how these can help identify key/value pairs automatically in your data and give you the option to remove/keep fields as desired. Then when we’re done, we’ll use the SERIALIZE function to re-assemble that event into key value pairs back into _raw
. Ready? Let’s go!
SENSITIVE FIELD LIST | FIELD LABEL | ACTION IF FOUND |
---|---|---|
Social Security Number |
“social” |
Hash |
Electronic Serial Number |
“esn” |
Hash |
Card Number |
“cardNumber” |
Hash |
This time let’s go about things a little differently, shall we? If regex is not your strong suit, we can always use the PARSER function to extract the key value pairs, make any changes needed for sensitive data, and then put the new key value pairs back together in whatever format we need.
Extract key value pairs using the PARSER function:
Create the EVAL function for sensitive data fields and md5 hash the values (if they exist):
Serialize key value pairs pack to _raw
while dropping unnecessary fields:
Drop non-essential fields from events:
So hopefully, this shows you just a few ways to use Stream to help manage your sensitive data needs, use built-in functions, simplify your workflows, and save you precious time! If you want to learn more about built-in functions, visit our docs site.
The fastest way to get started with Cribl Stream is to sign-up at Cribl.Cloud. You can process up to 1 TB of throughput per day at no cost. Sign-up and start using Stream within a few minutes.
Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.