To Mask, or Not to Mask? That Is the Question

Joseph Eustaquio
Written by Joseph Eustaquio

December 21, 2021

While I write this blog post, I reflect on the years of being a system administrator and the task of ensuring that no sensitive data made its way past me. What a daunting task right? The idea that sensitive data can make its way through our systems and other tools and reports is terrifying, and not to mention the potential financial/contractual problems this can cause.

This Is Difficult and Hard to Manage

Identifying complex patterns in logs, metrics, and traces… filtering through long regex statements to catch every possible pattern that data can show up in… datasets that constantly change… new datasets… What a mess! Help!

Enter Cribl Stream: The Observability Pipeline

So how can Cribl Stream help? We help IT, and Security professionals manage their data with an easy-to-use web interface, previewing potential changes to your data and giving you built-in functions to help work through whatever transformations, reductions, and routing are needed. Giving you full control of your data, what shape or format it needs to be in, and which destinations that data needs to go to. No need to make changes at the data source, which can be nearly impossible or take a long time to do. Instead, you can change that data in-stream on the way to its destination. Pretty cool, right?

So, let’s talk about masking with some sample data I already have streaming through Stream, and walk you through a few different ways to mask that sensitive data using Stream as an observability pipeline. Ready? Let’s go!

Sensitive Data Scenario 1:

You know the patterns of the sensitive data in your logs and can easily identify these one or more strings in the logs. If they are present, let’s do an md5 hash or replacement of those values.

Recommended Solution:

MASK function, using regex statements for one or more known patterns. You can hash, redact, remove, replace with the text of choice, etc.

While onboarding a new dataset, the business teams have identified several fields that need to be addressed if present. And if present, what to do with that data. And we see this sensitive data in clear text in our live capture sample. Uh oh! Let’s fix this immediately!

SENSITIVE FIELD LIST FIELD LABEL ACTION IF FOUND
Social Security Number

“social”

Hash

Electronic Serial Number

“esn”

Redact with (12) X characters

Card Number

“cardNumber”

Mark as “Removed”

 

 

 

 

 

In Stream, we simply add a MASK function to your pipeline to handle the masking needs. This is a handy function for replacing values with simple regex matches. A key benefit here is that you can add multiple simple regex patterns without relying on one complex, large regex pattern that may need more maintenance over time. Find the values you are looking for and replace them with the values that the business team requested.

Mask Function:

Preview the proposed changes in the UI before pushing out the configurations to affect the live data stream. Pretty easy, right?

Results:

Sensitive Data Scenario 2:

You know the patterns of the sensitive data, but you’re not so strong at regex syntax. Maybe you need to look for the field names in the data and reference them in simple English. If they are present, let’s do an md5 hash of those values.

Recommended Solution:

Let’s look at the PARSER and EVAL functions and how these can help identify key/value pairs automatically in your data and give you the option to remove/keep fields as desired. Then when we’re done, we’ll use the SERIALIZE function to re-assemble that event into key value pairs back into _raw. Ready? Let’s go!

SENSITIVE FIELD LIST FIELD LABEL ACTION IF FOUND
Social Security Number

“social”

Hash

Electronic Serial Number

“esn”

Hash

Card Number

“cardNumber”

Hash

 

 

 

 

 

This time let’s go about things a little differently, shall we? If regex is not your strong suit, we can always use the PARSER function to extract the key value pairs, make any changes needed for sensitive data, and then put the new key value pairs back together in whatever format we need.

Extract key value pairs using the PARSER function:

Create the EVAL function for sensitive data fields and md5 hash the values (if they exist):

Serialize key value pairs pack to _raw while dropping unnecessary fields:

Drop non-essential fields from events:

Conclusion

So hopefully, this shows you just a few ways to use Stream to help manage your sensitive data needs, use built-in functions, simplify your workflows, and save you precious time! If you want to learn more about built-in functions, visit our docs site.

The fastest way to get started with Cribl Stream is to sign-up at Cribl.Cloud. You can process up to 1 TB of throughput per day at no cost. Sign-up and start using Stream within a few minutes.

 

Questions about our technology? We’d love to chat with you.