x

Masking and Truncating Fields in Cribl Stream

September 15, 2022

In Cribl Stream and Cribl Edge, you can operate on your observability event data in flight, all the way down to the field level. Instead of writing complex regex to wrangle JSON and other structured formats, use Cribl’s built-in functions and extensibility to get the results you want. You’ll see formerly complex situations become easier to address and manage over the long term.

In this blog, we’ll cover two troublesome use cases:

  • Masking sensitive data at the field level.
  • Truncating excessively long fields globally.

Stop Being So Sensitive: Masking in Cribl Stream

In my previous life as Splunk admin, several hours every week were dedicated to just reporting on and attempting to remediate sensitive data in log events. Often this meant working in configuration files, restarting, pushing production data through, then validating that it worked… rinse and repeat when it didn’t. Formatting constraints made building the regex that much more difficult. I needed to account for the data around the match, as well as maintain the overall “shape” of the event.

Let’s see how this looks in Cribl.

Case 1: Masking fields with names matching *pattern* (wild card match).

Note the {} next to _raw: this means the data has been parsed into a JSON object; it’s not just a string.

Data Masking in Stream

Key points:

  • Because Cribl Stream understands structured data, we can operate at the field level.
  • We use Apply to Fields to specify the fields we want to target with wildcard matching (in this case, *socsec*).
  • We can validate the results directly in the product with actual, live-captured, sample data before we deploy.
  • The replacement is an expression, giving you super-powers in your replacement options – no longer just use ‘xxx’ to redact the data.
  • Built-in interactive regex editor (the pencil icon to the right of the expression) – think regex101.com-lite.
  • You don’t need to be the resident SME to grok what’s going on in this configuration.

Case 2: Masking data within a blob of text.

Note the α symbol next to _raw: this means that the field’s value is a string.

In this case we’re back to operating on the event as a whole, so we do have to be careful with formatting and position. But we still get the benefit of seeing immediately what impact we’re having on the data.

Data Masking in Stream

From the preview pane:

Data Masking in Stream

Key points:

  • If regex is more appropriate, you still have that option.
  • Sample pane to verify your work.
  • Flexible replacement expression not limited to simple static strings.
  • Built-in interactive regex editor (the pencil icon to the right of the expression)

The Long and Winding String: Truncating Long Fields

Cleaning up noisy logs accounted for around 25% of my time. I mean, it could have been 100%, but I had other work to do as well. Like painting the Golden Gate Bridge, it was a never-ending cycle.

One particularly egregious case involved JSON data sent from Kubernetes clusters. At some point the team decided it would be a grand idea to include the full HTTP conversation in the logs… entire PDF files and all. Average event size ballooned to 5 MB. Fields often changed both location and name, making it very difficult to write a general case regex that would fix it up safely. (Not to mention that in Splunk, HEC events don’t pass through the same pipeline as normal events, so SEDCMD and friends weren’t even available.)

In Cribl, after using the Parser Function to extract all the discrete fields from the event, a simple Mask Function can trim any field with more than, say, 30 characters. REDACT will replace the rest:

Data Masking in Stream

In this example, I applied the Mask Function to all fields in the _raw object. If I knew which field names were likely to be in play, I could specify patterns like *file* *enclos*, or *reallydumbcontent*. Cribl will march through all fields in the event, first comparing the field name, and then, if it matches, applying the regex and replacement for that field’s value.

Key points:

  • Sample pane to verify your work.
  • Reading/managing configs is easy, with field-level control.
  • Flexible replacement expressions, not limited to simple static strings.
  • Built-in interactive regex editor (the pencil icon to the right of the expression)

Runnin Down the Code, Tryin to Lighten My Load

Maybe your replacement requirements are leaning toward edge case territory; the built-in commands in Cribl aren’t catching your fancy; or, you just want to make something more specialized. We goat you. You can use the Code Function to really get your geek on. Here, you interact with the event in pure javascript. You’ve got far more control.

In the sample Code Function below, we replicate the truncation task in the Mask Function above, but without using regex. Simple load testing revealed this regex-free version to be about twice as fast! Click here to download the code in plain text.

Data Masking

It takes extra work, but it’s also much more flexible. You could include other field qualifiers or functions as needed in matcher() and replacer(). For example, instead of checking f_value.length, check against this SSN regex:

/^(?!000)(?!666)([0-6]\d{2}|7(?:[0-6]\d|7[012]))-?(?!00)(\d\d)-?(?!0000)(\d{4})$/.test(f_value)

This could be the basis of a universal social security number masker. Can you make it work? What other use case can you see? Let me know in Cribl Community Slack!

Summary

Masking data is a major use case for logging teams. So much time is spent crafting, and modifying complex regex. And many times you’ll want more than just ‘xxx’ over some chunks of data. Cribl’s Mask Function gives you the power to handle these situations easily. If you need still more options, roll your own with the Code function and some simple JavaScript.

Cribl’s guiding principle is to get all of your observability data onboarded, optimized, and transformed into whatever format you need. With its extensive toolset, Cribl enables you to do more with less effort than ever before.

Free. Your. Data.

The fastest way to get started with Cribl Stream and Cribl Edge is to try the Free Cloud Sandboxes.

.
Blog
Feature Image

Navigating the Mainframe Logging Maze: Insights for the Modern IT Professional

Read More
.
Blog
Feature Image

The Stream Life Episode 100: Storm Drains and Data Lakes

Read More
.
Blog
Feature Image

Why Netbuilder’s Service Model Is a Win-Win for the Company and Its Clients

Read More
pattern

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.

box