In a past life, at our regularly scheduled IT team meeting, one of my engineers mentioned he had detected some cleartext Social Security Numbers (SSNs) in one of our logs. You’d have thought I brought a bowl of Gaegogi soup to a PETA conference. All I wanted to do was obscure some Social Security numbers in my logs!
The security guys started screaming about HIPPA, GDPR, PCI, and the Privacy act of 1974. 1974! We didn’t even have computers then. I know; I was there. Then our security consultant started talking about the need to be compliant with the NIST publication. Blah, blah, blah. Finally, a member of the NOC team mentioned they needed most of the data but couldn’t care less about the SSNs. I was just about to say delete it all, but before I got the words out, the legal team rep – you know he just came for the free lunch – stated we needed to retain the complete data set for compliance purposes. Urgh, we were quickly going off the rails, but thank God, at that very moment, the food arrived, and everyone became more focused on getting the right dipping sauce.
This gave me time to do a quick Google search and easily found lots of information on Personally identifiable information (PII) and what was required to protect it. Another stroke of luck, after lunch, we were getting a briefing on the value of 5G, Wi-Fi 6, or Wi-Fi 7. So, while they were learning about the latest wireless technology of tomorrow, I could dig into the issue and figure out how to address everyone’s needs and concerns about protecting PII.
I figured a search on the topic of log security would be a good starting point. Bad idea! It resulted in ~3,060,000,000 results (0.64 seconds). A quick scan down the page, and I saw the word observability several times by different vendors. There were so many results to choose from, and I started to think there should be an “unobservability.” That might provide some actual direction. So it wasn’t a clean fit, but it did finally get me to pages that appeared to try and help me, but only if I had an advanced EE degree from MIT. The pages started with headings like data masking, obfuscation, and anonymization.
So far, so good, but midway through the second paragraph, I thought I was reading a foreign language. They called it Regex (i.e. {\.(?\w+-\w+\d+)-`)
, but I knew it was really Klingon ( `tlhIngan Hol Dajatlhʼaʼ?`). I didn’t understand it and knew no one on my team was fluent with it either. But I kept reading anyway. Soon I started to understand a little and eventually came across something called Match Regex and Replace Expression as it applied to masking data. This made sense, but I still needed to implement this without calling HR about a new headcount to configure all of this.
Enter Stream
Luckily (again), I came across a Cribl blog showing how to learn SSN masking in less than 10 minutes and for FREE. It sounded too good to be true, but I grabbed a new cup of coffee and settled in to watch anyway.
Wow, the video not only actually made sense, but it appeared to be something we could do internally, with the existing team, and at no additional cost. There was even a link to a deeper dive training for those implementing it. With Stream, you can mask data using various techniques by applying the Masking Function on any event that matches any arbitrary condition.
Well, I think I have the SSN crisis of 2021 resolved. Next, I need to find a way to reduce the volume of logs I am storing. Maybe Stream can help here too? That’s a story for another day, though.
The fastest way to get started with Cribl Stream is to sign-up at Cribl.Cloud. You can process up to 1 TB of throughput per day at no cost. Sign-up and start using Stream within a few minutes.