x

The Seinfeld Data Trilogy

October 21, 2020

Seinfeld taught us a lot of valuable and hilarious lessons about life, but little did we know then, that they were actually talking about data pipelines.

The Data Yada Yada

“The Yada Yada” was one of the most memorable Seinfeld episodes I have ever watched. If you’re not familiar, the gist of the story is George’s new girlfriend likes to say “yada yada yada” to shorten her stories. Unfortunately for George, she sometimes “shortens” some of the most important parts of the story. 

Having been in pre-sales, I’m guilty of “yada yada-ing” past some of the important parts of a story to get to what I wanted to focus on. One example is “Here are all these thousands of data sources from cloud to legacy systems…yada yada yada…check out these awesome insights and dashboards from all that data!”

To quote Jerry Seinfield but “you just yada yada yada’ed over the best part!” The funny part is that almost everyone wants to get to the cool dashboards and all that awesome insight; however, in the real world, we can’t just “yada yada yada” over getting the data into our analytics platform. While there is no glamour in getting data into your analytics system of choice, it is quite satisfying when you can do it at scale on a purpose built solution for logs, metrics, and traces like Cribl LogStream. We can take any data, from any source, and route, transform, encrypt, or compress it and send it to any destination. If you want, you can try it out yourself in one of our self-paced sandbox environment

Once you get the hang of it, you can download and use our software in your environment and see how easily you can get it to work. Then, next time you’re showing off your data, you don’t have to yada yada yada over the best part.

No Data For You!

Have you ever Slacked one of your co-workers in your SOC and asked them “Hey, can you share with me the application logs we’re sending in the SIEM?” I’m pretty sure the immediate response is something like “No Data For You!” Asking your security team for any logs, metrics, or traces that might just contain PII or confidential information is not an easy task. However, trying to work on an outage or a failing application is not easy when you only have access to half of the story.

 

That’s where Cribl LogStream can help keep your application developers informed, while your SOC can rest assured that the data they are sharing has been cleaned and is clear of any harmful information. 

First, Cribl LogStream can route all the full fidelity events into an inexpensive object storage solution like Amazon S3, or into compatible on-premises solutions (e.g., MinIO). 

Next, once the data is safe at rest, the security team can replay that data into the Application team’s analytics solution. The log data can be transformed into metrics, or the data within the logs can be masked or redacted so that no PII or confidential data makes its way into another system of record.  

Finally, if you want to request access to a real time feed of the data, Cribl LogStream can route the data into the Application team’s system of record with only the fields and metrics necessary for them to do their job. So yes, you can now be at peace with the Security team and enjoy your soup with bread too.

Data Shrinkage Where It Counts

Shrinkage: in some situations it’s not a good thing, but when it comes to shrinkage, it can come in different forms. Hardware shrinkage can occur when you route your data to inexpensive block storage, while the system of record, where analytics are run, receives the valuable high fidelity events. This reduces the need for expensive storage arrays to retain analytics data for long periods of time, while increasing the efficiency of the hardware that’s processing and indexing the data.

These efficiencies are not limited to on-premises solutions, but can also be leveraged in the cloud. Instead of sending all your data directly into your analytics platform, send the data in a format it understands and can process very quickly (e.g. Common Information Formated for Splunk, etc.) The added benefit would be to route data to inexpensive storage as a buffer prior to sending it out to the cloud. If you have any failures, you don’t lose any of your data. Additionally, data can be masked, encrypted, or reduced prior to being sent into the cloud service. If you want to restore that data into your offering, simply replay it with Cribl LogStream to the cloud, and evict it once you’re done with your analysis. So go ahead and take a dip in the Data Lake, but unlike George I think you’ll enjoy the data shrinkage experience.

These are just a few of our use cases for these data lessons.  If you want to be a true master of your domain, be sure to take a tour of LogStream in our interactive sandbox and download LogStream and process up to 5 TB of data a day – totally free.

.
Blog
Feature Image

One Reason Why Your Nodes’ Memory Usage Is Running High

Read More
.
Blog
Data Chaos

Data Chaos MUST Be Curbed, but How?

Read More
.
Blog
Scaling Window Event Forwarding

Scaling Windows Event Forwarding with a Load Balancer

Read More
pattern

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.

box