October 23, 2022
We’ve reached the point where our ability to collect data has actually exceeded our ability to process it. Nowadays, it’s commonplace for organizations to have terabytes or even petabytes worth of data sitting in storage, waiting patiently for well-intentioned systems admins to eventually analyze it. Cribl’s suite of solutions gives you a way to get value from that zombie data and sets you free from the burden of storing all of it (just in case you might need some of it at some time).
There are a few different definitions for zombie data floating around the internet. Some people use the term to describe multiple copies of redundant data that are distributed to more than one place. Others use it to refer to miscellaneous data that gets stored — like all the Windows files on PCs that sit idle for eternity. But the zombie data we’re talking about is all those logs, metrics, traces, and other observability data that organizations are collecting.
Since cloud storage has become the norm, the ability to store mass amounts of data has increased dramatically, as costs have gone down. As a result, the quantity of data being stored has gone up with it. It’s kind of like moving from a house with a small garage to a house with a bigger one — you somehow always end up with enough junk to fill all the new space.
All of this extra zombie data is bad for a couple of reasons. The first is that data ages more like milk than a fine wine. If you’re trying to monitor what’s going on in your network, and you save data that you don’t have time to immediately process at the moment, looking at it a week later has little to no value.
The other problem with zombie data is the licensing costs associated with storing it. Most organizations have so many devices generating so much data and sending it to some type of analysis system that it can be hard to know how much is actually being generated, collected, processed, stored, or dumped.
When the quantity of data hits a certain level, one of two things usually happens. The data either gets put into storage to be run back through at a later time — which usually doesn’t ever get done — or it gets dumped because the chance of it bringing value weeks or months later is slim.
One method is to look at it the same way miners look for gold. On average they have to sift through 87 tons of dirt to find one ounce of gold — so they start with exploration holes, searching the samples from different areas to see if it would be worth digging deeper.
You can do the same with your stored data by using Cribl Search to query it wherever it’s stored without having to move or process it first. Then take what you have in storage and rinse it through Cribl Stream to find the areas you might be interested in. You can then run it through a system of analysis and keep it in a high-priority data lake. If there’s some data with a little bit less value, run it back through and sit it in cold storage to save money. You can also identify the data that has no value and get rid of it once and for all.
Better yet: avoid collecting too much data in the first place by using Search and Stream to reshape your observability solution. Instead of ingesting all of the data being generated by your systems and application, you can search for the threat, term, value pair, or whatever type of data you’re looking for. Use Search to query the end stations or data stores where you’ve already collected data and discover the data with potential value.to analyze in greater depth.
You can then use Stream as a single manifold where you can sift through the data from your five, ten, twenty, or more bespoke agents — send it to the appropriate destination or copy and send it to another system of analysis. You can even take a full fidelity copy of the data, store it cheaply in cold storage just in case you ever need it again, and if you do, use Cribl Replay to rerun it through your pipeline again if needed.
Ghosts are roaming the streets… goblins are lurking in your backyard… and you can hear the cackle of witches as Halloween approaches. But did you know, lurking in the shadows of your data lake is undead, Zombie data – data whose value is unknown – so it is being stored, waiting to be called back to life (all the while draining finances, impacting network reliability and maybe even compromising network security).
Join us for an on-demand webinar where we’ll explain:
By the end of the webinar, you’ll have a better idea of how you can easily and cost-effectively shape, reduce, and analyze your existing data lakes and maybe even save a bunch of storage charges at the same time.