Psst, hey pal, would you like to buy a time machine?
I am not talking about some H.G. Wells monstrosity where you somehow end up being chased by dinosaurs or become your own grandparent. But a time machine for your observability data.
License costs and tool performance often keep organizations from ingesting all their data or require them to limit data retention time. Security incidents are often discovered long after these retention times are exhausted or require data that was never ingested. Leaving teams without the full story.
A massive advantage of Cribl Stream is the ability to store full-fidelity data in low-cost storage solutions like Amazon S3, and Azure Blob for long-term retention. This is great for compliance, but what if you need that data later for an investigation? That’s where our Replay solution shines. With Replay in Cribl Stream, you can efficiently collect data from object storage and “replay it” through a pipeline and into your destinations. Giving you an affordable way to retain more data for more extended periods of time, while still having it accessible for investigations.
A multinational conglomerate that has been innovating across industries since 1892, needed this exact functionality. Their observability engineers use Amazon S3 to store petabytes of data, as their first destination before sending it to Splunk for security and operations teams. They often get requests to Replay data for use in investigations. With the sheer amount of data they are storing, this could be like finding a needle in a haystack.
By using path segmentation in Amazon S3, they target the data they need–usually a data range, from a few days to 13 months–and replay it through Cribl Stream and into Splunk, SIEM tools, or any destination of their choosing. Allowing end users to quickly resolve breaches or potential threats, while still having an affordable way to store data for years, if not indefinitely if they wanted.
From an architecture perspective, they have a leader node for each environment and use a microserver architecture using between 200-300 worker nodes.
“The leader node is able to distribute the workload across workers very nicely,” said A Senior Cyber Security Software Engineer. “When we send a spike of data, like in this case when we had to Replay 1.335 PB of data, the workers can handle it. It’s economical too, because we use several small instances vs. big instances, and it doesn’t require much memory.”
It took about 2 hours to collect the metadata and then an additional 4 hours to process the data. With Cribl Stream, they reduced that data from S3 to output just 3.69GB to Splunk. (That’s a 99.9999972% reduction.)