Replay data from object storage for long-term incident investigations

By

Last edited: April 3, 2026

With Cribl Stream, retain more data for extended periods without breaking the bank. Our Replay solution allows you to efficiently collect data from object storage and replay it through a pipeline into your destinations

Psst, hey pal, would you like to buy a time machine?

I’m not talking about some H.G. Wells monstrosity where you somehow end up being chased by dinosaurs or become your own grandparent. I mean a time machine for your telemetry.

A time machine is what you need when you’ve been treating your SIEM as the only place that matters, and now you suddenly have to go back further than your hot retention or primary destination allows.

License costs and tool performance often prevent organizations from ingesting all their data or require them to limit data retention time in their primary SIEM or analytics platform. Security incidents are often discovered long after these retention windows expire, or require data that was never ingested in the first place, leaving teams without the full story.

A massive advantage of Cribl Stream is the ability to store full-fidelity data in low-cost storage solutions such as Amazon S3, Azure Blob, or Cribl Lake for long-term retention. This is great for compliance, but what if you need that data later for an investigation?

That’s where our Replay solution shines.

With Replay in Cribl Stream, you can efficiently collect data from a single low-cost destination (like object storage) and “replay” it through a pipeline and into your destinations. Whether your long-term repository is S3 or Blob storage, Replay gives you an affordable way to retain more data for longer periods while keeping it accessible when you need to investigate an incident, chase down an audit trail, or train new analytics and AI workflows.

Instead of forcing all your data through a single SIEM, you keep long-term history in a destination designed for cost-efficient retention and pull back only what you need when you need it.

Why Send Telemetry Data to Object Storage?

Most teams can’t afford to keep every event in their SIEM or analytics tools. License costs, scale limits, and performance constraints force them to drop data or dramatically shorten retention.

By first landing telemetry data in low-cost object storage, you can:

Store essentially everything for years without blowing up license or infrastructure costs.
Keep only the most valuable data hot in tools like Splunk or your SIEM, while still having a complete history available when investigations, audits, or AI projects demand it.
Decouple storage from tools, so you can add, switch, or re-balance downstream analytics platforms without re-collecting or re-ingesting telemetry.

Replay in Cribl Stream is what turns that object storage into a time machine for your telemetry — letting you hydrate the right tools with just the data you need, exactly when you need it.

How One Global Enterprise Uses Replay

A multinational conglomerate that has been innovating across industries since 1892 needed this exact functionality. Their observability engineers use Amazon S3 to store petabytes of data as their first destination before sending a curated subset to Splunk for security and operations teams.

They often get requests to Replay data for use in investigations. With the sheer amount of data they are storing, this could easily become a “needle in a haystack” problem.

By using path segmentation in Amazon S3, they can precisely target the data they need — usually a date range from a few days to 13 months — and Replay it through Cribl Stream and into Splunk, SIEM tools, or any destination of their choosing. The result: end users can quickly resolve breaches or potential threats, while the organization retains an affordable way to store data for years, if not indefinitely, if they want to.

From an architecture perspective, they have a leader node for each environment and use a microserver architecture with 200–300 worker nodes. “The leader node is able to distribute the workload across workers very nicely,” said a Senior Cyber Security Software Engineer. “When we send a spike of data, like in this case when we had to Replay 1.335 PB of data, the workers can handle it. It’s economical too, because we use several small instances vs. big instances, and it doesn’t require much memory.”

In this particular investigation, it took about 2 hours to collect the metadata, followed by an additional 4 hours to process the data. With Cribl Stream, they reduced that data from S3 down to just 3.69 GB to Splunk — a 99.9999972% reduction — without losing the fidelity they needed to understand what happened.

Where Cribl Lake Fits In

While many customers start with Amazon S3 or Azure Blob as their object storage of choice, they’re increasingly turning to Cribl Lake as a turnkey, observability-optimized data lake within the same architecture.

Cribl Lake gives you:

A managed, cloud-native data lake purpose-built for telemetry.
Native integration with Cribl Stream Replay, so you can hydrate downstream tools on demand using the same familiar workflows you use with S3 and Blob storage.
The flexibility to centralize your data in Lake while still routing only what you need — in the right shape — to SIEMs, data warehouses, and AI systems.

In practice, this means you can land everything in your object storage tier, keep only the subset you need hot in your tools of choice, and then use Replay to go “back in time” whenever a long-term security or operations investigation demands it — without blowing up license or infrastructure costs.

Try Replay for Yourself

See the power of Replay yourself and try your hand at time travel in our Replay Sandbox or with a free Cribl.Cloud account… then start thinking about how a single low-cost destination plus Replay could fit into your own architecture.

Chris Breshears

View all posts

Cribl, the AI Platform for Telemetry, empowers enterprises to manage and analyze telemetry for both humans and agents with no lock-in, no data loss, no compromises. Trusted by organizations worldwide, including half of the Fortune 100, Cribl gives customers the choice, control, and flexibility to build what’s next.

We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Previous articleNext article