The Evolution of Data Archiving: How to Get Immediate Access to Archived Data

June 12, 2024
Written by
Perry Correll's Image

Perry Correll, Principal Technical Content Manager at Cribl, is passionate about the powe... Read Morer of observability and how, when done right, it can deliver operational insights into network performance. He has 30+ years of networking experience from early Ethernet to today's observability and held positions from SE to product management with leading organizations. Read Less

Categories: Cribl Lake, Cribl Search

Maybe there is something to say about having your head (brains) in the clouds!

Data storage has come a long way. It’s impossible to imagine having to search racks of tape reels for specific datasets, and the same is happening for archival storage. This type of storage is very low cost, but the tradeoff is the data isn’t readily available, often requiring 24 hours or more to convert, thaw, and be in a usable format. But what if you could have your cake and eat it, too? Low-cost archival storage AND instant access to your data? Yes it is possible, easier than you may think, and let me tell you — that cake is delicious.

Why Is It So Difficult to Get Immediate Access to Archived Data?

  • Data volumes are growing at an exponential rate
  • Budgets aren’t keeping up
  • Tough decisions need to be made about what data is valuable enough to pay to ingest into your system of analysis (SOA)
  • Formatting difficulties arise when data generated from multiple sources and in different formats don’t cleanly fit into a common format.
  • Predefining schemas so they can conform to specific SOA requirements pigeonhole the data to be individually explored, and ETL’d before routing to your analytics systems.

What if there was an easier way? What if you could collect all your data, in any format, in your own data lake, and it was fully managed (think turnkey)? On top of that, what if you could search through all this data with surgical precision and then route only the specific datasets required to the analytics system or any destination of your choice? As you route the data, you automagically transform it into the format required…almost like a schema-on-need approach.

Christmas has come early because now you can! Cribl just released its new Cribl Lake product, which rounds out Cribl’s portfolio and seamlessly integrates with the rest of Cribl products to deliver the Data Engine for IT and Security.

Let’s examine the individual components.

Cribl’s suite of products is composed of:

  • Cribl Stream: a vendor-agnostic observability pipeline that gives you the flexibility to collect, reduce, enrich, normalize, and route data from any source to any destination within your existing data infrastructure.
  • Cribl Search: a federated, search-in-place processing engine that accesses data from any source or storage medium in any format using a single, intuitive query interface. Users can explore data anywhere and work with other products to forward it in the right format, a functionality called schema-on-need.
  • Cribl Edge: a unified collection engine that enables centralized data collection, intelligent filtering, and cost-effective endpoint processing.
  • Cribl Lake: a simplified storage solution that optimizes where data is stored based on value and future accessibility requirements.

While each product plays a crucial role in the Data Engine, Lake and Search are the primary two that will help you better manage your data lifecycle, making for fast and easy access to archived data.

Cribl Lake to Store Data in Open Formats

Cribl Lake is a turnkey data lake solution that takes just minutes to get up and running — no data expertise is needed. Cribl Lake streamlines workflows and policy enforcement eliminates vendor lock-in with open formats and works with Search to unify large-scale queries and rapid analysis — ensuring organizations have a cost-effective place to store their data, while also making that data readily usable and valuable to the teams and tools that need it. Cribl Lake allows users to:

  • Store any type of IT and security data — raw, structured, unstructured, semistructured, or in various formats.
  • Optimize future value through schema-on-need, store it in any format, and reformat it when needed.
  • Easily access data through open formats, ensuring future replay operations.
  • Maintain security and compliance with unified security features, retention policies, authentication, and access controls.
  • Reduce storage costs with comprehensive tiered storage based on data value.

Cribl Search to Find and Access Data Regardless of Where it is and What Format it’s in

Cribl Search has reshaped the data search paradigm, empowering users to query data, in any format, directly at its source. Now you can effortlessly sift through volumes of data in Cribl Lake, or object stores like Amazon S3, Amazon Security Lake, Azure Blob, and Google Cloud Storage. Search is an innovative new approach to finding and accessing data regardless of where it has landed and in any format.

Cribl Search allows users to:

  • Embrace tiered data strategies and use a federated solution to separate the query engine from a storage medium.
  • Use a unified query interface in a familiar language that reaches into existing object stores filled with messy, unstructured, or structured datasets. Retrieves data without having to move or index it first.
  • Discover and forward only the critical data to your systems of analysis, avoiding the cost of expensive storage and allowing administrators to identify and forward only a subset of the raw data for analysis.
  • Save significantly on costs by targeting specific datasets to store inside a system of analysis. Increasing users’ scope of analysis without needing to ship, ingest, and store the data first.

Cribl Search & Lake Deliver Immediate Access to All Data

Your enterprise is generating enormous volumes of IT and security data—it is either being collected for analysis, archived for compliance, or deemed invaluable and thus ignored. As our CEO constantly stresses, significant datasets have little value until they become the most important data, typically when an incident of compromise (IOC) occurs.

But by then, where has that data gone? It’s essentially well gone. You have no visibility or access to it. So are you screwed then if things go wrong and you suddenly need that data? Not necessarily. Is there a solution where you can access any generated data? What is the best part? It’s been stored away in a cost-effective solution.

Historically, data stores were complex, expensive, and time-consuming to manage. The data needs to be stored in specific formats. Data collectors and replay capabilities need to be properly configured. Managing the storage lifecycle policies can be challenging without the right mix of data and cloud skills. Misconfiguring object stores and storing too much data for too long can be costly, and putting ‘aged’ or excessive data into cold storage results in loss of visibility and time delays to retrieve.

The influx of data into these lakes often arrives in messy, disparate formats. It can quickly turn a data lake into a chaotic data swamp—a mere dumping ground for data, making it difficult to extract value from it. Most analytics solutions today need data to be moved into them to query data lakes.

You can flow your data into Cribl Lake, which acts as a staging ground and storage area. Query data in place, define policies and route some of it to system A, other datasets to system B, and so on. As the data is routed from Lake to the different systems of analysis, the data is shaped and transformed, in-flight (schema-on-need), into the format required by the destination system. Oh, and a full-fidelity copy of the data can always be retained as Cribl Lake is low-cost storage, but never ‘frozen’, so you always have immediate visibility and access when those IOCs or compliance requests pop up.

It’s about taking the complexity out of managing data — no special cloud or data skills required. Just immediate access when needed, without having to depend on other teams. Schema-on-need lets you deliver the format you need when you need it. Unified retention, security, and access control policies across object stores and clouds mean you have just one easy platform to manage. Cribl Lake seamlessly scales as your data needs grow and allows you to query data natively in place, regardless of where it is.

It’s about a single view for federated data—no need to rack up costs storing all your data in one location. Cribl Lake and Search offers a unified approach to managing data so you can seamlessly access and query regardless of where it resides.

It’s about being able to send data to the teams and tools that need it, when they need it, and in the specific format required for their systems.

It’s about peace of mind when storing, securing, and accessing data. Cribl offers a strong enterprise security posture with flexible role-based access controls that enable efficiency and prevent unauthorized access. It also has powerful capabilities for tracking who has accessed what data.

Ready to see it in action? Watch below.

Learn more to get started with Cribl Search & Cribl Lake:


Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl’s suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs.

We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Default Image

How to Cut Through the Chaos of Custom App Log Management

Read More
Feature Image

Cribl’s Blueprint for Secure Software Development

Read More
Feature Image

Calling All MSSP’s and MDR’s! Cribl.Cloud is Here for You!

Read More

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.


So you're rockin' Internet Explorer!

Classic choice. Sadly, our website is designed for all modern supported browsers like Edge, Chrome, Firefox, and Safari

Got one of those handy?