‘Play it Again Sam’
In today’s contemporary landscape, organizations produce more data than ever, which needs to be collected, stored, analyzed, and retained, but not necessarily in that order. Historically, most vendors’ analysis tools were also the retention point for that data. Still, while this may first appear to be the best option for performance, we have quickly seen it creates significant problems. First, those systems were never designed for the scale of today’s growing volume of data, currently at a 28% CAGR. Second, analysis systems pricing is based on the volume of ingested data; the costs are already prohibitive and will continue to climb.
The simple answer is to separate your retention system from your analysis system. Put your data in a separate, cost-effective repository (like Amazon S3) and optimize the transfer of only specific datasets from storage into your analysis system instead of dumping everything into analysis. This is where Cribl comes in.
Cribl built its reputation on providing more innovative ways to manage observability and security data. We engineered Cribl Stream, a vendor-agnostic platform that gives customers the flexibility to route, shape, restructure, and enrich data from any source to any destination and in the format required, with an additional replay capability that enables customers to route “replay” only the essential data they need from low-cost storage to existing analytics tools.
Then, last year, we added Cribl Search, which lets you perform federated “search-in-place” queries on any data in any format at any location, eliminating the complexity and costs associated with first shipping, ingesting, and storing the data before being ankled to search it.
Both Cribl Stream and Search enable administrators to retain their data in their chosen data stores and then retrieve only the specific datasets required for the task (investigative query, etc.). We pride ourselves on giving customers the flexibility to leverage whatever Cribl tools work best with their stack, so you may wonder, which do I use and when? Let’s dive into the answer!
Data analysis requires collecting and routing the data through some processes to glean specific information. There are multiple ways to collect data for analysis; the traditional method has been to ‘collect it all,’ like the fishing trawl, collecting the targets and a lot of unneeded (image 1) and then store and process the data in your analysis system (I.e., SIEM) to sort it out. This is highly effective and has been the cornerstone of data analysis for a long time. However, data volume growth requires a better option, and Cribl offers two new ways to collect data for analysis: Cribl Stream Replay (image 2), targeting specific subsets of data, and Cribl Search (image 3), allowing the targeting of data with surgical precision. Here’s where they differentiate:
Cribl Stream is a universal receiver designed to collect from almost any machine data source, streaming, or scheduled batch collection. As data transits with Cribl Stream, it is ‘shaped’ (reduction, enrichment, format, summarize, aggregate, etc.) before the data is forwarded to its final destination(s). This helps to control costs by eliminating digital noise and, in turn, helps retain more valuable data for a longer time without blowing out budgets. At The same time, Cribl Stream enables customers to route a full-fidelity copy of raw data to low-cost storage for long-term retention for compliance/audit/investigative purposes and “replay” it to analytics tools when required, such as Amazon S3, Google Cloud Platform, Azure Blob, and other compatible systems. With Replay, you can efficiently retrieve data from the object storage and then “replay it” through a pipeline and into your destinations, giving you an affordable way to retain more data for extended periods while still having it accessible for investigations.
A typical use case would be an organization using Amazon S3 to store data, even in the petabytes, before sending it for analysis for their security and operations teams. When teams get requests to review data for use in investigations, they no longer are overwhelmed by the volume of data, allowing the retrieval of specific datasets.
Cribl Stream’s Replay option for AWS S3 offers organizations fundamentally new ways to manage data by providing an easy way to ingest selectively and re‑ingest data into systems of analysis. Let’s walk through how to use this feature at a high level.
Replay allows you to retrieve only a subset of the dataset based on the partitioning scheme you defined, improving the quality and speed of your analytics environment.
Think about it: when (not if) there is a data breach, you’ll need access to the raw data to analyze it in a new way that you haven’t been (otherwise, you would have caught the breach sooner). Also, if you need to prove compliance with certain security standards back to a specific date, you will want your raw data to help.
Cribl Search is reshaping the data search paradigm, empowering users to query data directly at its source. Effortlessly sift through data in major object stores like AWS S3, Amazon Security Lake, Azure Blob, and Google Cloud Storage, and enrich your insights by querying dozens of live API endpoints from various SaaS providers. The power of Cribl Search lies in its strategic approach: discover and forward only the critical data to your systems of analysis, thus avoiding the cost of expensive storage.
Like Cribl Stream, Cribl Search allows administrators to identify and forward only a subset of the raw data for analysis, but with surgical precision to target specific data. Once a query is defined, it is a simple process to add the ‘send’ operator to leverage the native integration with Cribl Stream to send the results to the appropriate destinations easily.
Cribl Search enables administrators with a single search tool to query all their observability data without first collecting it: Search for any terms, patterns, value/pairs, Search for any data type, Search anywhere you can reach, and forward the results to the analysis system.
dataset="cribl_internal_logs" status=200 response_time>2 | limit 1000
Query Returns 1000 events where status = 200 and response time >2
dataset="cribl_internal_logs" status=200 response_time>2 | limit 1000| send
While Cribl Stream allows searching a subset of the data, Search allows you even to be more surgically, able to locate and retrieve very specific datasets, results from a single IP, single or range of users, and of key importance is the ability to query data from multiple data stores simultaneously. Additionally, if Cribl Stream was used to write the data into the S3 bucket (or other), its robust partitioning capabilities make retrieving specific datasets much easier; without the pre-partitioning, Cribl Search will provide a more granular retrieval capability.
Finally, Cribl Search enables administrators to query and collect logs, metrics, application data, etc., directly from a host (via Cribl Edge) and forward to Cribl Stream for additional shaping and routing. Once again, Cribl’s products improve the quality and speed of your analytics.
As data volumes grow, the percentage of data being analyzed will continue to drop due to licensing costs. There are only two options to address this: get a bigger budget or be smarter about processing data before ingesting it into the analysis system. The Cribl Stream Replay and Search Send features are game changers; you can now effortlessly collect specific datasets and forward them to different systems for advanced analysis, audit, and compliance—a tremendous value for anyone managing digital exhaust data at scale. Suppose you always had a full-fidelity copy of your logs, metrics, and traces in S3. Ask yourself this: Would you still bring every event into your analytics systems? Would you truly need to keep terabytes of noisy, verbose, hard-to-search logs in your expensive analysis tools daily?
Want to know more about Cribl Steam, Cribl Search, and Cribl Edge? Check out Cribl University, where training is always free.
Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl’s suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs.
We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.