Cribl Stream's Replay vs Cribl Search's Send: Understanding the Differences

Last edited: January 10, 2024

‘Play it Again Sam’

In today’s contemporary landscape, organizations produce more data than ever, which needs to be collected, stored, analyzed, and retained, but not necessarily in that order. Historically, most vendors’ analysis tools were also the retention point for that data. Still, while this may first appear to be the best option for performance, we have quickly seen it creates significant problems. First, those systems were never designed for the scale of today’s growing volume of data, currently at a 28% CAGR. Second, analysis systems pricing is based on the volume of ingested data; the costs are already prohibitive and will continue to climb.

The simple answer is to separate your retention system from your analysis system. Put your data in a separate, cost-effective repository (like Amazon S3) and optimize the transfer of only specific datasets from storage into your analysis system instead of dumping everything into analysis. This is where Cribl comes in.

Cribl built its reputation on providing more innovative ways to manage observability and security data. We engineered Cribl Stream, a vendor-agnostic platform that gives customers the flexibility to route, shape, restructure, and enrich data from any source to any destination and in the format required, with an additional replay capability that enables customers to route “replay” only the essential data they need from low-cost storage to existing analytics tools.

Then, last year, we added Cribl Search, which lets you perform federated “search-in-place” queries on any data in any format at any location, eliminating the complexity and costs associated with first shipping, ingesting, and storing the data before being ankled to search it.

Both Cribl Stream and Search enable administrators to retain their data in their chosen data stores and then retrieve only the specific datasets required for the task (investigative query, etc.). We pride ourselves on giving customers the flexibility to leverage whatever Cribl tools work best with their stack, so you may wonder, which do I use and when? Let’s dive into the answer!

Data Collection Strategies

BL-0006-Stream-Replay-Search-Send-DES-186-2

Data analysis requires collecting and routing the data through some processes to glean specific information. There are multiple ways to collect data for analysis; the traditional method has been to ‘collect it all,’ like the fishing trawl, collecting the targets and a lot of unneeded (image 1) and then store and process the data in your analysis system (I.e., SIEM) to sort it out. This is highly effective and has been the cornerstone of data analysis for a long time. However, data volume growth requires a better option, and Cribl offers two new ways to collect data for analysis: Cribl Stream Replay (image 2), targeting specific subsets of data, and Cribl Search (image 3), allowing the targeting of data with surgical precision. Here’s where they differentiate:

Cribl Stream Replay

Cribl Stream is a universal receiver designed to collect from almost any machine data source, streaming, or scheduled batch collection. As data transits with Cribl Stream, it is ‘shaped’ (reduction, enrichment, format, summarize, aggregate, etc.) before the data is forwarded to its final destination(s). This helps to control costs by eliminating digital noise and, in turn, helps retain more valuable data for a longer time without blowing out budgets. At The same time, Cribl Stream enables customers to route a full-fidelity copy of raw data to low-cost storage for long-term retention for compliance/audit/investigative purposes and “replay” it to analytics tools when required, such as Amazon S3, Google Cloud Platform, Azure Blob, and other compatible systems. With Replay, you can efficiently retrieve data from the object storage and then “replay it” through a pipeline and into your destinations, giving you an affordable way to retain more data for extended periods while still having it accessible for investigations.

A typical use case would be an organization using Amazon S3 to store data, even in the petabytes, before sending it for analysis for their security and operations teams. When teams get requests to review data for use in investigations, they no longer are overwhelmed by the volume of data, allowing the retrieval of specific datasets.

How It Works

Cribl Stream’s Replay option for AWS S3 offers organizations fundamentally new ways to manage data by providing an easy way to ingest selectively and re‑ingest data into systems of analysis. Let’s walk through how to use this feature at a high level.

Your analytics systems continue to do their thing, as usual; no changes are required.
Meanwhile, as raw data flows through Cribl Stream, a copy of that data, all those critical logs, metrics, and traces, is routed to less expensive object storage destination(s).
1. The settings for the Destination allow administrators to define how the uploaded files are partitioned. Host, year, month, date, time, host, index, sourcetype, ….
Whenever required by investigations, audits, or security events, Stream can re-ingest the relevant data right back into your analytics system(s).
1. When replaying the data, partitions will make your replay searches faster, mapping segments of the path back to variables, including time, that you can use to zero in on the exact logs you need.

Replay allows you to retrieve only a subset of the dataset based on the partitioning scheme you defined, improving the quality and speed of your analytics environment.

Think about it: when (not if) there is a data breach, you’ll need access to the raw data to analyze it in a new way that you haven’t been (otherwise, you would have caught the breach sooner). Also, if you need to prove compliance with certain security standards back to a specific date, you will want your raw data to help.

Cribl Search

Cribl Search is reshaping the data search paradigm, empowering users to query data directly at its source. Effortlessly sift through data in major object stores like AWS S3, Amazon Security Lake, Azure Blob, and Google Cloud Storage, and enrich your insights by querying dozens of live API endpoints from various SaaS providers. The power of Cribl Search lies in its strategic approach: discover and forward only the critical data to your systems of analysis, thus avoiding the cost of expensive storage.

Like Cribl Stream, Cribl Search allows administrators to identify and forward only a subset of the raw data for analysis, but with surgical precision to target specific data. Once a query is defined, it is a simple process to add the ‘send’ operator to leverage the native integration with Cribl Stream to send the results to the appropriate destinations easily.

How It Works

Cribl Search enables administrators with a single search tool to query all their observability data without first collecting it: Search for any terms, patterns, value/pairs, Search for any data type, Search anywhere you can reach, and forward the results to the analysis system.

Write your Search query – Example
1. dataset="cribl_internal_logs" status=200 response_time>2 | limit 1000
2. Query Returns 1000 events where status = 200 and response time >2
Append ‘send’ to your query
1. dataset="cribl_internal_logs" status=200 response_time>2 | limit 1000| send
2. Results of the query are forwarded to your Stream cloud instance
3. Note: results can go only to Stream or Stream and local display
Results received in Stream can be shaped and or routed to 1 or more destinations
1. The query results will automatically be routed to your Cribl Stream instance. Without any additional configuration, it will go to your default group or be configured to a specific group. The received data can then be routed to any Destination you choose (see Stream docs for more details on how to accomplish this).

While Cribl Stream allows searching a subset of the data, Search allows you even to be more surgically, able to locate and retrieve very specific datasets, results from a single IP, single or range of users, and of key importance is the ability to query data from multiple data stores simultaneously. Additionally, if Cribl Stream was used to write the data into the S3 bucket (or other), its robust partitioning capabilities make retrieving specific datasets much easier; without the pre-partitioning, Cribl Search will provide a more granular retrieval capability.

Finally, Cribl Search enables administrators to query and collect logs, metrics, application data, etc., directly from a host (via Cribl Edge) and forward to Cribl Stream for additional shaping and routing. Once again, Cribl’s products improve the quality and speed of your analytics.

Stream/Replay & Search/Send Use Cases:

Incident Analysis: Store a full copy of event log data from various sources in AWS S3, with the ability to send specific data to an SIEM system for further analysis.
Aging Data: Search for events that have aged out of your system of analysis
Compliance audit: Find and report which devices are accurately sending logs and identify any devices that may be missing.
Handling Massive Cardinality: When dealing with high cardinality data, use Cribl Search to decide what data is useful and then move it to an analytics tool.
AWS Security Logs Processing and Forwarding: Store AWS Logs in S3 and send a focused subset to another tool for further analysis.
AWS S3 to S3 Hairpin: Find specific data in one S3 bucket and then transfer the distilled search results back into a file (or set of files) in the S3 bucket.
Cost Savings: Optimize the data ingested to expand the licensing room for ingesting more critical data.
Optimize Ingest: Bring original data back from S3 object storage so you don’t need to suppress or drop specific data entirely.

Conclusion

As data volumes grow, the percentage of data being analyzed will continue to drop due to licensing costs. There are only two options to address this: get a bigger budget or be smarter about processing data before ingesting it into the analysis system. The Cribl Stream Replay and Search Send features are game changers; you can now effortlessly collect specific datasets and forward them to different systems for advanced analysis, audit, and compliance—a tremendous value for anyone managing digital exhaust data at scale. Suppose you always had a full-fidelity copy of your logs, metrics, and traces in S3. Ask yourself this: Would you still bring every event into your analytics systems? Would you truly need to keep terabytes of noisy, verbose, hard-to-search logs in your expensive analysis tools daily?

TL;DR: Cribl Solution Benefits

Enable separation of the system of analysis from the system of retention
- Store raw data in low-cost data stores, not in expensive analysis systems (<$$)
- Query data in-place (data stores), the route only relevant data (wheat from the chaff)
- Forward just the relevant data to the system of analysis (lower the ingest license)
- Improve the quality and speed of your analytics environment by saving older data somewhere else
Use Cribl Search to front-end and complement your existing analysis tooling
Keep more data for more extended retention periods and pay a lot less
Replay data to any analytics tools for unexpected investigations

Want to know more about Cribl Steam, Cribl Search, and Cribl Edge? Check out Cribl University, where training is always free.

Cribl, the AI Platform for Telemetry, empowers enterprises to manage and analyze telemetry for both humans and agents with no lock-in, no data loss, no compromises. Trusted by organizations worldwide, including half of the Fortune 100, Cribl gives customers the choice, control, and flexibility to build what’s next.

We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Previous articleNext article