In the era of big data, data lakes have emerged as a popular way to store and process massive amounts of data. Fortunately, with Cribl Search and Cribl Stream, you can create a Data Loop to optimize the use of your data lake by saving Search results as part of an investigation. Our four-part video series explains how to set up Cribl Search and Cribl Stream to establish a Data Loop using the Amazon S3 Data Lake destination in Cribl Stream and the Cribl Stream in_cribl_http
source. This Data Loop serves as a temporary storage location for Cribl Search results obtained from an initial investigation. For example, if an IOC triggers an alert in your SIEM (which stores only 30 days of data), you can expand your investigation on the IP addresses in that alert by querying 12 months of data stored in cost-effective Amazon S3 storage via the Data Loop. If you’d like to follow along, create a Cribl.Cloud account for free to get started!
In the very first video of the series, we delve into the concept of a data loop and why it is beneficial to use Cribl Search and Cribl Stream to optimize the use of a data lake. The video gives a concise overview of Cribl Search and Cribl Stream, and how they work in tandem to create a data loop. We then provide step-by-step instructions on how to configure the Cribl Stream “Amazon S3 Data Lake” Destination to transfer data from Stream to an S3 bucket that has been optimized specifically for Cribl Search’s access. Finally, we demonstrate sending sample data to the S3 bucket and present a before-and-after view of the bucket to showcase the impact of the test data.
In the second video of our series, we delve into the nuts and bolts of configuring Cribl Search to access the data that we’ve stored in the S3 bucket. The video guides you step-by-step through the process of configuring the Search S3 dataset provider by using the Stream Data Lake destination as a model for the authentication information. From there, we proceed to walk through the process of creating a Dataset to access the Provider that we’ve just established. To wrap things up, we demonstrate how to search through the test data that we’ve previously stored in the S3 bucket.
The third video of our series focuses on utilizing Cribl Stream to manage data. The presenter takes us through the process of configuring the Cribl Stream in_cribl_http
source in tandem with the Cribl Search send
operator to collect data. We are able to witness live data results being sent from Search to Stream.
Afterward, we demonstrate creating a Route in Stream to direct the incoming data from Search (via the in_cribl_http) Source to the Data Lake by using the Amazon S3 Data Lake Destination. This step employs a passthru
pipeline to ensure that the data is not altered in transit.
Finally, we add a layer of enrichment by appending a tag to the outbound data and modifying the Data Lake Destination to adjust the S3 bucket hierarchy to optimize Search access later.
The final section of our video series showcases how to put the data loop to use with a real-world dataset. We utilize the public domain “Boss of the SOC v3” [BOTSv3] dataset, which is readily available on GitHub. First, we employ Cribl Search to sift through and explore the BOTSv3 data that is stored in an S3 bucket to locate some specific data.
Following that, we create a fictitious scenario in which we identify a particular set of data as suspicious (in this instance, we picked mysql
data from a set of wire data analysis). We pinpointed the two IP addresses that were involved in the conversation and then expanded our investigation to all events that contained at least one of those two IP addresses. We saved this dataset to the Data Loop, thereby making it accessible in Search.
Finally, we illustrate how to utilize the Data Lake source in Search to search through the subset of data identified in the fictitious scenario in an optimized manner. We also demonstrate the advantage of utilizing the Data Lake source over the original dataset in terms of speed, which amounts to a 5x increase!
To sum it up, Cribl Search and Cribl Stream can be employed to establish a data loop that optimizes the use of an Amazon S3 bucket for intermediate storage. By gathering data from Search and looping it back to an S3 data lake destination, users can effortlessly store and process large quantities of data. Users can also filter and transform the data before it is transmitted to the data lake, allowing for complete control over the data that is stored.
Cribl Search and Cribl Stream work in unison to provide a powerful tool that enhances an existing SIEM or analytics tool by expanding and deepening forensic or audit investigations. This tool is highly valuable for businesses and organizations that rely heavily on data to make informed decisions. If you’d like to try it on your own, you can create a free Cribl.Cloud account today!
Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.