January 5, 2023
Are you just getting started with Cribl Stream? Or maybe you’re well on your way to becoming a certified admin through our Cribl Certified Observability Engineer certification offered by Cribl University. Regardless, using Cribl Stream to send data from one source to many destinations is something you’ll want to try. So if you’re ready, read on! The goal of this post is to show how simple it is to take a single source (logs) and split it into two separate datasets to send to two unique destinations. This can be repeated to more destinations if you have more than two. Let’s roll!
1. Sending the same data sources to multiple places (in the same format).
In this case, you can imagine yourself in a situation where the NOC and SOC teams need the same data in the same format, but in different tools.
2. Sending the full or partial data to multiple places (in the same format, but selective events only).
In this case, you can imagine yourself in a situation where the NOC and SOC teams need the data in the same format, but in some cases, the NOC doesn’t need ALL the events for their analysis.
3. Sending a different version of data to multiple places (different contents, formats, etc)
In this case, you can imagine yourself in a situation where the data consumers want full fidelity data in an object store, but want to optimize their ingested data to optimize their searches for speed and efficiency in the destination platform.
You may only need the output router functionality. Or in other cases, you need a custom pipeline + output router. Your solution depends on the complexity of your data management requirements.
My middleware team wants me to ingest their apache logs into Cribl Stream, and eventually out to Splunk. Pretty common use case right? But there’s a caveat… They want all of the logs archived out to a special S3 bucket already set up for retention (long-term archive), as well as optimizing and masking for some sensitive data before sending to Splunk. Oh boy, let’s get started!
So let’s do this with a custom pipeline, using the clone function, and create multiple sets of data from a single log source, then route that data to a custom output router destination that allows us to filter what gets sent where… S3 and Splunk. Sound fun? Let’s go do this!
In my case, this apache log data is already included in the Splunk TCP source that is set up for all my Splunk universal forwarders. Apache logs come in from the UFs running on those middleware hosts. All I need to know now is the source type(s) to filter on in my routes & pipelines in the upcoming steps.
(Splunk Source Example)
Step 1: Add Clone function
I chose to create a new field named
__cloned with a value of ‘true’ on the cloned records. This is helpful so I can identify the original record from the cloned record later.
Step 2: Add Mask function (This only applies to the cloned events)
My requirements were to eliminate the
JSESSIONID value, as it considered sensitive data for my customer. As well as removing the trailing browser information and other meta data that the users in Splunk don’t really need.
This saved approx 44% of their Splunk licensing and reduced the storage needed to write all these events to disk in Splunk.
Step 3: Added Eval function. (This only applies to the cloned events)
Changed some fields to set the index value and adjust the sourcetype value to land in the right place in Splunk.
OK, we have data being modified in the pipelines, now we need to set up the destination(s) where this data is going. In this case, we will need the standard destinations for Splunk and AWS S3 as well as a custom Output Router destination that allows us to filter events before sending them to their final destination.
Step 1: Splunk Destination
Step 2: AWS S3 Destination
Step 3: Output Router Destination
It is worth talking about this more to really understand what is happening here. We are sending ALL THE DATA from the pipeline to this custom destination. That data includes the original apache events (unchanged condition) and the newly created cloned events to that we’ve made some changes to. This destination allows you to do a last-minute filter and select what observability data goes where. Pretty cool right? The order of these filters also matters… Stream logic flows top-down… so pay close attention to this.
First, we peel out the cloned events (marked with an internal field we created named
__cloned). These events were filtered and modified, so let’s make sure to ONLY send those to our Splunk prod destination. Note: Setting the final flag ensures those events only go to the destination listed and do not move down to other rules below.
OK, we’ve made it this far, so let’s tie it all together!
You can connect the Source > Pipeline > Destination via either method of routes or quick connect. Both work exactly the same and depend on which method you have chosen to use. The example below is of the Routes. I ensure that only the incoming apache logs are sent thru this pipeline and get bifurcated accordingly by the output router.
So what did we learn? Data engineering teams can take a single source of any data and split it up across multiple destinations using an output router meta-destination. You can quickly and easily add more destinations to that output router to quickly bifurcate data where you need it.
Also, we can clone a copy of that dataset and do something completely different with it while maintaining data reliability. Essentially create a new dataset on the fly, and route that version of the data to whatever destination(s) you need as well. Cribl Stream gives you the ultimate control of your data. Send what you want, where you want, in whatever shape, format, or volume you chose. Pretty amazing right? Welcome to Cribl Stream.
The fastest way to get started with Cribl Stream, Edge, and Search is to try the Free Cloud Sandboxes.