The process of adding new data to operations and security analytics tools is familiar to admins. New data onboarding can be a tiresome process that takes up too much time and delays getting value from the new data.
The process typically begins with the admin engaging the data source owner, getting the wrong data sample, and then having to try again. Once the correct data sample is provided, then estimate the impact on computer/license resources, write the parser, test it, and finally deploy it into production. This process is frankly a pain. My goal is to discuss why an onboarding process is necessary and how to reduce friction and make it more tolerable – enjoyable, even.
Your team has now fully deployed Cribl Stream and is eager to begin solving problems. Instead of running to build code, now is the time to pause and put your processes in place to make your efforts sustainable.
Start with the log onboarding process. In a great blog post, Jon Rust wrote about how to build event breakers for new log sources in Cribl. For anyone who has had to build event breakers/parsers at the command line, his post is eye-opening for how easy it is to build event breakers using Cribl Stream’s UI. Even better, you can validate the event is breaking properly using the UI. For more details, see the embedded video in the blog. This is one of the most useful posts you will see.
I want to focus on why the onboarding process is so important and how to materially lower the friction of the normal process to something manageable so your engineers can spend more time on more business-critical tasks.
Cribl Stream’s default event breaker is attached to the data source, which can make it easy to miss if you’re not looking for it. Sometimes admins assume Cribl Stream is just passing events through without parsing and that can lead to problems. Although the default event breaker is very very flexible, it might not work correctly for custom events, especially application data, which is why it is critical to validate every log source with the default breaker and build a custom breaker if required. In addition, make sure you validate the timestamp and define the timezone as well. Nothing is as “fun” as finding a device’s events are exactly 9 and half hours off current time because the device is in IST and you are in EST and someone forgot to account for the timezone offset.
Use the onboarding process to document your data as well. Documenting your data matters as you try to understand what you are logging and what it means to your business. This is a big help when you need to quantify the value of your data. Another big benefit is having these docs for when the audit/records team appears from a puff of smoke when you are really busy and do not have time to answer questions.
Cribl’s Jordan Perks has a great format for documenting each data source.
Use this form to also update your license, storage, and compute forecasts so you can manage your capacity and budget accordingly. It is so important to stay on top of your capacity forecast so you don’t get caught short.
Finally, onboarding your data source by source is a great way to start thinking about how you can make the data better. This includes dropping, sampling, enriching, and format transforming. The best part is you can transform your data to make it better and smaller. Have your cake and eat it too. Here is a great video about the Cribl Windows pack which makes it easy to make Windows data 33% smaller without losing any fields or data. The ability to transform data from ugly to useful with minimal effort is part of what makes Cribl Stream unique.
I do not recommend making these changes at this point. You want to onboard the data and get it into your systems first and then come back to make changes. Small steps will get you to value faster and give you the necessary foundation to make the right decisions. Doing too much creates delay and slows down value. Schedule the next steps in your security and observability data
Here’s another good video with examples to start optimizing your data.
A consistent data onboarding process is key to long-term success.
I am going to discuss steps 4-6 in my next couple of 100 days with Cribl blog posts. I am particularly excited to talk about how to automate log onboarding while still retaining control of your data and preventing a free for all.
I’d love to hear your feedback on getting started with Cribl tools. Feedback is a gift, and I want to know if something doesn’t make sense or if I’m not covering something. Connect with me on LinkedIn or join our community Slack, and let’s talk about your experience deploying Cribl Stream.
Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl’s suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs.
We offer free training, certifications, and a generous free usage plan across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started. We also offer a hands-on Sandbox for those interested in how companies globally leverage our products for their data challenges.
Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.