AdobeStock_585069309-2

Best Practices for Using Git in Your Cribl Workflows

Last edited: November 6, 2023

Git and Get Going

In this conversation, Sanjay Shrestha, Principal Detection Engineer at Bayer, and Raanan Dagan, Principal Sales Engineer from Cribl, talk about the integration of Git in Cribl Stream. They discuss how to manage configuration files and pipelines as code, simplifying their deployment. They also share a demo and give best practices for optimizing your GitOps workflow.

In the 10+ years that Bayer has worked with Splunk, they’ve gone from processing just 80 GB/day to more than 13 TB/day. An increased data ingestion workload after moving to Splunk Cloud has also caused a considerable surge in operational costs and DDAA (Dynamic Data Active Archive) costs.

Since introducing Cribl Stream one year ago, Bayer has reduced those operational costs by 30% and DDAA costs by 70%. They leverage Cribl-to-Cribl integration to avoid double licensing costs, and take advantage of the vendor-agnostic nature of Cribl Stream to avoid SIEM vendor lock-in — routing logs to both Splunk and Exabeam. Bayer is also leveraging the integration of Git in Cribl to manage the configurations and data pipelines in their architecture.

Cribl Architecture at Bayer

Bayer’s architecture consists of two distinct worker groups deployed across several AWS regions — one for the US and the other for the EMEA region. They are further categorized into distinct public and private groups. The private worker group ingests logs from their internal network, and the public worker group is open to third-party vendor’s audit logs.

word-image-52667-1

The diagram above also shows a third-party vendor in the US sending logs to Bayer’s Cribl instance, and illustrates the Cribl-to-Cribl integration. Logs are sent to Bayer via the data center’s Cribl Stream instance and then from Bayer’s Cribl instance on AWS to Splunk Cloud.

Cribl and GitOps Integration at Bayer

The continuous, efficient flow of Bayer’s data is made possible by the integration between Cribl and Git. The GitOps operational model encompasses methodologies like version control, CI/CD, monitoring, and infrastructure automation — Git capabilities in Cribl Stream allow for the continuous addition of pipelines, Cribl Packs, sources, and destinations.

The integration provides an environment where you can continuously evolve the development-to-production cycle of your data pipelines and leverage a remote Git repository. Adjusting the Git settings in Cribl Stream will simplify and synchronize the dev → prod cycle, and create an environment where only approved changes make it into production.

Switch the GitOps workflow for your prod environment to push, and it will become read-only — meaning that changes will only be permitted in the dev environment. They’ll get committed to a dev branch of your repository, and then changes approved by production move to a prod branch to be merged and deployed.

You can also incorporate the Git methodology into specific sources, destinations, and routes. An option in Cribl Stream allows you to designate environments as dev or prod, so everything in those environments behaves accordingly. If a source, destination, or route within an environment doesn’t match the dev or prod bucket it’s supposed to, Cribl will automatically make it inactive. You can reactivate them by simply switching the designation.

Best Practices for Cribl Stream and Git

Here are some of the strategies and techniques Bayer has used to make the most of the Git Ops methodology within Cribl Stream.

  • Create a private repository. There will be some proprietary information that you won’t want to be public.

  • Grant access to users as per-need basis. You want to keep full control over who can approve pipelines and packs to move from dev to prod.

  • Examine the git.ignore file carefully. Know which default values are considered worthy of ignoring versus adding when Cribl pushes configurations from a local to remote Git repository.

  • Use Git as a backup to restore an environment. Store critical configurations so you can revert back to a previous environment if you run into any issues.

  • Use declarative comments. Details will help with understanding configuration changes and troubleshooting problems effectively.

  • Choose branches over repositories. This provides a more efficient and cohesive development environment and enables a pull request feature.

  • Worker/Global Commit and Deploy. When you make changes and commit on a global scope, you should push changes across multiple Cribl workers.

  • Continuous Push and Commit. This contributes to a more agile and efficient development process by giving rapid feedback and improving collaboration.

Be sure to watch the whole presentation for a demo of Cribl Stream and Git in action, and to learn more about how they can be used together to simplify the deployment and management of data pipelines.

Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl’s suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs.

We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

More from the blog

get started

Choose how to get started

See

Cribl

See demos by use case, by yourself or with one of our team.

Try

Cribl

Get hands-on with a Sandbox or guided Cloud Trial.

Free

Cribl

Process up to 1TB/day, no license required.