Products
Product Portfolio

Cribl puts your IT and Security data at the center of your data management strategy and provides a one-stop shop for analyzing, collecting, processing, and routing it all at any scale. Try the Cribl suite of products and start building your data engine today!
Learn more ›

Evolving demands placed on IT and Security teams are driving a new architecture for how observability data is captured, curated, and queried. This new architecture provides flexibility and control while managing the costs of increasing data volumes.
Read white paper ›

Cribl Stream

Cribl Stream is a vendor-agnostic observability pipeline that gives you the flexibility to collect, reduce, enrich, normalize, and route data from any source to any destination within your existing data infrastructure.
Learn more ›

Vodafone Case Study

Vodafone Dials up Business Insights with Cribl Stream
Read Case Study ›

Cribl Edge

Cribl Edge provides an intelligent, highly scalable edge-based data collection system for logs, metrics, and application data.
Learn more ›

SpyCloud Edge Story

Listen to how SpyCloud uses Cribl Edge at scale.
Watch Video ›

Cribl Search

Cribl Search turns the traditional search process on its head, allowing users to search data in place without having to collect/store first.
Learn more ›

How Cribl Search Can Save You From Drowning in a Deluge of Data
Read Blog ›

Cribl Lake

Cribl Lake is a turnkey data lake solution that takes just minutes to get up and running — no data expertise needed. Leverage open formats, unified security with rich access controls, and central access to all IT and security data.
Learn more ›

Navigating the future of IT and Security Data management white paper
Read white paper ›

Cribl.Cloud

The Cribl.Cloud platform gets you up and running fast without the hassle of running infrastructure.
Learn more ›

Cribl.Cloud Solution Brief

The fastest and easiest way to realize the value of an observability ecosystem.
Read Solution Brief ›

Cribl Copilot

Cribl Copilot gets your deployments up and running in minutes, not weeks or months.
Learn more ›

Cribl Copilot

Your Trusted AI Advisor for Deploying, Configuring & Troubleshooting.
Read blog ›

AppScope

AppScope gives operators the visibility they need into application behavior, metrics and events with no configuration and no agent required.
Learn more ›

Sandbox

Launch an AppScope Sandbox today!
Launch Now ›
Solutions
Use Cases

Explore Cribl’s Solutions by Use Cases:

Supercharge Security Insights ›

Accelerate Cloud Migration ›

Avoid Vendor Lock-in ›

Agent Consolidation ›

Slash Storage Costs ›

Free Up Space for High-Value Data ›

Route From Any Source To Any Destination ›

Immediate Access to Archived Data ›

Replay Data from Low-Cost Storage ›

Reduce Log Volume & Pay Less for Infrastructure ›
Integration

Explore Cribl’s Solutions by Integrations:

Amazon ›

CrowdStrike ›

Elastic ›

Exabeam ›

Google ›

Microsoft ›

Splunk ›

Wiz ›

View All Integrations ›

Seamless Integrations for Your Observability Data
Learn More ›
Industries

Explore Cribl’s Solutions by Industry:

AIOps ›

Financial Services ›

Healthcare ›

Managed Security Services ›

Manufacturing and Logistics ›

Media and Entertainment ›

Public Sector ›

Retail ›
Resources
Resources

Resource Library ›

Documentation ›

Guides ›

AppScope Docs ›

Blog ›

Glossary ›

Podcasts ›

Telemetry 101

Understanding the Basics of Telemetry and Its Benefits
Learn More ›
Events & Webinars

Events ›

Webinars ›

CriblCon24
Watch On-Demand ›

July 31 | 10am PT / 1pm ET

Navigating the Data Current Report: Transforming IT & Security Operations in 2024
Register ›
Learning

Try the Sandboxes ›

Self Guided Trials ›

Cribl University ›

Cribl Community ›

Cribl Curious Forum ›

What is Observability? ›

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›
Tools & Pricing

Download Library ›

Past Releases ›

Pricing Plans ›

Stream ROI Calculator ›

Download Library

Download Cribl’s suite of products for free to get started.
Download ›
Customers
Customer Stories

Get inspired by how our customers are innovating IT, security and observability. They inspire us daily!
Read Customer Stories ›

Sally Beauty Holdings

Sally Beauty Swaps LogStash and Syslog-ng with Cribl.Cloud for a Resilient Security and Observability Pipeline
Read Case Study ›
Customer Experience

Support & Success ›

Professional Services ›

Service Delivery Partners ›

Documentation ›

AppScope Docs ›

Professional Services

Check out our new Professional Services offering.
Learn More ›
Learning

Try the Sandboxes ›

Self Guided Trials ›

Cribl University ›

Cribl Community ›

Cribl Curious Forum ›

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›
Company
About Cribl

Transform data management with Cribl, the Data Engine for IT and Security
Learn More ›

Cribl Corporate Overview

Cribl makes open observability a reality, giving you the freedom and flexibility to make choices instead of compromises.
Get the Guide ›

Cribl Newsroom

Stay up to date on all things Cribl and observability.
Visit the Newsroom ›

Press Releases

Read our most recent press releases.
Recent Press Releases ›

Leadership

Cribl’s leadership team has built and launched category-defining products for some of the most innovative companies in the technology sector, and is supported by the world’s most elite investors.
Meet our Leaders ›

Careers

Join the Cribl herd! The smartest, funniest, most passionate goats you’ll ever meet.
Learn More ›

Cribl Named to the Inc. 5000 List of Fastest Growing Private Companies
Learn More ›

Cribl for Startups

Whether you’re just getting started or scaling up, the Cribl for Startups program gives you the tools and resources your company needs to be successful at every stage.
Learn More ›

Contact Us

Want to learn more about Cribl from our sales experts? Send us your contact information and we’ll be in touch.
Talk to an Expert ›

Try Cribl Talk to an expert

Building Efficient Pipelines in Cribl Stream

June 6, 2022

Categories: Cribl Stream, Engineering

Back To Blogs

An old colleague of mine once said to me, “It doesn’t matter how inefficiently something DOESN’T work.” This was a joke used to make a point, so it stuck with me. It also made me consider that it does matter how efficiently something DOES work. Sometimes, when we have tools like Cribl Stream making things like routing, reducing, and transforming data so easy, we can forget that there might be a more efficient way to do it. When we are getting something working, we are usually testing against a sample set of data and not a full and steady stream. It’s important to think ahead and plan for production levels of data, potential growth, seasonal increases, etc

Having said all this, let’s get into some areas to focus on when building pipelines so that you can build something that will meet your needs and go beyond just getting it working.

Measure Twice (Or at Least Once), Cut Once

In Stream, a Pipeline is a sequence of functions that are applied to the data. The order in which your Pipeline is set up to process data is important. It’s helpful to plan out the processing before building the pipeline. Just like with coding or writing a document, sure, you could just start typing, but it’s better if you have some idea of what you are trying to do before starting. This helps you determine the order that your functions should be used. For example, you would probably want to apply functions to as small a data set as possible, so it would make sense to filter out or drop unnecessary data first. A little planning never hurt anyone.

“Pre-pare” Your Data for Processing

Sometimes there may be a case where you can prepare your data to be more efficiently processed. Stream has a type of pipeline called a pre-processing pipeline that is attached to a Source and allows you to condition / normalize the data before it is delivered to a processing pipeline. Normalized data will likely be easier to process. An example of a pre-processing pipeline can be found in the Cribl Pack for Syslog Input (cribl-syslog-input) available on the Cribl Pack Dispensary.

Less Is More…Efficient

When working with large amounts of streaming data, it’s best to get rid of any unnecessary data as early as possible. I have plastic bins in my basement, full of things that I will probably never need. This makes trying to find something very time consuming and inefficient. The same goes for data. So, tap into your inner Marie Kondo, and get rid of any data that doesn’t “spark joy in your heart”. Ok, it’s unlikely that your data is going to “spark joy in your heart”, so just get rid of data you don’t need as early as possible in your pipeline.

There are a few places to do this. Where you do it will depend on how easy it is to identify whether you can drop it. We can look at reducing our dataset in two ways: what events to drop, and what fields to remove from the remaining events. This section is not necessarily about overall reduction of data but getting down to the minimal set of data to work with before applying other functions that might require a bit more processing power to perform.

Routes

After the pre-processing pipelines, your next line of defense against processing unnecessary data would be Routes. Here you can create a filter that will only send the set of data you need to the pipeline. You can filter on any fields that exist at the point the data comes into Stream. For example, if your source is a Splunk Universal Forwarder, you probably have sourcetype available to you. If you are using a pre-processing pipeline, you may have additional fields available for filtering.

Pipeline – Eval Function

The Eval function can be used to provide further filtering if necessary. Filtering at both the route and pipeline (in an Eval) are limiting the overall set of data. Eval also allows for listing fields to keep, or fields to remove, whichever list is shorter. This is helpful in controlling the size of events.

Pipeline – Parser Function

The Parser function can be used to extract fields from your events. Also, just like the Eval function, it lets you list fields to keep or fields to remove. The Parser function also allows for a Fields Filter Expression. This can be used to drop fields when they are equal to a specified value, such as null for example.

Pipeline – Drop Function

The Drop Function allows you to drop more events. This can also be done after you have performed field extractions, allowing you more specificity in what gets dropped.

Working Smart, Not Hard

When adding functions to a pipeline, it’s best to use as few functions as possible to accomplish your goals. The more functions you use in your pipeline, the more invocations of code that will happen. More functions also means more time and resources will be required to process events.

This goes back to planning out your pipeline as I mentioned earlier. If you can have a single function do multiple things, you will be better off. For example, if you have to rename several fields, it’s possible to do this in one invocation of the Rename function. See the example below:

Don’t Be Greedy

Let’s talk about regular expressions for a bit. They are important when it comes to building efficient pipelines. A good understanding of some of the things that make a regex inefficient can make all the difference in your pipelines. Think about the fact that a regex is going to be applied to every event that a pipeline must process. Even if we are talking about sub second differences, you need to multiply that time difference by the number of events being processed.

This is not going to be a tutorial on how to write efficient regex. There are plenty of sources for that out there. Let’s call out a few things that can help improve regex performance.

If you are familiar with regex, you have probably heard the term “greedy”. Remember earlier when I talked about just getting things working vs. making things as efficient as possible? Well, we are all probably guilty of this. It’s easy and quick to write greedy regular expressions. But understanding a little bit about how they work will make you want to find a better way. Consider the following text:

Field_1: 1024 Field_2: 5.1 Field_3: 256 Field_4: .5

and the following regular expression:

.* Field_2: (\d+\.\d+) .*

This is a greedy regex. It’s greedy because it uses .*, which will consume everything up to the end of the string, then it will backtrack until it finds Field_2.

An alternative would be to use a lazy regex. I never thought I would say it’s better to be lazy than greedy. Both options sound like bad character traits to me, but in the case of regex, lazy is better. Here is the lazy version of the regex above:

.*? Field_2: (\d+\.\d+) .*

In this version, it would start from the beginning until it reaches Field_2 and would continue to match the rest of the string.

Could You Be More Specific

Another way to make your regex more efficient is to be a little more specific. One way to do this is to use character classes instead of our greedy friend “.*”. A character class lets you specify what characters to look for or NOT look for. Some examples would be:

[abc] – This will accept any one of these characters in the square brackets.
[0-9] – This will accept any one number in the range 0 to 9.
[^abc] – This will accept one character not in this list.
[0-9]+ – This will accept any number of consecutive digits in the range 0-9.

Anchors Away

Let’s end with one more quick tip. Using anchors like ^ and $ allow you to indicate where the cursor should be within a string. ^ indicates the beginning of a line and $ indicates the end of a line. For example, I can use the following regex to match strings that begin with an IP address:

^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}

would match:

192.168.1.10 10/22/2022 11:57:13 blah blah blah blah

And would NOT match:
10/22/2022 11:57:13 192.168.1.10 blah blah blah blah

Time Will Tell

I just spent this entire blog telling you about how to make your pipelines more efficient. It’s only fair that I provide a way to measure how long it takes your pipeline to process events. A fairly easy way to get a rough idea of pipeline performance is to add an Eval function at the beginning of your pipeline and set __starttime to the current time. We prefix our field name with __ because this indicates an internal field, which won’t get passed on to the destination.

This will allow us to calculate an elapsed time at the end of the pipeline. We do this by adding an Eval as the last function in your pipeline that subtracts __starttime from the current time.

Getting Started

So, if all of this sounds great, but you don’t have a way to make use of this information, we have a few ways for you to get started with Cribl Stream. Try it out yourself at no cost.

Also, check out the following related blog: High Performance Javascript in Stream

The fastest way to get started with Cribl Stream and Cribl Edge is to try the Free Cloud Sandboxes.

Blog

Preventing Friction With an Impactful Security Champions Program

Blog

From Necessity to Opportunity: The Customer Push for SIEM Options

Blog

Securing the Foundation of Cribl Copilot

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.

Launch Now

Product Portfolio

Cribl Stream

Cribl Edge

Cribl Search

Cribl Lake

Cribl.Cloud

Cribl Copilot

AppScope

Use Cases

Integration

Industries

Resources

Events & Webinars

Learning

Tools & Pricing

Download Library

Customer Stories

Customer Experience

Learning

Try Your Own Cribl Sandbox

About Cribl

Cribl Newsroom

Leadership

Careers

Cribl for Startups

Contact Us

Building Efficient Pipelines in Cribl Stream

Measure Twice (Or at Least Once), Cut Once

“Pre-pare” Your Data for Processing

Less Is More…Efficient

Routes

Pipeline – Eval Function

Pipeline – Parser Function

Pipeline – Drop Function

Working Smart, Not Hard

Don’t Be Greedy

Could You Be More Specific

Anchors Away

Time Will Tell

Getting Started

Blog

Preventing Friction With an Impactful Security Champions Program

Blog

From Necessity to Opportunity: The Customer Push for SIEM Options

Blog

Securing the Foundation of Cribl Copilot

Try Your Own Cribl Sandbox

So you're rockin' Internet Explorer!