Products
Product Portfolio

Cribl puts your IT and Security data at the center of your data management strategy and provides a one-stop shop for analyzing, collecting, processing, and routing it all at any scale. Try the Cribl suite of products and start building your data engine today!
Learn more ›

Evolving demands placed on IT and Security teams are driving a new architecture for how observability data is captured, curated, and queried. This new architecture provides flexibility and control while managing the costs of increasing data volumes.
Read white paper ›

Cribl Stream

Cribl Stream is a vendor-agnostic observability pipeline that gives you the flexibility to collect, reduce, enrich, normalize, and route data from any source to any destination within your existing data infrastructure.
Learn more ›

Vodafone Case Study

Vodafone Dials up Business Insights with Cribl Stream
Read Case Study ›

Cribl Edge

Cribl Edge provides an intelligent, highly scalable edge-based data collection system for logs, metrics, and application data.
Learn more ›

SpyCloud Edge Story

Listen to how SpyCloud uses Cribl Edge at scale.
Watch Video ›

Cribl Search

Cribl Search turns the traditional search process on its head, allowing users to search data in place without having to collect/store first.
Learn more ›

How Cribl Search Can Save You From Drowning in a Deluge of Data
Read Blog ›

Cribl Lake

Cribl Lake is a turnkey data lake solution that takes just minutes to get up and running — no data expertise needed. Leverage open formats, unified security with rich access controls, and central access to all IT and security data.
Learn more ›

Navigating the future of IT and Security Data management white paper
Read white paper ›

Cribl.Cloud

The Cribl.Cloud platform gets you up and running fast without the hassle of running infrastructure.
Learn more ›

Cribl.Cloud Solution Brief

The fastest and easiest way to realize the value of an observability ecosystem.
Read Solution Brief ›

Cribl Copilot

Cribl Copilot gets your deployments up and running in minutes, not weeks or months.
Learn more ›

Cribl Copilot

Your Trusted AI Advisor for Deploying, Configuring & Troubleshooting.
Read blog ›

AppScope

AppScope gives operators the visibility they need into application behavior, metrics and events with no configuration and no agent required.
Learn more ›

Sandbox

Launch an AppScope Sandbox today!
Launch Now ›
Solutions
Use Cases

Explore Cribl’s Solutions by Use Cases:

Supercharge Security Insights ›

Accelerate Cloud Migration ›

Avoid Vendor Lock-in ›

Agent Consolidation ›

Slash Storage Costs ›

Free Up Space for High-Value Data ›

Route From Any Source To Any Destination ›

Immediate Access to Archived Data ›

Replay Data from Low-Cost Storage ›

Reduce Log Volume & Pay Less for Infrastructure ›
Integration

Explore Cribl’s Solutions by Integrations:

Amazon ›

CrowdStrike ›

Elastic ›

Exabeam ›

Google ›

Microsoft ›

Splunk ›

Wiz ›

View All Integrations ›

Seamless Integrations for Your Observability Data
Learn More ›
Industries

Explore Cribl’s Solutions by Industry:

AIOps ›

Financial Services ›

Healthcare ›

Managed Security Services ›

Manufacturing and Logistics ›

Media and Entertainment ›

Public Sector ›

Retail ›
Resources
Resources

Resource Library ›

Documentation ›

Guides ›

AppScope Docs ›

Blog ›

Glossary ›

Podcasts ›

Telemetry 101

Understanding the Basics of Telemetry and Its Benefits
Learn More ›
Events & Webinars

Events ›

Webinars ›

CriblCon24
Watch On-Demand ›

July 31 | 10am PT / 1pm ET

Navigating the Data Current Report: Transforming IT & Security Operations in 2024
Register ›
Learning

Try the Sandboxes ›

Self Guided Trials ›

Cribl University ›

Cribl Community ›

Cribl Curious Forum ›

What is Observability? ›

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›
Tools & Pricing

Download Library ›

Past Releases ›

Pricing Plans ›

Stream ROI Calculator ›

Download Library

Download Cribl’s suite of products for free to get started.
Download ›
Customers
Customer Stories

Get inspired by how our customers are innovating IT, security and observability. They inspire us daily!
Read Customer Stories ›

Sally Beauty Holdings

Sally Beauty Swaps LogStash and Syslog-ng with Cribl.Cloud for a Resilient Security and Observability Pipeline
Read Case Study ›
Customer Experience

Support & Success ›

Professional Services ›

Service Delivery Partners ›

Documentation ›

AppScope Docs ›

Professional Services

Check out our new Professional Services offering.
Learn More ›
Learning

Try the Sandboxes ›

Self Guided Trials ›

Cribl University ›

Cribl Community ›

Cribl Curious Forum ›

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›
Company
About Cribl

Transform data management with Cribl, the Data Engine for IT and Security
Learn More ›

Cribl Corporate Overview

Cribl makes open observability a reality, giving you the freedom and flexibility to make choices instead of compromises.
Get the Guide ›

Cribl Newsroom

Stay up to date on all things Cribl and observability.
Visit the Newsroom ›

Press Releases

Read our most recent press releases.
Recent Press Releases ›

Leadership

Cribl’s leadership team has built and launched category-defining products for some of the most innovative companies in the technology sector, and is supported by the world’s most elite investors.
Meet our Leaders ›

Careers

Join the Cribl herd! The smartest, funniest, most passionate goats you’ll ever meet.
Learn More ›

Cribl Named to the Inc. 5000 List of Fastest Growing Private Companies
Learn More ›

Cribl for Startups

Whether you’re just getting started or scaling up, the Cribl for Startups program gives you the tools and resources your company needs to be successful at every stage.
Learn More ›

Contact Us

Want to learn more about Cribl from our sales experts? Send us your contact information and we’ll be in touch.
Talk to an Expert ›

Try Cribl Talk to an expert

Better Practices for Getting Data in from Splunk Universal Forwarders

December 19, 2023

Written by

Categories: Cribl Stream, Engineering

Back To Blogs

Why Do My Universal Forwarders (UFs) Need Tuning to Work With Cribl Stream?

While tuning isn’t strictly required, Cribl Support frequently encounters users who are having trouble getting data into Stream from Splunk forwarders. More often than not, this is a performance issue that results in the forwarders getting blocked by Stream. When they encounter this situation, customers often ask: How do I get data into Stream from my Splunk forwarders as efficiently as possible? The answer is proper tuning!

There are many settings to consider from the UF side when it comes to tuning. I’ll discuss each of these and how they impact Stream later. But you might be wondering why I need to change anything in my UF when the forwarders can successfully send the same data set to my Splunk indexers without tuning.

The short answer is that Stream and Splunk are architected differently.

Stream uses many lightweight processes that can’t work unless they’re given to them over network sockets because they don’t talk to each other. The work that Stream does with events compared to indexer processes is also quite different so it’s effectively an apples to oranges kind of comparison. Stream sends events to indexers as cooked but not compressed. Stream must do significantly more processing on events versus indexers. The default settings for Splunk forwarders work well for most Splunk environments (as you might expect, given they are the default values). However, because they’re built for Splunk, they aren’t necessarily going to apply as-is for optimal transmission to Stream or for fully utilizing Stream resources. The goal of this article is to teach you how to fix that!

Note that while this article is focused on how to tune Splunk Universal Forwarders, the sizing calculations carry over to any input. In fact, I wrote a similar article for Microsoft Event Hubs because that service is such a beast to deal with! The sizing calculations for this article only come into play when calculating the value for the parallelIngestionPipelines setting. A future article will cover the same topic from a Heavy Forwarder (HF) perspective.

Which Knobs Do I Turn?

Your goal is to tune your Splunk forwarders to maximize throughput without crushing your individual worker processes with too much data. To that end, here is the list of settings to tune in your Splunk forwarders:

AutoLBFrequency (output.conf) – You use this setting to help prevent excessive TCP pinning (as Cribl calls it), or you may know it as sticky sessions. The default value is 30 seconds, and you want to ensure your setting is no higher than the default. While there is no magic number here, the higher the setting, the worse the performance will be. Why? Because high volume connections can saturate a worker process if the process gets too much data over too long of a time period. “Too long” depends on your events and your Stream configuration. You should also know that the utility of this setting is dependent on another aspect: event breaking.
Event breaking (props.conf) – Event breaking converts your data stream to discrete events. Forwarders won’t switch to another output server if they’re sending unbroken events because they wait for an event boundary (or EOF) to close a TCP connection. Otherwise, you would get truncated events. So, if the forwarders don’t know where the event boundary is, they won’t close the connection, which renders the AutoLBFrequency setting irrelevant.

Note: Technically, the forcedTimebasedAutoLB setting can be used in place of event boundaries or an EOF to switch output servers. However, this introduces a risk of truncated events. Because we have seen truncated events when customers use this setting, we explicitly note in our Splunk TCP docs that it should be turned off when sending to Stream. While Splunk indexers can mitigate the truncation, Stream does not have the same mechanism used by Splunk indexers. Because the integrity of your events is important, use this setting at your own risk. Incidentally, with Splunk v6.5+, Splunk itself recommends using Event Breakers rather than this setting.

MaxKBps (limits.conf) – Use this setting to adjust the forwarder throughput. While it defaults to 256, you don’t need to limit the throughput here as long as you have optimized all other settings. Go ahead and set this to 0. Because the setting is applied to each forwarders’ ingestion pipeline, not the tcpoutput processor, it doesn’t affect how fast data is sent to Stream but rather how fast the forwarder will ingest the data.
pipelineSetSelectionPolicy (server.conf) – Use this setting to ensure optimal distribution of events across Splunk ingestion pipelines and, thus outgoing TCP connections. The default setting is round-robin, but we’ve seen first-hand that pipeline usage can become quite unbalanced using the default algorithm. If you change this setting to weighted_random, it will yield better results.
parallelIngestionPipelines(server.conf) – This setting defaults to 1. We often ask users to modify this setting, and they usually ask why. Why do they ask? Well, Splunk documentation states that this setting should only be increased if Splunk Professional Services recommends it. However, in this case, your Cribl goats recommend it. Why? Doing so will help leverage more of those worker processes in your Stream environment. Each pipeline handles data from ingress to egress in a forwarder, which dictates how many outbound TCP connections are used. As we learned previously, the more connections over which data can be sent to Stream, the higher throughput you’ll achieve in Stream by employing more worker processes. That said, there are some caveats:
- First, the benefits of this setting are heavily dependent on the quantity of inputs. Because each pipeline handles both ingress and egress, additional pipelines won’t be used by a given forwarder if there aren’t enough inputs configured to require them. For example, if you have only one input stanza in your inputs.conf, using a value of two for this setting won’t leverage the extra pipeline.

Many Cribl customers have syslog devices sending events to syslog-ng running on the same host as a UF. Syslog-ng is writing events to files, and a UF is monitoring these files. High-volume firewalls can generate multiple terabytes of daily data written to multiple files in time order. However, each file set is going to be treated as a single input by a UF, so a UF will only use one connection to send each file set’s data. This can quickly saturate a worker process for high-volume sources. Your best solution for this problem is to eliminate the UF from the path, at least for the high-volume syslog sources, so they can send their data directly to Stream (via a load balancer). An alternative solution is to ensure that the forwarder’s TCP connections to Stream are shut down frequently so that they are sent to a different Stream process when they reopen—event breakers aid in allowing these connections to rotate.

Next is the added CPU resources per pipeline. If your forwarder host doesn’t have the CPU resources to add more pipelines, there is a risk that there will be too few connections to distribute the data volume. This increases the chances that the Stream processes will be overloaded. In this case, you may need to add more UFs or consider eliminating them altogether, as mentioned in the syslog example.

Astute readers will notice I didn’t mention a recommended value for the quantity of ingestion pipelines. That is indeed necessary information. But, before we dive into that, we need to take a detour to understand how we measure throughput in Stream. If you know our sizing information, skip the next section and proceed to Pipeline Engineering.

Throughput Under a Microscope

As documented here, there is a limit to the data volume that can be processed per worker process per unit of time. So, we’ve created general guidelines to help ensure your worker processes aren’t pushed beyond their limits. The exact real-world limits vary case by case, so we have general guidelines. The value of 400 GB/day (x64) in Cribl documentation is based on minimal data processing with a single destination. A single destination implies ingress and egress are split evenly at 200 GB. The more copies on egress, the lower the throughput of each TCP stream, including the ingress stream, to keep the total around 400 GB/day. The 400 recommendation becomes slightly higher when using Graviton (ARM) processors, but to keep this discussion simple, I leave the math (detailed below for x64) for Graviton up to you. To simplify this analysis, take a look at the throughput introspection Splunk dashboard provided in our Splunk app hosted on GitHub. This provides throughput information per worker process as well as per input and output.

Although our sizing guidance is based on a per-day volume, it is imperative to focus on volumes at smaller time scales. For example, 400 GB/day per worker process is the daily limit, but that does not mean 400 GB can be received all within, for example, one hour and nothing else for another 23 hours. You must consider physical limitations because processing isn’t free. So, in reality, you need to be mindful of the per-second threshold to best ensure your processes aren’t overloaded at any given moment of the day. Of course, you can’t plan for every spike in traffic, but doing some level of planning helps mitigate the risk of processes displaying unexpected behavior because they are overwhelmed. In our example, 400 GB/day translates to 4.74 MB/s or 2.37 MB/s on ingress and 2.37 MB/s on egress. These are the numbers we’ll be referencing below.

Continuing with our example, we are striving to ensure the total throughput of a given worker process at any time does not exceed 4.74 MB/s, or, if it does, then we must strive not to let it exceed that threshold for too long. Remember, the 400 GB/day is a guideline rather than a hard limit. No processing at all (as you see with a default passthru Pipeline that has 0 functions in it) allows for a higher throughput ceiling. There is also the possibility that, in some environments, the processes won’t reach 400 GB/day, let alone exceed it. It just depends on what’s in your event processing configuration.

So, our throughput (in and out) rate is 4.74 MB/s. This is the aggregate ingress rate for all network sockets (TCP and UDP) from which a process is receiving data, plus the aggregate rate of all egress network sockets. In the simplest scenario, a process receives data over just one connection. That isn’t exactly realistic, but we’re going to simplify to make calculations easier in this exercise. We’ll assume one connection will constitute the entire ingress rate.

The other side of throughput is the egress rate. Users typically have at least some, if not all, events sent to multiple destinations. This type of configuration requires additional CPU and networking resources because each copy of data sent out to one or more destinations must be put on the wire separately in its own data stream along with the related overhead. As a result, although the overall throughput stays at 400 GB/day, the ratio of ingress vs egress traffic must be adjusted in our calculations. If we account for three outbound copies (and to simplify further, we’ll assume three full copies with event size on ingress equivalent to that of egress), that now drops the 200 GB ingress down to 100 GB/day so that we can have 300 GB/day on egress. This helps us stay within the 400 GB/day.

Pipeline Engineering

Now that we have discussed how Stream throughput is calculated, we can return to the discussion of determining how many Splunk ingestion pipelines you need. Estimating this requires knowing the total data volume generated by a given forwarder combined with the 4.74 MB/s (derived from the 400 GB/day/process number used previously), and the number of separate data streams (that is, copies of events) when factoring in both ingress and egress.

In other words:

Max_ingress_rate = 4.74 MB/s / #_of_tcp_streams.
# parallel pipelines = Forwarder’s_total_daily_volume / max_ingress_rate.

The max throughput of each event stream copy is calculated with equation 1. To simplify all these calculations, our first assumption is that the entire ingress data volume is also being sent on egress. This is in contrast to a subset of those events being sent to some destinations while a full set is being sent to others. That mixed scenario is beyond the scope of this discussion. As a result of that assumption, the max throughput of the ingress stream can be considered the same as the throughput of any of the ingress or egress streams. This greatly simplifies our calculations.

Secondly, these calculations assume the event size of each egress stream is the same as on ingress. Keep in mind that JSON Unroll, enrichment, reduction, and so forth will cause egress volume to be much different than ingress.

So, here is an example. One copy of data on ingress, and three full copies on egress, which is a total of four full copies.

1.185 MB/s per stream = 4.74 MB/s / 4.

Now, let’s assume a UF is sending 800 GB/day among many different inputs (to fully leverage multiple ingestion pipelines). This translates into an average of 9.48 MB/s.

8 pipelines = 9.48 MB/s / 1.185 MB/s.

What does 8 pipelines mean? What we’ve calculated is the minimum number of ingestion pipelines that should be used on this particular UF to help reduce (but maybe not eliminate) the likelihood the Stream processes receiving data from this UF will be overwhelmed when accounting for all ingress and egress data streams related to the UF’s data. These calculations can’t guarantee anything because this is only an estimate. In reality, the pipelines aren’t used perfectly evenly within a UF (the aforementioned weighted_random will help address that). Another reason that there is no guarantee of zero processing lag is that we’re using the average of a forwarder’s daily volume. When a traffic spike occurs, there still could be some short-lived processing lag from Stream processes.

We’re also simplifying the configuration by assuming these worker processes are not processing events from any other sources. Remember that some of your UFs will be processing differing amounts of events due to collecting different data sets therefore, you’ll need to do this calculation for each forwarder group.

If your configuration involves sending subsets of the ingress event stream to one or more destinations or your events on egress are larger/smaller compared to ingress. These calculations will need to be modified accordingly. Ultimately though, it’s difficult, nigh impossible, to account for all variables, so use this information as guidance for items to consider, and be prepared to encounter scenarios not yet accounted for and, therefore, future tuning.

Conclusion

As many of you know who have Splunk experience, Splunk forwarders are complicated beasts but the same settings making them complicated also make them as flexible as possible. Many users may not need to leverage those settings when in a homogeneous Splunk environment but these settings are heavily relied upon for optimizing performance with Cribl Stream. We encourage you to review your Splunk forwarder configurations if you experience slower performance when sending to Stream than Splunk indexers. Nine times out of 10 the solution requires simple tuning of the forwarder configurations to interoperate with Cribl Stream.

Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl’s suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs.

We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Blog

Preventing Friction With an Impactful Security Champions Program

Blog

From Necessity to Opportunity: The Customer Push for SIEM Options

Blog

Securing the Foundation of Cribl Copilot

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.

Launch Now

Product Portfolio

Cribl Stream

Cribl Edge

Cribl Search

Cribl Lake

Cribl.Cloud

Cribl Copilot

AppScope

Use Cases

Integration

Industries

Resources

Events & Webinars

Learning

Tools & Pricing

Download Library

Customer Stories

Customer Experience

Learning

Try Your Own Cribl Sandbox

About Cribl

Cribl Newsroom

Leadership

Careers

Cribl for Startups

Contact Us

Better Practices for Getting Data in from Splunk Universal Forwarders

Written by

Brandon McCombs

Why Do My Universal Forwarders (UFs) Need Tuning to Work With Cribl Stream?

Which Knobs Do I Turn?

Throughput Under a Microscope

Pipeline Engineering

Conclusion

Blog

Preventing Friction With an Impactful Security Champions Program

Blog

From Necessity to Opportunity: The Customer Push for SIEM Options

Blog

Securing the Foundation of Cribl Copilot

Try Your Own Cribl Sandbox

So you're rockin' Internet Explorer!