AdobeStock_806260302

Optimizing Throughput: Overcoming Syslog TCP Pinning with Cribl’s Load Balancing

June 24, 2024

In modern network systems, managing data flow efficiently is critical, especially when dealing with high volumes of log data. One common challenge for IT teams is the bottleneck caused by Syslog TCP pinning, where a limited number of persistent TCP connections lead to throughput inefficiencies. This blog explores the concept of TCP pinning in depth, discussing its implications on network performance and detailing strategies to alleviate these bottlenecks through innovative load balancing techniques. By integrating real-world examples and advanced solutions, we aim to provide a comprehensive guide to optimizing TCP connections for better resource utilization and throughput scaling.

Load Balancing and TCP Pinning

Whilst Cribl enables scaling data processing workloads across multiple Worker Nodes and Processes, network connections are typically load balanced across Worker Nodes and not the more granular Worker Processes running on said Nodes (have we said Worker enough?).

This means that when admins have huge powerful compute systems running Cribl software, some of their data streams may be stuck to one CPU core while the rest of the performance is left on the table. As Bruce Almighty said, “All this horsepower and no room to gallop.”

Scaling Throughput with Better Resource Utilization

Fear not, lone (data) wanderer, there are solutions to all problems, ye only need ask! In this instance, your friends will be: utilizing multiple short lived connections and/or enabling TCP load balancing.

Multiple Short Lived Connections

When a network connection is established, the data is pinned to a Worker Process. More connections, more chances for other Worker Processes to take some of the load. Therefore, option one for your lack of CPU utilization woes is this: using short lived connections to allow continual rotation of traffic to your Worker Processes (CPU cores).

Wait a little bit, though. There are some challenges associated with this approach:

  1. It’s hard to control the number of connections across myriad clients.
  2. Some clients may be busier than others. (Like how that one friend never answers your texts. Just me? Ok…)
  3. Ain’t nobody got time to _configure_ all those clients.
  4. More connections and establishing connections multiple times leads to more overhead.

Enabling TCP Load Balancing

A challenger appears: load balancing traffic, not connections across all worker processes!

This new feature (released in version 4.6.0) allows the load balancing of Syslog TCP traffic across all Worker Processes. When enabled, a single load balancer Worker Process is spawned to well process incoming connections. Just what the admin ordered.

Let’s dig a little deeper.

New Load Balancing Process Responsibilities

  • Handle connections.
  • Perform lightweight deserialization of the data
    • e.g.: split chunks of events by newline. Other formats, such as octet-count-based frames, are also supported.
  • Round-robin chunks of data to Worker Processes for further processing.

Benefits? Yes, Please!

We can see the benefits of this approach in two main ways:

  • Distributing the workload evenly across all worker processes yields 100% resource utilization
  • Throughput now scales linearly with the number of Worker Processes.

Limitations (For Now)

Currently, TCP Load Balancing is only supported for Syslog TCP since it requires lightweight deserialization which allows the load balancer process to delegate the majority of the data processing workload to worker processes.

Also, we’ve been in the data engine business for a while now, so we acknowledge that a single load balancer process can be a bottleneck and point of failure. Don’t worry, it won’t be for long.

Back It Up, With Pictures!

You can read all day (please do; our blogs and docs are fun), but sometimes you need to see and believe. The following is a sample demonstration of a data processing workload scaling linearly with the number of Worker Processes when TCP load balancing is enabled.

Our technical champion at a Fortune 50 Financial Services Organization says TCP Load Balancing has made the team’s life so much easier, and they went from a lag of 12 hours to almost zero for their security logs! And that’s no small amount of logs: 230,000 per second! TCP Load Balancing has already greatly impacted the team’s ability to quickly and confidently secure the business, reduce risk, and accelerate response times.

“9 am Wednesday, when we enabled load balancing on Syslog, we immediately saw the benefit…very happy about this…I want you to know this made a real impact… the team is no longer on my back.”

“We’re getting 80% reduction on Zscaler logs with Cribl after solving the TCP pinning with Elastic Filebeat for load balancing and optimizing the pipelines.”

This could be you!

Wrap up

At Cribl, we’re customers first, always. We heard your feedback and know the bottlenecks around syslog TCP pinning have been a source of pain for a long time. We’re hopeful the capabilities we’ve delivered to address these challenges will help your team to improve resource utilization and throughput scaling, thus ensuring efficient data flow management in your IT and security infrastructures.

Get Cribl.Cloud here or read more about this feature in the docs on TCP load balancing.


 

Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl’s suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs.

We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

.
Blog
Feature Image

Cribl Stream: Up To 47x More Efficient vs OpenTelemetry Collector

Read More
.
Blog
Feature Image

12 Ways We Sleighed Innovation This Year

Read More
.
Blog
Feature Image

Scaling Observability on a Budget with Cribl for State, Local, and Education

Read More
pattern

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.

box

So you're rockin' Internet Explorer!

Classic choice. Sadly, our website is designed for all modern supported browsers like Edge, Chrome, Firefox, and Safari

Got one of those handy?