In modern network systems, managing data flow efficiently is critical, especially when dealing with high volumes of log data. One common challenge for IT teams is the bottleneck caused by Syslog TCP pinning, where a limited number of persistent TCP connections lead to throughput inefficiencies. This blog explores the concept of TCP pinning in depth, discussing its implications on network performance and detailing strategies to alleviate these bottlenecks through innovative load balancing techniques. By integrating real-world examples and advanced solutions, we aim to provide a comprehensive guide to optimizing TCP connections for better resource utilization and throughput scaling.
Whilst Cribl enables scaling data processing workloads across multiple Worker Nodes and Processes, network connections are typically load balanced across Worker Nodes and not the more granular Worker Processes running on said Nodes (have we said Worker enough?).
This means that when admins have huge powerful compute systems running Cribl software, some of their data streams may be stuck to one CPU core while the rest of the performance is left on the table. As Bruce Almighty said, “All this horsepower and no room to gallop.”
Fear not, lone (data) wanderer, there are solutions to all problems, ye only need ask! In this instance, your friends will be: utilizing multiple short lived connections and/or enabling TCP load balancing.
When a network connection is established, the data is pinned to a Worker Process. More connections, more chances for other Worker Processes to take some of the load. Therefore, option one for your lack of CPU utilization woes is this: using short lived connections to allow continual rotation of traffic to your Worker Processes (CPU cores).
Wait a little bit, though. There are some challenges associated with this approach:
A challenger appears: load balancing traffic, not connections across all worker processes!
This new feature (released in version 4.6.0) allows the load balancing of Syslog TCP traffic across all Worker Processes. When enabled, a single load balancer Worker Process is spawned to well process incoming connections. Just what the admin ordered.
Let’s dig a little deeper.
We can see the benefits of this approach in two main ways:
Currently, TCP Load Balancing is only supported for Syslog TCP since it requires lightweight deserialization which allows the load balancer process to delegate the majority of the data processing workload to worker processes.
Also, we’ve been in the data engine business for a while now, so we acknowledge that a single load balancer process can be a bottleneck and point of failure. Don’t worry, it won’t be for long.
You can read all day (please do; our blogs and docs are fun), but sometimes you need to see and believe. The following is a sample demonstration of a data processing workload scaling linearly with the number of Worker Processes when TCP load balancing is enabled.
Our technical champion at a Fortune 50 Financial Services Organization says TCP Load Balancing has made the team’s life so much easier, and they went from a lag of 12 hours to almost zero for their security logs! And that’s no small amount of logs: 230,000 per second! TCP Load Balancing has already greatly impacted the team’s ability to quickly and confidently secure the business, reduce risk, and accelerate response times.
“9 am Wednesday, when we enabled load balancing on Syslog, we immediately saw the benefit…very happy about this…I want you to know this made a real impact… the team is no longer on my back.”
“We’re getting 80% reduction on Zscaler logs with Cribl after solving the TCP pinning with Elastic Filebeat for load balancing and optimizing the pipelines.”
This could be you!
At Cribl, we’re customers first, always. We heard your feedback and know the bottlenecks around syslog TCP pinning have been a source of pain for a long time. We’re hopeful the capabilities we’ve delivered to address these challenges will help your team to improve resource utilization and throughput scaling, thus ensuring efficient data flow management in your IT and security infrastructures.
Get Cribl.Cloud here or read more about this feature in the docs on TCP load balancing.
Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl’s suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs.
We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.
Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.