Being a Cribl Pack author, I frequently receive questions related to why I chose to implement a certain functionality inside my Packs the way I did. A few lives ago, I worked for a Fortune 250 oil & gas company where I managed our SIEM environment. We didn’t have much in terms of system resources, so we needed to make everything run as efficiently as possible. (Maybe that’s where I get my love for performance from?)
Cribl Stream is written in JavaScript (technically TypeScript) and executed on the Node.js runtime. The JavaScript v8 engine embedded in Node.js is very powerful and our customers need less hardware to run their observability platforms compared to their previously deployed solutions. On average, Cribl customers are typically seeing 3-4x returns on their investments including that reduction in hardware!
But this JavaScript v8 engine is only as powerful as you make it. You wouldn’t put normal gasoline in a racing V8 engine, only the expensive high-octane stuff! The filters in your Cribl Stream Routes and Pipelines are like the gasoline in your car’s engine. The better the fuel, the better the engine runs. The better your Stream filters, the faster your Routes and Pipelines can process observability data.
Now consider running observability platforms at scale. Some of Cribl’s customers run Stream processing 100s of TB into Petabytes of data per day. Observability at this scale requires careful attention to minor performance items, such as the choice of the function in your filter.
Disclaimers upfront: This blog is not meant as a comprehensive guide to all possibilities of filters, nor is it a truly scientific experiment. We’ll keep this high-level for this blog. For more in-depth engineering content, please go read some of the blogs that our amazing developers have published!
For my testing, I picked a variety of the most commonly used JavaScript string operators: match, search, startsWith, includes, indexOf, and test
. Each of these functions operates similarly to each other, but slightly differently. indexOf
, includes
, and startsWith
use strings as their function parameter, while match, search
, and test
use regular expressions.
You can find JavaScript benchmarking sites online that will run these tests in your browser, but I wanted to test directly on a Stream instance. Fortunately, Stream allows you to run JavaScript files at the command line. I wrote a simple JavaScript file that would run the same benchmark across all functions. Each function runs one million times while under a timer. The average time (in nanoseconds) is then computed to give an average time per operation.
Here’s the code I used for the benchmark:
const iterations = 1000000; function bench(name, test) { console.time(name) for (var i = 0; i < iterations; i++) { test() } console.timeEnd(name) } bench('test', () => /%ASA-/.test("%ASA-6-305011: Built dynamic TCP translation from outside:10.123.3.42/4952 to outside:192.0.2.130/12834")) bench('indexOf', () => "%ASA-6-305011: Built dynamic TCP translation from outside:10.123.3.42/4952 to outside:192.0.2.130/12834".indexOf("%ASA-")) bench('includes', () => "%ASA-6-305011: Built dynamic TCP translation from outside:10.123.3.42/4952 to outside:192.0.2.130/12834".includes("%ASA-")) bench('startsWith', () => "%ASA-6-305011: Built dynamic TCP translation from outside:10.123.3.42/4952 to outside:192.0.2.130/12834".startsWith("%ASA-")) bench('match', () => "%ASA-6-305011: Built dynamic TCP translation from outside:10.123.3.42/4952 to outside:192.0.2.130/12834".match(/%ASA-/)) bench('search', () => "%ASA-6-305011: Built dynamic TCP translation from outside:10.123.3.42/4952 to outside:192.0.2.130/12834".search(/%ASA-/)) }
To execute this benchmark, I connected via SSH to the Leader node, and ran my command. This server is a Graviton 2 c6g.2xlarge instance running on Amazon Web Services. Here’s the command:
[cribl@leader]$ /opt/cribl/bin/cribl node bench.js
This gave me an output of each benchmark (I’ve sorted the output based on performance, from fastest to slowest):
indexOf: 35.78 ns/op
includes: 36.11 ns/op
startsWith: 40.58 ns/op
test: 56.35 ns/op
search: 63.59 ns/op
match: 84.66 ns/op
As you can see, the indexOf function operates approximately 2.5-3 times faster than the match function. This is pretty consistent across various trials I ran, and across instance types, too. Granted, being able to use the indexOf or includes functions for all comparisons won’t be possible in all use cases, but small optimizations can lead to huge performance gains systemwide.
For a real-world application, this is why I decided to use the indexOf function in the Cribl Pack for Palo Alto Networks. Security appliances, such as firewalls, generate huge volumes of data that need to be quickly processed. I made a change to the Pack to optimize the Routes from using the test function to indexOf. While it doesn’t look as pretty, the change allows the Pack to process data about 2x faster.
I hope this helps you gain some more performance from your Stream deployment. Join us on the Cribl Community for discussions about all things Stream, sign up for your own free Cribl.Cloud instance and start making sense of your observability data today! Interested in joining Cribl? We’re hiring!
The fastest way to get started with Cribl Stream and Cribl Edge is to try the Free Cloud Sandboxes.
Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.