AdobeStock_247378763

High-Performance Javascript in Stream – Why the Function in Your Filter Matters

Last edited: April 28, 2022

Being a Cribl Pack author, I frequently receive questions related to why I chose to implement a certain functionality inside my Packs the way I did. A few lives ago, I worked for a Fortune 250 oil & gas company where I managed our SIEM environment. We didn’t have much in terms of system resources, so we needed to make everything run as efficiently as possible. (Maybe that’s where I get my love for performance from?)

Cribl Stream is written in JavaScript (technically TypeScript) and executed on the Node.js runtime. The JavaScript v8 engine embedded in Node.js is very powerful and our customers need less hardware to run their observability platforms compared to their previously deployed solutions. On average, Cribl customers are typically seeing 3-4x returns on their investments including that reduction in hardware!

But this JavaScript v8 engine is only as powerful as you make it. You wouldn’t put normal gasoline in a racing V8 engine, only the expensive high-octane stuff! The filters in your Cribl Stream Routes and Pipelines are like the gasoline in your car’s engine. The better the fuel, the better the engine runs. The better your Stream filters, the faster your Routes and Pipelines can process observability data.

Now consider running observability platforms at scale. Some of Cribl’s customers run Stream processing 100s of TB into Petabytes of data per day. Observability at this scale requires careful attention to minor performance items, such as the choice of the function in your filter.

Testing Performance (An Unscientific Approach)

Disclaimers upfront: This blog is not meant as a comprehensive guide to all possibilities of filters, nor is it a truly scientific experiment. We’ll keep this high-level for this blog. For more in-depth engineering content, please go read some of the blogs that our amazing developers have published!

For my testing, I picked a variety of the most commonly used JavaScript string operators: match, search, startsWith, includes, indexOf, and test. Each of these functions operates similarly to each other, but slightly differently. indexOf, includes, and startsWith use strings as their function parameter, while match, search, and test use regular expressions.

You can find JavaScript benchmarking sites online that will run these tests in your browser, but I wanted to test directly on a Stream instance. Fortunately, Stream allows you to run JavaScript files at the command line. I wrote a simple JavaScript file that would run the same benchmark across all functions. Each function runs one million times while under a timer. The average time (in nanoseconds) is then computed to give an average time per operation.

Bench.js

Here’s the code I used for the benchmark:

Code example
const iterations = 1000000; function bench(name, test) { console.time(name) for (var i = 0; i < iterations; i++) { test() } console.timeEnd(name) } bench('test', () => /%ASA-/.test("%ASA-6-305011: Built dynamic TCP translation from outside:10.123.3.42/4952 to outside:192.0.2.130/12834")) bench('indexOf', () => "%ASA-6-305011: Built dynamic TCP translation from outside:10.123.3.42/4952 to outside:192.0.2.130/12834".indexOf("%ASA-")) bench('includes', () => "%ASA-6-305011: Built dynamic TCP translation from outside:10.123.3.42/4952 to outside:192.0.2.130/12834".includes("%ASA-")) bench('startsWith', () => "%ASA-6-305011: Built dynamic TCP translation from outside:10.123.3.42/4952 to outside:192.0.2.130/12834".startsWith("%ASA-")) bench('match', () => "%ASA-6-305011: Built dynamic TCP translation from outside:10.123.3.42/4952 to outside:192.0.2.130/12834".match(/%ASA-/)) bench('search', () => "%ASA-6-305011: Built dynamic TCP translation from outside:10.123.3.42/4952 to outside:192.0.2.130/12834".search(/%ASA-/)) }

To execute this benchmark, I connected via SSH to the Leader node, and ran my command. This server is a Graviton 2 c6g.2xlarge instance running on Amazon Web Services. Here’s the command:

[cribl@leader]$ /opt/cribl/bin/cribl node bench.js

This gave me an output of each benchmark (I’ve sorted the output based on performance, from fastest to slowest):

indexOf: 35.78 ns/opincludes: 36.11 ns/opstartsWith: 40.58 ns/optest: 56.35 ns/opsearch: 63.59 ns/opmatch: 84.66 ns/op

Results

As you can see, the indexOf function operates approximately 2.5-3 times faster than the match function. This is pretty consistent across various trials I ran, and across instance types, too. Granted, being able to use the indexOf or includes functions for all comparisons won’t be possible in all use cases, but small optimizations can lead to huge performance gains systemwide.

For a real-world application, this is why I decided to use the indexOf function in the Cribl Pack for Palo Alto Networks. Security appliances, such as firewalls, generate huge volumes of data that need to be quickly processed. I made a change to the Pack to optimize the Routes from using the test function to indexOf. While it doesn’t look as pretty, the change allows the Pack to process data about 2x faster.

I hope this helps you gain some more performance from your Stream deployment. Join us on the Cribl Community for discussions about all things Stream, sign up for your own free Cribl.Cloud instance and start making sense of your observability data today! Interested in joining Cribl? We’re hiring!

The fastest way to get started with Cribl Stream and Cribl Edge is to try the Free Cloud Sandboxes.

Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl’s suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs.

We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

More from the blog

get started

Choose how to get started

See

Cribl

See demos by use case, by yourself or with one of our team.

Try

Cribl

Get hands-on with a Sandbox or guided Cloud Trial.

Free

Cribl

Process up to 1TB/day, no license required.