javascript front-end developing

Leveraging V8 VM Contexts To Enable Cribl Packs Functionality

Last edited: February 23, 2022

TL;DR

We were able to leverage the Node.js VM module to execute arbitrary JavaScript code with its own set of globals, separate from the running process’ globals.

Problem

Cribl Stream can be thought of as a streams processing engine for machine data, using functions that are shipped as a configuration in the form of index.js files. Stream will load up the code in these files, compile the code, and send events through them to perform all of the manipulations on the machine data. We ship the functions as configuration files, so anyone has the ability to write new, custom functions to meet their data processing needs. You can check out this blog post on how to write custom functions.

In the 3.0 release of Stream, we introduced Packs. Boiling the problem down, we now need to run these functions with varying global scopes, depending on the Pack context you’re running data through. And we need to make this possible without updating each individual index.js file we ship that references any globals.

Solution

The functions were initially loaded using Module._load(absoluteFilePath), to make the code within the file executable from within the context of the running process. However, this means you’re required to share the same global scope between the running process and the arbitrary function code being loaded.

We were able to decouple the global scope of the running process from the arbitrary function code by loading the content of the index.js file we want to run into memory, crafting an object that represents the global scope for the code we’re going to execute, and handing both of those things off to vm.runInNewContext().

In our case, we wanted to hand off mostly the same set of global variables – except, we wanted to override an application-specific variable (see more info here) that we expose, to enable developers of custom functions to leverage utility functions and APIs of the underlying Stream platform. We needed to override this variable to load it with the corresponding configuration files, based on the context in which the function is running.

Here’s an example of what the above would look like in code:

Code example
const module = require('module'); const path = require('path'); const fs = require('fs'); const vm = require('vm'); const globalsMap = new Map([ [ 'mycontext', { getTheAnswer: () => 42 } ] ]); function crequire(absFile, context) { const mod = {}; vm.runInNewContext( fs.readFileSync(absFile).toString(), { ...global, Promise, __filename: absFile, __dirname: path.dirname(absFile), exports: mod, MyCustomGlobals: globalsMap.get(context), // inject your custom global object here - allows code in absFile to call MyCustomGlobals.getTheAnswer() require: module.createRequire(absFile), process, Date }, {filename: absFile}); return mod; } const mod = crequire(path.join(process.cwd(), 'index.js'), 'mycontext'); // if the file loaded has exports.process = (...) => {...}, it should be added to the mod object: mod.process({foo: 'bar'});

What the Above Does

  • Asks crequire() to load and return a given file, using ‘mycontext’ global variables

  • Reads the contents of the JavaScript file

  • Uses vm.runInNewContext() to feed the content of the JavaScript file, and the desired globals, into a new V8 Virtual Machine context

  • Uses vm.runInNewContext() to compile and assign the exported values from within the JavaScript file into the mod variable

  • Returns the mod variable holding the exports object from crequire()

  • All exports are now accessible/callable from the returned object

Figuring out the right combination of globals to pass into the new context was a fun challenge, which ultimately required me to dig into the Node.js code itself. From the Node.js code, I learned about vm.runInNewContext()and module.createRequire(), and about how the global variables (e.g., __filename, __dirname, etc.) are not truly global and are actually contextualized per module/file.

The lesson here is that Node.js is an incredibly flexible runtime that provides a plethora of tools/modules for accomplishing most tasks at hand.

If you found this problem interesting, there are plenty more cool engineering problems where that came from! Come join us!

Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl’s suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs.

We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

More from the blog

get started

Choose how to get started

See

Cribl

See demos by use case, by yourself or with one of our team.

Try

Cribl

Get hands-on with a Sandbox or guided Cloud Trial.

Free

Cribl

Process up to 1TB/day, no license required.