Leveraging V8 VM Contexts To Enable Cribl Packs Functionality

February 23, 2022
Written by
Categories: Engineering

TL;DR

We were able to leverage the Node.js VM module to execute arbitrary JavaScript code with its own set of globals, separate from the running process’ globals.

Problem

Cribl Stream can be thought of as a streams processing engine for machine data, using functions that are shipped as a configuration in the form of index.js files. Stream will load up the code in these files, compile the code, and send events through them to perform all of the manipulations on the machine data. We ship the functions as configuration files, so anyone has the ability to write new, custom functions to meet their data processing needs. You can check out this blog post on how to write custom functions.

In the 3.0 release of Stream, we introduced Packs. Boiling the problem down, we now need to run these functions with varying global scopes, depending on the Pack context you’re running data through. And we need to make this possible without updating each individual index.js file we ship that references any globals.

Solution

The functions were initially loaded using Module._load(absoluteFilePath), to make the code within the file executable from within the context of the running process. However, this means you’re required to share the same global scope between the running process and the arbitrary function code being loaded.

We were able to decouple the global scope of the running process from the arbitrary function code by loading the content of the index.js file we want to run into memory, crafting an object that represents the global scope for the code we’re going to execute, and handing both of those things off to vm.runInNewContext().

In our case, we wanted to hand off mostly the same set of global variables – except, we wanted to override an application-specific variable (see more info here) that we expose, to enable developers of custom functions to leverage utility functions and APIs of the underlying Stream platform. We needed to override this variable to load it with the corresponding configuration files, based on the context in which the function is running.

Here’s an example of what the above would look like in code:

const module = require('module');
const path = require('path');
const fs = require('fs');
const vm = require('vm');

const globalsMap = new Map([
  [
    'mycontext',
    {
      getTheAnswer: () => 42
    }
  ]
]);

function crequire(absFile, context) {
  const mod = {};
  vm.runInNewContext(
    fs.readFileSync(absFile).toString(), 
    {
      ...global, 
      Promise,
      __filename: absFile,
      __dirname: path.dirname(absFile),
      exports: mod, 
      MyCustomGlobals: globalsMap.get(context), // inject your custom global object here - allows code in absFile to call MyCustomGlobals.getTheAnswer()
      require: module.createRequire(absFile), 
      process,
      Date
    }, 
    {filename: absFile});
  return mod;
}

const mod = crequire(path.join(process.cwd(), 'index.js'), 'mycontext');

// if the file loaded has exports.process = (...) => {...}, it should be added to the mod object:
mod.process({foo: 'bar'});

What the Above Does

  • Asks crequire() to load and return a given file, using ‘mycontext’ global variables
  • Reads the contents of the JavaScript file
  • Uses vm.runInNewContext() to feed the content of the JavaScript file, and the desired globals, into a new V8 Virtual Machine context
  • Uses vm.runInNewContext() to compile and assign the exported values from within the JavaScript file into the mod variable
  • Returns the mod variable holding the exports object from crequire()
  • All exports are now accessible/callable from the returned object

Figuring out the right combination of globals to pass into the new context was a fun challenge, which ultimately required me to dig into the Node.js code itself. From the Node.js code, I learned about vm.runInNewContext()and module.createRequire(), and about how the global variables (e.g., __filename, __dirname, etc.) are not truly global and are actually contextualized per module/file.

The lesson here is that Node.js is an incredibly flexible runtime that provides a plethora of tools/modules for accomplishing most tasks at hand.

If you found this problem interesting, there are plenty more cool engineering problems where that came from! Come join us!

.
Blog
Feature Image

Cribl Stream: Up To 47x More Efficient vs OpenTelemetry Collector

Read More
.
Blog
Feature Image

12 Ways We Sleighed Innovation This Year

Read More
.
Blog
Feature Image

Scaling Observability on a Budget with Cribl for State, Local, and Education

Read More
pattern

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.

box

So you're rockin' Internet Explorer!

Classic choice. Sadly, our website is designed for all modern supported browsers like Edge, Chrome, Firefox, and Safari

Got one of those handy?