x

Leveraging V8 VM Contexts To Enable Cribl Packs Functionality

Written by Nick Romito

February 23, 2022

TL;DR

We were able to leverage the Node.js VM module to execute arbitrary JavaScript code with its own set of globals, separate from the running process’ globals.

Problem

Cribl Stream can be thought of as a streams processing engine for machine data, using functions that are shipped as a configuration in the form of index.js files. Stream will load up the code in these files, compile the code, and send events through them to perform all of the manipulations on the machine data. We ship the functions as configuration files, so anyone has the ability to write new, custom functions to meet their data processing needs. You can check out this blog post on how to write custom functions.

In the 3.0 release of Stream, we introduced Packs. Boiling the problem down, we now need to run these functions with varying global scopes, depending on the Pack context you’re running data through. And we need to make this possible without updating each individual index.js file we ship that references any globals.

Solution

The functions were initially loaded using Module._load(absoluteFilePath), to make the code within the file executable from within the context of the running process. However, this means you’re required to share the same global scope between the running process and the arbitrary function code being loaded.

We were able to decouple the global scope of the running process from the arbitrary function code by loading the content of the index.js file we want to run into memory, crafting an object that represents the global scope for the code we’re going to execute, and handing both of those things off to vm.runInNewContext().

In our case, we wanted to hand off mostly the same set of global variables – except, we wanted to override an application-specific variable (see more info here) that we expose, to enable developers of custom functions to leverage utility functions and APIs of the underlying Stream platform. We needed to override this variable to load it with the corresponding configuration files, based on the context in which the function is running.

Here’s an example of what the above would look like in code:

const module = require('module');
const path = require('path');
const fs = require('fs');
const vm = require('vm');

const globalsMap = new Map([
  [
    'mycontext',
    {
      getTheAnswer: () => 42
    }
  ]
]);

function crequire(absFile, context) {
  const mod = {};
  vm.runInNewContext(
    fs.readFileSync(absFile).toString(), 
    {
      ...global, 
      Promise,
      __filename: absFile,
      __dirname: path.dirname(absFile),
      exports: mod, 
      MyCustomGlobals: globalsMap.get(context), // inject your custom global object here - allows code in absFile to call MyCustomGlobals.getTheAnswer()
      require: module.createRequire(absFile), 
      process,
      Date
    }, 
    {filename: absFile});
  return mod;
}

const mod = crequire(path.join(process.cwd(), 'index.js'), 'mycontext');

// if the file loaded has exports.process = (...) => {...}, it should be added to the mod object:
mod.process({foo: 'bar'});

What the Above Does

  • Asks crequire() to load and return a given file, using ‘mycontext’ global variables
  • Reads the contents of the JavaScript file
  • Uses vm.runInNewContext() to feed the content of the JavaScript file, and the desired globals, into a new V8 Virtual Machine context
  • Uses vm.runInNewContext() to compile and assign the exported values from within the JavaScript file into the mod variable
  • Returns the mod variable holding the exports object from crequire()
  • All exports are now accessible/callable from the returned object

Figuring out the right combination of globals to pass into the new context was a fun challenge, which ultimately required me to dig into the Node.js code itself. From the Node.js code, I learned about vm.runInNewContext()and module.createRequire(), and about how the global variables (e.g., __filename, __dirname, etc.) are not truly global and are actually contextualized per module/file.

The lesson here is that Node.js is an incredibly flexible runtime that provides a plethora of tools/modules for accomplishing most tasks at hand.

If you found this problem interesting, there are plenty more cool engineering problems where that came from! Come join us!

Questions about our technology? We’d love to chat with you.