Products
Product Portfolio

Cribl puts your IT and Security data at the center of your data management strategy and provides a one-stop shop for analyzing, collecting, processing, and routing it all at any scale. Try the Cribl suite of products and start building your data engine today!
Learn more ›

Evolving demands placed on IT and Security teams are driving a new architecture for how observability data is captured, curated, and queried. This new architecture provides flexibility and control while managing the costs of increasing data volumes.
Read white paper ›

Cribl Stream

Cribl Stream is a vendor-agnostic observability pipeline that gives you the flexibility to collect, reduce, enrich, normalize, and route data from any source to any destination within your existing data infrastructure.
Learn more ›

Vodafone Case Study

Vodafone Dials up Business Insights with Cribl Stream
Read Case Study ›

Cribl Edge

Cribl Edge provides an intelligent, highly scalable edge-based data collection system for logs, metrics, and application data.
Learn more ›

SpyCloud Edge Story

Listen to how SpyCloud uses Cribl Edge at scale.
Watch Video ›

Cribl Search

Cribl Search turns the traditional search process on its head, allowing users to search data in place without having to collect/store first.
Learn more ›

How Cribl Search Can Save You From Drowning in a Deluge of Data
Read Blog ›

Cribl Lake

Cribl Lake is a turnkey data lake solution that takes just minutes to get up and running — no data expertise needed. Leverage open formats, unified security with rich access controls, and central access to all IT and security data.
Learn more ›

Navigating the future of IT and Security Data management white paper
Read white paper ›

Cribl.Cloud

The Cribl.Cloud platform gets you up and running fast without the hassle of running infrastructure.
Learn more ›

Cribl.Cloud Solution Brief

The fastest and easiest way to realize the value of an observability ecosystem.
Read Solution Brief ›

Cribl Copilot

Cribl Copilot gets your deployments up and running in minutes, not weeks or months.
Learn more ›

Cribl Copilot

Your Trusted AI Advisor for Deploying, Configuring & Troubleshooting.
Read blog ›

AppScope

AppScope gives operators the visibility they need into application behavior, metrics and events with no configuration and no agent required.
Learn more ›

Sandbox

Launch an AppScope Sandbox today!
Launch Now ›
Solutions
Use Cases

Explore Cribl’s Solutions by Use Cases:

Supercharge Security Insights ›

Accelerate Cloud Migration ›

Avoid Vendor Lock-in ›

Agent Consolidation ›

Slash Storage Costs ›

Free Up Space for High-Value Data ›

Route From Any Source To Any Destination ›

Immediate Access to Archived Data ›

Replay Data from Low-Cost Storage ›

Reduce Log Volume & Pay Less for Infrastructure ›
Integration

Explore Cribl’s Solutions by Integrations:

Amazon ›

CrowdStrike ›

Elastic ›

Exabeam ›

Google ›

Microsoft ›

Splunk ›

Wiz ›

View All Integrations ›

Seamless Integrations for Your Observability Data
Learn More ›
Industries

Explore Cribl’s Solutions by Industry:

AIOps ›

Financial Services ›

Healthcare ›

Managed Security Services ›

Manufacturing and Logistics ›

Media and Entertainment ›

Public Sector ›

Retail ›
Resources
Resources

Resource Library ›

Documentation ›

Guides ›

AppScope Docs ›

Blog ›

Glossary ›

Podcasts ›

Telemetry 101

Understanding the Basics of Telemetry and Its Benefits
Learn More ›
Events & Webinars

Events ›

Webinars ›

CriblCon24
Watch On-Demand ›

July 31 | 10am PT / 1pm ET

Navigating the Data Current Report: Transforming IT & Security Operations in 2024
Register ›
Learning

Try the Sandboxes ›

Self Guided Trials ›

Cribl University ›

Cribl Community ›

Cribl Curious Forum ›

What is Observability? ›

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›
Tools & Pricing

Download Library ›

Past Releases ›

Pricing Plans ›

Stream ROI Calculator ›

Download Library

Download Cribl’s suite of products for free to get started.
Download ›
Customers
Customer Stories

Get inspired by how our customers are innovating IT, security and observability. They inspire us daily!
Read Customer Stories ›

Sally Beauty Holdings

Sally Beauty Swaps LogStash and Syslog-ng with Cribl.Cloud for a Resilient Security and Observability Pipeline
Read Case Study ›
Customer Experience

Support & Success ›

Professional Services ›

Service Delivery Partners ›

Documentation ›

AppScope Docs ›

Professional Services

Check out our new Professional Services offering.
Learn More ›
Learning

Try the Sandboxes ›

Self Guided Trials ›

Cribl University ›

Cribl Community ›

Cribl Curious Forum ›

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›
Company
About Cribl

Transform data management with Cribl, the Data Engine for IT and Security
Learn More ›

Cribl Corporate Overview

Cribl makes open observability a reality, giving you the freedom and flexibility to make choices instead of compromises.
Get the Guide ›

Cribl Newsroom

Stay up to date on all things Cribl and observability.
Visit the Newsroom ›

Press Releases

Read our most recent press releases.
Recent Press Releases ›

Leadership

Cribl’s leadership team has built and launched category-defining products for some of the most innovative companies in the technology sector, and is supported by the world’s most elite investors.
Meet our Leaders ›

Careers

Join the Cribl herd! The smartest, funniest, most passionate goats you’ll ever meet.
Learn More ›

Cribl Named to the Inc. 5000 List of Fastest Growing Private Companies
Learn More ›

Cribl for Startups

Whether you’re just getting started or scaling up, the Cribl for Startups program gives you the tools and resources your company needs to be successful at every stage.
Learn More ›

Contact Us

Want to learn more about Cribl from our sales experts? Send us your contact information and we’ll be in touch.
Talk to an Expert ›

Try Cribl Talk to an expert

Extending LogStream: Building Custom Functions

Written by Clint Sharp

November 29, 2018

One constant in log use cases is that you can’t plan for what you’re going to find at customers. Whether it’s multiple levels of encapsulation, like JSON-in-XML-in-Pipe-Separated (yes we’ve seen this), a need to radically transform the structure of events in a way we haven’t seen, or a need to reach out to an external system we’ve never worked with before, we knew going into this market we’d need to provide an easily extensible product. When we find ourselves in a place where a customer can’t chain our flexible out-of-the-box functions like Drop, Eval, or Mask, our customers or we can easily drop in a custom function which meets their needs.

One of the reasons that we chose JavaScript was its rich ecosystem of libraries and its emergence as a universal runtime with WebAssembly. Cribl allows you to easily drop in your own code, interpreted or compiled, and get full access to your log data in motion. Even before we had a UI or any out-of-the-box functions we proved out all our use case ideas through custom functions. We provide a very simple API to working with data, requiring you only to implement two methods: init and process. Configuration for custom functions works the same way as with out-of-the-box functions by providing a JSON Schema and a UI schema as implemented via React JSON Schema Form. With this simple schema definition language, which you may be familiar with as the same behind Swagger, the UI will automatically render the forms properly allowing you to provide even sophisticated configuration available to your end-users.

Lastly, one other advantage we get from JavaScript as a language is our ability to allow users to configure functions with JavaScript through the form of JavaScript Expressions. Along with our library of functions we ship for Masking and Encoding, powerful transformations and operations are possible through one-liner JavaScript expressions included in LogStream configurations.

With this post, we will walk you through how to build some custom functions with Cribl. We’ll start with some examples from the functions we ship, then conclude with a common use case: doing a DNS lookup against IP Addresses found in raw data.

Drop
Regex Filter
DNS Lookup

What is a function?

First, we should define what is a function in LogStream. In LogStream, functions are a combination of code, configuration and data. Functions are a directory of files. Here is our regex_filter function that we ship with Cribl:

regex_filter
├── conf.schema.json
├── conf.ui-schema.json
└── index.js

index.js contains our JavaScript code. It can include any built-in Node modules or reference other JavaScript files in its directory. Support for npm modules is on the backlog.

conf.schema.json and config.ui-schema.json are schema files for React JSON Schema Form, which will be covered in more detail below.

You can use Data Preview to store sample data with out-of-the-box or custom functions, for testing and validation.

To install a function, perhaps from our content repository, simply drop the function directory into $CRIBL_HOME/local/cribl/functions, and restart Cribl. After that, the function will be available in the UI.

Note: Prior to LogStream 1.7, this subdirectory was: $CRIBL_HOME/bin/functions.

Next, let’s look into the details of how a function is implemented.

Drop: The Simplest Function

Let’s examine a function which Cribl ships with: Drop. Drop is an incredibly simple function. If the Filter expression matches, we’ll drop the event. The Filter expression gets evaluated before the function itself gets called, so Drop is only executed for events which should be dropped.

Let’s look at the code for the function:

exports.name = 'Drop'; 
exports.version = '0.1'; 
exports.group = 'Standard'; 
exports.process = () => null;

Cribl functions are NodeJS modules, and we look for several module variables to be defined, the names of which should seem obvious. Name defines how the UI will display the function name, Version documents the function’s version, and Group is used by the UI to group like functions.

The process method is called for every event. It is passed the event, which is a JavaScript object that contains all the key/value pairs from our event. These key/value pairs are sent to our destination systems: in Splunk, they become index-time fields, in Elastic they become the shape of the event, or to a FileSystem or S3 they are serialized as JSON documents, one per line. In the case of the Drop function, we do not use the contents of the event, so the method is quite simply, return null for every call. When Cribl receives a falsey return value, we will drop the event.

Next Example: RegEx Filter

Now let’s introduce a slightly more sophisticated example. The next function, RegEx Filter, will drop an event if a given regular expression matches. This introduces some configuration into the function, allowing the user to input data. It implements both init and process, and ships conf.schema.json and conf.ui-schema.json for defining configurable variables.

First, let’s look at the biggest new item we’ve introduced, JSON Schema. If you’ve never heard of JSON Schema, check out their tutorial. We use React JSON Schema Form to render JSON Schema as forms. You can use their interactive playground to test forms and see what options are available. For RegEx Filter, we’ve introduced a simple schema which defines two config variables: regex which defines the RegEx we’ll execute against the data, and field which defines which field we’ll test for a RegEx match. Here’s the Schema JSON, contained in conf.schema.json:

{
 "type": "object",
 "title": "",
 "properties": {
    "regex": {
      "title": "Regex",
      "description": "Regex to test against",
      "type": "string",
      "regexp": true
    },
    "field": {
      "title": "Field",
      "description": "Name of the field to apply the regex on (defaults to _raw)",
      "type": "string",
      "default": "_raw"
    }
  }
}

This should be seem straightforward. We are returning an object whose properties, regexand field, have various properties defined about them, including their title, description, type, and default values. Any JSON Schema will work here, including sophisticated examples we’ve seen in Swagger. For some more sophisticated examples in Cribl, look at the Mask or Lookup functions.

React JSON Schema Form also allows us to specify some information that isn’t covered simply in the schema for the data. The UI may need to differentiate a password field from a normal string field, for example. In this case, we’re defining the RegEx field to use a custom input type which will validate a Regular Expression in conf.schema.json:

{
  "regex": {
    "ui:widget": "RegexInput",
    "ui:placeholder": "Regular expression"
  }
}

The UI Schema matches a given field name, in this case regex, and it tells it to use aui:widget of RegexInput. Now, let’s look at the code in index.js:

exports.name = 'Regex Filter';
exports.version = '0.1';
exports.group = 'Standard';

const { NamedGroupRegExp } = C.util; 

let regex;
let field = '_raw';
exports.init = (opts) => {
  const conf = opts.conf || {};
  regex = null;
  field = '_raw';

  if (conf.regex) {
    regex = new NamedGroupRegExp(conf.regex);
  }
  if (conf.field) {
    field = conf.field;
  }
};

exports.process = (event) => {
  if (regex) {
    regex.lastIndex = 0; // common trap of setting "global" flag
    return regex.test(event[field]) ? null : event;
  }
  return event;
};

The function is, again, quite simple. Most of the code is validating inputs to ensure the user has properly filled out regex and field. Let’s look at the new concepts. First, we declare module-level variables:

let regex;
let field = '_raw';

JavaScript is single-threaded, so we can safely declare state at the module, which will persist across each invocation of the Function’s process method. Next, we declare an init method which is called with an object. We use the name opts, which contains the key/value pairs configured by the user.

exports.init = (opts) => {
  const conf = opts.conf || {};
  regex = null;
  field = '_raw';

  if (conf.regex) {
    regex = new NamedGroupRegExp(conf.regex);
  }
  if (conf.field) {
    field = conf.field;
  }
};

React JSON Schema Form validates input provided by the UI, but users can configure via YAML or JSON configs, so we must also include validation in our functions to ensure we are not misconfigured. The majority of the code in init is validating that the user has inputted regex and field in the configuration. Now, let’s look at process:

exports.process = (event) => {
  if (regex) {
    regex.lastIndex = 0; // common trap of setting "global" flag
    return regex.test(event[field]) ? null : event;
  }
  return event;
};

Here again, we’re simply testing the value in field to see if it matches regex. If so, we return null; if we not, we return the event unmodified.

Reaching Out: Enriching Data using DNS

Lastly, let’s look at an example which shows a few more capabilities: asynchronous execution, reaching out to a third party system, and modifying an event. This really shows the power of Cribl’s extensibility: Custom user code can employ information in an event to modify the event, using information accessed elsewhere. Even though Cribl did originally not ship with this function, we can meaningfully extend LogStream to implement a use case that is currently difficult to do in all logging systems: do a DNS lookup at ingestion time instead of read time. This function is hosted in our content repo, under dns.

Note: Since first publishing this post, we’ve developed this example into the Reverse DNS (beta) out-of-the-box function that now ships with Cribl LogStream.

To keep it simple, this version of the function has no configuration; it simply enriches any IPv4 address it finds in the event’s _raw field. Our example function also does not support cache expiry, nor a few other features we’d likely implement for use beyond a demo. We’ve since enhanced it to make it more full-featured. But this original version shows how we enable users to extend LogStream with less full-featured implementations than Cribl would need in order to ship a generic version. Let’s look at the code:

exports.name = 'DNS Lookup';
exports.version = '0.1';
exports.group = 'Demo Functions';

const dns = require('dns');

const ipv4Regex = /(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)/gm;
const cache = {};

function reverse(IP, midx) {
  if (!cache[IP]) {
    cache[IP] = {
      promise: new Promise((resolve, reject) => { // eslint-disable-line
        dns.reverse(IP, (err, hostnames) => {
          if (!err) {
            const value = [`dns${midx !== 1 ? midx.toString() : ''}`, hostnames.join(' ')]; // if idx is not 1, name field dns2, dns3, etc
            cache[IP].value = value;
            resolve(value);
          } else {
            resolve([]);
          }
        });
      }),
    };
    return cache[IP].promise;
  } else if (!cache[IP].value) {
    return cache[IP].promise;
  }
  return Promise.resolve(cache[IP].value);
}

exports.disabled = 0;
exports.asyncTimeout = 500; // ms
exports.process = (event) => {
  const promises = [];
  let matches;
  let matchIdx = 1;
  ipv4Regex.lastIndex = 0; // ensure this is properly reset
  while (matches = ipv4Regex.exec(event._raw)) {
    const midx = matchIdx;
    const IP = matches[0];
    promises.push(reverse(IP, midx));
    matchIdx++;
  }
  if (promises.length === 0) {
    return event;
  }
  return Promise.all(promises)
    .then((entries) => {
      entries.filter(e => e !== undefined).forEach(e => {
        event[e[0]] = e[1];
      });
      return event
    })
    .catch(() => {
      return event;
    });
};

Our function defines a few module-level variables, such as importing Node’s dnsmodule, setting up a cache variable, and defining a RegEx we will use for matching IPv4 addresses. Let’s look at our process implementation:

exports.process = (event) => {
  const promises = [];
  let matches;
  let matchIdx = 1;
  ipv4Regex.lastIndex = 0; // ensure this is properly reset
  while (matches = ipv4Regex.exec(event._raw)) {
    const midx = matchIdx;
    const IP = matches[0];
    promises.push(reverse(IP, midx));
    matchIdx++;
  }
  if (promises.length === 0) {
    return event;
  }
  return Promise.all(promises)
    .then((entries) => {
      entries.filter(e => e !== undefined).forEach(e => {
        event[e[0]] = e[1];
      });
      return event
    })
    .catch(() => {
      return event;
    });
};

We first match all the instances of the IPv4 regex we find in the _raw field, which is hard-coded for this function. For each match, we add a promise to an array which we then pass to Promise.all. With Promise.all, our function will wait for all DNS resolutions to complete before calling our .then() implementation, which merges the DNS responses back into the event object itself before returning it. The meat of the parent function’s logic is in the resolve function we’ve implemented, which wraps Node’s dns.reverse in a promise:

function reverse(IP, midx) {
  if (!cache[IP]) {
    cache[IP] = {
      promise: new Promise((resolve, reject) => { // eslint-disable-line
        dns.reverse(IP, (err, hostnames) => {
          if (!err) {
            const value = [`dns${midx !== 1 ? midx.toString() : ''}`, hostnames.join(' ')]; // if idx is not 1, name field dns2, dns3, etc
            cache[IP].value = value;
            resolve(value);
          } else {
            resolve([]);
          }
        });
      }),
    };
    return cache[IP].promise;
  } else if (!cache[IP].value) {
    return cache[IP].promise;
  }
  return Promise.resolve(cache[IP].value);
}

This method first checks our module-level cache object, called cache, and if it matches, the method returns a promise of the value in the cache. If not, the method creates a new promise, which resolves when the async dns.resolve returns. It checks for errors and returns the resolved value.

As you can see, this is fairly straightforward. In less than 60 lines, we’ve implemented a meaningful extension to Cribl’s functionality.

Conclusion

There are hundreds of different use cases which can be easily implemented as Cribl functions. We  don’t want to require everyone to invent their own implementations, so we’re launching a shared repo of functions that users have built to solve various use cases. In a version coming soon, you’ll be able to point Cribl at a URL for a repo on GitHub or BitBucket and import a function with a single click. For now, it’s simple to clone these repos and insert them into $CRIBL_HOME/local/cribl/functions, and the functions will show up in your UI upon restart.

Note: Prior to LogStream 1.7, this subdirectory was: $CRIBL_HOME/bin/functions.

What would you like Cribl to do, that it doesn’t do today? We’d love to collaborate on publishing a new extension to our content repo. We want everyone to be able to conceive of, and easily ship, their own ideas and share them with the community. We’d love to see your contributions, or file an issue and we’ll build you an implementation!

Return to Cribl Blog

Additional Reading

Preventing Friction With an Impactful Security Champions Program

Liam McGovern Jul 24, 2024

Securing the Foundation of Cribl Copilot

Zach Rayburn Jul 22, 2024

Product Portfolio

Cribl Stream

Cribl Edge

Cribl Search

Cribl Lake

Cribl.Cloud

Cribl Copilot

AppScope

Use Cases

Integration

Industries

Resources

Events & Webinars

Learning

Tools & Pricing

Download Library

Customer Stories

Customer Experience

Learning

Try Your Own Cribl Sandbox

About Cribl

Cribl Newsroom

Leadership

Careers

Cribl for Startups

Contact Us