Products
Product Portfolio

Cribl puts your IT and Security data at the center of your data management strategy and provides a one-stop shop for analyzing, collecting, processing, and routing it all at any scale. Try the Cribl suite of products and start building your data engine today!
Learn more ›

Evolving demands placed on IT and Security teams are driving a new architecture for how observability data is captured, curated, and queried. This new architecture provides flexibility and control while managing the costs of increasing data volumes.
Read white paper ›

Cribl Stream

Cribl Stream is a vendor-agnostic observability pipeline that gives you the flexibility to collect, reduce, enrich, normalize, and route data from any source to any destination within your existing data infrastructure.
Learn more ›

Vodafone Case Study

Vodafone Dials up Business Insights with Cribl Stream
Read Case Study ›

Cribl Edge

Cribl Edge provides an intelligent, highly scalable edge-based data collection system for logs, metrics, and application data.
Learn more ›

SpyCloud Edge Story

Listen to how SpyCloud uses Cribl Edge at scale.
Watch Video ›

Cribl Search

Cribl Search turns the traditional search process on its head, allowing users to search data in place without having to collect/store first.
Learn more ›

How Cribl Search Can Save You From Drowning in a Deluge of Data
Read Blog ›

Cribl Lake

Cribl Lake is a turnkey data lake solution that takes just minutes to get up and running — no data expertise needed. Leverage open formats, unified security with rich access controls, and central access to all IT and security data.
Learn more ›

Navigating the future of IT and Security Data management white paper
Read white paper ›

Cribl.Cloud

The Cribl.Cloud platform gets you up and running fast without the hassle of running infrastructure.
Learn more ›

Cribl.Cloud Solution Brief

The fastest and easiest way to realize the value of an observability ecosystem.
Read Solution Brief ›

Cribl Copilot

Cribl Copilot gets your deployments up and running in minutes, not weeks or months.
Learn more ›

Cribl Copilot

Your Trusted AI Advisor for Deploying, Configuring & Troubleshooting.
Read blog ›

AppScope

AppScope gives operators the visibility they need into application behavior, metrics and events with no configuration and no agent required.
Learn more ›

Sandbox

Launch an AppScope Sandbox today!
Launch Now ›
Solutions
Use Cases

Explore Cribl’s Solutions by Use Cases:

Supercharge Security Insights ›

Accelerate Cloud Migration ›

Avoid Vendor Lock-in ›

Agent Consolidation ›

Slash Storage Costs ›

Free Up Space for High-Value Data ›

Route From Any Source To Any Destination ›

Immediate Access to Archived Data ›

Replay Data from Low-Cost Storage ›

Reduce Log Volume & Pay Less for Infrastructure ›
Integration

Explore Cribl’s Solutions by Integrations:

Amazon ›

CrowdStrike ›

Elastic ›

Exabeam ›

Google ›

Microsoft ›

Splunk ›

Wiz ›

View All Integrations ›

Seamless Integrations for Your Observability Data
Learn More ›
Industries

Explore Cribl’s Solutions by Industry:

AIOps ›

Financial Services ›

Healthcare ›

Managed Security Services ›

Manufacturing and Logistics ›

Media and Entertainment ›

Public Sector ›

Retail ›
Resources
Resources

Resource Library ›

Documentation ›

Guides ›

AppScope Docs ›

Blog ›

Glossary ›

Podcasts ›

Telemetry 101

Understanding the Basics of Telemetry and Its Benefits
Learn More ›
Events & Webinars

Events ›

Webinars ›

CriblCon24
Watch On-Demand ›

July 31 | 10am PT / 1pm ET

Navigating the Data Current Report: Transforming IT & Security Operations in 2024
Register ›
Learning

Try the Sandboxes ›

Self Guided Trials ›

Cribl University ›

Cribl Community ›

Cribl Curious Forum ›

What is Observability? ›

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›
Tools & Pricing

Download Library ›

Past Releases ›

Pricing Plans ›

Stream ROI Calculator ›

Download Library

Download Cribl’s suite of products for free to get started.
Download ›
Customers
Customer Stories

Get inspired by how our customers are innovating IT, security and observability. They inspire us daily!
Read Customer Stories ›

Sally Beauty Holdings

Sally Beauty Swaps LogStash and Syslog-ng with Cribl.Cloud for a Resilient Security and Observability Pipeline
Read Case Study ›
Customer Experience

Support & Success ›

Professional Services ›

Service Delivery Partners ›

Documentation ›

AppScope Docs ›

Professional Services

Check out our new Professional Services offering.
Learn More ›
Learning

Try the Sandboxes ›

Self Guided Trials ›

Cribl University ›

Cribl Community ›

Cribl Curious Forum ›

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›
Company
About Cribl

Transform data management with Cribl, the Data Engine for IT and Security
Learn More ›

Cribl Corporate Overview

Cribl makes open observability a reality, giving you the freedom and flexibility to make choices instead of compromises.
Get the Guide ›

Cribl Newsroom

Stay up to date on all things Cribl and observability.
Visit the Newsroom ›

Press Releases

Read our most recent press releases.
Recent Press Releases ›

Leadership

Cribl’s leadership team has built and launched category-defining products for some of the most innovative companies in the technology sector, and is supported by the world’s most elite investors.
Meet our Leaders ›

Careers

Join the Cribl herd! The smartest, funniest, most passionate goats you’ll ever meet.
Learn More ›

Cribl Named to the Inc. 5000 List of Fastest Growing Private Companies
Learn More ›

Cribl for Startups

Whether you’re just getting started or scaling up, the Cribl for Startups program gives you the tools and resources your company needs to be successful at every stage.
Learn More ›

Contact Us

Want to learn more about Cribl from our sales experts? Send us your contact information and we’ll be in touch.
Talk to an Expert ›

Try Cribl Talk to an expert

Trimming Unnecessary Fields from Logs

April 23, 2019

Categories: Engineering, Learn

Back To Blogs

The author of a log has very different motivations from the consumer of that same log. For the author, they must conceive of all the use cases this data may be useful for and include information which will be relevant both now and in the future. The author must ask a number of important questions to build the most relevant log possible:

Is this log going to be used for operations, security or both?

Should we be selective about what to log or simply serialize this whole object in memory?

Are there future use cases I should plan for now and go ahead and include more data?

Should I make it more machine or human friendly to read?

The end result is often logs which are least common denominator: work for 0ps and security, including all potential information for now and for any use cases the author may have considered for the future. Vendors are often in the worst position for this. Who knows what customers may want 2 or 3 years down the road, and changing formats is expensive, so best to include everything up front. Cisco eStreamer for example has 94 fields in every log entry.

As the consumer of log data, often only a fraction of the information is relevant for your given use case. You may only be interested in security data while the performance information is irrelevant for you, or vice versa. Fields your developers wrote out may be of interest only to them and not for your operational use case. Fixing overly verbose events has historically required going back to the developer to ask them to log less, or in the case of vendor data, just ingesting it and being forced to live with it. Cribl puts the log administrator in control to give you the ability to easily reshape data to meet your needs.

In this post, we’ll show how to use Cribl’s Parser function’s Reserialize mode to easily read structured data and write it back with less information to minimize ingestion volume and avoid storage and processing costs.

That’s a lot of JSON!

In today’s world, the most common example of over verbosity is with structured, JSON logs. JSON logging provides us a huge advantage in that we no longer have to guess at how to parse our log entries. It makes it easy for developers to write, as nearly all languages have the equivalent of JavaScript’s JSON.stringify(). However, because serializing objects to JSON is so easy, developers tend to do little reshaping or trimming before that call to JSON.stringify(), leading to overly verbose logs. Here’s an example of a log entry I pulled at random from a log in S3 in our account, which was originally used in an open source application we were running:

{
  "resource": "/done",
  "path": "/done",
  "httpMethod": "POST",
  "headers": {
    "Accept": "application/json, text/javascript, */*; q=0.01",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept-Language": "en-US,en;q=0.9",
    "CloudFront-Forwarded-Proto": "https",
    "CloudFront-Is-Desktop-Viewer": "true",
    "CloudFront-Is-Mobile-Viewer": "false",
    "CloudFront-Is-SmartTV-Viewer": "false",
    "CloudFront-Is-Tablet-Viewer": "false",
    "CloudFront-Viewer-Country": "US",
    "content-type": "application/json",
    "Host": "jef18742ajw.execute-api.us-west-2.amazonaws.com",
    "origin": "https://cdn.foo.com",
    "Referer": "https://cdn.foo.com/bugger/index.html",
    "User-Agent": "Mozilla/5.0 (Linux; U; Android 2.3.5; zh-cn; MI-ONE Plus Build/GINGERBREAD) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1",
    "Via": "2.0 nb7CCG2p7MZ5igcMI58VkqWRlxUQTrCap.cloudfront.net (CloudFront)",
    "X-Amz-Cf-Id": "pV3t3VVKegwwv5sRvVrfX9UgHsXiC9weFGmKSOOkWUCbUE9gbHepDjoV",
    "X-Amzn-Trace-Id": "Root=aD2sYM7atV811DKgTMxMmp6UZuZg7sky3v",
    "X-Forwarded-For": "130.253.37.97",
    "X-Forwarded-Port": "443",
    "X-Forwarded-Proto": "https"
  },
  "multiValueHeaders": {
    "Accept": [
      "application/json, text/javascript, */*; q=0.01"
    ],
    "Accept-Encoding": [
      "gzip, deflate, br"
    ],
    "Accept-Language": [
      "en-US,en;q=0.9"
    ],
    "CloudFront-Forwarded-Proto": [
      "https"
    ],
    "CloudFront-Is-Desktop-Viewer": [
      "true"
    ],
    "CloudFront-Is-Mobile-Viewer": [
      "false"
    ],
    "CloudFront-Is-SmartTV-Viewer": [
      "false"
    ],
    "CloudFront-Is-Tablet-Viewer": [
      "false"
    ],
    "CloudFront-Viewer-Country": [
      "US"
    ],
    "content-type": [
      "application/json"
    ],
    "Host": [
      "jef18742ajw.execute-api.us-west-2.amazonaws.com"
    ],
    "origin": [
      "https://cdn.foo.com"
    ],
    "Referer": [
      "https://cdn.foo.com/bugger/index.html"
    ],
    "User-Agent": [
      "Mozilla/5.0 (Linux; U; Android 2.3.5; zh-cn; MI-ONE Plus Build/GINGERBREAD) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1"
    ],
    "Via": [
      "2.0 4H2N7SZPy76JchKSA42VLdO0Go5deev92.cloudfront.net (CloudFront)"
    ],
    "X-Amz-Cf-Id": [
      "biqnKVhl5vIsanuR3M2tGYfXbWoA34ZGcKsX34eBFbGGF2tMHZLolR9c"
    ],
    "X-Amzn-Trace-Id": [
      "Root=uwdK8lRoaEFFo1FZmzIBQJn1dKYbFvB9uW"
    ],
    "X-Forwarded-For": [
      "130.253.37.97"
    ],
    "X-Forwarded-Port": [
      "443"
    ],
    "X-Forwarded-Proto": [
      "https"
    ]
  },
  "queryStringParameters": null,
  "multiValueQueryStringParameters": null,
  "pathParameters": null,
  "stageVariables": null,
  "requestContext": {
    "resourceId": "mwPK97",
    "resourcePath": "/done",
    "httpMethod": "POST",
    "extendedRequestId": "uoYMpT7pQeiwpMTj",
    "requestTime": "22/Apr/2019:15:02:56 +0000",
    "path": "/prod/done",
    "accountId": "718298359019",
    "protocol": "HTTP/1.1",
    "stage": "prod",
    "domainPrefix": "jef18742ajw",
    "requestTimeEpoch": 1555970576,
    "requestId": "3070f015-8f93-4bc0-a06a-89ab501b0178",
    "identity": {
      "cognitoIdentityPoolId": null,
      "accountId": null,
      "cognitoIdentityId": null,
      "caller": null,
      "sourceIp": "130.253.37.97",
      "accessKey": null,
      "cognitoAuthenticationType": null,
      "cognitoAuthenticationProvider": null,
      "userArn": null,
      "userAgent": "Mozilla/5.0 (Linux; U; Android 2.3.5; zh-cn; MI-ONE Plus Build/GINGERBREAD) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1",
      "user": null
    },
    "domainName": "jef18742ajw.execute-api.us-west-2.amazonaws.com",
    "apiId": "jef18742ajw"
  },
  "body": "{\"done\":\"true\"}",
  "isBase64Encoded": false
}

This is far from the biggest JSON log message we see in the wild. In fact, I’d say this is about average. This log message was not engineered to show verbosity, it’s just a normal message you’ll find just about anywhere. If you look through the whole of the message, you’ll see a number of areas where we can optimize:

Headers are repeated as headers, an object, and multiValueHeaders, an object filled with arrays.
There are numerous values set to null.
Some fields are mutually exclusive, like the CloudFront-Is-* headers of which only one can be true.

All of these above can easily be solved with Splunk’s fillnull command or the equivalent in other systems. If the value is present in any event, you can fill null values back with null or false or other values. Let’s see how to to configure Cribl to drop these unnecessary keys.

Reserializing FTW

In 1.4, we added a new Parser function which has a mode called Reserialize. What Reserialize does is to take structured information like CSV, JSON, or other structured formations and allow us to modify the contents of that data while keeping the structure. For example, in CSV, if we wanted to drop a field, we would keep the header row and the positions the same while setting the value to be empty. For JSON, we can drop key and value pairs.

For this use case, let’s use Parser to drop the duplicate header field and drop fields which are set to null or false. Here’s an example of the Parser function fully configured. We’ll walk through the configuration in detail:

For the Filter, we’re filtering to just records with sourcetype=='lambda'. For Parser Mode, we’ve set it to Reserialize. For Fields To Remove, we’ve set it to multiValueHeaders.*. Note, we support a dotted notation for nested fields, and .* drops everything under the multiValueHeaders object.

The next configuration, Fields Filter Expression is a bit more complex. From the docs:

Fields Filter Expression: Expression evaluated against {index, name, value} context of each field. Return truthy to keep, falsy to remove field. Index is zero based.

What we want our expression to do is to drop every field where the value is null or false if the field name is starts with CloudFront-Is. For the expression, we want to return truthy for all fields we want to keep, so we want to return value only for those conditions mentioned above. For the full expression we’ve written:

!(name.startsWith('headers.CloudFront-Is') && value === 'false') && value !== null

This will return truthy when the value is not null and when the value is not false and the name starts with CloudFront-Is. With Cribl’s interactive preview, it’s easy to see if we got this right:

In addition to seeing the specific fields you’re dropping, Cribl’s Preview UI also gives you some statistics on the difference in Data Volume:

With very minimal effort, we’ve easily trimmed 47% from the raw event.

Just Scratching the Surface

This is just a simple example. There are other techniques, like suppressing duplicate values or lookups, that we could use to eliminate verbose fields like User Agent. Trimming high cardinality fields like session or trace IDs if they’re not being used can have dramatic impacts on the size of data on disk.

If you want to see a live example of this pipeline, check out our demo environment in GitHub. Instructions there make it easy for you to see this live and work with it yourself on your laptop or wherever you’d like.

Blog

Preventing Friction With an Impactful Security Champions Program

Blog

From Necessity to Opportunity: The Customer Push for SIEM Options

Blog

Securing the Foundation of Cribl Copilot

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.

Launch Now

Product Portfolio

Cribl Stream

Cribl Edge

Cribl Search

Cribl Lake

Cribl.Cloud

Cribl Copilot

AppScope

Use Cases

Integration

Industries

Resources

Events & Webinars

Learning

Tools & Pricing

Download Library

Customer Stories

Customer Experience

Learning

Try Your Own Cribl Sandbox

About Cribl

Cribl Newsroom

Leadership

Careers

Cribl for Startups

Contact Us

Trimming Unnecessary Fields from Logs

That’s a lot of JSON!

Reserializing FTW

Just Scratching the Surface

Blog

Preventing Friction With an Impactful Security Champions Program

Blog

From Necessity to Opportunity: The Customer Push for SIEM Options

Blog

Securing the Foundation of Cribl Copilot

Try Your Own Cribl Sandbox

So you're rockin' Internet Explorer!