Products
Product Portfolio

Cribl puts your IT and Security data at the center of your data management strategy and provides a one-stop shop for analyzing, collecting, processing, and routing it all at any scale. Try the Cribl suite of products and start building your data engine today!
Learn more ›

Evolving demands placed on IT and Security teams are driving a new architecture for how observability data is captured, curated, and queried. This new architecture provides flexibility and control while managing the costs of increasing data volumes.
Read white paper ›

Cribl Stream

Cribl Stream is a vendor-agnostic observability pipeline that gives you the flexibility to collect, reduce, enrich, normalize, and route data from any source to any destination within your existing data infrastructure.
Learn more ›

Vodafone Case Study

Vodafone Dials up Business Insights with Cribl Stream
Read Case Study ›

Cribl Edge

Cribl Edge provides an intelligent, highly scalable edge-based data collection system for logs, metrics, and application data.
Learn more ›

SpyCloud Edge Story

Listen to how SpyCloud uses Cribl Edge at scale.
Watch Video ›

Cribl Search

Cribl Search turns the traditional search process on its head, allowing users to search data in place without having to collect/store first.
Learn more ›

How Cribl Search Can Save You From Drowning in a Deluge of Data
Read Blog ›

Cribl Lake

Cribl Lake is a turnkey data lake solution that takes just minutes to get up and running — no data expertise needed. Leverage open formats, unified security with rich access controls, and central access to all IT and security data.
Learn more ›

Navigating the future of IT and Security Data management white paper
Read white paper ›

Cribl.Cloud

The Cribl.Cloud platform gets you up and running fast without the hassle of running infrastructure.
Learn more ›

Cribl.Cloud Solution Brief

The fastest and easiest way to realize the value of an observability ecosystem.
Read Solution Brief ›

Cribl Copilot

Cribl Copilot gets your deployments up and running in minutes, not weeks or months.
Learn more ›

Cribl Copilot

Your Trusted AI Advisor for Deploying, Configuring & Troubleshooting.
Read blog ›

AppScope

AppScope gives operators the visibility they need into application behavior, metrics and events with no configuration and no agent required.
Learn more ›

Sandbox

Launch an AppScope Sandbox today!
Launch Now ›
Solutions
Use Cases

Explore Cribl’s Solutions by Use Cases:

Supercharge Security Insights ›

Accelerate Cloud Migration ›

Avoid Vendor Lock-in ›

Agent Consolidation ›

Slash Storage Costs ›

Free Up Space for High-Value Data ›

Route From Any Source To Any Destination ›

Immediate Access to Archived Data ›

Replay Data from Low-Cost Storage ›

Reduce Log Volume & Pay Less for Infrastructure ›
Integration

Explore Cribl’s Solutions by Integrations:

Amazon ›

CrowdStrike ›

Elastic ›

Exabeam ›

Google ›

Microsoft ›

Splunk ›

Wiz ›

View All Integrations ›

Seamless Integrations for Your Observability Data
Learn More ›
Industries

Explore Cribl’s Solutions by Industry:

AIOps ›

Financial Services ›

Healthcare ›

Managed Security Services ›

Manufacturing and Logistics ›

Media and Entertainment ›

Public Sector ›

Retail ›
Resources
Resources

Resource Library ›

Documentation ›

Guides ›

AppScope Docs ›

Blog ›

Glossary ›

Podcasts ›

Telemetry 101

Understanding the Basics of Telemetry and Its Benefits
Learn More ›
Events & Webinars

Events ›

Webinars ›

CriblCon24
Watch On-Demand ›

July 31 | 10am PT / 1pm ET

Navigating the Data Current Report: Transforming IT & Security Operations in 2024
Register ›
Learning

Try the Sandboxes ›

Self Guided Trials ›

Cribl University ›

Cribl Community ›

Cribl Curious Forum ›

What is Observability? ›

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›
Tools & Pricing

Download Library ›

Past Releases ›

Pricing Plans ›

Stream ROI Calculator ›

Download Library

Download Cribl’s suite of products for free to get started.
Download ›
Customers
Customer Stories

Get inspired by how our customers are innovating IT, security and observability. They inspire us daily!
Read Customer Stories ›

Sally Beauty Holdings

Sally Beauty Swaps LogStash and Syslog-ng with Cribl.Cloud for a Resilient Security and Observability Pipeline
Read Case Study ›
Customer Experience

Support & Success ›

Professional Services ›

Service Delivery Partners ›

Documentation ›

AppScope Docs ›

Professional Services

Check out our new Professional Services offering.
Learn More ›
Learning

Try the Sandboxes ›

Self Guided Trials ›

Cribl University ›

Cribl Community ›

Cribl Curious Forum ›

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›
Company
About Cribl

Transform data management with Cribl, the Data Engine for IT and Security
Learn More ›

Cribl Corporate Overview

Cribl makes open observability a reality, giving you the freedom and flexibility to make choices instead of compromises.
Get the Guide ›

Cribl Newsroom

Stay up to date on all things Cribl and observability.
Visit the Newsroom ›

Press Releases

Read our most recent press releases.
Recent Press Releases ›

Leadership

Cribl’s leadership team has built and launched category-defining products for some of the most innovative companies in the technology sector, and is supported by the world’s most elite investors.
Meet our Leaders ›

Careers

Join the Cribl herd! The smartest, funniest, most passionate goats you’ll ever meet.
Learn More ›

Cribl Named to the Inc. 5000 List of Fastest Growing Private Companies
Learn More ›

Cribl for Startups

Whether you’re just getting started or scaling up, the Cribl for Startups program gives you the tools and resources your company needs to be successful at every stage.
Learn More ›

Contact Us

Want to learn more about Cribl from our sales experts? Send us your contact information and we’ll be in touch.
Talk to an Expert ›

Try Cribl Talk to an expert

Context is King: Turning Ugly Logs into Rich Structured Events

Written by Clint Sharp

March 4, 2019

Logs themselves often do not contain the necessary information in themselves to point an investigator in the right direction. Let’s say I’m troubleshooting a performance issue with my application. I may want to dig through all kinds of data sources, like proxy logs, web access logs or custom instrumentation. But, in these data sources, proxy logs show IP Address, web access logs use hostname and my custom instrumentation contains the process name. How do I analyze data by what’s being reported, which is a particular application slowness?

Different investigations require the analyst to go digging through other systems to get information to key into the search bar of their log analytics tool. Mapping hosts to applications and writing long lists of ORs by hand. Trying to understand which class of customers is having a problem may be impossible. Knowing a security group accidentally left a port open is a far cry from knowing whether any traffic tried to hit that port.

What’s missing, is context. Meaningful analysis can only be conducted if the analyst is given the ability to follow different lines of inquiry. Understanding the health of an application or its security posture requires examining data from many sources, which need to be able to be correlated and grouped together by the analyst. Logs are missing this context, but the logs often contain the information required to look up this information.

Cribl allows you to enrich events at ingestion time, no matter their destination system. In the Splunk, ecosystem this is a fundamentally new capability. In the Elastic ecosystem, Logstash provides this capability, but it’s so hard to do that few do. For the first time, Cribl provides a UX which allows you to interactively reshape and enrich the data to give you high confidence the data will look the way you want it after it is run through Cribl.

In this post, we’ll walk you through an interactive demo showing how to take an ugly old Web Access Log, structure it, and turn it into a rich business event stream through a Cribl pipeline which:

Parses the web access log
Looks up the username in Redis based on the session ID in the query string
Queries a REST API to lookup customer information from the username

If you prefer video, here’s the rest of the post in video form. Otherwise, keep reading.

Query Time Lookup

“Wait!” you say, Splunk has provided query time lookups for a long time. This is true, and Splunk’s lookup ability has taken it very far. Query time lookups have some advantages. They do not bloat the stored events, although that inefficiency can be optimized away with a smarter data store. You can update the values returned at any time, no reprocessing of data required.

However, there are downsides. Query time joins consume resources and require redoing the same work for every query. Query time joins can fail to work well at large scale, and they can require replicating around very large caches of state information to every node which services queries. Most importantly, this demo highlights why ingestion time enrichment is a critical capability. In this demo, we lookup a session ID in Redis to map from the session ID to the user who’s session it is. This is temporal operational state that’s maintained by the application. In order for query time joins to offer the same functionality, it is necessary to maintain a table of every session ID to every user for essentially all time … a massive challenge.

Get the Demo Environment

To prove out this use case, we’ve built out a demo environment which does lookups against running backend systems to be as realistic as possible. We’ve built these as custom functions in Cribl and they’re available in GitHub. You can run this demo easily using Docker by cloning the cribl-enrich-demo repo and running docker compose up -d. Open your web browser to http://localhost:9000/pipelines/parse-and-enrich and use username admin and password admin.

This environment is fully functional, but by default it’s setup to drop all data. This can easily be changed, and we’d encourage you to setup an output and send the data to one of our destinations, or even just to the filesystem to see how it works.

Parsing Weblogs

In this demo, we have a data generator container sending data to Cribl which in turn queries Redis and a custom REST API to access enrichment information. The data generator is generating 10 events per second of weblog entries, which in their raw form look like:

27.96.128.0 - - [18/Feb/2019:07:02:51 +0000] "GET /static/3700.jpg?&JSESSIONID=SD4528254410$SL6599261649FF1314707154ADFF1622727863 HTTP/1.1" 200 1827 "/category.screen?category_id=DISPLAYPROTECT" "Mozilla/5.0 (Linux; U; Android 2.3.3; en-us; Sprint APA9292KT Build/GRI40) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1" 142

Log data has a ton of variety. Some logs are streams of human readable text which describe an event that happened. Some logs have a mix of performance information scattered amongst human readable text. Web Access Logs are of the variety that are largely purely structured information, like CSV or JSON, but serialized in a format, called Common Log Format, that at first glance looks like unstructured text. There are many other examples of these types of structured logs, like flow logs or firewall logs.

That raw log line has a ton of structure, but that structure is encapsulated in another structure which contains the metadata of the log event. In Cribl, we represent this as JSON, and it looks like:

{
  "_raw":"125.17.14.100 - - [Jan 03 07:40:42.820480] \"GET /oldlink?item_id=EST-12&JSESSIONID=SD5785900364$SL1443342661FF9130604439ADFF1842236881 HTTP 1.1\" 200 970 \"/category.screen?category_id=BLUETOOTH\" \"Mozilla/5.0 (iPad; CPU OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A405\" 167",
  "_time":"1546501240",
  "host":"web-01.bar.com",
  "index":"cribl",
  "source":"/opt/apache/log/access_combined.log",
  "sourcetype":"access_combined"
}

What we want to do first, is lift up the structure from the weblog into the structure of the event. For this, we’re going to use Cribl’s Parser function, which ships with support for Apache Access Combined format. In your copy of Cribl, you’ll see the first two functions in our pipeline are Parser functions. Let’s look at the first in detail.

There’s a lot of options here, which may look intimidating, but to configure web log parsing from scratch, the following steps were all I needed:

Click Add Function -> Standard -> Parser
Enter a Description
From Parser Library, select “Apache Combined Log Format”
Since my log has one additional field, I added “timetaken” to the end of the List of Fields

Now, we’re going to use Cribl’s Preview functionality to see if Parsing is actually working the way we want. As the way we ship the demo, all functions except the first are disabled by default so you can turn them on one by one and see their effect. On the right hand side, we’re going to start a capture to grab some events running through our pipeline and see the effects of our Parser function. On the right hand side of the UI, click Preview > Start a Capture.

This will pop open a dialog to run a capture. There are a number of options in this dialog, but we don’t need to concern ourselves with any of them right now. Click “Start a capture”, which will run for a few seconds, and then show you the events it captured from the stream. Once you see events, click “Save”.

Now we have events in Cribl we can interactively work with. On the right hand side, you’ll see events as they come into the pipeline. They’ll have a simple structure, _raw, _time, and a few other pieces of metadata. On the top of the Preview sidebar, click “Out” to see what events are looking like after the pipeline has modified them. You should see a screen like this.

The UI will show you a diff of what’s changed or added to a particular event. We can see the parser has added a number of fields (highlighted in green) it’s extracted from the web log like status, timetaken, useragent, and details about the request. There’s a few additional things we want to do with this data. First, we want to get our Session ID out of the request_uri_query and into a field to use as the key for a lookup. Secondly, there are a number of numerical fields in the data which are currently strings, and we need to convert those to numbers so that the downstream systems can interpret the data correctly. Enable the next two functions in the pipeline, Parser & Numerify, and click “Save” on the left hand side. Your data should now look like this.

If you expand the second Parser function, you’ll see we’re running our Key=Value parser against the request_uri_queryfield which adds JSESSIONID to our event. Our bytes, timetaken and status are now properly numeric fields as well. Now, let’s enrich our events.

Enriching Events

In our hypothetical running application, there is a Redis server which contains mappings of session IDs to users to allow the web servers to be stateless and retrieve that information from Redis. In our log processing pipeline, we’re going to use Redis to do that same mapping to get from a session ID to a username. Once we have the username, we’re going to hit an internal REST API which returns customer information based on a given username.

There are a few things to consider before doing this in production. For every log line you’re going to be reaching out to Redis and hitting a REST API. Given each web session generates dozens to hundreds of web access log entries, you could easily DoS Redis or the API. In the demo functions I wrote, I do no caching, no retries, error handling, etc, so I wouldn’t necessarily call them production ready. That being said, they are a good base to build off of. But, I have personally implemented this exact strategy in production for business event type of data, using multiple summary indexing steps, back in my days as a Splunk customer, so it is a feasible approach and will work at scale with the right planning.

In the Cribl UI, enable the Redis Lookup function and hit save. You’ll notice in the preview, you should now see a userName field which is being retrieved from Redis. Let’s look at how the Redis Lookup function is configured.

This function really shows the power of Cribl’s expandability. You can look at the source code and configuration of the function if you like to see how it was implemented. conf.schema.json defines how we want to be able to configure our function. The UI for configuration is automatically generated from the schema. index.js implements the lookup. In this case, we’re connecting a host named redis which is also what our Redis docker container is named. We’re going to lookup the value of JSESSIONID in redis and store the results in userName.

Now that we have the username, we’re going to use that to look up customer information from our REST API. Enable the REST Lookup on the left and hit save. Now you will see we have a new field in our event which is a nested object, called customer.

That’s a huge amount of context! We can now see where our customers are located, gender, and age. With this context, we can now group our access log information by a much richer set of information. To see this work in action, I’m going to load this data into an analytics tool. One of my favorites for looking at performance information is Honeycomb.

Sending and Analyzing Rich Events

In our hypothetical environment, my friend Michael Wilde called me to say my service was running slow. I haven’t heard about this from anywhere else, and my dashboards are all green. I already have a free Honeycomb account, and I want to send my enriched logs into my enrich_demo dataset. In Cribl, I go to Destinations > Honeycomb > Add New. I enter my dataset and team name. Also, under advanced, I select “honeycomb-shape” for my conditioning pipeline. This pipeline will flatten the customer object to work better in Honeycomb and drop the customer and _raw fields. After saving, the UI prompts me to make it my default destination, and I select OK to start sending the data out to Honeycomb.

I give it about an hour to get some data going in Honeycomb. To start with, I
add P95(timetaken)and HEATMAP(timetaken) calculations to get a look at my performance data.

The P95 looks normal, but the heatmap shows a few outliers. I want to get some ideas of what dimensions might be correlated to the slow performance, so I use their new BubbleUp feature to get a view on a those high timetaken entries split by a number dimensions.

Looks like all of them are in South Lake Tahoe. At their helpful suggestion I break down by customerCity, customerState, and userName. Username is a very high cardinality field, one of Honeycomb’s differentiators. Honeycomb brings me back data sorted by P95(timetaken) in descending order, and sure enough right there in the table is my friend Michael Wilde. I hear the weather up in Tahoe is terrible the last few days, maybe they’re all stuck indoors watching Netflix and chewing up all the bandwidth?

Wrapping Up

Logs are hugely valuable data source, but they’re often in need of additional context that can be found elsewhere in other data sources. Cribl allows you to bring structure to your unstructured logs and use that information to build rich structured events. This context can be hugely valuable for troubleshooting, make it easy to look across your data in ways you hadn’t originally anticipated. There are a number of other data sources we could use enrichment to allow us to look at the data differently.

Analyze AWS logs by security group
DNS Logs by threat level
Analyze web logs by customer type
Access logs for a department

In all of those examples, we need to join our original log to some other data source to be able to split the data by these fields. Once enriched, these logs enable fundamental new use cases which have been difficult if not impossible previously.

We’d love to hear from you on your ideas! Join us in Slack #cribl, tweet at us @cribl_io, or contact us via hello@cribl.io.

Return to Cribl Blog

Additional Reading

Preventing Friction With an Impactful Security Champions Program

Liam McGovern Jul 24, 2024

Securing the Foundation of Cribl Copilot

Zach Rayburn Jul 22, 2024

Product Portfolio

Cribl Stream

Cribl Edge

Cribl Search

Cribl Lake

Cribl.Cloud

Cribl Copilot

AppScope

Use Cases

Integration

Industries

Resources

Events & Webinars

Learning

Tools & Pricing

Download Library

Customer Stories

Customer Experience

Learning

Try Your Own Cribl Sandbox

About Cribl

Cribl Newsroom

Leadership

Careers

Cribl for Startups

Contact Us