Products
Product Portfolio

Cribl puts your IT and Security data at the center of your data management strategy and provides a one-stop shop for analyzing, collecting, processing, and routing it all at any scale. Try the Cribl suite of products and start building your data engine today!
Learn more ›

Evolving demands placed on IT and Security teams are driving a new architecture for how observability data is captured, curated, and queried. This new architecture provides flexibility and control while managing the costs of increasing data volumes.
Read white paper ›

Cribl Stream

Cribl Stream is a vendor-agnostic observability pipeline that gives you the flexibility to collect, reduce, enrich, normalize, and route data from any source to any destination within your existing data infrastructure.
Learn more ›

Vodafone Case Study

Vodafone Dials up Business Insights with Cribl Stream
Read Case Study ›

Cribl Edge

Cribl Edge provides an intelligent, highly scalable edge-based data collection system for logs, metrics, and application data.
Learn more ›

SpyCloud Edge Story

Listen to how SpyCloud uses Cribl Edge at scale.
Watch Video ›

Cribl Search

Cribl Search turns the traditional search process on its head, allowing users to search data in place without having to collect/store first.
Learn more ›

How Cribl Search Can Save You From Drowning in a Deluge of Data
Read Blog ›

Cribl Lake

Cribl Lake is a turnkey data lake solution that takes just minutes to get up and running — no data expertise needed. Leverage open formats, unified security with rich access controls, and central access to all IT and security data.
Learn more ›

Navigating the future of IT and Security Data management white paper
Read white paper ›

Cribl.Cloud

The Cribl.Cloud platform gets you up and running fast without the hassle of running infrastructure.
Learn more ›

Cribl.Cloud Solution Brief

The fastest and easiest way to realize the value of an observability ecosystem.
Read Solution Brief ›

Cribl Copilot

Cribl Copilot gets your deployments up and running in minutes, not weeks or months.
Learn more ›

Cribl Copilot

Your Trusted AI Advisor for Deploying, Configuring & Troubleshooting.
Read blog ›

AppScope

AppScope gives operators the visibility they need into application behavior, metrics and events with no configuration and no agent required.
Learn more ›

Sandbox

Launch an AppScope Sandbox today!
Launch Now ›
Solutions
Use Cases

Explore Cribl’s Solutions by Use Cases:

Supercharge Security Insights ›

Accelerate Cloud Migration ›

Avoid Vendor Lock-in ›

Agent Consolidation ›

Slash Storage Costs ›

Free Up Space for High-Value Data ›

Route From Any Source To Any Destination ›

Immediate Access to Archived Data ›

Replay Data from Low-Cost Storage ›

Reduce Log Volume & Pay Less for Infrastructure ›
Integration

Explore Cribl’s Solutions by Integrations:

Amazon ›

CrowdStrike ›

Elastic ›

Exabeam ›

Google ›

Microsoft ›

Splunk ›

Wiz ›

View All Integrations ›

Seamless Integrations for Your Observability Data
Learn More ›
Industries

Explore Cribl’s Solutions by Industry:

AIOps ›

Financial Services ›

Healthcare ›

Managed Security Services ›

Manufacturing and Logistics ›

Media and Entertainment ›

Public Sector ›

Retail ›
Resources
Resources

Resource Library ›

Documentation ›

Guides ›

AppScope Docs ›

Blog ›

Glossary ›

Podcasts ›

Telemetry 101

Understanding the Basics of Telemetry and Its Benefits
Learn More ›
Events & Webinars

Events ›

Webinars ›

CriblCon24
Watch On-Demand ›

July 31 | 10am PT / 1pm ET

Navigating the Data Current Report: Transforming IT & Security Operations in 2024
Register ›
Learning

Try the Sandboxes ›

Self Guided Trials ›

Cribl University ›

Cribl Community ›

Cribl Curious Forum ›

What is Observability? ›

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›
Tools & Pricing

Download Library ›

Past Releases ›

Pricing Plans ›

Stream ROI Calculator ›

Download Library

Download Cribl’s suite of products for free to get started.
Download ›
Customers
Customer Stories

Get inspired by how our customers are innovating IT, security and observability. They inspire us daily!
Read Customer Stories ›

Sally Beauty Holdings

Sally Beauty Swaps LogStash and Syslog-ng with Cribl.Cloud for a Resilient Security and Observability Pipeline
Read Case Study ›
Customer Experience

Support & Success ›

Professional Services ›

Service Delivery Partners ›

Documentation ›

AppScope Docs ›

Professional Services

Check out our new Professional Services offering.
Learn More ›
Learning

Try the Sandboxes ›

Self Guided Trials ›

Cribl University ›

Cribl Community ›

Cribl Curious Forum ›

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›
Company
About Cribl

Transform data management with Cribl, the Data Engine for IT and Security
Learn More ›

Cribl Corporate Overview

Cribl makes open observability a reality, giving you the freedom and flexibility to make choices instead of compromises.
Get the Guide ›

Cribl Newsroom

Stay up to date on all things Cribl and observability.
Visit the Newsroom ›

Press Releases

Read our most recent press releases.
Recent Press Releases ›

Leadership

Cribl’s leadership team has built and launched category-defining products for some of the most innovative companies in the technology sector, and is supported by the world’s most elite investors.
Meet our Leaders ›

Careers

Join the Cribl herd! The smartest, funniest, most passionate goats you’ll ever meet.
Learn More ›

Cribl Named to the Inc. 5000 List of Fastest Growing Private Companies
Learn More ›

Cribl for Startups

Whether you’re just getting started or scaling up, the Cribl for Startups program gives you the tools and resources your company needs to be successful at every stage.
Learn More ›

Contact Us

Want to learn more about Cribl from our sales experts? Send us your contact information and we’ll be in touch.
Talk to an Expert ›

Try Cribl Talk to an expert

…Like a Multi-Tool For Your Observability Pipeline

April 14, 2020

Categories: Engineering, Learn

Back To Blogs

In my last post, I focused on a specific use case for routing observability data: separating retention from analysis. That’s just one of the many tools that become available to you by inserting a routing mechanism into your observability pipeline, and in this post, I’m going to take a look at a number of other capabilities that processing log data “in the stream” can provide.

Supporting Multiple Analysis Tools

The IT team uses one tool for log analysis and another one for metrics. The security team uses yet another tool for Security Information and Event Management (SIEM), and the development teams have additional tooling for product logs, errors and metrics. Unfortunately, each one of these tools has its own mechanism for ingesting data, and they’re isolated from each other, leading to multiple “agents” being installed on systems just to feed the tools, each of which have their own overhead.

Imagine reducing that agent count down to 1 or 2 and being able to feed *all* of the data enrichment tools from a single pipeline, transforming data as appropriate for each source. This now feeds the network device data to the developers’ tooling, allowing the developers to correlate app errors with the servers switch port flapping, leading to quick resolution. Suddenly, where two tools were reporting different values for specific metrics, they’re now showing the same value, simply because each tool is getting the same data.

Data Enrichment

Context is King. A lot of the data we get, in the form of logs, is barely useful without context. Take the port flap mentioned above. A flapping port doesn’t matter unless it’s connected to something important, and the log entry for that port flap is not going to tell you what it’s connected to. What if you could add data from your CMDB to that line, like the server that port’s connected to, the application that it runs, and the business process that it supports? Now you’ve got the context you need to understand the impact and respond accordingly.

Or, say you have a huge amount of one kind of log data, but you only care about a subset, based on external information, and ingesting all of it into your analysis system is prohibitively expensive? This is exactly the situation that one of our customers found themselves in: they had too much DNS log data to ingest, but they really only cared about the subset of that data that didn’t match “trusted” domains, so they enriched the data with a list of trusted domains, and filtered out records from those domains, only ingesting the log data they needed for analysis. As a result, this reduced their ingestion requirement by orders of magnitude, making it an affordable approach for them.

Another great use case is adding GeoIP information to the data as it comes in. Sure, you can do that at search time in Splunk, but if you have multiple tools, you have to figure out how to do that in all of them. If you do that lookup before sending it to the downstream systems, it only has to be done once, and all downstream systems benefit from it. Less maintenance and consistent results across the board.

Metric Generation

Often, log files contain incredibly valuable information, but it needs to be extracted from the log entry and aggregated to be valuable. Weblog entries, for example, are rarely individually valuable. While what someone is looking for might vary, its usually the metrics about access that matter, not the individual accesses. For example, let’s say you have 1000 lines of weblog data, similar to this:

128.241.220.82 - - [03/Apr/2020:20:30:05 +0000] "GET /static/jquery.js?&JSESSIONID=SD2581716739$SL2122330098FF8932042391ADFF3720110694 HTTP/1.1" 200 2484 "/cart.do?action=view&itemId=EST-16&product_id=MC-SANDISK-MICROSD16GB" "Mozilla/5.0 (iPad; CPU OS 5_0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A334 Safari/7534.48.3"
64.66.0.20 - - [03/Apr/2020:20:30:05 +0000] "GET /product.screen?product_id=HS-MONST-NERGY&JSESSIONID=SD1837132548$SL4493168124FF7003251314ADFF2222394401 HTTP/1.1" 404 3818 "/product.screen?product_id=BT-HS-JAWB-ICONTHD" "BlackBerry9300/5.0.0.955 Profile/MIDP-2.1 Configuration/CLDC-1.1 VendorID/102" 
12.130.60.4 - - [03/Apr/2020:20:30:03 +0000] "POST /category.screen?category_id=ACCESSORIES&JSESSIONID=SD8687719920$SL6155682857FF6085796020ADFF1246778254 HTTP/1.1" 400 2967 "/product.screen?product_id=CC-T11-ZAGG-FOLIO" "Mozilla/5.0 (iPad; U; CPU OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5" 
130.253.37.97 - - [03/Apr/2020:20:30:05 +0000] "POST /product.screen?product_id=BT-SP-JAWB-JAMBOXBIG&JSESSIONID=SD8401052943$SL2691867954FF9065133477ADFF6965824981 HTTP/1.1" 404 722 "/cart.do?action=addtocart&itemId=EST-12&product_id=BA-HTC-REZOUND" "Mozilla/5.0 (iPad; U; CPU OS 4_3_3 like Mac OS X; de-de) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8J2 Safari/6533.18.5"
194.8.74.23 - - [03/Apr/2020:20:30:05 +0000] "GET /cart.do?action=changequantity&itemId=EST-18&product_id=BA-MOPHIE-JUICEPACKPLUS&JSESSIONID=SD8190965089$SL7522258463FF7229085117ADFF6846367911 HTTP/1.1" 200 3758 "/category.screen?category_id=MEMORYCARDS" "Mozilla/5.0 (iPad; U; CPU OS 4_3_5 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5"
125.17.14.100 - - [03/Apr/2020:20:30:05 +0000] "GET /static/6051.jpg?&JSESSIONID=SD1073290485$SL6642531837FF5469045339ADFF7796274172 HTTP/1.1" 200 846 "/category.screen?category_id=CHARGERS" "Mozilla/5.0 (Linux; U; Android 2.3.4; en-us; T-Mobile G2 Build/GRJ22) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1"
130.253.37.97 - - [03/Apr/2020:20:30:04 +0000] "GET /product.screen?product_id=AC-MOTO-HOTSPOT4G&JSESSIONID=SD8576365728$SL9394190596FF4303629878ADFF2344394698 HTTP/1.1" 200 2410 "/category.screen?category_id=BLUETOOTH" "Mozilla/5.0 (Linux; U; Android 2.3.4; en-us; DROID3 Build/5.5.1_84_D3G-55) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1"
27.175.11.11 - - [03/Apr/2020:20:30:02 +0000] "GET /static/9403.jpg?&JSESSIONID=SD7103245756$SL6669302782FF9250881909ADFF9216942956 HTTP/1.1" 200 3346 "/category.screen?category_id=BLUETOOTH" "Mozilla/5.0 (iPad; CPU OS 5_0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A334 Safari/7534.48.3"
195.69.160.22 - - [03/Apr/2020:20:30:03 +0000] "GET /category.screen?category_id=CASES&JSESSIONID=SD5212008800$SL9669846961FF8508958439ADFF3355227402 HTTP/1.1" 200 3399 "/category.screen?category_id=BATTERIES" "Mozilla/5.0 (Linux; U; Android 2.3.4; en-us; DROID3 Build/5.5.1_84_D3G-55) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1"
86.9.190.90 - - [03/Apr/2020:20:30:05 +0000] "GET /category.screen?category_id=BATTERIES&JSESSIONID=SD5330660580$SL5721426140FF1739646253ADFF1837268871 HTTP/1.1" 503 3561 "/category.screen?category_id=CASES" "Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_5 like Mac OS X; en-gb) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8L1 Safari/6533.18.5"

If you’re really just interested in how many times each page, you can just summarize a count of the hits, grouped by the page URI (minus URI query strings), and filtering out images that are embedded in the pages:

/category.screen	220
/product.screen	246
/static/jquery.js	38
/cart.do	30

Say you also wanted to get a break down of the same accesses by the location of the requestor. You can use a GeoIP tool, like the MaxMind GeoIP database, to look up the locations of the requestor’s IP address, enrich the data with the result, and then summarize a count of hits grouped by requestors country:

US	403
UK	234
Korea (South)	112
India	213
Spain	18
Bahamas	20

By extracting the values and creating the aggregates “in the stream”, the needed metrics are readily available. As a result, you can just send the aggregated metrics to the analysis/reporting system instead of the full logs. Need the metrics data in multiple tools? The data can be delivered to each one in the format it expects, like Splunk metrics or statsd formats.

Data Cleansing/Reduction

No two ways about it, logs are noisy. In the application development world, it’s far more expensive to have to go back into the code to add new elements to logging than to simply log everything up front. Unfortunately, that means you end up with a lot of info in the logs you don’t want. For example, look at the following excerpt from an AWS API Gateway log entry:

{
  "resource": "/done",
  "path": "/done",
  "httpMethod": "POST",
  "queryStringParameters": null,
  "multiValueQueryStringParameters": null,
  "pathParameters": null,
  "stageVariables": null,
  "requestContext": {
    "resourcePath": "/done",
    "httpMethod": "POST",
    "identity": {
      "user": null
    },
  }
}

The highlighted lines show fields with null values, which provides marginal, if any, value in analysis. Removing those null value fields, reduces the data ingested into the analysis system(s). While it may not seem like much, as you scale up, it adds up very quickly. If retention has been separated from analysis, you’ll also have the freedom to cut out any fields you don’t think are valuable to your analysis, or even whole records. Of course, you’ll want to be careful, since you may find use for the removed fields later. Since you have the raw logs, though, you can always re-ingest the data.

Scratching the Surface

Though there are some great use cases in here, it’s just scratching the surface. I’m sure each of you reading this have a unique need that these kinds of capabilities can help solve. Our product, Cribl LogStream provides these capabilities, and I encourage you to take a drive through our interactive sandbox environment to see how LogStream could help with those needs.

Blog

Preventing Friction With an Impactful Security Champions Program

Blog

From Necessity to Opportunity: The Customer Push for SIEM Options

Blog

Securing the Foundation of Cribl Copilot

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.

Launch Now

Product Portfolio

Cribl Stream

Cribl Edge

Cribl Search

Cribl Lake

Cribl.Cloud

Cribl Copilot

AppScope

Use Cases

Integration

Industries

Resources

Events & Webinars

Learning

Tools & Pricing

Download Library

Customer Stories

Customer Experience

Learning

Try Your Own Cribl Sandbox

About Cribl

Cribl Newsroom

Leadership

Careers

Cribl for Startups

Contact Us

…Like a Multi-Tool For Your Observability Pipeline

Supporting Multiple Analysis Tools

Data Enrichment

Metric Generation

Data Cleansing/Reduction

Scratching the Surface

Blog

Preventing Friction With an Impactful Security Champions Program

Blog

From Necessity to Opportunity: The Customer Push for SIEM Options

Blog

Securing the Foundation of Cribl Copilot

Try Your Own Cribl Sandbox

So you're rockin' Internet Explorer!