x
data cost savings

Revolutionizing Data Strategy: Achieving 99.94% Cost Savings and Accelerated Performance with Cribl Search

October 10, 2023

Imagine sending logs to cost-effective storage, converting them into efficient metrics, and forwarding only essential data for analysis. This change can slash ingest and long-term storage expenses by an order of magnitude! Enter Cribl Search—an ingenious solution that skillfully navigates storage, transforms logs into actionable metrics, and seamlessly channels vital data to your analysis systems. The result? Over 99.94% reduction in volume, enhanced efficiency, and substantial cost savings. In the example below, I am sending across 132.54MB per hour of logs to an S3 bucket. Using Cribl Search I run a scheduled search every hour to generate 70.96KB in metrics to a Prometheus endpoint, in this case, to Grafana. Feel free to choose your desired metrics, data lake, and achieve the same results.

data cost savings

That’s not just a space-saving trick; it’s a jaw-dropping reduction of nearly 99.94%! In practical terms, it means that the logs, which once occupied a substantial chunk of storage, now occupy only a fraction of a fraction. This kind of transformation is the power of converting logs into efficient metrics, making data management not just efficient but downright magical.

The Details

Sending approximately 132.54 megabytes of Apache web logs per hour to an S3 bucket, consistently over a month, equates to roughly 94.29GB of uncompressed data or a lean 11.79GB when compressed with gzip, which Cribl Stream utilizes as its default compression method.

Example Events: Apache Logs

192.168.0.1 - - [07/Jul/2023:12:00:01 +0000] "GET /search?query=laptop&session_id=7f1c5d8b-2a32-4b85-9df7-094a8e1ef19c&user_id=jsmith@gmail.com HTTP/1.1" 200 2048 "https://techboss.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36" 156 
192.168.0.1 - - [07/Jul/2023:12:01:02 +0000] "GET /product?product_id=XYZ789&product_name=laptop&session_id=7f1c5d8b-2a32-4b85-9df7-094a8e1ef19c&user_id=jsmith@gmail.com HTTP/1.1" 200 4096 "https://techboss.com/search?query=laptop" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36" 241 
192.168.0.1 - - [07/Jul/2023:12:02:03 +0000] "GET /reviews?product_id=XYZ789&product_name=laptop&session_id=7f1c5d8b-2a32-4b85-9df7-094a8e1ef19c&user_id=jsmith@gmail.com HTTP/1.1" 200 3072 "https://techboss.com/product?product_id=XYZ789" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36" 189 
192.168.0.1 - - [07/Jul/2023:12:03:04 +0000] "POST /cart/add?product_id=XYZ789&product_name=laptop&price=1979.97&quantity=1&session_id=7f1c5d8b-2a32-4b85-9df7-094a8e1ef19c&user_id=jsmith@gmail.com HTTP/1.1" 302 512 "https://techboss.com/product?product_id=XYZ789" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36" 178 
192.168.0.1 - - [07/Jul/2023:12:04:05 +0000] "GET /cart?session_id=7f1c5d8b-2a32-4b85-9df7-094a8e1ef19c&user_id=jsmith@gmail.com&product_id=XYZ789&product_name=laptop&price=1979.97 HTTP/1.1" 200 3072 "https://techboss.com/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36" 214 
192.168.0.1 - - [07/Jul/2023:12:05:06 +0000] "POST /checkout/start?session_id=7f1c5d8b-2a32-4b85-9df7-094a8e1ef19c&user_id=jsmith@gmail.com&product_id=XYZ789&product_name=laptop&price=1979.97 HTTP/1.1" 200 1024 "https://techboss.com/reviews?product_id=XYZ789" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36" 210 
192.168.0.1 - - [07/Jul/2023:12:06:07 +0000] "POST /checkout/address?session_id=7f1c5d8b-2a32-4b85-9df7-094a8e1ef19c&product_id=XYZ789&product_name=laptop&price=1979.97&user_id=jsmith@gmail.com&address=123 Main Street&city=New York&state=NY&postal_code=10001&country=USA&phone=(636) 123-4567 HTTP/1.1" 200 512 "https://techboss.com/checkout/start" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"  189 
192.168.0.1 - - [07/Jul/2023:12:07:09 +0000] "POST /checkout/payment?session_id=7f1c5d8b-2a32-4b85-9df7-094a8e1ef19c&product_id=XYZ789&product_name=laptop&price=1979.97&user_id=jsmith@gmail.com HTTP/1.1" 200 512 "https://techboss.com/checkout/address" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36" 221 
192.168.0.1 - - [07/Jul/2023:12:08:09 +0000] "POST /checkout/confirm?session_id=7f1c5d8b-2a32-4b85-9df7-094a8e1ef19c&product_id=XYZ789&product_name=laptop&price=1979.97&user_id=jsmith@gmail.com HTTP/1.1" 302 256 "https://techboss.com/checkout/payment" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36" 195 
192.168.0.1 - - [07/Jul/2023:12:09:10 +0000] "GET /checkout/success?session_id=7f1c5d8b-2a32-4b85-9df7-094a8e1ef19c&product_id=XYZ789&product_name=laptop&price=1979.97&user_id=jsmith@gmail.com HTTP/1.1" 200 1024 "https://techboss.com/checkout/confirm" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36" 234 
192.168.0.1 - - [07/Jul/2023:12:10:11 +0000] "POST /email/send?session_id=7f1c5d8b-2a32-4b85-9df7-094a8e1ef19c&product_id=XYZ789&product_name=laptop&price=1979.97&user_id=jsmith@gmail.com&email=jsmith@gmail.com&subject=Purchase+Confirmation HTTP/1.1" 200 128 "https://techboss.com/checkout/success" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36" 

Costs to Consider: S3 vs. Premium Datalakes

While storing this compressed data in an S3 bucket for just a month, the cost is a mere $0.27 – an impressively frugal choice. Over the course of a year, this translates to an annual expenditure of approximately $3.24, highlighting the remarkable cost efficiency. S3, in addition to delivering these substantial savings, ensures top-notch features such as high availability, data durability, and Cribl Replay and Cribl Search natively support it. In contrast, many premium search vendors command prices well beyond $600.00 per year for similar data volumes, making them over 185 times costlier. Now, picture the potential financial impact if we were dealing with terabytes or even petabytes of data!

Cribl Search: Logs to Metrics

Sending logs to cost-effective object storage is a strategic move that can yield substantial benefits for your data management strategy. By utilizing object storage, you not only save on storage costs but also gain the flexibility to schedule Cribl searches and generate a wide array of metrics across any desired timeframe. This newfound agility in data processing allows you to optimize your data pipeline for cost reduction, ensuring compliance with audit requirements, and enhancing operational intelligence. In essence, it’s a win-win situation that empowers your organization to make data-driven decisions efficiently and economically. Let’s take a look at an example with Cribl Search that converts logs to metrics.

Top 10 URLs by Site

In the following Cribl Search query, we analyze the dataset of Apache server logs within a one-hour time window, summarize the most requested endpoints by site, and then present the top ten endpoints based on their frequency of access.

Cribl Search

dataset="Apache_Logs_to_Metrics_Dataset" earliest=-65m@m latest=-5m@m | summarize endpoints=count() by request_uri_path,site | top 10 by endpoints

Cribl Search

Sending Metrics is as Easy as Pipe

If you want these metrics sent to your preferred datalake just add a | send. Notice the summarized report of how many metric events were sent back through Cribl Stream and onto the destination of choice.

dataset="Apache_Logs_to_Metrics_Dataset" earliest=-65m@m latest=-5m@m | summarize endpoints=count() by request_uri_path,site | extend _destination="endpoints", _time=now() | send

data cost savings with Search

Tee for True

If you append tee=true at the end of the send command you will see the exact metrics that will be sent such as:

dataset="Apache_Logs_to_Metrics_Dataset" earliest=-65m@m latest=-5m@m | summarize endpoints=count() by request_uri_path,site | extend _destination="endpoints", _time=now() | send tee=true

Logs In, Value Out

Take the bloated logs in, use Cribl Routes to send to S3, and Cribl Search to send the metrics you need to your preferred destination. In this example, I am sending to a Prometheus endpoint in Grafana Cloud.

Cribl Stream Routes

Cribl Stream routes

Charting Grafana Prometheus Metrics

A quick search in their interface to display the top 10 endpoints by site would look similar to this query:

topk(10, sum(endpoints) by (request_uri_path, site))

In summary, this query identifies the top 10 combinations of request_uri_path and site based on the sum of the endpoints metric. It helps you find which paths and sites have the highest traffic based on the endpoints count.

Wrap up

The transition from logs to metrics is a game-changer in data management. This remarkable transformation can reduce data volume by nearly 99.94%, resulting in substantial cost savings and operational efficiency. Storing logs in object storage comes with the added benefits of high availability, data durability, and easy access, thanks to features like Replay in Cribl Stream and Cribl Search. With Cribl Search, running tailored searches and generating metrics is a breeze, making data optimization more accessible than ever before. Embrace this transformative shift from logs to metrics with Cribl Search, and unlock new possibilities in data management and cost efficiency.

Ready to get started? Head over to Cribl.Cloud to sign up for a free account to gain instant access to all of our products!



 

Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl’s suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs.

We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

.
Blog
Feature Image

Cribl Search and Common Schema: Faster, More Accurate Detections

Read More
.
Blog
Feature Image

Data Here, Data There, Data Everywhere: the Benefits of Routing Data With Cribl

Read More
.
Blog
Feature Image

How Cribl Stream Can Enhance Digital Operational Resilience Under DORA within Financial Services

Read More
pattern

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.

box