Products
Product Portfolio

Cribl puts your IT and Security data at the center of your data management strategy and provides a one-stop shop for analyzing, collecting, processing, and routing it all at any scale. Try the Cribl suite of products and start building your data engine today!
Learn more ›

Evolving demands placed on IT and Security teams are driving a new architecture for how observability data is captured, curated, and queried. This new architecture provides flexibility and control while managing the costs of increasing data volumes.
Read white paper ›

Cribl Stream

Cribl Stream is a vendor-agnostic observability pipeline that gives you the flexibility to collect, reduce, enrich, normalize, and route data from any source to any destination within your existing data infrastructure.
Learn more ›

Vodafone Case Study

Vodafone Dials up Business Insights with Cribl Stream
Read Case Study ›

Cribl Edge

Cribl Edge provides an intelligent, highly scalable edge-based data collection system for logs, metrics, and application data.
Learn more ›

SpyCloud Edge Story

Listen to how SpyCloud uses Cribl Edge at scale.
Watch Video ›

Cribl Search

Cribl Search turns the traditional search process on its head, allowing users to search data in place without having to collect/store first.
Learn more ›

How Cribl Search Can Save You From Drowning in a Deluge of Data
Read Blog ›

Cribl Lake

Cribl Lake is a turnkey data lake solution that takes just minutes to get up and running — no data expertise needed. Leverage open formats, unified security with rich access controls, and central access to all IT and security data.
Learn more ›

Navigating the future of IT and Security Data management white paper
Read white paper ›

Cribl.Cloud

The Cribl.Cloud platform gets you up and running fast without the hassle of running infrastructure.
Learn more ›

Cribl.Cloud Solution Brief

The fastest and easiest way to realize the value of an observability ecosystem.
Read Solution Brief ›

Cribl Copilot

Cribl Copilot gets your deployments up and running in minutes, not weeks or months.
Learn more ›

Cribl Copilot

Your Trusted AI Advisor for Deploying, Configuring & Troubleshooting.
Read blog ›

AppScope

AppScope gives operators the visibility they need into application behavior, metrics and events with no configuration and no agent required.
Learn more ›

Sandbox

Launch an AppScope Sandbox today!
Launch Now ›
Solutions
Use Cases

Explore Cribl’s Solutions by Use Cases:

Supercharge Security Insights ›

Accelerate Cloud Migration ›

Avoid Vendor Lock-in ›

Agent Consolidation ›

Slash Storage Costs ›

Free Up Space for High-Value Data ›

Route From Any Source To Any Destination ›

Immediate Access to Archived Data ›

Replay Data from Low-Cost Storage ›

Reduce Log Volume & Pay Less for Infrastructure ›
Integration

Explore Cribl’s Solutions by Integrations:

Amazon ›

CrowdStrike ›

Elastic ›

Exabeam ›

Google ›

Microsoft ›

Splunk ›

Wiz ›

View All Integrations ›

Seamless Integrations for Your Observability Data
Learn More ›
Industries

Explore Cribl’s Solutions by Industry:

AIOps ›

Financial Services ›

Healthcare ›

Managed Security Services ›

Manufacturing and Logistics ›

Media and Entertainment ›

Public Sector ›

Retail ›
Resources
Resources

Resource Library ›

Documentation ›

Guides ›

AppScope Docs ›

Blog ›

Glossary ›

Podcasts ›

Telemetry 101

Understanding the Basics of Telemetry and Its Benefits
Learn More ›
Events & Webinars

Events ›

Webinars ›

CriblCon24
Watch On-Demand ›

July 31 | 10am PT / 1pm ET

Navigating the Data Current Report: Transforming IT & Security Operations in 2024
Register ›
Learning

Try the Sandboxes ›

Self Guided Trials ›

Cribl University ›

Cribl Community ›

Cribl Curious Forum ›

What is Observability? ›

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›
Tools & Pricing

Download Library ›

Past Releases ›

Pricing Plans ›

Stream ROI Calculator ›

Download Library

Download Cribl’s suite of products for free to get started.
Download ›
Customers
Customer Stories

Get inspired by how our customers are innovating IT, security and observability. They inspire us daily!
Read Customer Stories ›

Sally Beauty Holdings

Sally Beauty Swaps LogStash and Syslog-ng with Cribl.Cloud for a Resilient Security and Observability Pipeline
Read Case Study ›
Customer Experience

Support & Success ›

Professional Services ›

Service Delivery Partners ›

Documentation ›

AppScope Docs ›

Professional Services

Check out our new Professional Services offering.
Learn More ›
Learning

Try the Sandboxes ›

Self Guided Trials ›

Cribl University ›

Cribl Community ›

Cribl Curious Forum ›

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›
Company
About Cribl

Transform data management with Cribl, the Data Engine for IT and Security
Learn More ›

Cribl Corporate Overview

Cribl makes open observability a reality, giving you the freedom and flexibility to make choices instead of compromises.
Get the Guide ›

Cribl Newsroom

Stay up to date on all things Cribl and observability.
Visit the Newsroom ›

Press Releases

Read our most recent press releases.
Recent Press Releases ›

Leadership

Cribl’s leadership team has built and launched category-defining products for some of the most innovative companies in the technology sector, and is supported by the world’s most elite investors.
Meet our Leaders ›

Careers

Join the Cribl herd! The smartest, funniest, most passionate goats you’ll ever meet.
Learn More ›

Cribl Named to the Inc. 5000 List of Fastest Growing Private Companies
Learn More ›

Cribl for Startups

Whether you’re just getting started or scaling up, the Cribl for Startups program gives you the tools and resources your company needs to be successful at every stage.
Learn More ›

Contact Us

Want to learn more about Cribl from our sales experts? Send us your contact information and we’ll be in touch.
Talk to an Expert ›

Try Cribl Talk to an expert

Scalable Data Collection from Azure Blob Storage

April 7, 2021

Categories: Engineering

Back To Blogs

Data collection from Amazon S3, first introduced in Cribl LogStream 2.0, has been an overnight success with most of our AWS customers. In 2.4.4 we’ve added a similar capability to read data at scale from Azure Blob Storage, where a lot of other customers store massive amounts of observability data; logs, metrics, events, etc. In this post, we’ll take a look at how it works, and how to configure it.

If you’re new to Cribl LogStream, you may want to take our sandboxes for a drive before reading further.

How Does It Work

Reading data from Azure Blob Storage can be accomplished directly via its API. This is pretty straightforward when the number of blobs to be read is small. However, at large scale, the following problems start manifesting themselves:

Just listing across multiple containers with tens or hundreds of thousands of objects, can become expensive and time consuming.
Tracking what blob is currently being read, and what Worker Node is reading, what can become a real issue in a distributed environment.
Resiliency is all to be figured out by the reader.

One way to address these issues is by using Azure Blob Storage events in Queue Storage via Azure Event Grid, with a LogStream distributed architecture. Conceptually, the mechanism works this way:

A new blob of data lands on Azure Blob Storage.
A Blob Created notification is sent to a Queue Storage queue, via Azure Event Grid.
LogStream Worker Processes are configured as Queue consumers, and each Process reads notifications/messages, while the Queue marks them invisible to others. This separation ensures that no two Workers’ processes read the same message.
Each Worker Process extracts the path to the blob from each message, and goes and reads it from Blob Storage.
When the read is completed, the Worker Process then deletes its messages from the Queue.

Benefits

Scalability – the more Worker Processes you have consuming messages from the Queue, the higher the read throughput.
Resiliency – If one Worker Process becomes unavailable, the Queue will make its messages visible to others.
Timely delivery – acting on notification tends to be faster than scanning and listing all blobs.

Azure Configuration

There are multiple ways to configure Azure to achieve this. If you’re new to it, please follow along these three steps:

1. Set Up System Topic in Event Grid

Navigate to Event Grid System Topics. Create a new topic by clicking on +Create, and then set the Topic Type to Storage Account (Blob).

Select the desired Subscription and Resource Group. In Resource, select the storage account of interest. And assign the topic a preferred name.

2. Set Up a Queue

Navigate to your Storage Account and create a Queue.

Select the Storage Account of interest, and then in the submenu, select Queue Service > Queues.

Click + Queue to create a queue.

3. Configure Storage Account to Send Notifications

From the Storage Accounts menu, select Events.

Then click + Event Subscription to configure notifications:
- Enter a Name for the subscription.
- In System Topic Name, enter the name of the system topic created above.
- In Event Types, select Blob Created, and deselect Blob Deleted.
- As the Endpoint Type, select Storage Queues.

Click Select an endpoint, and click the subscription to use (Pay‑As-You-Go).

Next, select the storage account on which to add the subscription.

Select the queue you created above, and Save.

Click Create to complete the process.

LogStream Configuration

Before we start configuring LogStream, let’s make sure we have the following config settings/strings available:

The storage account Queue name (from above)
The storage account Connection String (found under Storage Account > Name > Access Keys)

Navigate to Sources > Azure > Blob Storage and click + Add New.
Enter the Queue value from above.
Enter the Connection String value from above. (Alternatively, you can use `env.AZURE_STORAGE_CONNECTION_STRING`.
Ideally, use a Filename Filter that identifies blobs of interest.
Optionally, modify Event Breakers, Fields (Metadata), and Pre-processing, as necessary.

Then, in Advanced Settings, change the following only if really necessary:

Max messages – number of messages each receiver (below) can get on each poll.
Visibility timeout (secs) – duration (in seconds) over which the received messages are hidden from subsequent retrieve requests, after being retrieved by a Worker Process. Practically speaking, this is the time each Worker Process is given to fetch and process a blob.
Num receivers – number of pollers to run per Worker Process.
Skip file on error – skip files that trigger a processing error.

Best Practices

Use the Filename Filter as aggressively as possible, using Preview to test the expression as you build it. Filtering ensures that only files of interest are ingested by LogStream, thus improving latency, throughput, and data quality.
If higher throughput is needed, increase the Source’s Advanced Settings > Number of Receivers. However, note that this setting is per Worker Process. E.g., a Worker Node with 21 Worker Processes, and Number of Receivers set to 2, will have a total of 42 pollers.
While the Source’s default Visibility timeout (secs) value of 600s works well in most cases, when ingesting large files, tune up this value or consider using smaller blobs.

Blog

Preventing Friction With an Impactful Security Champions Program

Blog

From Necessity to Opportunity: The Customer Push for SIEM Options

Blog

Securing the Foundation of Cribl Copilot

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.

Launch Now

Product Portfolio

Cribl Stream

Cribl Edge

Cribl Search

Cribl Lake

Cribl.Cloud

Cribl Copilot

AppScope

Use Cases

Integration

Industries

Resources

Events & Webinars

Learning

Tools & Pricing

Download Library

Customer Stories

Customer Experience

Learning

Try Your Own Cribl Sandbox

About Cribl

Cribl Newsroom

Leadership

Careers

Cribl for Startups

Contact Us

Scalable Data Collection from Azure Blob Storage

How Does It Work

Benefits

Azure Configuration

1. Set Up System Topic in Event Grid

2. Set Up a Queue

3. Configure Storage Account to Send Notifications

LogStream Configuration

Best Practices

Blog

Preventing Friction With an Impactful Security Champions Program

Blog

From Necessity to Opportunity: The Customer Push for SIEM Options

Blog

Securing the Foundation of Cribl Copilot

Try Your Own Cribl Sandbox

So you're rockin' Internet Explorer!