Products
Product Portfolio

Cribl puts your IT and Security data at the center of your data management strategy and provides a one-stop shop for analyzing, collecting, processing, and routing it all at any scale. Try the Cribl suite of products and start building your data engine today!
Learn more ›

Evolving demands placed on IT and Security teams are driving a new architecture for how observability data is captured, curated, and queried. This new architecture provides flexibility and control while managing the costs of increasing data volumes.
Read white paper ›

Cribl Stream

Cribl Stream is a vendor-agnostic observability pipeline that gives you the flexibility to collect, reduce, enrich, normalize, and route data from any source to any destination within your existing data infrastructure.
Learn more ›

Vodafone Case Study

Vodafone Dials up Business Insights with Cribl Stream
Read Case Study ›

Cribl Edge

Cribl Edge provides an intelligent, highly scalable edge-based data collection system for logs, metrics, and application data.
Learn more ›

SpyCloud Edge Story

Listen to how SpyCloud uses Cribl Edge at scale.
Watch Video ›

Cribl Search

Cribl Search turns the traditional search process on its head, allowing users to search data in place without having to collect/store first.
Learn more ›

How Cribl Search Can Save You From Drowning in a Deluge of Data
Read Blog ›

Cribl Lake

Cribl Lake is a turnkey data lake solution that takes just minutes to get up and running — no data expertise needed. Leverage open formats, unified security with rich access controls, and central access to all IT and security data.
Learn more ›

Navigating the future of IT and Security Data management white paper
Read white paper ›

Cribl.Cloud

The Cribl.Cloud platform gets you up and running fast without the hassle of running infrastructure.
Learn more ›

Cribl.Cloud Solution Brief

The fastest and easiest way to realize the value of an observability ecosystem.
Read Solution Brief ›

Cribl Copilot

Cribl Copilot gets your deployments up and running in minutes, not weeks or months.
Learn more ›

Cribl Copilot

Your Trusted AI Advisor for Deploying, Configuring & Troubleshooting.
Read blog ›

AppScope

AppScope gives operators the visibility they need into application behavior, metrics and events with no configuration and no agent required.
Learn more ›

Sandbox

Launch an AppScope Sandbox today!
Launch Now ›
Solutions
Use Cases

Explore Cribl’s Solutions by Use Cases:

Supercharge Security Insights ›

Accelerate Cloud Migration ›

Avoid Vendor Lock-in ›

Agent Consolidation ›

Slash Storage Costs ›

Free Up Space for High-Value Data ›

Route From Any Source To Any Destination ›

Immediate Access to Archived Data ›

Replay Data from Low-Cost Storage ›

Reduce Log Volume & Pay Less for Infrastructure ›
Integration

Explore Cribl’s Solutions by Integrations:

Amazon ›

CrowdStrike ›

Elastic ›

Exabeam ›

Google ›

Microsoft ›

Splunk ›

Wiz ›

View All Integrations ›

Seamless Integrations for Your Observability Data
Learn More ›
Industries

Explore Cribl’s Solutions by Industry:

AIOps ›

Financial Services ›

Healthcare ›

Managed Security Services ›

Manufacturing and Logistics ›

Media and Entertainment ›

Public Sector ›

Retail ›
Resources
Resources

Resource Library ›

Documentation ›

Guides ›

AppScope Docs ›

Blog ›

Glossary ›

Podcasts ›

Telemetry 101

Understanding the Basics of Telemetry and Its Benefits
Learn More ›
Events & Webinars

Events ›

Webinars ›

CriblCon24
Watch On-Demand ›

July 31 | 10am PT / 1pm ET

Navigating the Data Current Report: Transforming IT & Security Operations in 2024
Register ›
Learning

Try the Sandboxes ›

Self Guided Trials ›

Cribl University ›

Cribl Community ›

Cribl Curious Forum ›

What is Observability? ›

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›
Tools & Pricing

Download Library ›

Past Releases ›

Pricing Plans ›

Stream ROI Calculator ›

Download Library

Download Cribl’s suite of products for free to get started.
Download ›
Customers
Customer Stories

Get inspired by how our customers are innovating IT, security and observability. They inspire us daily!
Read Customer Stories ›

Sally Beauty Holdings

Sally Beauty Swaps LogStash and Syslog-ng with Cribl.Cloud for a Resilient Security and Observability Pipeline
Read Case Study ›
Customer Experience

Support & Success ›

Professional Services ›

Service Delivery Partners ›

Documentation ›

AppScope Docs ›

Professional Services

Check out our new Professional Services offering.
Learn More ›
Learning

Try the Sandboxes ›

Self Guided Trials ›

Cribl University ›

Cribl Community ›

Cribl Curious Forum ›

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›
Company
About Cribl

Transform data management with Cribl, the Data Engine for IT and Security
Learn More ›

Cribl Corporate Overview

Cribl makes open observability a reality, giving you the freedom and flexibility to make choices instead of compromises.
Get the Guide ›

Cribl Newsroom

Stay up to date on all things Cribl and observability.
Visit the Newsroom ›

Press Releases

Read our most recent press releases.
Recent Press Releases ›

Leadership

Cribl’s leadership team has built and launched category-defining products for some of the most innovative companies in the technology sector, and is supported by the world’s most elite investors.
Meet our Leaders ›

Careers

Join the Cribl herd! The smartest, funniest, most passionate goats you’ll ever meet.
Learn More ›

Cribl Named to the Inc. 5000 List of Fastest Growing Private Companies
Learn More ›

Cribl for Startups

Whether you’re just getting started or scaling up, the Cribl for Startups program gives you the tools and resources your company needs to be successful at every stage.
Learn More ›

Contact Us

Want to learn more about Cribl from our sales experts? Send us your contact information and we’ll be in touch.
Talk to an Expert ›

Try Cribl Talk to an expert

Connect and Federate Searches Across Your Cloud Data Lakes with Cribl Search

October 19, 2023

Written by

Categories: Cribl Search, Engineering

Back To Blogs

The way we handle massive volumes of data from multiple sources is about to change fundamentally. The traditional data processing systems don’t always fit into our budget (unless you have some pretty deep pockets). Our wallets constantly need to expand to keep up with the changing data veracity and volume, which isn’t always feasible. Yet we keep doing it because data is a commodity.

To complement our traditional processing systems, data lakes are becoming more popular. With data lakes, we can store any amount of data in real-time as it flows. More importantly, data lakes allow us to tier our data, essentially sending our high-value data to our analytics system and our low-value data to our retention system. Ultimately, this makes all our data accessible.

A data lake also allows us to store data in multiple cloud environments without incurring high egress costs. Moving data is expensive! But you might be asking, “ok, so cool, I have all this data in my data lake but I’m going to have to move it eventually!” Well, not necessarily. Investigative and search workflows are changing. What if you could search your data without moving it out of your data lake? And what if you didn’t have to index it in your analysis system, reducing licensing costs? How about running a unifying search across multiple data lakes and datasets at once? Well, YOU CAN!

Cribl Search combines federated, centralized search with decentralized data storage to offer a search-in-place solution. You are free to use ALL the cloud data lakes your business requires or your heart desires!

This blog will take a quick look at how to federate a search across all three major Cloud data lakes (AWS S3, Azure Blob, and Google GCS). Why, you might ask? Here are some reasons why federated cloud searches are important:

Investigate the same data across multiple cloud providers more quickly (e.g., flow logs across all three main cloud providers).
Ensure conformance across all data lakes – Eventually, you’re going to have to search data across multiple data lakes, but how can you make sure they all follow the same standard? Use Cribl Search to run conformance-based searches.
Give users access to the data – analytics systems can be expensive. What if those infrequent or ad hoc users could access that data through Cribl Search without incurring additional licensing costs downstream or cloud egress costs.
Analyze historical data before replaying to your analysis system – We often need to review historical data for investigations, capacity exercises, or other reasons. Instead of replaying or rehydrating all the data for a given time, Cribl Search allows you to find the relevant data PRIOR to rehydrating.
And more…!

So now that you know federated cloud searches can be useful, let’s connect your data lakes and start searching.

Connect Amazon S3 to Cribl Search

First, we’ll tackle Amazon S3. There are two ways to authenticate:

Use an access key/secret key combination
Assume a role within your organization’s account.

In this blog, we are going to be covering the second option with AssumeRole.

In the diagram below, we show how Cribl Search is hosted in Cribl.Cloud works with your AWS AssumeRole permissions to access your S3 resources.

Here’s how it works:

The Cribl Search Compute assumes a role in your AWS Account. Your account’s Trust Policy allows this role to assume another role.
By using these credentials, Cribl Search Compute can access resources in your AWS Account that are limited by the resource/permission policies.

Let’s get started by setting up two items within your AWS Account. In Cribl Search, you’ll find both the Trust and Permission policies under Dataset Providers. Let’s get started by navigating there.

In Cribl Search, navigate to Data > Dataset Providers > Create Provider
Set Dataset Provider Type to Amazon S3.
Copy the Trust Policy and Permission Policy and paste them into a temporary notepad. We will need these once we get to the AWS Console.

In the AWS Console, navigate to AWS Console > IAM. Create a Role with a custom Trust Policy.
Paste in the Trust Policy you copied in Step #3 from Cribl Search Dataset provider. You will need to update the ExternalId field in your Trust Policy if you are using one.

In the AWS console, under Step 2 of the Role Creation wizard, you can add the Permissions policy copied from Step #3 here to allow access to the S3 bucket you plan to search in the AWS console. You will need to update the “Resource” in the policy itself to reflect the name of your bucket.

When the role is created in the AWS console, copy the ARN and use it in the Cribl Search Dataset provider set up page for S3. Include any ExternalId you used in your Trust Policy. Click Save.

In Cribl Search, navigate to Data > Datasets > Add Dataset. Set the Dataset Provider to the Amazon S3 dataset provider (its ID) you created in the earlier step. Specify the bucket prefixes to populate the Bucket path. If you want to tokenize parts of that prefix, you can use the following syntax:

"mybucket/${fieldA}/${fieldB}"

Under “Processing”, add any Datatype Rulesets to make searching your data easier and fields referenceable in your search queries.

Connect Google Cloud Storage (GCS) to Cribl Search

On to Google Cloud Storage (GCS) hosted within GCP. To access resources in your GCP account, Cribl Search uses a service account on the GCP side with service account credentials. Let’s go over your storage bucket access controls and get your service account up and running.

In the GCP Console, if you haven’t already created your storage bucket, go ahead and do so. When you get to “Access Controls”, make sure to select “Fine Grained” Access control. If your bucket already exists, make these modifications “Bucket Details” > “Permissions” section.

Navigate to IAM & Admin > Service Accounts > Create Service Account in the GCP Console to create a new service account for Cribl Search access. Give the account an ID.

Once the service account is created, copy the address of the service account. You’ll need it in Step 4. Then navigate to the Keys tab and click Add Key > JSON to generate Service Account credentials for Cribl Search.

Navigate back to the bucket, on the Permissions tab, click Grant Access under the Principals section. Add the Service Account address you copied from before to the Principals field and assign it the role of Storage Admin for your bucket. Click Save.

In Cribl Search, navigate to Data> Dataset Providers > Create Provider. Select Google Cloud Storage. Add the Service Account credentials you created in Step #3.

Navigate to Data > Datasets > Add Dataset. Set the Dataset Provider to the Google Cloud Storage dataset provider (its ID) you created in the earlier step. Specify the bucket prefixes to populate the path. . If you want to tokenize parts of that prefix, you can use the following syntax:

"mybucket/${fieldA}/${fieldB}"

Connect Azure Blob to Cribl Search

Finally, let’s connect Azure Blob to Cribl Search. For authentication, Azure Blob offers two options:

Shared Access Signatures (SAS)
Connection Strings.

This blog covers the second method, Connection Strings.

In your Azure Portal, navigate to Storage Account > and select the Storage Account you want Cribl Search to access, then go to the Access Keys. You will see a handful of keys already configured for the storage account. You can Rotate key if necessary. Copy the Connection String of the key you’d like to use.

In Cribl Search, navigate to Data> Dataset Providers > Create Provider. Select Azure Blob Storage. Add the Connection String you copied from Step #1 along with the location of your Storage Account.

Navigate to Data > Datasets > Add Dataset. Set the Dataset Provider to the Azure Blob dataset provider (its ID) you created in the earlier step. Populate the Container name with the storage container and prefixes you have in your Azure Storage Account. If you want to tokenize parts of that prefix, you can use the following syntax:

"mybucket/${fieldA}/${fieldB}"

Running a Federated Search Across All 3 Cloud Providers

We now have all three cloud providers configured within Cribl Search. It’s time to federate some searches.

In this example, each of the three cloud providers has flow logs in their object storage that we have configured Dataset Providers and Datasets to access. To give all three datasets a uniform naming convention, each dataset is appended with “_flowlogs”. This naming convention allows us to run the first search to combine data from all three datasets:

dataset="*_flowlogs" | limit 1000

We can see all three datasets from GCS, Azure Blob, and Amazon S3 returning results here. With Cribl Search, you can easily gather all your flowlogs from your multi-cloud management environments for analysis.

Let’s take that search and turn it into a bar chart with counts of each dataset’s flowlog. By visualizing each provider’s flowlog volume, I can see if there are any anomalies in terms of the volume I expect from each and add this visualization to my dashboard.

dataset="*_flowlogs" | summarize flowcount=count() by dataset

It is likely that the three providers will also share fields. However, they can’t make it THAT easy! For our AWS flowlogs, we already have a “dstport”; however, Azure organizes flowlogs into tuples from which we must extract the destination port.

As part of the next search, we try to accomplish just that and then go one step further by using the coalesce operator to create a normalized field, “Destination_Port”, across my Azure and AWS flows. It is also possible to create datatypes with normalized fields for each dataset as well. (More details here: https://docs.cribl.io/search/datatypes/). Now that I have normalized fields, I can analyze data across all three cloud providers agnostically.

dataset="*_flowlogs" | limit 1000 | extract type=regex regex=@"\d+,([^,]*,){3}(?<dst_port>\d+)" | extend Destination_Port= coalesce(dst_port, dstport)

Wrap up

The popularity of multi-cloud environments and data lakes is growing. Cribl Search allows you to bring all of those datasets together, normalize them, and gain insights across all platforms at once rather than keeping them all separate during analysis. Federated searches here we come! No more need for a swivel chair I mean, swivel cloud analysis! Visit us over at Cribl Community Slack to share any cool multi-cloud searches you’re running today!

Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl’s suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs.

We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Blog

Preventing Friction With an Impactful Security Champions Program

Blog

From Necessity to Opportunity: The Customer Push for SIEM Options

Blog

Securing the Foundation of Cribl Copilot

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.

Launch Now

Product Portfolio

Cribl Stream

Cribl Edge

Cribl Search

Cribl Lake

Cribl.Cloud

Cribl Copilot

AppScope

Use Cases

Integration

Industries

Resources

Events & Webinars

Learning

Tools & Pricing

Download Library

Customer Stories

Customer Experience

Learning

Try Your Own Cribl Sandbox

About Cribl

Cribl Newsroom

Leadership

Careers

Cribl for Startups

Contact Us

Connect and Federate Searches Across Your Cloud Data Lakes with Cribl Search

Written by

Yasmin Hovakeemian

Connect Amazon S3 to Cribl Search

Connect Google Cloud Storage (GCS) to Cribl Search

Connect Azure Blob to Cribl Search

Running a Federated Search Across All 3 Cloud Providers

Wrap up

Blog

Preventing Friction With an Impactful Security Champions Program

Blog

From Necessity to Opportunity: The Customer Push for SIEM Options

Blog

Securing the Foundation of Cribl Copilot

Try Your Own Cribl Sandbox

So you're rockin' Internet Explorer!