Internet search concept, magnifier and computer keyboard

Data Field Discovery and Exploration in Cribl Search

August 14, 2023
Written by
Raanan Dagan's Image

During his many years at Cribl, Splunk, Cloudera, and Oracle he was part of multiple impl... Read Moreementations of security, analytics, cloud, open-source, and IT use cases as well as big data and data lake projects in complex environments. Raanan is a global resource with 30 years of experience building large data clusters. He has helped thousands of customers, including some who ingest several hundred terabytes per day and store multiple petabytes of data. Read Less

Categories: Cribl Search, Engineering

If you’ve ever found yourself pondering the hidden treasures tucked away within thousands of files in Amazon S3, this is the perfect guide for you. In this blog post, we’re going to look at how you can use the Cribl Search fields feature to catalog and explore the fields in petabytes of data stored in Object Stores.

In the Fields Tab within Cribl Search, all returned fields are categorized according to five different dimensions. Without writing a single query you can answer questions like:

  • What are the top values for each field and how they compare with other fields
  • Find rare fields
  • Find very common fields

Using Cribl Search to Explore and Discover the Available Fields

Our first step is to set up a search so that we can explore the Object Store and discover the fields available to us.

  1. Setup the Dataset Provider and Dataset as described in their corresponding docs:
  2. Search the dataset and select the Fields Tab
    • For example, dataset=="cribl_search_sample" | limit 1000
    • Select the Fields Tab and look at the five dimensions Cribl Search provide us
      • Type: Field type
      • Uniques: Number of unique values
      • Nulls: Number of null values
      • Top Value Distribution: How often values occur using the standard cumulative beta distribution
      • Presence: Percentage of results that contain the field
  3. Keep on iterating the different searches and use the Cribl Search UI to explore.

A screenshot of a computer Description automatically generated with low confidence

Cribl Search API or UI to Catalog Your Fields

In the second step, we must send the fields discovered in files from the Object Store to a place where we can document and catalog them. To do this, I’ll show you how to use Cribl Search API: /search/jobs/{id}/field-summaries.

You can run the Cribl Search API using the CLI with curl or directly on the Cribl UI.

Using CLI with the curl command to run the Cribl Search API

To run the Cribl Search API we need the Job_Id, Bearer Token, and your Cribl.Cloud Instance name.

  • Job_Id: The easiest way to locate the Job_Id is to run the search from the UI, click on the details link, and copy the search Id. For details, see Search Details.

A screenshot of a computer Description automatically generated with medium confidence

  • Bearer Token: You can find the Bearer Token using the UI or via API. For details, see Bearer Token.
  • Cribl.Cloud Instance name: This is the Leader hostname/IP (URI)

Now let’s use the curl command to send the results and all of the fields data to a location of our choosing

curl -X GET "https://Cloud_Instance.cribl.cloud/api/v1/m/default_search/search/jobs/Job_Id/field-summaries" -H "accept: application/json" -H "Authorization: Bearer Bearer_Token”

You can upload the response from this curl command to your catalog and documentation.

Using the Cribl UI to Run the Cribl Search API

  • To run the Cribl Search API via the Cribl UI we only need the Cribl Search Job_Id. The Cribl UI provides the Bearer Token and the Cribl.Cloud instance name.
  • Job_Id: The easiest way to locate the Job Id is to run the search from the UI, click on the Details link, and copy the Search ID. For details, see Search Details.

A screenshot of a computer Description automatically generated with medium confidence

  • After copying the Job_Id, navigate from Cribl Search to your Cribl Stream on your Cribl Cloud instance. Once you are on Cribl Stream, go to: -> Settings -> Global Settings -> API Reference

A screenshot of a computer Description automatically generated with medium confidence

  1. In the API Reference, change the Servers to https://Cloud_Instance.cribl.cloud/api/v1/m/default_search
  2. Find the Search API section -> /search/jobs/{id}/field-summaries -> Try it out -> paste the Job_Id -> and execute this API
  3. Click on the Download option on the right side of the 200 Response code

A screenshot of a computer Description automatically generated with medium confidence

The Download option provides you with the response JSON file content that you can upload to your catalog and documentation.

Wrap Up

In conclusion, the Cribl Search Fields feature provides a powerful solution for cataloging and exploring vast amounts of data stored in Object Stores. With this tool, you can easily navigate through petabytes of information and gain valuable insights into the contents of thousands of files in S3. By utilizing this feature, you can uncover hidden patterns, identify relevant data fields, and streamline your data analysis process.

Whether you’re a data scientist, analyst, or IT professional, Cribl Search empowers you to unlock the potential of your data and make informed decisions with confidence. So, next time you find yourself wondering what treasures lie within your Object Stores, turn to Cribl Search and embark on a transformative journey of exploration and discovery. Ready to get started? Set up a Cribl.Cloud account today!

 


 

Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl’s suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs.

We offer free training, certifications, and a generous free usage plan across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started. We also offer a hands-on Sandbox for those interested in how companies globally leverage our products for their data challenges.

.
Blog
Feature Image

Leveraging AWS Private Image Build for a Compliant Cribl Deployment

Read More
.
Blog
Feature Image

Cribl: Empowering Data Freedom with Open Standards and Unmatched Flexibility

Read More
.
Blog
Feature Image

Hello Vegas! Cribl @ AWS re:Invent 2024

Read More
pattern

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.

box

So you're rockin' Internet Explorer!

Classic choice. Sadly, our website is designed for all modern supported browsers like Edge, Chrome, Firefox, and Safari

Got one of those handy?