If you’ve ever found yourself pondering the hidden treasures tucked away within thousands of files in Amazon S3, this is the perfect guide for you. In this blog post, we’re going to look at how you can use the Cribl Search fields feature to catalog and explore the fields in petabytes of data stored in Object Stores.
In the Fields Tab within Cribl Search, all returned fields are categorized according to five different dimensions. Without writing a single query you can answer questions like:
What are the top values for each field and how they compare with other fields
Find rare fields
Find very common fields
Using Cribl Search to Explore and Discover the Available Fields
Our first step is to set up a search so that we can explore the Object Store and discover the fields available to us.
Setup the Dataset Provider and Dataset as described in their corresponding docs:
AWS S3: Set up S3
Azure Blob Storage: Set up Azure Blog Storage
Google Cloud Storage: Set up Google Cloud Storage
Search the dataset and select the Fields Tab
For example,
dataset=="cribl_search_sample" | limit 1000
Select the Fields Tab and look at the five dimensions Cribl Search provide us
Type: Field type
Uniques: Number of unique values
Nulls: Number of null values
Top Value Distribution: How often values occur using the standard cumulative beta distribution
Presence: Percentage of results that contain the field
Keep on iterating the different searches and use the Cribl Search UI to explore.
Cribl Search API or UI to Catalog Your Fields
In the second step, we must send the fields discovered in files from the Object Store to a place where we can document and catalog them. To do this, I’ll show you how to use Cribl Search API: /search/jobs/{id}/field-summaries.
You can run the Cribl Search API using the CLI with curl or directly on the Cribl UI.
Using CLI with the curl command to run the Cribl Search API
To run the Cribl Search API we need the Job_Id
, Bearer Token
, and your Cribl.Cloud Instance name
.
Job_Id: The easiest way to locate the Job_Id is to run the search from the UI, click on the details link, and copy the search Id. For details, see Search Details.
Bearer Token: You can find the Bearer Token using the UI or via API. For details, see Bearer Token.
Cribl.Cloud Instance name: This is the Leader hostname/IP (URI)
Now let’s use the curl command to send the results and all of the fields data to a location of our choosing
curl -X GET "https://Cloud_Instance.cribl.cloud/api/v1/m/default_search/search/jobs/Job_Id/field-summaries" -H "accept: application/json" -H "Authorization: Bearer Bearer_Token”
You can upload the response from this curl command to your catalog and documentation.
Using the Cribl UI to Run the Cribl Search API
To run the Cribl Search API via the Cribl UI we only need the Cribl Search Job_Id. The Cribl UI provides the Bearer Token and the Cribl.Cloud instance name.
Job_Id: The easiest way to locate the Job Id is to run the search from the UI, click on the Details link, and copy the Search ID. For details, see Search Details.
After copying the Job_Id, navigate from Cribl Search to your Cribl Stream on your Cribl Cloud instance. Once you are on Cribl Stream, go to: -> Settings -> Global Settings -> API Reference
In the API Reference, change the Servers to https://Cloud_Instance.cribl.cloud/api/v1/m/default_search
Find the Search API section -> /search/jobs/{id}/field-summaries -> Try it out -> paste the Job_Id -> and execute this API
Click on the Download option on the right side of the 200 Response code
The Download option provides you with the response JSON file content that you can upload to your catalog and documentation.
Wrap Up
In conclusion, the Cribl Search Fields feature provides a powerful solution for cataloging and exploring vast amounts of data stored in Object Stores. With this tool, you can easily navigate through petabytes of information and gain valuable insights into the contents of thousands of files in S3. By utilizing this feature, you can uncover hidden patterns, identify relevant data fields, and streamline your data analysis process.
Whether you’re a data scientist, analyst, or IT professional, Cribl Search empowers you to unlock the potential of your data and make informed decisions with confidence. So, next time you find yourself wondering what treasures lie within your Object Stores, turn to Cribl Search and embark on a transformative journey of exploration and discovery. Ready to get started? Set up a Cribl.Cloud account today!