Products
Product Portfolio

Cribl puts your IT and Security data at the center of your data management strategy and provides a one-stop shop for analyzing, collecting, processing, and routing it all at any scale. Try the Cribl suite of products and start building your data engine today!
Learn more ›

Evolving demands placed on IT and Security teams are driving a new architecture for how observability data is captured, curated, and queried. This new architecture provides flexibility and control while managing the costs of increasing data volumes.
Read white paper ›

Cribl Stream

Cribl Stream is a vendor-agnostic observability pipeline that gives you the flexibility to collect, reduce, enrich, normalize, and route data from any source to any destination within your existing data infrastructure.
Learn more ›

Vodafone Case Study

Vodafone Dials up Business Insights with Cribl Stream
Read Case Study ›

Cribl Edge

Cribl Edge provides an intelligent, highly scalable edge-based data collection system for logs, metrics, and application data.
Learn more ›

SpyCloud Edge Story

Listen to how SpyCloud uses Cribl Edge at scale.
Watch Video ›

Cribl Search

Cribl Search turns the traditional search process on its head, allowing users to search data in place without having to collect/store first.
Learn more ›

How Cribl Search Can Save You From Drowning in a Deluge of Data
Read Blog ›

Cribl Lake

Cribl Lake is a turnkey data lake solution that takes just minutes to get up and running — no data expertise needed. Leverage open formats, unified security with rich access controls, and central access to all IT and security data.
Learn more ›

Navigating the future of IT and Security Data management white paper
Read white paper ›

Cribl.Cloud

The Cribl.Cloud platform gets you up and running fast without the hassle of running infrastructure.
Learn more ›

Cribl.Cloud Solution Brief

The fastest and easiest way to realize the value of an observability ecosystem.
Read Solution Brief ›

Cribl Copilot

Cribl Copilot gets your deployments up and running in minutes, not weeks or months.
Learn more ›

Cribl Copilot

Your Trusted AI Advisor for Deploying, Configuring & Troubleshooting.
Read blog ›

AppScope

AppScope gives operators the visibility they need into application behavior, metrics and events with no configuration and no agent required.
Learn more ›

Sandbox

Launch an AppScope Sandbox today!
Launch Now ›
Solutions
Use Cases

Explore Cribl’s Solutions by Use Cases:

Supercharge Security Insights ›

Accelerate Cloud Migration ›

Avoid Vendor Lock-in ›

Agent Consolidation ›

Slash Storage Costs ›

Free Up Space for High-Value Data ›

Route From Any Source To Any Destination ›

Immediate Access to Archived Data ›

Replay Data from Low-Cost Storage ›

Reduce Log Volume & Pay Less for Infrastructure ›
Integration

Explore Cribl’s Solutions by Integrations:

Amazon ›

CrowdStrike ›

Elastic ›

Exabeam ›

Google ›

Microsoft ›

Splunk ›

Wiz ›

View All Integrations ›

Seamless Integrations for Your Observability Data
Learn More ›
Industries

Explore Cribl’s Solutions by Industry:

AIOps ›

Financial Services ›

Healthcare ›

Managed Security Services ›

Manufacturing and Logistics ›

Media and Entertainment ›

Public Sector ›

Retail ›
Resources
Resources

Resource Library ›

Documentation ›

Guides ›

AppScope Docs ›

Blog ›

Glossary ›

Podcasts ›

Telemetry 101

Understanding the Basics of Telemetry and Its Benefits
Learn More ›
Events & Webinars

Events ›

Webinars ›

CriblCon24
Watch On-Demand ›

July 31 | 10am PT / 1pm ET

Navigating the Data Current Report: Transforming IT & Security Operations in 2024
Register ›
Learning

Try the Sandboxes ›

Self Guided Trials ›

Cribl University ›

Cribl Community ›

Cribl Curious Forum ›

What is Observability? ›

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›
Tools & Pricing

Download Library ›

Past Releases ›

Pricing Plans ›

Stream ROI Calculator ›

Download Library

Download Cribl’s suite of products for free to get started.
Download ›
Customers
Customer Stories

Get inspired by how our customers are innovating IT, security and observability. They inspire us daily!
Read Customer Stories ›

Sally Beauty Holdings

Sally Beauty Swaps LogStash and Syslog-ng with Cribl.Cloud for a Resilient Security and Observability Pipeline
Read Case Study ›
Customer Experience

Support & Success ›

Professional Services ›

Service Delivery Partners ›

Documentation ›

AppScope Docs ›

Professional Services

Check out our new Professional Services offering.
Learn More ›
Learning

Try the Sandboxes ›

Self Guided Trials ›

Cribl University ›

Cribl Community ›

Cribl Curious Forum ›

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›
Company
About Cribl

Transform data management with Cribl, the Data Engine for IT and Security
Learn More ›

Cribl Corporate Overview

Cribl makes open observability a reality, giving you the freedom and flexibility to make choices instead of compromises.
Get the Guide ›

Cribl Newsroom

Stay up to date on all things Cribl and observability.
Visit the Newsroom ›

Press Releases

Read our most recent press releases.
Recent Press Releases ›

Leadership

Cribl’s leadership team has built and launched category-defining products for some of the most innovative companies in the technology sector, and is supported by the world’s most elite investors.
Meet our Leaders ›

Careers

Join the Cribl herd! The smartest, funniest, most passionate goats you’ll ever meet.
Learn More ›

Cribl Named to the Inc. 5000 List of Fastest Growing Private Companies
Learn More ›

Cribl for Startups

Whether you’re just getting started or scaling up, the Cribl for Startups program gives you the tools and resources your company needs to be successful at every stage.
Learn More ›

Contact Us

Want to learn more about Cribl from our sales experts? Send us your contact information and we’ll be in touch.
Talk to an Expert ›

Try Cribl Talk to an expert

Building a Scripted Event Collector With Cribl Stream

September 6, 2023

Written by

Categories: Cribl Stream, Engineering

Back To Blogs

Cribl Stream provides a robust HTTP REST collector, with many features and options. Still, there are endless combinations that vendors can provide in their API endpoints. Sometimes you may need to take more extreme measures to unlock data stashed behind the API entry point. No worries! Cribl also allows you to run a script to collect that data, and can even help you scale it. In this blog post, we’ll cover how I completed this task for a recent interaction using Qualys.

Step 1: Credentials

Set-up a test account in the API authentication system. Preferably make it read-only, and with access to only the resources you will be monitoring. We want to minimize the potential blast radius of compromised credentials.

Step 2: Plagiarism Is a Compliment

Find previous art, if available. In the case of Qualys, the client pointed me to an old perl script that does what we need — mostly. In testing it out, I noticed it was retrieving all host detections based on an optional time filter. With 10s of thousands of hosts, even a very short interval proved to be a huge task, taking quite a long time to run.

Step 3: Docs Docs Docs (And Phone-a-friend)

Check the vendor for API docs. For Qualys, I found some API docs, and a blog post that helped me get on the right track to optimizing.

And don’t forget about my new bestie, Chad Gippity. Be warned that the answers are not always perfectly accurate, but most times I get a good starting point. I believe Chad helped me shorten the dev cycle quite a bit.

“Write a python script that collects detections from the qualys api”
Chad: Roger!

Step 4: Set Phase to Discover

The primary way we’re going to improve the API pull is by breaking up the list of targets. The best practices blog linked above says we should first request a list of target systems, then call for the detections list in batches of 5000 (ish). The Perl script doesn’t do that. By multithreading the requests, we can parallelize the heck out of this beauty.

And this is one of the fantastic features of the Cribl Collector sources. Multi-threading scripts were never this easy.

There are 2 phases possible in each collector: Discover what work needs to be done, and do the work. For Qualys, we want to get the list of host ids that recently got scanned, then make multiple independent requests for subsets of that list. Rather than one big request for all 25000 IDs, we’ll make 5 requests with 5000 each. Or 25 requests with 1000 each. The more worker processes in our worker group, the more threads we can fire up.

Step 5: Design

We now have an idea of what we’re going to do:

Get logged in
Request the (long) list of targets in a timeframe
Break the list into smaller chunks and print it on stdout, 1 chunk per line
Get the detections for each batch
Print the detections to stdout

Step 6: Dev and Test

Well, this part may not be so exciting. I rolled up my sleeves and got to work on authenticating, requesting the list of hosts, and printing out the list in chunks of 1000. Each line of output had 1000 entries separated by spaces. Then I added a separate mode to the script: If it’s called with the list of ids, run the detection piece.

By checking for the $CRIBL_COLLECT_ARG environment variable I can determine if we’re in Discover mode (the variable doesn’t exist), or Collect mode (it’s there). I also added a few more environment variables: time, max size of threads, username, and password.

I’m no Pythonic expert, so it took me a few hours to sort it all out. Eventually, I could run the script in list mode and have it output a segmented list. Then I could take a sample from that list and run in detect mode to get the detections for that sample list. I used the Python requests module, so that will need to be available on each worker node. So far all the testing was done on the command line.

Step 7: Deploy in Cribl

We’ve got a working script. We’ve validated its 2 run modes, Discover and Collect. Let’s get it into service.

Go into the Leader and make a new Scripted Collector:

And the settings:

The Discover and Collect Script entries are identical copies of the entire python script, and the value of the CRIBL_earliest environment variable is:

C.Time.strftime(Date.now()/1000 - 300,"%Y-%m-%dT%H:%M:00")

Meaning, 5 minutes ago, in the format Qualys expects.

Finally, I scheduled it to run every 5 minutes:

Step 8: Wrap It Up

Using samples captured from test runs, I created the Event Breaker to unroll the events into manageable sizes, and a pipeline to transform XML into JSON. You can see the final product in the Cribl Dispensary.

Conclusion

Cribl’s built-in sources are flexible, but sometimes there are edge cases that just don’t fit. By offering features like Scripted Collectors, HTTP, raw TCP, and raw UDP, Cribl gives you the flexibility to get almost any data source ingested into your event pipeline. What have you done “outside of the box” with Cribl? Join us in our Slack community and share! Ready to get started? Head over to Cribl.Cloud to create an account and get 1 TB/day for free!

Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl’s suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs.

We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Blog

Preventing Friction With an Impactful Security Champions Program

Blog

From Necessity to Opportunity: The Customer Push for SIEM Options

Blog

Securing the Foundation of Cribl Copilot

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.

Launch Now

Product Portfolio

Cribl Stream

Cribl Edge

Cribl Search

Cribl Lake

Cribl.Cloud

Cribl Copilot

AppScope

Use Cases

Integration

Industries

Resources

Events & Webinars

Learning

Tools & Pricing

Download Library

Customer Stories

Customer Experience

Learning

Try Your Own Cribl Sandbox

About Cribl

Cribl Newsroom

Leadership

Careers

Cribl for Startups

Contact Us

Building a Scripted Event Collector With Cribl Stream

Written by

Jon Rust

Step 1: Credentials

Step 2: Plagiarism Is a Compliment

Step 3: Docs Docs Docs (And Phone-a-friend)

Step 4: Set Phase to Discover

Step 5: Design

Step 6: Dev and Test

Step 7: Deploy in Cribl

Step 8: Wrap It Up

Conclusion

Blog

Preventing Friction With an Impactful Security Champions Program

Blog

From Necessity to Opportunity: The Customer Push for SIEM Options

Blog

Securing the Foundation of Cribl Copilot

Try Your Own Cribl Sandbox

So you're rockin' Internet Explorer!