Products
Product Portfolio

Cribl puts your IT and Security data at the center of your data management strategy and provides a one-stop shop for analyzing, collecting, processing, and routing it all at any scale. Try the Cribl suite of products and start building your data engine today!
Learn more ›

Evolving demands placed on IT and Security teams are driving a new architecture for how observability data is captured, curated, and queried. This new architecture provides flexibility and control while managing the costs of increasing data volumes.
Read white paper ›

Cribl Stream

Cribl Stream is a vendor-agnostic observability pipeline that gives you the flexibility to collect, reduce, enrich, normalize, and route data from any source to any destination within your existing data infrastructure.
Learn more ›

Vodafone Case Study

Vodafone Dials up Business Insights with Cribl Stream
Read Case Study ›

Cribl Edge

Cribl Edge provides an intelligent, highly scalable edge-based data collection system for logs, metrics, and application data.
Learn more ›

SpyCloud Edge Story

Listen to how SpyCloud uses Cribl Edge at scale.
Watch Video ›

Cribl Search

Cribl Search turns the traditional search process on its head, allowing users to search data in place without having to collect/store first.
Learn more ›

How Cribl Search Can Save You From Drowning in a Deluge of Data
Read Blog ›

Cribl Lake

Cribl Lake is a turnkey data lake solution that takes just minutes to get up and running — no data expertise needed. Leverage open formats, unified security with rich access controls, and central access to all IT and security data.
Learn more ›

Navigating the future of IT and Security Data management white paper
Read white paper ›

Cribl.Cloud

The Cribl.Cloud platform gets you up and running fast without the hassle of running infrastructure.
Learn more ›

Cribl.Cloud Solution Brief

The fastest and easiest way to realize the value of an observability ecosystem.
Read Solution Brief ›

Cribl Copilot

Cribl Copilot gets your deployments up and running in minutes, not weeks or months.
Learn more ›

Cribl Copilot

Your Trusted AI Advisor for Deploying, Configuring & Troubleshooting.
Read blog ›

AppScope

AppScope gives operators the visibility they need into application behavior, metrics and events with no configuration and no agent required.
Learn more ›

Sandbox

Launch an AppScope Sandbox today!
Launch Now ›
Solutions
Use Cases

Explore Cribl’s Solutions by Use Cases:

Supercharge Security Insights ›

Accelerate Cloud Migration ›

Avoid Vendor Lock-in ›

Agent Consolidation ›

Slash Storage Costs ›

Free Up Space for High-Value Data ›

Route From Any Source To Any Destination ›

Immediate Access to Archived Data ›

Replay Data from Low-Cost Storage ›

Reduce Log Volume & Pay Less for Infrastructure ›
Integration

Explore Cribl’s Solutions by Integrations:

Amazon ›

CrowdStrike ›

Elastic ›

Exabeam ›

Google ›

Microsoft ›

Splunk ›

Wiz ›

View All Integrations ›

Seamless Integrations for Your Observability Data
Learn More ›
Industries

Explore Cribl’s Solutions by Industry:

AIOps ›

Financial Services ›

Healthcare ›

Managed Security Services ›

Manufacturing and Logistics ›

Media and Entertainment ›

Public Sector ›

Retail ›
Resources
Resources

Resource Library ›

Documentation ›

Guides ›

AppScope Docs ›

Blog ›

Glossary ›

Podcasts ›

Telemetry 101

Understanding the Basics of Telemetry and Its Benefits
Learn More ›
Events & Webinars

Events ›

Webinars ›

CriblCon24
Watch On-Demand ›

July 31 | 10am PT / 1pm ET

Navigating the Data Current Report: Transforming IT & Security Operations in 2024
Register ›
Learning

Try the Sandboxes ›

Self Guided Trials ›

Cribl University ›

Cribl Community ›

Cribl Curious Forum ›

What is Observability? ›

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›
Tools & Pricing

Download Library ›

Past Releases ›

Pricing Plans ›

Stream ROI Calculator ›

Download Library

Download Cribl’s suite of products for free to get started.
Download ›
Customers
Customer Stories

Get inspired by how our customers are innovating IT, security and observability. They inspire us daily!
Read Customer Stories ›

Sally Beauty Holdings

Sally Beauty Swaps LogStash and Syslog-ng with Cribl.Cloud for a Resilient Security and Observability Pipeline
Read Case Study ›
Customer Experience

Support & Success ›

Professional Services ›

Service Delivery Partners ›

Documentation ›

AppScope Docs ›

Professional Services

Check out our new Professional Services offering.
Learn More ›
Learning

Try the Sandboxes ›

Self Guided Trials ›

Cribl University ›

Cribl Community ›

Cribl Curious Forum ›

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›
Company
About Cribl

Transform data management with Cribl, the Data Engine for IT and Security
Learn More ›

Cribl Corporate Overview

Cribl makes open observability a reality, giving you the freedom and flexibility to make choices instead of compromises.
Get the Guide ›

Cribl Newsroom

Stay up to date on all things Cribl and observability.
Visit the Newsroom ›

Press Releases

Read our most recent press releases.
Recent Press Releases ›

Leadership

Cribl’s leadership team has built and launched category-defining products for some of the most innovative companies in the technology sector, and is supported by the world’s most elite investors.
Meet our Leaders ›

Careers

Join the Cribl herd! The smartest, funniest, most passionate goats you’ll ever meet.
Learn More ›

Cribl Named to the Inc. 5000 List of Fastest Growing Private Companies
Learn More ›

Cribl for Startups

Whether you’re just getting started or scaling up, the Cribl for Startups program gives you the tools and resources your company needs to be successful at every stage.
Learn More ›

Contact Us

Want to learn more about Cribl from our sales experts? Send us your contact information and we’ll be in touch.
Talk to an Expert ›

Try Cribl Talk to an expert

Simple Threat Intelligence with Cribl Stream

November 9, 2021

Categories: Engineering

Back To Blogs

Enriching data with lookups has proven its value to many in the Cybersecurity industry, solving many challenges by simply having a list of specific parts of a database (or a whole flat-file database) loaded as a CSV file. Some lookup tables are simple and easy to apply to a project. Others are larger or more complex and require a robust solution that can handle the reads/writes, relationships, arrays, etc. But most lookup tables work best when they can be cached in memory, which performs more efficiently than reading a very large file from a disk. For these cases, Redis and other in-memory/cache data solutions are recommended.

In this blog, I would like to look at that particular challenge from a different and simpler perspective. Cribl Stream’s native Lookup Function will offer the same value and performance for smaller data sets, using CSV files. CSVs are normally limited to 1,048,576 rows, but the reality is that there is no limit, depending on what program you are using to open your CSV file.

In our case, Stream’s native Lookup Function will perform well reading from these files, but we should consider the size of each CSV file being transferred among Workers, and the frequency we update them among Workers. Let’s assume that you have files larger than 10GB each. You’re considering reducing the size of each file and/or the number of files being used, but you have the need for larger data sets. Using a more scalable solution such as Redis (which is also fully integrated with Stream via the Redis Function) will better distribute your data, without your having to worry about performance hits in your Stream pipelines or overall performance.

The goal of this blog is to ensure a simple solution for SOC analysts who do not yet have their entire infrastructure matured to a level where they can use more complex tools in their operations. My intention is to have your growing environment ready to send data into your SOC enriched, accurate, compliant, and ready to be used by your system of analysis.

In this example, I will be using Splunk Core, and no other tool will be necessary (no need for Splunk ES, Splunk Security Essentials, etc.) Just a plain-vanilla version of Splunk, benefitting from one single Stream Function.

These tools are extremely important for your SOC operations, but even if your organization doesn’t have them yet, I want to have your data ready to be used now, and compatible for when you onboard such tools in your SOC – making your data more valuable, reliable, and your single source of truth.

We’ll cover three simple steps to getting Threat Intelligence into your environment:

Ingest your Threat Intel from your source of choice, by creating scheduled REST API script collectors.
Store these indicators in a local lookup file in Stream (Leader and Workers).
Create a pipeline that will correlate your incoming traffic with these indicators, and then add fields to your data.

The Threat Intelligence tool I’ve chosen for this exercise was MISP (Malware Information Sharing Platform), an open-source Threat Intelligence aggregator that is part of TheHive Project, which shares information among several organizations around the world. But you may use whatever Threat Intel aggregator of your choice – just adjust the formatting according to your tool of choice, by converting them from JSON, Common Logs, or Text to CSV format. This conversion is simply done by using SED/AWK commands via a script for each Threat source you want/need to observe.

The Environment

1 Stream Leader
2 Stream Workers
1 MISP Server (or any Threat Intelligence source)
1 Splunk Core

For this simple example, we will deploy our Stream environment and create script collectors that will GET a list of threats specifically tailored for Bro/Zeek logs (Threat Export).

Deployment of MISP is beyond the scope of this document; see the MISP documentation.

Once your Threat feeds have been selected, you may choose which ones to apply to your solution.
The use case I’m trying to demonstrate is to download a list of indicators exported from Bro/Zeek logs. MISP will give you several lists that can be imported from Stream Collectors and placed as a CSV file in your local lookups directory, or integrated with Redis for larger data volumes and/or faster results.

Along with MISP, Cortex is the perfect companion for TheHive. TheHive lets you analyze tens or hundreds of observables in a few clicks, by leveraging one or several Cortex instances, depending on your OPSEC needs and performance requirements. Moreover, TheHive comes with a report template engine that allows you to adjust the output of Cortex analyzers to your taste, instead of having to create your own JSON parsers for Cortex output.

Cortex, open-source and free software, has been created by TheHive Project for this very purpose. Observables – such as IP and email addresses, URLs, domain names, files or hashes, etc. – can be analyzed one by one, or in bulk mode, using a Web interface.

Note that this exercise won’t check live against a list of providers such as MISP. To do that on every observable would create thousands of API calls to these providers, and degrade performance. If you want to validate a specific observable, then Cortex would be a good starting point for your SOC.

The Problem

Security Analysts working on SOCs nowadays rely on several tools to perform their threat hunting and vulnerabilities investigations. Some of their sources are received through an a list aggregator and others via a simple REST API call. The challenge is that having API calls for each element that needs to be analyzed (IP addresses, URLs, Files, File Hashes, emails, etc.) will create a lot of network traffic and a delay just to analyze (query) each element to an external Threat List provider.

We can simply solve that problem by creating a local list (or database) containing all – or at least some – known threats available for each segment of security we want to validate against, and check our network traffic observables against this “database” via a local lookup table.

Note: Local lookups are great for small data sources. For larger environments, consider the use of an external database provider like Redis, which will provide a much more reliable result than a simple local lookup using CSV files.

The Solution

Cribl Stream can help you to provide data enrichment and add value to your SOC by analyzing the elements on your data sources, comparing them with a local lookup table, and adding any threats found to your log before it gets to your system of analysis.

In this example, we have a small Splunk deployment without any security applications dedicated to correlate indexed data, collection of Threat Intel, etc. This environment has only a single Splunk Core deployment with no Prime applications deployed (i,e,: no Splunk Enterprise Security).

Following our three steps mentioned above, let’s start collecting data from our Threat Intelligence source via a REST Collector using a script:

There are two methods we can use:

REST Collector Script (bash), copying the results to each Worker, and to the Leaderof the group hosting the lookups.
Using Git to simply Commit & Deploy the scripts, letting Stream’s Git integration update the workers.
Both can be scheduled in Stream. However, a Commit & Deploy in a worker group might cause some loss of work from other Stream users who have not yet committed their changes.

REST Collector Scripts:

The lookup files – in this case, the CSV files gathered from our MISP source via a REST API call – are stored on all Workers in the following directory:

/opt/cribl/data/lookups

And on the Leader, the lookup files are stored on the Worker Group containing the assigned Workers:

/opt/cribl/groups/GROUPNAME/data/lookups

Note: For this example, our Worker Group’s name is “Security.”

We need to create a directory on your Stream Leader and Workers to host the temporary files collected via the REST, convert them into CSV files, and place them in the proper lookup location(s).
You can clean these files manually, or via a local cron job, but they will be overwritten every time the REST collector script runs

Create a directory named ‘tmp’ in /opt/cribl/:

/opt/cribl/tmp

Assign the proper permissions for the cribl user (or whichever user owns the cribl directory and runs the collector scripts):

chown -R cribl:cribl /opt/cribl/tmp

The script will collect the Threat Intel, convert it to a CSV format based on this source (text / headers), and copy the files both to Workers and to the Leader at the above locations: /opt/cribl/data/lookups for the Workers and /opt/cribl/GROUPNAME/data/lookups/ for the Leader.

REST Script Collector and conversion to CSV format

From our Threat Intel source, we will use ready-to-consume “exports” tailored for Bro/Zeek. f other sources were chosen, you’d need to configure a different Event Breaker, parsing and conversion accordingly.
There are multiple Threat Intel options for Bro/Zeek in MISP, but for this example, we will use only these:

https://10.0.104.105/attributes/bro/download/ip
https://10.0.104.105/attributes/bro/download/url
https://10.0.104.105/attributes/bro/download/filehash

Script for IP Intel:

curl -k \
-H "Authorization: uxZ4XViSjSC7n3qRqJT68npE6HLXV9z0zeX7PDDq" \
-H "Accept: application/json" \
-H "Content-type: application/json" \
https://10.0.104.105/attributes/bro/download/ip > /opt/cribl/tmp/MISP_Threats_IP.txt
#List Results on the collector
cat /opt/cribl/tmp/MISP_Threats_IP.txt

#Clean up the file and convert to csv format
sed -e 's/#fields\t//1;s/\t/,/g' MISP_Threats_IP.txt > MISP_Threats_IP.csv

#Copy the new csv to workers
sshpass -f .sshpasswd scp MISP_Threats_IP.csv cribl@10.0.109.237:/opt/cribl/data/lookups/
sshpass -f .sshpasswd scp MISP_Threats_IP.csv cribl@10.0.109.248:/opt/cribl/data/lookups/

#Move the new csv file to leader
mv MISP_Threats_IP.csv /opt/cribl/groups/Security/data/lookups/

Note: I’m using sshpass to copy the CSV files to the workers without typing the password, and placing the destination host password on an .sshpasswd file for security purposes. You may easily use SSH Keys for this method if you wish.

You may schedule the script to collect new Threat Intelligence in intervals defined by your security organization.

Git with Script Collectors

The process for using Git is the same as we used above, which simply copies the files to each host. The caution here is that after using a Git commit/deploy command, all Workers will execute, and if any other users are creating or editing their work in Stream within the same Worker Group as you scheduled your script to run, that could cause a loss of their work.

Script for Intel – Using Git

curl -k \
-H "Authorization: uxZ4XViSjSC7n3qRqJT68npE6HLXV9z0zeX7PDDq" \
-H "Accept: application/json" \
-H "Content-type: application/json" \
https://10.0.104.105/attributes/bro/download/ip > /opt/cribl/tmp/MISP_Threats_IP.txt

#List Results on the collector
cat /opt/cribl/tmp/MISP_Threats_IP.txt

#Clean up the file and convert to csv format
sed -e ‘s/#fields\t//1;s/\t/,/g’ MISP_Threats_IP.txt > MISP_Threats_IP.csv

#Move the new csv file to leader
mv MISP_Threats_IP.csv /opt/cribl/groups/GROUPNAME/data/lookups/

#Commit/Deploy using GIT
/opt/cribl/bin/cribl auth login -u admin -p LEADERTOKEN;/opt/cribl/bin/cribl git commit-deploy -g GROUPNAME

Now all workers will have the same version of your latest CSV from a single command line.
Make sure to create a separate CSV file for each type of Threat Intel you are getting from MISP and follow the same steps mentioned above.

Lookup files are present on Leader and all Workers

We now have the Threat Intelligence lists available for the Lookups, being dynamically updated depending on your scheduled scripts.

Dynamically created lookup files ready for consumption

The next step is to bring the data you want to analyze for Threats into Stream using the normal process (Source, Event Breaker, etc.)

Here, our Source will monitor a Filesystem containing all .log files from Bro/Zeek.

Create a simple Event Breaker rule for the source we are onboarding (headers), and extract its fields for easy manipulation by our selected system of analysis (Splunk):

Now that we have our data source configured with its Event Breaker rule, we can capture a sample file and start working on our Threat Intel Pipeline for Bro/Zeek traffic.

The Pipeline

We will create one pipeline:

Threat_Intel_Bro

Threat_Intel_Bro is our main pipeline. It will match any threats on our Threat List if they exist in our source (Bro/Zeek logs). We are working only with Conn, http, and files Extractions for Bro/Zeek traffic from our Threat List aggregator (MISP). When we find a match (using our Lookup Function), we Eval a new field, bringing the value of the IOC (Indicator of Compromise) found for the observable we are analyzing – IP address, URLs, File Hashes – into our data source.

In our Pipeline, the Functions are simple. The first one is “Log Type,” which is a Stream Eval Function responsible for “tagging” all traffic coming into Stream from several log files (conn.log, http.log, filehash.log, etc.)

There are several ways to filter this data in Stream. I chose to use a JavaScript expression as a filter, and to add a field to each event, for this task.

__source.substring(__source.lastIndexOf("/")+1,__source.indexOf(".log"))

Once we segregate our log types, it’s easy to start performing our lookups and enriching the data with valid Threat Intelligence on the fly.

I organized each group by Threat type. Remember you may have as many Threat types as you need. You might also have different queries not just on the attributes of the Bro/Zeek export, but on all events based on different criteria and capabilities offered by your Threat Intelligence of choice.

Inside each Threat type, we match with our lookups for the values we have in our Bro/Zeek logs, filtering them on each lookup function by their log type:

The results start to appear in our Preview, using the sample file previously captured from our data source (Bro/Zeek logs).

Applying a filter on our sample window, we can concentrate only on the IP Intel. Because we’ve created the logtype field in our first Eval function In this Pipeline, we can easily use this expression on our Filter box:

logtype=='conn'

Our lookup will match id_resp_h (destination IP) to a field named indicator from our lookup table MISP_Threats_IP.csv. For this example, we will create three fields: Threat_found, Threat_source, and indicator_type.

These fields will provide value to a SOC analyst receiving this data, without having to perform any extra correlation. It will also help enhance the security posture of this organization with very little effort.

Note: This example is very simple. Threat Intel can bring in any fields with deeper information about each indicator, and many formats such as JSON, CSV, Common Log Format, etc.

Now we simply need to repeat these steps for each threat type we need to check.

The next threat type is URL, using the http.log file (from Bro/Zeek logs). We want to add a field referencing the Threat Found, if available from the Publisher (in this case OSINT, CIRCL, or MISP). This information will help the security analyst to know about the threat, and to validate with the international security community its risk, and how to mitigate and/or avoid it. This may be (if applicable) an analysis against frameworks such as the MITTR ATTACK or any other Threat intel provider (e.g.,Virus Total).

Threat Intel for URL with external reference to the IOC

The File Hashes threat type will require one extra step, because file hashes may have different cryptography algorithm types (MD5, SHA1, SHA256, or others). In our case, we have only MD5 and SHA1, but I’ve created a lookup filter for each field based on the log type (files).

Now we have matches for two hash types and all data from Bro/Zeek logs, with log type files is checked in the Pipeline.

What Have We Achieved?

We were able to get Threat Intel for IP, URL, and File Hashes; create a REST Collector to bring this data into Stream; convert it to CSV format; and place a copy on each Worker’s …/lookups/ directory (we used two methods for that).

We also created a Pipeline matching each event on our Bro/Zeek logs with the Threat Intel we have dynamically available. We enriched the traffic and made it ready for Splunk consumption.

The Splunk Component

Now that we have our data properly enriched, we should think about its destination. In this example, our environment has a single instance of Splunk Core – no additional applications are installed, just pure simple Splunk out of the box.

Our Pipeline has sent the proper data and extracted the proper fields for Splunk making them “tstats”-ready!
No props.conf or transforms.conf were used in this example, just good old Cribl Stream saving the day.

We haven’t used a Splunk Prime app, such as Splunk Enterprise Security, although we may need this in the future.

In order for Splunk Enterprise Security to work properly, we need to make its data CIM (Common Information Model) compliant, by following rules found on the Splunk Data Models definition.

For this example, we will send our “bro logs” data, properly parsed and enriched with Threat Intelligence acquired from the MISP framework, to the Splunk index named “bro”, and this index will be ready to be added to the Network Traffic Data Model in the future.

This data will be ready when this fictitious Company decides to implement Splunk ES, and their SOC will be ready to react to Notable Events when properly generated.

In order to achieve that, we need to validate the data source in Stream by adding another Pipeline. Here, an optional Eval function will translate the field names to those properly defined in Splunk Network Traffic Data Model. It will also add the necessary tags for this Data Model to work properly (network, communicate).

Note: The Splunk CIM compliance process needs more than just field renaming to operate properly. The tag field represents different data sets within each data model, and in order to be implemented correctly, Event Types (Splunk searches) are created to classify these events according to each data model data set.

In this example, we are sending data only to the Network Traffic data model, and its tags can be either network or communicate. If we want to send data to another relevant data model, such as Intrusion Detection, we would have different tags (ids, attack). If we also decide to send this data to the Network Sessions data model, the tags would be Start, end, DHCP, and VPN. We would then need to create more segregating Functions within the CIM Pipeline to filter the data per each specific data set within our source, and to apply the correct tag before sending it to Splunk.

Covering the Splunk CIM for all data models on specific sources is outside the scope of this blog post. For more details, follow the reference

By replacing the field names with the ones specified on the Splunk Network Traffic Data Model, the data is ready for consumption by Splunk ES upon its arrival. All dashboards created with this data should function properly.

Note: In Splunk, you can create field aliases for these old field names, if you need to access historical data.

In our Bro/Zeek files.log, the source and destination IP addresses are named differently. So we need to create a second Eval function filtering by this log type, rename these fields properly, and then remove the old field names (which avoids data duplication and provides data reduction in Splunk).

Let’s create a simple dashboard in Splunk with the enriched data provided by Stream:

Splunk dashboard with real Threat Intel from Cribl Stream

This was a very simple way to demonstrate how flexible Cribl Stream can be, and how quickly and easily you can have a SOC functioning and providing valuable Threat Intel to security analysts who have not yet deployed Splunk ES or Splunk Security Essentials.

This solution is not replacing these tools, but helps to give you a kick-start in your Network Security operations.

There are many other solutions to explore with Stream, and this data can be easily routed to your own system of analysis if you are not using Splunk Core today.

Final Accomplishments

We were able to receive index ready fields extracted in Stream and made them ‘tstats‘ ready in Splunk, we also created a CIM-complaint Pipeline following the Network Traffic Data Model specifications from Splunk. Finally, we were able to reduce our ingestion by 39%.

Note: CIM compliance requires attention on the Splunk side, tags may be overridden by Splunk if the Splunk CIM compliance App is installed.

Summary

I kind of fooled you. You can accomplish this without Redis, but now you really should consider using Stream’s fabulous Redis Function and expand your Stream environment knowing Cribl Stream is capable of making your life better.

The fastest way to get started with Cribl Stream is to sign-up at Cribl.Cloud. You can process up to 1 TB of throughput per day at no cost. Sign-up and start using Stream within a few minutes.

Blog

Preventing Friction With an Impactful Security Champions Program

Blog

From Necessity to Opportunity: The Customer Push for SIEM Options

Blog

Securing the Foundation of Cribl Copilot

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.

Launch Now

Product Portfolio

Cribl Stream

Cribl Edge

Cribl Search

Cribl Lake

Cribl.Cloud

Cribl Copilot

AppScope

Use Cases

Integration

Industries

Resources

Events & Webinars

Learning

Tools & Pricing

Download Library

Customer Stories

Customer Experience

Learning

Try Your Own Cribl Sandbox

About Cribl

Cribl Newsroom

Leadership

Careers

Cribl for Startups

Contact Us

Simple Threat Intelligence with Cribl Stream

The Problem

The Solution

REST Collector Scripts:

Git with Script Collectors

The Pipeline

What Have We Achieved?

The Splunk Component

Final Accomplishments

Summary

Blog

Preventing Friction With an Impactful Security Champions Program

Blog

From Necessity to Opportunity: The Customer Push for SIEM Options

Blog

Securing the Foundation of Cribl Copilot

Try Your Own Cribl Sandbox

So you're rockin' Internet Explorer!