Products
Product Portfolio

Cribl puts your IT and Security data at the center of your data management strategy and provides a one-stop shop for analyzing, collecting, processing, and routing it all at any scale. Try the Cribl suite of products and start building your data engine today!
Learn more ›

Evolving demands placed on IT and Security teams are driving a new architecture for how observability data is captured, curated, and queried. This new architecture provides flexibility and control while managing the costs of increasing data volumes.
Read white paper ›

Cribl Stream

Cribl Stream is a vendor-agnostic observability pipeline that gives you the flexibility to collect, reduce, enrich, normalize, and route data from any source to any destination within your existing data infrastructure.
Learn more ›

Vodafone Case Study

Vodafone Dials up Business Insights with Cribl Stream
Read Case Study ›

Cribl Edge

Cribl Edge provides an intelligent, highly scalable edge-based data collection system for logs, metrics, and application data.
Learn more ›

SpyCloud Edge Story

Listen to how SpyCloud uses Cribl Edge at scale.
Watch Video ›

Cribl Search

Cribl Search turns the traditional search process on its head, allowing users to search data in place without having to collect/store first.
Learn more ›

How Cribl Search Can Save You From Drowning in a Deluge of Data
Read Blog ›

Cribl Lake

Cribl Lake is a turnkey data lake solution that takes just minutes to get up and running — no data expertise needed. Leverage open formats, unified security with rich access controls, and central access to all IT and security data.
Learn more ›

Navigating the future of IT and Security Data management white paper
Read white paper ›

Cribl.Cloud

The Cribl.Cloud platform gets you up and running fast without the hassle of running infrastructure.
Learn more ›

Cribl.Cloud Solution Brief

The fastest and easiest way to realize the value of an observability ecosystem.
Read Solution Brief ›

Cribl Copilot

Cribl Copilot gets your deployments up and running in minutes, not weeks or months.
Learn more ›

Cribl Copilot

Your Trusted AI Advisor for Deploying, Configuring & Troubleshooting.
Read blog ›

AppScope

AppScope gives operators the visibility they need into application behavior, metrics and events with no configuration and no agent required.
Learn more ›

Sandbox

Launch an AppScope Sandbox today!
Launch Now ›
Solutions
Use Cases

Explore Cribl’s Solutions by Use Cases:

Supercharge Security Insights ›

Accelerate Cloud Migration ›

Avoid Vendor Lock-in ›

Agent Consolidation ›

Slash Storage Costs ›

Free Up Space for High-Value Data ›

Route From Any Source To Any Destination ›

Immediate Access to Archived Data ›

Replay Data from Low-Cost Storage ›

Reduce Log Volume & Pay Less for Infrastructure ›
Integration

Explore Cribl’s Solutions by Integrations:

Amazon ›

CrowdStrike ›

Elastic ›

Exabeam ›

Google ›

Microsoft ›

Splunk ›

Wiz ›

View All Integrations ›

Seamless Integrations for Your Observability Data
Learn More ›
Industries

Explore Cribl’s Solutions by Industry:

AIOps ›

Financial Services ›

Healthcare ›

Managed Security Services ›

Manufacturing and Logistics ›

Media and Entertainment ›

Public Sector ›

Retail ›
Resources
Resources

Resource Library ›

Documentation ›

Guides ›

AppScope Docs ›

Blog ›

Glossary ›

Podcasts ›

Telemetry 101

Understanding the Basics of Telemetry and Its Benefits
Learn More ›
Events & Webinars

Events ›

Webinars ›

CriblCon24
Watch On-Demand ›

July 31 | 10am PT / 1pm ET

Navigating the Data Current Report: Transforming IT & Security Operations in 2024
Register ›
Learning

Try the Sandboxes ›

Self Guided Trials ›

Cribl University ›

Cribl Community ›

Cribl Curious Forum ›

What is Observability? ›

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›
Tools & Pricing

Download Library ›

Past Releases ›

Pricing Plans ›

Stream ROI Calculator ›

Download Library

Download Cribl’s suite of products for free to get started.
Download ›
Customers
Customer Stories

Get inspired by how our customers are innovating IT, security and observability. They inspire us daily!
Read Customer Stories ›

Sally Beauty Holdings

Sally Beauty Swaps LogStash and Syslog-ng with Cribl.Cloud for a Resilient Security and Observability Pipeline
Read Case Study ›
Customer Experience

Support & Success ›

Professional Services ›

Service Delivery Partners ›

Documentation ›

AppScope Docs ›

Professional Services

Check out our new Professional Services offering.
Learn More ›
Learning

Try the Sandboxes ›

Self Guided Trials ›

Cribl University ›

Cribl Community ›

Cribl Curious Forum ›

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›
Company
About Cribl

Transform data management with Cribl, the Data Engine for IT and Security
Learn More ›

Cribl Corporate Overview

Cribl makes open observability a reality, giving you the freedom and flexibility to make choices instead of compromises.
Get the Guide ›

Cribl Newsroom

Stay up to date on all things Cribl and observability.
Visit the Newsroom ›

Press Releases

Read our most recent press releases.
Recent Press Releases ›

Leadership

Cribl’s leadership team has built and launched category-defining products for some of the most innovative companies in the technology sector, and is supported by the world’s most elite investors.
Meet our Leaders ›

Careers

Join the Cribl herd! The smartest, funniest, most passionate goats you’ll ever meet.
Learn More ›

Cribl Named to the Inc. 5000 List of Fastest Growing Private Companies
Learn More ›

Cribl for Startups

Whether you’re just getting started or scaling up, the Cribl for Startups program gives you the tools and resources your company needs to be successful at every stage.
Learn More ›

Contact Us

Want to learn more about Cribl from our sales experts? Send us your contact information and we’ll be in touch.
Talk to an Expert ›

Try Cribl Talk to an expert

Exporting Splunk Data at Scale: See a Need, Fill a Need

Written by Jim Apger

July 27, 2022

The Core Splunk platform is rightfully recognized as having sparked the log analytics revolution when viewed through the lenses of ingest, search speed, scale, and usability. Their original approach leveraged a MapReduce approach, and it still stores the ingested data on disk in a collection of flat files organized as “buckets.” These immutable buckets are not human-readable and largely consist of the original raw data, indexes (.tsidx files), and a bit of metadata. Read on as we dive into exporting Splunk data at scale, and how it’s made easy by Cribl Stream and Cribl.Cloud.

As Splunk continues to make progress migrating their customers from on-prem or self-hosted platforms to their Splunk Cloud offering, the big question is “what happens to all of the previously ingested data?”. The current Splunk-supported options for exporting data do not address the portability or reusability of this historical data from a scale perspective, even when migrating to Splunk Cloud. Do you keep licensing in place for your historical data until your investigation, hunting, threat intelligence, and regulatory/compliance retention timelines have been satisfied? Someone needs to step up and build a solution for moving that historical data into Splunk Cloud or any other platform that best fits your needs.

I think back on the 2005 movie Robots from time to time and the recurring “See a Need, Fill a Need” theme. The older robots are faced with either expensive upgrades or termination for reasons I don’t want to spoil in this blog. An inventive hero named Rodney Copperbottom stands up against the monopolistic Bigweld Industries corporation to save the day by being the single brave sole armed with his curiosity and persistence.

Splunk customers need a solution to address the portability of this previously-ingested data that is neither expensive nor leaves the data for extinction. It needs to scale, it needs to leverage native Splunk functionality, and it needs to provide the flexibility to allow you, the customer, to format, transit, and use the data exactly as you need. Choice. The solution outlined here involves no proprietary information and you are free to make use of it, with or without Cribl, to migrate your Splunk ingested data into Splunk Cloud, object storage, or any other destination.

The Solution

Splunk includes a command-line switch called “exporttool” which provides the ability to retrieve the original raw events as they were originally indexed by Splunk. As a very important bonus, exporttool also provides metadata such as the original source name, sourcetype, and index time that allow us to organize, route, optimize, and reuse or even replay that data perpetually or on-demand. Exporttool is very fast and can either write data to disk or stream the data directly from the indexer using the CLI. Examples of how to use this command when exporting Splunk data are detailed below.

Export to stdout:

/opt/splunk/bin/splunk cmd exporttool /opt/splunk/var/lib/splunk/bots/db/db_1564739504_1564732800_2394 /dev/stdout -csv

Export to a local csv file:

/opt/splunk/bin/splunk cmd exporttool /opt/splunk/var/lib/splunk/bots/db/db_1564739504_1564732800_2394 /exports/bots/db_1564739504_1564732800_2394.csv -csv

Writing to disk is generally not optimal considering that the uncompressed raw data will likely fill your disks and you are still left with decisions for how to best transit, organize, route, and optimize this data to suit your needs. The streaming option of exporttool bypasses storage and disk IO concerns but is still single-threaded which is something we will solve.

Allow me to introduce you to a python script that streams data from Splunk Indexers into Cribl using exporttool via a python script we call scribl.py. This script provides the parallelization of both the exporttool functionality and the transport of data off of each indexer. You point the script at an index of your choosing from your indexer CLI, tell it how many CPUs to use, then assign your Cribl Stream destination IP and TCP port. There are a couple of other options, such as transport via TLS or which time ranges you would like to export, but that’s it. Nothing fancy or overly complicated. Again, send directly to Splunk Cloud for licensed ingest, to object storage, to another platform, or to Cribl for organization, routing, optimization, replay, etc., before sending to any of the aforementioned destinations.

Here is a subtle but important point to consider while exporting from Splunk Indexers: Regardless of your use case, a significant percentage of your ingested data may be overly verbose given the purpose-built needs of many of your destination platforms. If you are moving your previously indexed Splunk data to Splunk Cloud, this is your chance to hit the “reset” button and have Cribl Stream reformat data “in-flight” to ensure you free up ingest volume for additional data sources, reduce storage requirements related to retention, decrease your search concurrency (CPU count), and improve cost/performance across the board.

Any gotchas? The scribl script running on your indexers and the Cribl Stream workers are built to scale and will not be your bottleneck. My scale testing shows Scribl throughput on a single Splunk indexer writing results to local /dev/null/ (no network involved) of more than 19 Gb/sec when assigning 30 CPUs. The same config pushes more than 11 Gb/sec when writing to a single Cribl worker over the network within the same AWS availability zone.

Your bottlenecks will almost certainly be bandwidth constraints between your indexers and your final destination. Depending on where you deploy your Cribl Stream workers, that bandwidth bottleneck might exist between the indexers and Cribl workers or between your Cribl workers and the final destination. If you happen to have unlimited bandwidth, you might find your bottleneck to be the ingest rate at your destination platform.

Exporting Splunk Data: The Details

With exporting Splunk Data, you will need:

CLI access to each Linux indexer with the index/buckets that need to be exported which means this process only applies to on-prem or non-SplunkCloud deployments.
To install netcat on each indexer to act as the transport mechanism.
To make sure outbound communication from each indexer to the Cribl Worker TCP port is open.

Scribl.py is a python script that parallelizes native Splunk exporttool functionality available in the Core Splunk platform as long as you have CLI access to your indexers and can install the OG netcat utility on each of your indexers. Since python is embedded in every Splunk install, all you need to do is copy/paste the script from the repo, install netcat if not already installed, and follow the repo instructions for configuring Cribl Stream.

Scribl needs to be run on each of your indexers independently which is where we achieve an even higher degree of scale by bypassing any bottlenecks related to aggregation components like a search head. As detailed below, scribl will reach into the index that you want to export, build a list of all buckets that need to be exported, then balance the exporting and transiting of the data in each bucket across the number of CPUs that you dedicate to the export process.

The data will be csv formatted into the following fields when delivered to Cribl Stream:

_time
Source
Host
Sourcetype
_raw
_meta

Cribl Stream performs the following while exporting Splunk:

Load-balancing across multiple Stream Worker nodes for scale.
Event breaking to handle multi-line events with the most common example being Windows Logs.
Provide the ability to filter by original sourcetype for granular optimization (reduction, dedup, aggregation, etc) and routing of data to its final destination.
Ensure the _time value represents the original event time in the _meta field.
Remove the _meta field after it has been referenced for _time.
Perform some formatting to clean up the exporttool output.

The scribl github repo provides you with the scribl.py script, instructions for usage, and for configuring Cribl Stream. Don’t forget, you can use a Cribl.Cloud instance to stream your exported Splunk data to for testing in a matter of minutes. If you want to take scribl for a spin in your lab environment, one of the best ways to do this is to use one of Splunk’s open-sourced Boss of the SOC datasets. They contain pre-indexed data (many sourcetypes) in a single index which you can have scribl export and stream to your Cribl Stream worker node(s) then on to Splunk Cloud or any other platform.

Additional resources you may find useful during your testing of exporting Splunk data include Cribl Sandbox, or the Cribl Docs. For more information on Cribl’s solutions, visit our Podcast, LinkedIn, Twitter, or Slack community.

The fastest way to get started with Cribl Stream and Cribl Edge is to try the Free Cloud Sandboxes.

Return to Cribl Blog

Additional Reading

Optimizing Throughput: Overcoming Syslog TCP Pinning with Cribl’s Load Balancing

Kunal Deshpande Jun 24, 2024

Why More Choices Matter With Observability Tools

Jag Dhillon Jun 6, 2024

Product Portfolio

Cribl Stream

Cribl Edge

Cribl Search

Cribl Lake

Cribl.Cloud

Cribl Copilot

AppScope

Use Cases

Integration

Industries

Resources

Events & Webinars

Learning

Tools & Pricing

Download Library

Customer Stories

Customer Experience

Learning

Try Your Own Cribl Sandbox

About Cribl

Cribl Newsroom

Leadership

Careers

Cribl for Startups

Contact Us