LogStream is now available as a Cloud service! Learn More

Data Collection – Listening To Our Customers

Clint Sharp
Written by Clint Sharp

September 17, 2020

From the very start, the Cribl founding team came in with some strong assumptions, which you can even see baked into the name of our first product: LogStream. The founders have been in the logging ecosystem for 30+ combined years, having worked as customers and with customers. We knew organizations wanted to work with log data in motion to route the right data to the right store, in the right format. We knew logging use cases demanded their own fit-for-purpose solution, with a data centric experience that makes it easy to work with gritty log data. We knew in order to solve customers’ pain, we needed to meet customers where they were at and support all their existing agents and collectors.

Organizations are really struggling to get data into logging tools, time series databases, and data lakes, in the right shapes, structured properly for the data store. Solving this problem isn’t just about data in motion. Over the last year, we’ve gotten numerous requests to be able to collect data. Requests usually come in the form of a question, like: “how can I collect back what I put to rest in cheap storage?”. We fulfilled that request in our 2.2 release with Ad-Hoc Data Collection tools. Now we can easily replay data in cheap storage to any destination.

But, we also received a lot of questions like: “how do I get data from Office 365 APIs?” or “how can I collect from a REST API on an interval?” There’s no category name for this kind of problem, but working with our customers, it’s become clear that reliably and scalably collecting data from APIs is a huge pain point. Today, they’re forced to run dozens of different collectors, each with their own unique configuration and administration. Many of these collectors are custom scripts written by in-house engineers, because no vendor had even built a collector for that type of data. 

Keeping all these little collectors running is simply operations toil. Most collectors leave scaling to the administrator: hand configured individual nodes, each handling a slice of the workload. Each has to be documented and operationalized. They need monitoring. Each collector comes with its own failure modes, and when they fail, someone has to diagnose and resolve the issues. One node fails, you lose data. 

LogStream 2.3 takes away this toil. Now, as part of the same platform that has consolidated receiving from all of your deployed agents, you can easily collect data from anywhere, on an arbitrary interval. Run your custom scripts. Collect from REST APIs. The system handles sharding and scaling, transparently to the administrator. You can reuse the same infrastructure you have for receiving, or you can create different worker groups that might have different IAM/security roles to make authentication and authorization easier and more secure.

Scheduled Data Collection is a great example of how Cribl’s innovation is driven by our customers-first culture. An Observability Pipeline demands streams processing, as well as batch and mini-batch processing, in the same engine. It blurs lines typically drawn in data processing. Cribl meets customers where you are at, and we solve problems others are ignoring. 

There’s so much more coming. What’s your biggest pain point we’re not solving? Our product is inspired by our customers and we want nothing more than to solve problems for you. Hit me up in our community Slack, I’d love to hear from you!

Additional Reading
Announcing LogStream 2.4

Bryan Turiff Jan 12, 2021

Why I Joined Cribl

Nick Heudecker Nov 19, 2020

Questions about our technology? We’d love to chat with you.