From the very start, the Cribl founding team came in with strong assumptions, which you can even see baked into the name of our first product: LogStream. The founders have been in the logging ecosystem for 30+ years, having worked as customers and with customers. We knew organizations wanted to work with log data in motion to route the right data to the right store, in the right format. Logging use cases demanded their own fit-for-purpose solution, with a data-centric experience that makes it easy to work with gritty log data. To solve customers’ pain, we needed to meet them where they were and support all their existing agents and collectors.
Organizations struggle to get data into logging tools, time series databases, and data lakes in the right shapes, structured properly for the data store. Solving this problem isn’t just about data in motion. Over the last year, we’ve received numerous requests to collect data. These requests often come in the form of questions, such as: “how can I collect back what I put to rest in cheap storage?”. We fulfilled that request in our 2.2 release with Ad-Hoc Data Collection tools, allowing us to easily replay data in cheap storage to any destination.
But, we also received questions like: “how do I get data from Office 365 APIs?” or “how can I collect from a REST API on an interval?”. There’s no category name for this type of problem, but it has become clear, through working with our customers, that reliably and scalably collecting data from APIs is a huge pain point. Today, organizations are forced to run dozens of different collectors, each with their own unique configuration and administration. Many of these collectors are custom scripts written by in-house engineers because no vendor had built a collector for that type of data.
Keeping all these collectors running is operations toil. Most collectors leave scaling to the administrator: hand-configured individual nodes, each handling a slice of the workload. Each has to be documented and operationalized. They need monitoring. Each collector comes with its own failure modes, and when they fail, someone has to diagnose and resolve the issues. If one node fails, you lose data.
LogStream 2.3 eliminates this toil. Now, as part of the same platform that has consolidated receiving from all your deployed agents, you can collect data from anywhere, on any interval. Run your custom scripts and collect from REST APIs. The system handles sharding and scaling transparently for the administrator. You can reuse the same infrastructure you have for receiving, or create different worker groups with different IAM/security roles to make authentication and authorization easier and more secure.
Scheduled Data Collection is a great example of how Cribl’s innovation is driven by our customer-first culture. An Observability Pipeline demands streams processing, as well as batch and mini-batch processing, in the same engine. It blurs lines typically drawn in data processing. Cribl meets customers where they are and solves problems others ignore.
There’s so much more coming. What’s your biggest pain point that we’re not solving? Our product is inspired by our customers, and we want nothing more than to solve problems for you. Hit me up in our community Slack, I’d love to hear from you!
Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.