October 15, 2020
Today I’m pleased to announce Cribl has closed its Series B round of funding, raising $35m from Sequoia Capital. CRV also participated, bringing our total capital raised to date to $46m. The investment will be used to bolster go-to-market initiatives, drive data routing and pipeline innovation, and support an aggressive hiring strategy. As part of this investment Pat Grady, partner at Sequoia will join our board, joining myself, Ledion Bitincka, and Max Gazor from CRV.
First and foremost, this investment is a validation of the value we’re delivering to customers like Autodesk, TransUnion, Blue Voyant, and others. In 2019, we were pleased to partner with CRV and Max Gazor for our Seed round and again in February of this year for our Series A. Max has been a phenomenal partner to build our business with, and now we’re beyond excited to welcome Pat Grady to the team. Pat’s experience from Snowflake, Zoom, Okta, and more are already proving invaluable, and we’re looking forward to building a legendary company together.
I’ve never written publicly about how Cribl the company and LogStream the product came into being. I’ve certainly told the story many times, especially to key hires who have been curious about how we got to where we are today. But, with this fundraise, and our move into offering LogStream as a service, it seemed like a good time to pause and reflect a bit.
No origin story of Cribl can start anywhere but at Splunk. I became a Splunk customer in 2009. We were able to put in simple logs and turn them into dashboards, without any databases, schemas, or planning upfront. It changed how we did everything in Application Operations. What we would now call observability became an essential capability of operating a complex modern software stack. Of course, it made it easy for my team to troubleshoot and gave developers and operators an easy mechanism for searching and interrogating our logs. More importantly, Splunk allowed me to easily extend our visibility up into not just application performance, but understanding business transactions.
By being a reference customer, I got to know the Splunk team and I was fortunate enough to go to work there in 2012. In 2013, Ledion and I teamed up to build a product called Hunk, which was Splunk’s user interface but analyzing raw logs at rest in Hadoop. We learned a number of key lessons from that experience that carried with us in our thinking about Cribl LogStream. First, Hunk’s adoption was severely limited by the adoption of Hadoop, which was far slower than we anticipated due to the complexity of Hadoop. We see the same mistakes today with vendors trying to ship open source heavy microservices architectures on-prem: they require professional services resources to spend weeks or months to get a deployment operational with even a basic use case implemented. More importantly, Hunk proved at many petabyte scale that customers very much wanted a cheap way to store large volumes of machine data cheaply even if it meant a cost to performance.
I met my other co-founder Dritan at Splunk as well. Dritan was a founding member of Splunk’s Professional Services organization. Dritan was a legend in Splunk circles for his intimate knowledge of Splunk’s product and his no-bullshit attitude of holding people accountable and making sure they did their homework. I tried to hire Dritan on more than one occasion to come work for me in product management, but I never could convince him to leave the field and come into the product. When I started to seriously consider leaving to start a company, there was no question it would be with Ledion and D.
As Ledion, Dritan and I worked with dozens of enterprises through 2017 and into early 2018, it became clear that logging was still an incredible pain point. Once we got on the outside of Splunk, a clearer view of the market began to emerge. Customers valued their existing tooling. Some had hundreds of trained users who were logging in regularly to troubleshoot in Splunk or Kibana, and many had dozens of distinct departmental workspaces that had rich role-based access to many internal dashboards. Customers had deep investments in their existing tools.
But, every customer we spoke to was capacity constrained. Most were spending millions of dollars a year already, and they had pent up demand from other departments for millions more. Yet, they knew the data they were onboarding wasn’t all of equal value. Some of the data just needed to be kept in cheap storage, and other data would be better aggregated or sampled and stored in a more cost-effective destination like a time-series database.
Logging is a crowded market. Yet, with dozens of vendors customers were not finding relief to their struggles with cost and capacity. Before Cribl, we found every vendor in the space approaching the problem from a storage-first perspective. The way they provide value to their customers is by putting all data in their data store, some with better economics. Customers looking to solve cost and capacity constraints were forced to add an additional full solution. Customers simply weren’t enthusiastic about having multiple systems that would require multiple agents on the endpoints, and create almost an almost certain confusion on where to find data.
And thus the idea for Cribl LogStream was born. We envisioned a product that would serve as a pipeline for machine data, routing data to its optimal destination while allowing reshaping of the data in flight for security, enrichment, or cost control. This product would be targeted at customers of systems like Splunk and Elasticsearch, giving them a rich user experience for working with gritty, ugly log data and making it simple to build regular expression based field extractions or easy enrichment via lookups. Most critically, it would support the protocols of their deployed agents like the Splunk Forwarder or industry protocols like Syslog while simply being a bump in the wire when delivered to their existing destinations.
When we looked at how people were trying to solve the problem today we found limited success. Using vendor-provided tools like the Heavy Forwarder or Logstash to process data in-flight required struggling with esoteric configuration languages and limited processing capabilities. Often, they also looked to open source where a few successfully stitched together Fluentd, Apache Kafka, Apache Flink, or perhaps Apache NiFi. Even with the in-house engineering skillset, organizations find themselves having to deploy a new agent to thousands of endpoints and build their own deployment, management, and monitoring to successfully run the system in production. Once in production, customers struggled to enhance their systems, finding talent for developers who could build on Flink or NiFi hard to come by, while also finding the hardware requirements to run Kafka was as large as the data storage nodes they were trying to offload.
Cribl LogStream was built from the ground up, from first principles, to solve this problem for logs and metrics. With combined decades of deep experience in logs and metrics, we knew these problems demanded fit for purpose engine, designed to be performant, writing to disk only when necessary, and using a fraction of the hardware of competing solutions. We designed our user experience for IT and Security professionals, using a simple rule-based paradigm commonly found in Firewalls and Access Control Lists. We built a package that is simple to download and run on a laptop but scales all the way into petabytes of daily ingestion.
LogStream from day one has been built to give customers choice over how to process their data and where best to store it. Cheap storage like AWS S3 or an NFS filer has always been our most popular second destination. Over the last few years since we released our 1.0 product, we’ve moved into continually supporting more protocols and systems and adding new functions and ways of processing data. With our 2.0 release, we added support for a single management console to manage thousands of nodes. With our 2.2 release, we’ve added support for collecting data at rest in cheap storage, making it easy to replay data to any destination we support.
And now, LogStream is available as a service! We’ve built LogStream since the beginning to be easy to deploy and manage, and now we’re automating all the infrastructure as well. LogStream Cloud is priced as a cloud-native consumption model. Whether running large batch jobs for a few hours at a time or steadily processing a few hundred gigabytes a day, LogStream Cloud charges just for what’s processed. LogStream Cloud is the easiest way to get data into log and metric tools.
As we tell the story of Cribl LogStream the product, it’s important also to tell the story of Cribl the company. Now, partnering with Sequoia along with our long time partners CRV, we’re well-positioned to grow as quickly as possible. Our customers like Autodesk, TransUnion, and Blue Voyant absolutely love our product, and we’re the core of their data ingestion, helping them control costs and connect all their systems. One of Cribl’s core values is “Customers First, Always.” We exist to help our customers maximize the value of all of their machine data. With our new investment, we’ll be helping to raise awareness and hire tons of great new customer-facing resources to help our customers get value from Cribl while continuing to quickly grow our phenomenal product organization to further accelerate our pace of innovation.
If you’re interested in trying LogStream, please try our online Sandbox or Download the product. If you’re interested in LogStream Cloud, please sign up for our beta. If you have any questions or feedback, please join our community Slack and drop us a line!