Getting Started With Lakehouse: Not Even White Lotus Can Match the Hospitality of Cribl’s Lakehouse

Getting Started With Lakehouse: Not Even White Lotus Can Match the Hospitality of Cribl’s Lakehouse

Last edited: May 13, 2025

Cribl recently introduced Lakehouse, a powerful new feature within Cribl Lake that enables fast queries on the freshest data. But it’s so much more than just speedy searches. Lakehouse redefines how organizations collect, store, manage, and analyze telemetry data at scale, ensuring a future-proofed, cost-efficient, and flexible approach to data management. By combining the performance of a high-speed query engine with the cost savings of object storage, Cribl’s new Lakehouse empowers teams to gain instant insights without vendor lock-in or unnecessary storage costs.

Purpose-built for the dynamic, unpredictable nature of telemetry data, Cribl’s Lakehouse is unlike traditional solutions which are designed for structured enterprise data. Lakehouse eliminates schema management complexity and offers elastic scalability, automated tiered storage to optimize both cost and performance, and a unified, composable data management experience. With just a few clicks, federated queries can be run across diverse datasets and across various geographies — meaning more teams can get easier access to the data they really need.

unnamed.png

Lakehouse’s next-generation architecture unlocks the full potential of telemetry data at scale. Store massive volumes of ever-changing telemetry data while enabling real-time, high-performance dashboards and analytics.

Coming Out of Stealth Mode

Ok, we fibbed– but just a little. For most users, it WAS just introduced last month. But for a lucky dozen, they were able to get their paws on Lakehouse a few months earlier while it was still in stealth mode (sounds more exciting than beta testing). These organizations had the chance to turn the knobs, see what they could break, and share where they saw value in Lakehouse — using it in their environments, with their own data.

We selected users from across the industry, they ranged from SecOps teams to incident response, large tech companies to smaller IT teams, teams just getting started with a data lake strategy to established, mature companies looking for ways to cut SIEM costs. Then we held our breaths and waited for the feedback. Some came after initial spinup, some after a week of hands-on, and more after weeks of use in production networks. What we heard was music to our ears:

  • "1 second results…so fast! Super fast! Amazing"

  • "Can I sign up now?"

  • "I've stopped using ‘XYZ’ for searching this dataset, now it's Cribl for what doesn’t fit in my SIEM”.

  • “The biggest thing I've noticed is there's more data and it's cost effectively searchable-- I often thought I would miss data points when we needed to sample due to costs."

Ok, truth be told, it wasn’t always rainbows and butterflies

  • “I could only put 10TB a day in my Lakehouse.” This led us to creating multiple Lakehouses.

  • “Lakehouse slowed down a little when I oversubscribed my tier by 4x.” We made it possible to upsize subscriptions with one click.

  • “Now you’re encouraging me to generate more reports, urgh.” Sorry, not sorry!

Taking Advantage of What We Learned From Early Adopters to Deliver Real Benefits

There’s an old saying: the early bird gets the worm, but the second mouse gets the cheese 🐭🧀. Ready to be the second mouse and feast on delicious cheese? Head over to your Cribl.Cloud account and just go for it. We promise it’s just a few easy steps:

  1. Ingest Data: Data flows into a Cribl Lake dataset.

  2. Set up Acceleration: Associate a dataset with Lakehouse. ALL data will continue to be stored in Lake simultaneously.

  3. Short-term Retention: Data stays in Lakehouse for up to 30 days.

  4. Cribl Search: Queries automatically target the fastest option, whether Lakehouse or Cribl Lake.

  5. Transparent to Users: Search runs without extra steps, always choosing the best source based on query time range.

  6. Rehydrate as Needed: Full-fidelity data is always available via Replay.

The Bottom Line

Cribl breaks the traditional storage-analysis lock-in with open formats, flexible performance tiers, and seamless integrations. Lakehouse future-proofs your infrastructure, so you can work smarter with your data, keep costs predictable, and your teams ahead — without data engineering expertise, without complex schema management, and without vendor lock-in.

Ready to Take a Dip in the Lakehouse?

More from the blog