Square Pegs, Round Holes: The Challenge of Integrating MELT Data into Traditional Data Warehouses

Last edited: May 6, 2024

This is the first in a series of blog posts about the disconnect between modern IT and security teams and the vendors they’re forced to work with. If you’re looking for the second and third posts, you can find them here and here.

Imagine this scenario: You’re grappling with the ever-escalating costs of your legacy solutions. What’s the logical next step? For many, it’s exploring the new wave of tools emerging, such as data warehouses. While these new tools initially sound promising, they’re often built on traditional data management infrastructure. As you may have guessed, sticking a new user experience on a legacy data platform presents its own challenges for IT and security teams.

Data warehouses are great for handling massive amounts of transactional data, but they hit a snag when confronted with the diverse array of metrics, events, logs, and traces (MELT) that IT and security operations teams contend with daily. Here’s the rub: data warehouses need structure. They thrive on predefined formats for storing and processing data. But operational telemetry data? It’s anything but predictable. It’s fluid, it’s diverse, and integrating it into the rigid framework of a warehouse is akin to fitting a square peg into a round hole.

The need for predefined structure in data warehouses means someone has to put that data into the warehouse. This might not be a problem when you’re locked into a vendor’s agent that is shoveling data in from a single source. But what if you want to onboard multiple data sources? When I speak with Cribl customers and prospects, it’s not uncommon for them to have dozens of different sources they want to onboard. It’s manual, time consuming work to get all that data in, especially when integrating multiple sources.

Let’s talk about that timing in a different way. Data warehouses aren’t real time platforms. They fundamentally rely on batch processing to get data in, making them unsuitable for streaming data sources and real-time operational demands. While you may want to act on data now, your new tool backed by a legacy data warehouse may not support your SLAs, like MTTR in DevOps teams or IT Operations, or time to detect if you’re in the SOC.

Additionally, centralizing all of your data in a single place poses a formidable challenge. Data warehouses thrive on consolidating data into a single, uniform location (and format, as mentioned above). However, operational telemetry data sprawls across an organization’s infrastructure, spanning endpoints, data centers, and the cloud. Consolidating it into a singular repository is not just impractical but also financially unfeasible. If you have tens of thousands of servers, can you afford to move – let alone store – all of that data to a single location? For many teams, it’s just not viable, financially or practically.

Finally, the skill set required to navigate data warehouses differs significantly from that of IT and security operations teams. While SQL reigns supreme in the data realm, it might as well be a foreign language to many on the operations side. Instead, we rely on intuitive, search-based interfaces to delve into data and extract actionable insights.

Where does this leave us? While data warehouses excel in the realm of business intelligence and analytics, they fall short in meeting the intricate demands of IT and security operations. And the new observability and security products riding on the coattails of these technologies? Despite their shiny veneer, they often fail to grasp the complexities of our data landscape and the nuances of our operational needs.

What’s the remedy? Well, we’ll have more to say about that in the near future. For now, let’s acknowledge that bridging the gap between old and new presents a formidable challenge—one that demands thoughtful consideration and innovative solutions.

If you want to see those innovative solutions in action, check out the next blog in this series and join us at CriblCon on June 10th in Las Vegas. We’re announcing new products and features, and it’s a chance to connect with the brightest minds in cybersecurity and observability. I hope to see you there!

Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl’s suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs.

We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Previous articleNext article