We’ve all been there: staring at an outrageous bill from an observability or security vendor, watching our data volumes balloon, and knowing, deep down, that most of what we’re paying to store and index is junk. It’s the digital equivalent of hoarding, justified by the terrifying specter of the one time five years ago someone got yelled at for not having that particular piece of data. And so, the mantra becomes, "We save everything, just in case."
That is not a business plan. It's a costly, unsustainable reflex.
While not nearly so stark, this was similar to the environment Simon Overby found himself in at Getty Images back in 2022. They were ingesting several terabytes a day into their primary logging system, and their data growth was projected to climb a whopping 30% year-over-year. At that rate, the team was looking down the barrel of millions of dollars in additional expenditures. More than just the astronomical cost, the environment was painful. Filtering was obtuse, configurations were scattered, and there was no way to preview data before ingestion, turning every new source into guesswork.
I recently sat down with Simon Overby and Lovepreet Singh - the Engineering Manager and systems engineer (respectively) at Getty Images to talk about their experiences implementing Cribl. You can watch that conversation here:
After getting a rundown of the pre-Cribl environment (described above) I asked to jump straight to the end, the net benefits. If the "before" was a terrifying tidal wave of cost and complexity, what did the "after" look like?
The short version? They gained control.
By leveraging Cribl to treat their telemetry data in flight—cleaning it, filtering it, restructuring it, and routing it—Getty Images was able to cut their ingestion volume nearly in half. Crucially, they’ve managed to keep that ingest volume flat over the last few years, even as the business continues to grow.
This isn't just a technical win; it's a massive financial one. The net savings were substantial, with the largest savings coming from reducing the indexer cluster size alone.
But as Simon noted, the real victory goes beyond the license wrangling and cost savings. The true win is the "visibility and flexibility" that came from treating their data pipeline as an opportunity, not just a necessary evil.
Cribl became something Simon called the "unlock tool." It’s the platform that lets them truly own their data and put it where it needs to be, when it needs to be there, and in the format it needs to be in. This paradigm shift allowed them to stop treating telemetry data like a black hole and to make it usable for teams like security, often without ever hitting the main SIEM.
That's the kind of architectural agility that shifts an operations team from being a cost center to being a true internal solutions provider.
The real challenge: Culture and scaffolding
It’s easy to focus on the flashy technology, but the most insightful part of the conversation was about the process—the non-technical scaffolding necessary to make a project like this succeed. As Simon said, "The technical problems are usually easy. It's the buy-in, getting stakeholders to buy in, especially when you're introducing a new technology or a new tool, that’s the hard part".
He frames the role of the enterprise monitoring team in a powerful way: they want to be "boring and uninteresting." Monitoring should just work. Just like sanitation engineering, you only notice the team when the trash isn’t being picked up and everything starts to stink. To achieve that boring, reliable functionality, you have to be deliberate and disciplined.
Part of that discipline included implementing a strict internal process: the "observability ingest form". Every request to bring in new data—from application teams, network, or security - has to be vetted through this form. It’s designed to collect, up-front, the “twenty questions” style an observability engineer would otherwise have to glean from hurried and repetitive Slack exchanges. Questions like:
What data are you bringing in?
How much volume is expected?
What is your required retention period?
What security policies are around this data?
Does it need restricted access or visibility?
This step forces internal customers to quantify and qualify their data up front. It shifts the burden of proof from the monitoring team to the data owner, making sure they’ve thought through why they need the data and what its value is before it starts costing the company money.
Adopting a "crawl, walk, run" approach
When introducing a new tool, the key is evangelism that focuses on pain, not features. I found Simon’s advice to be spot on: “Don’t talk about the tooling. Ask ‘How can we help you? What are you trying to solve for?’” By partnering with teams on their pain points the focus moves away from a team’s natural resistance to vendor-specific conversations to the immediate, undeniable solution.
To demonstrate how this control translates into real-world wins, the team shared several compelling use cases where flexibility was the difference between success and a total melt-down.
The atomic habit of engineering discipline
The conversation wasn't just about big-picture strategy; it was also full of technical lessons and best practices that any monitoring team can and should adopt. These are the nitty-gritty details that make the "boring" system reliable.
1. Naming conventions and metadata: Be deliberate about naming conventions for sources, routes, and pipelines. Additionally, Lovepreet and Simon emphasized adding metadata fields to the data payload itself. These fields track elements like the "originating source," which is critical for troubleshooting and tracking data provenance later on.
2. Partitioning and isolation: This piece of advice is super-specific to the Cribl platform: Partition worker groups (in Cribl Stream) by team or use case. This isolates compute and configuration changes, ensuring that a misconfiguration or a burst of activity from one team doesn’t impact the stability or performance of another.
3. Dev null and load testing: Also specific to Cribl Stream: monitor your devnull pipeline to catch errors and ensure you aren’t accidentally dropping valuable logs. The team also utilizes Cribl’s lesser-known capability, datagens, to generate anonymous sample data (JSON, Apache, Syslog) for pre-production load testing and Proofs of Concept. This ensures you don't have to risk sensitive production data to test a new architecture.
4. The power of knowledge and AI: Lovepreet reinforced what is a common, but still-important technical reminder: RTFM (Read The FRIENDLY Documentation). He recounted how reading the pack documentation for Palo Alto firewall logs helped them immediately resolve complex issues like time-stamping, which otherwise would have taken hours of fruitless troubleshooting.
5. No blog is complete without a mention of AI: Beyond the foundational docs, the team is already leveraging AI with tools like Cribl Copilot. This tool helps quickly build complex data processing pipelines, generate sample data, and enables natural language searches that write the KQL queries for you. As Lovepreet put it, when you are learning a new tool and need to find something in a haystack, AI integrated into your solution becomes a necessary tool that helps you "find a needle in the haystack".
The lightning round: Simon and Lovepreet’s final take-aways
If you only remember one thing from the Getty Images journey, let it be this: The technical problems are solvable; people problems are hard.
Simon's closing advice should be a mission statement for every operations, observability, or platform engineering team: Don't get stuck defending your shiny new tool. Instead, go where the pain is.
Is security struggling with compliance retention? Is a dev team spending too much time debugging without the right logs? Is finance screaming about license overages? Address those specific, visceral needs, and focus on solving the problem, not implementing a tool.
That’s the kind of value that transcends technology. That’s the real win.







