x

The Value of Agnostic Design

October 21, 2021
Categories: Engineering

I’ve spent most of my time in the observability space in APM, Application Security, and DevSecOps. Many of my responsibilities were around talking to customers about “the future,” with titles spanning practitioners, architects, junior leaders, and senior VPs.

And for all the times that OTEL (OpenTelemetry) came up – and it comes up very often in observability – none of them could ever tell me about technical features of OTEL that mattered. One thing stood out from my years in the space: Companies don’t want OpenTelemetry.

Well, they do, just not for the reasons, they think they do. I heard about how:

  • OpenTelemetry will make it easier to instrument applications.
  • OpenTelemetry will remove vendor lock-in and allow easier transitioning between tools.
  • Open source is better.
  • OpenTelemetry will solve organizational silos, making it easier for development, infrastructure, and application teams to work with each other.

I can’t even count how many times I’ve been asked, “What’s your OTEL strategy?” Collectively, this has led me to the realization that organizations don’t actually want OTEL. It just happens to be the closest thing to vendor unlock that currently exists in the observability space.

To defend my opinion that we need agnostic tools, we should first take a highly abbreviated jog down memory lane. Historically, ops tools have been very vendor-specific. You lived within whichever ecosystem you bought into. In the early days, it was all about event management. Tools like IBM Netcool evolved as the de facto collection engines. They allowed various sources to come together, and you could route incidents, or rather events, to various teams for triaging. The remnants of this architecture still exist today, and I could spend way too much time kvetching about a certain big automotive company that wouldn’t let me send performance events directly to SNOW. (We had to send them to Netcool first.)

One giant leap forward later, we started to get into threshold-based alerting and, eventually, into insanely deep performance metrics. I think of Solarwinds as the first highly successful monitoring tool. Its infrastructure monitoring is still in place in many organizations today, and it has a nearly 100% penetration of the infrastructure for enterprises that use it. Unfortunately for Solarwinds, they were only really effective at infrastructure and network performance monitoring, which left the door open for APM solutions like Dynatrace and AppDynamics. While these provide much higher fidelity, they are far more difficult to deploy and ludicrously expensive. So much so that they really only see adoption in the “Tier 1” applications, or roughly 10% organizationally. Even still, companies have constantly pushed the APM titans for more visibility: database and infrastructure to replace Solarwinds; business correlation; and all of this correlated and collated. I believe that push comes from the lack of integration between tools and, ultimately, a need for that legendary “single pane of glass” we’ve all heard so much about.

That brings us to today, give or take, and maybe tomorrow. Observability is now a key part of every organization’s strategy. Everyone has heard of, if not actively used, tools like Datadog and Splunk. The amount of generated data is constantly growing, while companies are simultaneously trying to reduce their costs and simplify their tool landscape. For example, Skunkworks teams are building their next generation of architecture on the cloud, with pressure from executives not to get locked into another multi-year, multi-million-dollar contract. Then, with almost perfect timing, OpenTelemetry pops into the picture, and suddenly everyone has an OTEL strategy.

OpenTelemetry

OpenTelemetry is now 5 years in the making, yet still not up to feature parity with existing enterprise tools in use today.

As an Apache 2.0–licensed solution, the core implementation is squarely in the “free elephant” category of tools. While it potentially allows you to unlock from Dynatrace dashboards and switch to New Relic’s, that ability comes at a high cost. There is no enterprise support, metrics visibility, and you’ll need a software engineering group willing to implement tracer code (or more of the same difficult profiler API instrumentation that makes APM a headache today). They are also continuing the model of getter <-> analyzer, agent & controller, collector & indexer. That is to say, you can send the OTEL data from the tracer to only one analysis system. How free are you really to switch to another tool if you have to have a hard cutover?

What do enterprises actually need, then? As much as we like to believe that committees are productive, you often need someone with an opinion. Open source is great, and vendor lock-in is a real problem, but the success of Kubernetes stands as an outlier among many open-source enterprise projects. While it is compelling, my experience watching “admins” manage K8s deployments has told me a proper enterprise support model is a must – look at the market for OpenShift and EKS. The technology in question must be easy to deploy and not rely on too many teams working in coordination to work successfully. Finally, it must follow a 3-tier model of collect, route, and analyze.

This is where Cribl’s AppScope and Stream come into the picture. At Cribl, we build around protocols first and then around specific vendors as needed. By focusing on standards, we allow companies to bring any source into the equation and send that data to any analysis system or multiple destinations at a time. Thus, we are agnostic about the origin of the data and where the data is going. And, of course, we have proper enterprise support to back up our offerings.

As we continue to develop AppScope, we’ll further the goal of collecting data from anything, anywhere, at the appropriate fidelity, and then send it to any systems of analysis you might be using or exploring. We’re solving the actual problems in the observability space today. We’ll help you vendor-unlock your tools strategy by allowing tools to coexist. We’ll bridge gaps by bringing multiple sources together. And we’ll make it easier to collect data from all the things.

All of that said, I want to be clear: Of course, we support OTEL. In fact, we support all things Metrics, Events, Logs, & Traces (MELT). I’m not saying OTEL isn’t great. I think it will be, but it’s not a magic bullet, and in many ways, it’s a step backward from where current enterprise tools are today.

The industry doesn’t need another OSS tool; it needs an opinionated view, with an agnostic perspective on how observability data is collected, moved, and analyzed. Organizations will benefit greatly from more flexibility in monitoring tools, so long as they don’t lose the forest for the trees and end up in a landscape of rolling your own OSS. So if you ask me what our OTEL strategy is, I’ll tell you it’s just another source to collect from.

AppScope is the easiest way to instrument everything, giving you better visibility into any application. Learn more or get started with AppScope free today. If you have questions, join our Slack community to connect with other AppScope users leveraging this new approach to black-box instrumentation.

The fastest way to get started with Cribl Stream is to sign-up at Cribl.Cloud. You can process up to 1 TB of throughput per day at no cost. Sign-up and start using Stream within a few minutes.

.
Blog
Feature Image

Scanning the Edge: Expand Your Visibility to New Heights

Read More
.
Blog
Feature Image

Conquering Data Lakes and Searching Google Cloud Storage Buckets With Cribl Search

Read More
.
Blog
Search Amazon S3

Effortlessly Search Data From Amazon S3 Buckets With Cribl Search

Read More
pattern

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.

box