With its market size reaching more than $2 billion in 2020, you’d think that a universal definition of the term observability would have emerged by now. But it turns out that a clear definition of a term or industry isn’t necessarily a prerequisite for the rapid growth of its market size — just ask everyone at your next dinner party to define blockchain for you and see how many different answers you get! In this blog post, let’s look at observability history and where the concept came from.
A recent survey found that 64% of organizations are looking to deploy observability capabilities, but 48% aren’t even sure how to explain it. Most companies sense that they need something more than they currently have, but they’re just not sure how to define it.
If you did a quick Google search, you’d find a million different definitions for observability. One common characterization is that it allows you to understand the behaviors of applications and infrastructure by the traffic they produce. Gartner says observability is the characteristics of software and systems that allow them to be seen and ask questions.
Here at Cribl, we believe that observability allows you to ask and answer questions of complex systems. This includes questions that have not been planned for; you can evaluate and gain insights from a system even though it wasn’t necessarily instrumented to provide that information.
Maybe this is enough.
There’s some debate, but most people think it came from Rudolf Kálmán and his study of system theory and control theory in 1960. In his research, he suggested that if you monitor the outputs of a system, you’ll get a good idea of what’s happening inside, and those insights will help you figure out why the inside works the way it does. Just like hospital patients are hooked up to all kinds of equipment to monitor their condition and gain insights into their health, the IT and security world is now doing the same for their systems.
While this whole concept was born in 1960, I believe the concept of observability was actually defined 20 years later, following the first major network crash of internet (Arpanet) back on Oct. 27, 1980. During that outage, experts caught a glimpse of the future, and it wasn’t pretty. Imagine if it happened today: Hours-long interruptions in social media scrolling, shopping on Amazon, or watching cat videos… Could we even survive as a species? For an interesting, nerdy read, check out a summary and analysis of the incident documented in RFC 789. It discusses the vulnerabilities of network control protocols and emphasizes the importance of internal monitoring systems.
RFC 789 made it clear: You need more data to monitor system operations. As the 1980s went on, things like SGMP and Syslog data entered the conversation. Then the ISO got involved and said there needed to be a formal way to capture certain amounts of data, so SNMP was created as a result. To decide exactly which data should be stored, the idea for Fault, Configuration, Accounting, Performance, and Security (FCAPS) appeared shortly after.
Some of us with less hair left than others might remember the time that we started to get all these new NMS systems like Novell and Banyan VINES. Both have since gone by the wayside, but then vendors started to join the party. Splunk did an amazing job of capturing data and the more they captured, the more people wanted it. In security, the SIEM and UEBA services emerged, and AWS services quickly followed as we needed more storage capacity than was available or affordable locally.
All of the improvements made over the last 40 years are pretty great, but the problem is that many of these applications are ephemeral, so data might only be up for days, hours, or even minutes, and then *poof* it’s gone. Capturing has to be done in real-time by a carefully planned out observability architecture.
Another reason why the value of observability is increasing is that the quantity of enterprise data or machine data that’s been generated by applications, systems, routers, or bridges is also increasing at an incredible rate of 23% year over year. There are a few reasons observability is more valuable than ever to organizations. One is that there are a lot more systems generating data; even the smallest system out there is now a smart system. Things like your thermostat and your doorbell are now WiFi-enabled devices that generate data. Secondly, we’ve got more teams that want to see that data. If something is generating data, you might be able to do something with it, so it’s a good idea to capture it. And finally, so much of this data needs to be retained for compliance or ‘just in case’ reasons.
Yet because of this tremendous amount of data, logging systems are almost at capacity. If data is going to continue increasing, you have to decide what you’re going to do with it and where you’re going to store it. Questions abound: Can you afford to capture it? How will you analyze it? Part 2 of this series will cover why observability isn’t something you can buy, but requires you to build.
We’ll answer these and more in our upcoming posts in this Observability History series, and don’t miss our on-demand webinar: Observability: Everything You’ve Heard is WRONG (Almost).
Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.