January 25, 2021
In this episode, Abby is joined once again by Nick Heudecker to talk about their data & observability predictions for 2021. It’s all centered around security budgets and breaches, customer satisfaction, and revenue and infrastructure costs.
Abby: Hi, everybody, and welcome to The Stream Life, our Cribl podcast, discussing all things machine data and streaming. Today, I’m joined by Nick Heudecker, whom you may remember from Episode 3. Nick is a former VP analyst at Gartner who joined us to run market strategy here at Cribl. Nick, I am so glad you’re back on The Stream Life, and I can’t wait to hear these predictions about observability that you’ve been preparing.
Nick: Thanks for having me.
Abby: So let’s jump right in. As we see from many companies right now, it’s definitely the time of year for offering predictions on market trends. Why do you think folks have this fascination with predictions?
Nick: I think people are always interested in looking into the future, even if it’s just 12 months or 24 months out, and it’s really hard to do, right? How do you come up with a good prediction? And at the end of the year, it’s always fun to look back and see how right was I, how wrong was I, and odds are in your favor. The more predictions that you make, the better chance you have of being right. And fortunately, most people only remember the ones that you get right, they don’t remember the stack of them that you get wrong or I get wrong over the course of a year or a career.
Abby: I feel like I just got a little insight into what it’s like to be an analyst there.
Nick: It’s about volume. It’s about volume,
Abby: But observability as a term, it’s obviously a relatively new market trend itself. So I think it’s interesting that there’s predictions associated with it that aren’t observability is going to become a more widely used term in 2021.
Nick: Well I think it’s definitely going to be a more widely used term. I think if there’s a shortcoming, it’s in how people are using and accessing this data and how they’re sharing it across not just IT ops and security ops, but DevOps more and more, and your broader infrastructure concerns across the organization. People really need to think about collecting a lot of data and then figuring out what to do with it once they have it, once they realize a problem, rather than figuring all that stuff out in advance.
Abby: That’s awesome. So what are the themes for Cribl’s 2021 observability predictions?
Nick: So there’s three. The first is around security budgets and how that impacts the ability to detect breaches, application complexity, and how that relates to customer satisfaction and potential revenue growth, as well as infrastructure costs and how you might be optimizing or not optimizing around that.
Abby: So I know I was just making fun of you a couple of minutes ago, but how do you go about developing these predictions so you can make many of them?
Nick: Yeah. So for Cribl, I spent some time talking with customers, looking at current events in the market, what new patterns of work we’re seeing in application development. And then you take that information, you look gaps, right? You start to figure out what second and third order impacts might develop over the next 12, 14, maybe 18 months, and then write those up and share them here on the podcast and in other places.
Abby: I’m excited, so let’s get to this. What is your first prediction for observability in 2021, Nick?
Nick: Yeah. So the first one is that three quarters, 75%, of container based deployments are going to exceed their infrastructure budgets by about 200%, because they’re going to have what I’m calling container blindness. And what’s really causing that is businesses, and this is a lot of pandemic issues triggered this, businesses are re-architecting pretty heavily. They want to be more adaptable, they want to be more resilient. Their workforces are increasingly remote, and that’s not going to change. I’ll talk about that later. And so instead of crafting or continue to build big monolith applications on legacy hardware, they’re really looking at modular apps built on flexible cloud-based infrastructure, and containers are a core part of that, right? They provide consistent portable runtime environments and the usage of containers has exploded, right?
Nick: According to CNCF, the production use of containers went from 23% in 2016 to 84% in 2019. And about a fifth of those survey respondents report deploying over 5,000 containers in production. The challenge for infrastructure and operations leaders, IT ops teams, what are all these things doing? Right? We’ve got over a hundred container management tools all over the place, all aggregating logs and data in their own way, and it’s impossible to get a holistic view over this entire environment. And a lot of these containers only exist for 24 to 48 hours, and then they’re gone. And it’s difficult, if not impossible, if you don’t already have that data collected in some place where you can get to it, to figure out what’s happening, to do a post-mortem. And then you’ve got cost management issues, right? Another survey conducted by Datadog, nearly half of the containers that are deployed use less than 30% of the requested CPU, and 45% of containers are using less than 30% of the requested memory.
Nick: Unfortunately, you’re not just paying for the parts you’re using, right? If you’re allocating X amount of memory, but you’re only using 30% of that, you’re paying for X. And so there’s a tremendous amount of waste that’s happening because I&O leaders cannot figure out what’s really going on in these container environments. So they’re spending a lot of money, right? They might be trying to use traditional performance monitoring systems for some better insight, but that can be incredibly expensive, it’s challenging to deploy those agents all over the place. You may be looking at price pressures that force many enterprises to only install APM on a small fraction of applications, making it really unworkable in today’s incredibly dynamic application environment. And so this lack of visibility, this lack of insight into your container based deployments, is what I’m calling container blindness. And all of those reasons are why I think you’re going to see a doubling of infrastructure budgets throughout the year.
Abby: It’s interesting that you say that, and I’m thinking back to the earlier podcast that you did with us a couple of weeks ago, and thinking through how you had said it’s impossible to know how your containers are performing without having an observability pipeline. And certainly here at Cribl, one of the things that we really focus on with our technology is helping with that cost management that you were just describing. And so obviously, this is one of the core focal points here at Cribl, and I’d really love to just spend a couple more minutes on this, because as I start to think about everything that you just said, in order to get the data to and from these containers, being able to monitor whether or not the performance or what that performance looks like especially with these dynamic destinations, and then controlling costs, do you think everybody has to have an observability pipeline? Or even more importantly, do they already have an observability pipeline and are just calling it something else? Or is there a step that’s often missed as folks are spinning up these container based environments where they should have an observability pipeline?
Nick: I think a lot of companies are using maybe something like Kafka for eventing, right? But not necessarily looking at 100% of the logs that their infrastructure is generating, and handling that in an intelligent, or I like to say an opinionated way. A lot of open source infrastructure, DIY, perhaps you go as far as to call them an observability pipeline. They don’t know the data that they’re dealing with, so they can’t do incredibly intelligent things with it unless you write that code yourself, which turns into its own legacy nightmare of maintenance going forward. So a lot of companies, I don’t think they have opinionated or purpose-built observability pipelines today. They might be trying to build these things or streaming data into a cloud-based object store and hoping for the best when it comes to querying and interrogating that information. But right now, I’d say it’s a fairly small level of adoption as we’ve seen so far.
Abby: It sounds like a topic that we should probably dive into further in a future episode.
Nick: Yeah, definitely.
Abby: All right. So let’s hear that second prediction now, please.
Nick: All right. So the second one is more around security and breaches, and the way that I’m phrasing this is log anxiety, right? Log anxiety is going to result in about 40%, is my estimate, of security related breaches going undetected as teams are basically disabling log collection to stay within budget. What’s driving this? Well the attack surface is so much bigger than it has been before. There was a survey from Stanford, 42% of U.S. workers are now working from home full-time because of the pandemic. And I don’t know what the numbers are internationally, but I imagine they’re similar. In the U.S., that 42% represents more than two thirds of all economic activity. And it’s been wildly successful. According to Gartner, 74% of CFOs plan to make this shift to remote work permanent. But this really increased the challenges for cybersecurity and security ops teams, right? As the corporate data center extends into the home, a lot more risks are possible and becoming more and more evident for workers as they become their own IT team.
Nick: There was a survey from the Center for Strategic and International Studies and McAfee. They estimate that losses from cyber crime in 2020 are projected to be just under a billion dollars. Foreign criminal enterprises are targeting these industries, whereas individual scams are targeting the workers. And so as people work from home, as I said, they really are becoming their own IT support, their own security support, and it’s difficult to scale that stuff. McKinsey recommends that companies, they expand their web facing threat intelligence, security information and event management programs to compensate for all of this work from home risk. And that’s great advice. But the reality is that the challenges are often more practical, right? You want to collect all this data from firewalls, log information from VPNs, multi-factor authentication. And that’s great and you should be collecting all of that data when it makes sense. But the problem is that it often busts your budget, right?
Nick: So if CFOs are looking at 10X increases in the size of their logging analytics budgets, they’re likely going to accept a higher risk just to stay within that budget. And that can be a tough trade-off to make, right? IBM released a survey on cyber security and data breaches. The average cost of a data breach for U.S. companies in 2020 was $8.64 million. It’s a huge amount of money. And CFOs always focused on the bottom line and may think, “Well all right, that risk is worth it, right? There’s only a small chance I’m going to get hit with a data breach.” But that same survey from IBM reports that the average chance a company will experience a data breach is at 27, nearly 28%. So as you start to approach 30%, if you aren’t getting hit and your friend’s not getting hit at his own company, the third person is. So it’s only a matter of time that you’re going to experience a massive data breach, and you do need to be able to do a post-mortem on all of that data. Can you pay for it? That’s really the question.
Abby: Hence, interesting that you say all this. Even as I’m listening to you, I’m thinking back to some of the case studies that we’ve recently published. And I know BlueVoyant, who’s one of our leading cybersecurity services providers and one of our customers, were telling us in their case study that they’ve seen a thousand fold increase since March 2020 in the levels of domain impersonations and other attack types taking advantage of businesses. And most of it comes with the move to remote working. And I know we talked to another partner of ours who was saying that they saw about 400% increase in their firewall log volume over pre-pandemic levels, given all of the traffic from remote employees that now goes through the firewall in and out of the mostly unused corporate offices and resources, how many of those networks were designed assuming that folks were in one central location and then backhaul to wherever their resources reside. And now that everybody’s coming from everywhere, everything’s going in and out of that firewall at a time.
Abby: And so while I understand that that growth is unprecedented and unexpected given the events of 2020, it definitely contributes to the anxiety that you mentioned earlier. So wow. It’s interesting to hear all the-
Nick: It can definitely be a challenge. And you’ve got to start to think about where do I put the intelligence for this data? It often can’t be in the log analytics platform because you might want to use one, another team wants to use another, they still need the same data, but they need different facets of it, or they need a different view over it for the tools that they’re most comfortable with. So you have to push that logic earlier in the stream.
Abby: Absolutely. It’s really great to hear the data backing this up and also the opportunity for using a product like we have with log stream to help take control of some of this data.
Abby: All right. So let’s get over to this third prediction of yours.
Nick: Yeah. This one’s a little longer, and it links in a way back to the first one around containers. And the prediction is enterprises implementing an end-to-end observability pipeline will lower infrastructure costs by 30% and resolve issues four times faster than their competitors, which results in better customer satisfaction and increased spend by about 15%. And the reason that this is longer is modern applications are really complex, right? We talked about monoliths versus microservices earlier. Today’s applications are hundreds or thousands of services. They’re built by independent teams. They may have their own databases that are run by those teams or a separate team entirely. And the software may be automatically deployed, right? You might have deployments that occur dozens or hundreds of times a day across all kinds of production environments, right? And so how do you test these things? Right? Containers are short-lived, you’ve got dynamic scaling going on, you’ve got all kinds of cloud regions.
Nick: Honestly, the only time companies really test their applications today is when they get deployed to the customer. So your customers, the people you’re trying to get money from or engage with, are now unpaid acceptance testers. And that’s incredibly frustrating, right? Where do they turn? They turn to Twitter. They turn to social media hoping for a resolution and definitely to complain. Traditionally, infrastructure and operations teams would deploy monitoring for visibility into these environments. That really hasn’t kept pace, right? We’ve talked about the exorbitant costs of storing all that stuff. They’re forced into decisions about what logs, metrics and traces to keep to stay within budgets. And they just can’t keep everything to observe their environment. Prebuilt dashboards and alerts, figuring out what questions you want to ask before you know what you want to ask is one approach. I equate it more like the data warehousing world of I have reports I want to generate every month and they don’t typically change. I see that as traditional monitoring, right?
Nick: But today, you need to collect a lot more and offer the opportunity for ad hoc query. You build your questions as you go down this path. Monitoring is also a point solution, right? Typically targeting a single application or a service. A failure in one service will cascade to others, and traditional monitoring is not capable of dealing with those complex relationships in application components and infrastructure components. So what we’re seeing is that successful IT ops and security ops teams are evolving past monitoring, this passive story into more of an observability, right? And this is a characteristic of software and systems that allows them to be seen and answer questions about their behavior, right? Which is unlike monitoring, which is primarily a static view. Implementing observability or a pipeline requires you to come up with a way to collect and integrate data from all kinds of different systems, and the pipeline decouples the sources of data from their destinations and it gives you a high degree of control over what goes where, what shape it might need to take, what data gets redacted, and so on.
Nick: And so by putting a lot of context around these logs and metrics, an observability pipeline makes debugging faster, because you can ask all kinds of what if questions about the environment rather than pre-calculated views that’s prevalent in today’s monitoring solutions. Faster debugging and root cause analysis means fewer customers getting errors in production, which increases customer satisfaction, driving up sales. This also lets you rationalize your infrastructure costs, right? Often, the team that’s deploying infrastructure is not the team paying for it, right? And they’re going to over-provision, right? We saw this earlier in the Datadog survey. Collecting all that performance data, even for transient infrastructure like containers, gives IT ops teams visibility into how many resources they’re actually consuming and what optimizations are possible.
Abby: Wow. That was a longer one, but it’s so interesting to hear you speak about this because it is so parallel with many of the things that we’re doing here at Cribl. And I just get so excited when I hear all these statistics.
Nick: Well nothing happens in a vacuum anymore, right? No one team can say, “This is what’s happening in the world,” whether it’s developers or infrastructure, or someone in the middle. So it really does come down to how do I better scale and collaborate across these bigger groups? And that doesn’t happen in one platform.
Abby: Right. And one of the things that I love about that is that it just means that log stream is so perfectly suited to help address some of these potential negative outcomes of your predictions. And they really help customers move into the next generation and make that move from monitoring to observability, and deliver the necessary control and visibility and cost reductions that they need in order to scale with this massive growth in data. I don’t know. I just love hearing all of this. Well I’m so grateful that you came on the podcast again today and were willing to talk through some of these. Is there anything else that you want to share or leave with the listeners before we sign off today?
Nick: Well hold me to these. Over the year, if you are experiencing these problems or you’re not experiencing these problems, you can find me on pretty much any social media platform under NHeudecker, or email me at Cribl, and I’m happy to be proven right or wrong, either way.
Abby: Mostly right though.
Nick: Hopefully right. Hopefully right.
Abby: Well thank you again so much, Nick, for taking the time to join us on The Stream Life. And thank you, everybody, for listening. I hope you have a great day.