LogStream is now available as a Cloud service! Learn More

The Stream Life Episode 006: Detox Your Data Ops Workflow (and other New Year’s Resolution‪s‬)

Abby Strong
Written by Abby Strong

February 9, 2021

In this episode of the Stream Life Podcast, we welcome Rachel Perkins and Steve Litras from Cribl to talk about how much the role of a logging and data wrangler has changed over the years, and how Cribl LogStream helps modern Data Ops/Engineering teams meet challenges head-on by giving them control over the data they’re sending to their analytics tools, and opening up the landscape of access to the information needed to drive the business.

If you want to get every episode of the Stream Life podcast automatically, you can subscribe on Apple Podcasts, Spotify, Pocket Casts, Overcast, CastroRSS, or wherever you get your podcasts.

 


Podcast Transcript

Abby:

Hi, everybody. Welcome back to The Stream Life, our Cribl podcast, discussing all things machine data and streaming. Today I’m excited to be joined by Rachel Perkins, aka piebob, or just Pie, one of my favorite things for short, and Steve Litras. Rachel runs our community and customer advocacy program here at Cribl. Steve leads our technical marketing team. Both have extensive experience in the observability space, and I’m super excited to turn this podcast over to the two of them as they talk through what a modern data engineering and operations workflow might look like.

Rachel:

Well, hello everyone. Thanks for inviting Steve and me to the show, Abby.

Abby:

Yeah, thanks for joining us.

Rachel:

Now, you may have noticed, those of you who are listening to Steve and I both have titles that reflect our current roles in marketing, but that wasn’t always the case. Steve, you spent many years in data engineering and operations before coming to Cribl to spin the gospel of turbocharged data pipelines. And I wanted to ask you about that journey. Steve, what did your world used to look like back in the day? Sounded like things were pretty sweet for a while.

Steve:

Well, first of all, we didn’t really call it data engineering back then. I was just an IT guy looking for answers.

Rachel:

Everyone was the guy, yes.

Steve:

Exactly. And it was, at first when I first became a sysadmin it was great because when problems would come up, I’d go look through logs, use grep and whatnot and I’d usually find my answers. Things got a little more complicated as we started dealing with more and more systems and eventually tools like Splunk came along and Splunk was a great thing for me because I implemented it in 2006. I was running a lab for an enterprise applications team supporting about 300 people and I was getting just pelted with problems on a regular basis. Hey, this isn’t working, this isn’t working. And my systems were keeping everybody from being able to do their job.

Steve:

Bring in Splunk, feed all my logs into it and I just started coming in every morning and typing error, simple stuff, just search for errors and go find them and fix them before anybody else sees them. I went from being kind of the person getting beat all the time, to the superstar almost immediately and everything started working a lot better. Those were the good old days and that developed into a kind of dashboarding, pulling metrics out of this stuff. Being able to show the current state and continue to show the current state. But yeah, That’s how it was. It was great. It was detective work that was pretty simple to do.

Rachel:

You were basically using super grep, with Splunk.

Steve:

Exactly.

Rachel:

I remember them talking about being a superhero and going home early, that kind of thing. That was the expectation that you had.

Steve:

I never got to go home early, they just found more work for me to do.

Rachel:

At that point, you became a victim of your own success to some degree, right?

Steve:

Yeah. Yeah. As you start getting reliant on those dashboards, all of a sudden the other people who are using them want more stuff in them and it’s usually not the stuff that you’re already tracking. Then you start ingesting more data, you start pulling more logs, logs from different systems. All of a sudden things kind of bloat out, you’re dealing with license costs, you’re dealing with kind of an infrastructure that keeps growing. And moreover, then all of a sudden when people start seeing the value of this, you get people, like your security team, who all of a sudden look at it and go, “Hey, there’s a lot of really valuable data in there. We need to add in retention requirements. We need more of that data.” And pretty soon you’re pulling in logs that you’re really not analyzing. You’re just retaining.

Steve:

And you’re paying a premium for that because these systems are really set up for, they’re built for searching and analysis. They’re not really built for just long term storage of data. It becomes a real challenge. And we had retention requirements around 16 months at first, that’s changed over the years. But when you start looking at an entire data center’s worth of log data, and that was just at the start. The cloud came later and all that data grew exponentially at that point, but it was a huge amount of data.

Rachel:

And to begin with, most of what you needed to know, you could glean from the infrastructure logs. You were running out of disk space, you’re running out of memory, that kind of stuff. You didn’t really have a lot of visibility into other aspects of the business. That’s when the security team, when even the marketing team began thinking, huh, there’s data in what I do as well. And that scales things up a great deal. What did you end up having to do at that point?

Steve:

Well, it just became kind of a consistent, let’s add more license, let’s add more disk, let’s keep growing this. And I think when you do that kind of ad hoc, you tend to forget about the problems that you end up with as those things grow, if you don’t pay attention to it. We ended up with a lot of problems where either we were out of license or we ran out of disk, a lot of the things we were trying to use the tool to watch and the rest of the environment was showing up in the logging environment. We’d run out of disk space, et cetera. And it became a pretty big challenge.

Steve:

We would start trimming down logs really by just cutting certain sources off. We’d say, “Okay, what’s the most chatty source here?” And it might be Windows audit logs. Well, those are things you actually really want, but you’d turn them off. And moreover, I think as applications became more complex, and when I was just looking at the infrastructure, am I out of disk space? Is the switchboard up? Do I have a problem on the network side? That was one thing. It was a pretty limited domain, but when you started adding things like, okay, let’s add application logs into it. Let’s start looking at interactivity between applications. That data balloons rather quickly. And not only does it balloon, but it also becomes much harder to interpret because now you’re kind of trying to piece pieces together from really different perspectives and it’s hard to build that context.

Rachel:

Right, and as the kinds of data you’re collecting creeps out into other organizations, you may not necessarily have the expertise to analyze what’s coming in. And so you need to reach out to people from other organizations in your company, that kind of stuff. Did you end up having to grow relationships and deal with that way? How did you deal with that kind of creep?

Steve:

Well, number one, I’d say that being successful in IT means growing relationships anyway because if you’re just the kind of mindless not, sorry, that’s a really bad word. If you are the faceless person just fixing things, you’re going to get beat up on. You’re not necessarily going to get a really good interaction with your customers. But if you build those relationships, you’re going to actually uncover the things that they’re really looking for. Absolutely, I built those relationships and I realized fairly early on that what people were asking me for was not necessarily what they wanted. It might’ve been the outcome of what they wanted. Or it might’ve been the outcome of what they wanted or it might also have been what they think is the symptom that needs to change. You really have to have these conversations with people to dig into what are the questions you’re really asking here? And let’s get away from the data you think you need. Let’s talk about the problem you’re trying to solve. That was always a big challenge for me. That was always a big challenge in IT in general though.

Rachel:

Yeah. That it’s so much more. It’s about so much more than just the technology, the IT part. I completely agree. And so then at that point, you need to serve these other parts of the business. You need to build relationships with them. They’re not necessarily technically able to process data in a way that makes it easy to onboard into the one maybe analysis platform that you’re going to use. What happened then?

Steve:

Yeah. That’s when you start building teams. You start building logging teams, and it’s hard to find people that have that ability, just like I come from an operations background. Just like it’s hard to find networking people who can really talk to the applications, people and vice versa, same thing when it comes to logging and the individual app folks or the people who own the apps but maybe are not completely technically tied to them. Or maybe they don’t have the technical savvy about how the app works. I think this is where data engineering really kind of starts because it takes a multidisciplinary approach. Where you actually do understand what’s going on in the application. Where you do understand the business processes that are subject to those applications or that those applications support. You start working your way up so you understand the bigger picture.

Steve:

I think you become a much more valuable IT person when you do that when you really understand, not only okay, here’s the technology I’m working with, but here are the people I’m working with, but also here are the business problems they’re solving. And here’s how I’m trying to support them in solving those problems. I was one guy for a long time. I couldn’t meet everybody’s needs. And that became very frustrating, both for me and the people around me. We grew and additional people came in and started working on this stuff, but for a while there, it was kind of touch and go. But I’d also say, as companies like the ones I came from have started moving into a more distributed development type landscape, where they’re trying to empower development teams to do whatever they need to do to get a service out.

Steve:

You start talking about things like domain driven design, where you are breaking monoliths up into smaller pieces and setting contracts between those domains. All of a sudden you have a lot of people who are choosing tools and now you’ve got to figure out, okay, well, how does group A get the data that group B has? If they’re in a different tool, do I go copy that data over? There’s a whole swath of problems with, if I have disparate technologies, how do I get the same data that everybody needs to see across all of those? And that was a big challenge for us. We did a lot of copying data around and re-ingesting data. And it just it didn’t work all that well as a general rule.

Rachel:

Yeah. You ended up with more data because more complexities introduced into the stack, apps are talking to each other, storage is talking, storage is moving stuff around, all kinds of different container-type data as well. And then you have different parts of the business realizing they need some of the same data, just like you were saying. At what point did you solve this problem? Obviously, you came up with, you achieved some goal that allowed you to meet these people’s needs.

Steve:

I had to solve the problem 15 different times and it really boils down to, it was always one problem or another. It was either group A is not getting the data they need, or, hey, our license is too expensive or our infrastructure is too expensive. I tried a lot of different things. I brought in different tools. I always said that I was trying to give us different knobs to turn to solve the problem. But what that led to is we had three or four logging tools out there, each getting a portion of the same data or getting copies of the same data. And when you do that and a lot of that data you want to enrich with context, but when you’re enriching in those individual logging systems or those individual target systems, you run the risk of having completely different context on each one.

Steve:

If I end up doing GOIP enrichment on data in one system and I don’t realize that I’m using a different version of the GOIP database in another, all of a sudden I’ve got divergent data, and then they can’t trust the data at that point. At that point, I was running an enterprise architecture team and we kind of developed this idea of building demarcation points for data.

Rachel:

What does that mean?

Steve:

Well, it comes from the network world. If you’re in a data center and you order a circuit, a network circuit, the network brings it into the building, they drop it at their demarcation point and your provider may have another demarcation point that they run it to. And that is really just the place where their responsibility ends and mine begins. And I think in a lot of cases that wasn’t very clear until we started building this idea of saying, “We’re going to define these endpoints and that’s where we’ll guarantee the data to. And that’s our demark.” If we’re enriching data, we’re doing it in one place and you’re going to consume that data, at that point, it’s guaranteed to be correct at that point.

Steve:

What you do with it afterward, I have no control over and therefore it’s outside of my demarcation point. And that’s where you start doing things like acceptable use policies and whatnot. We actually started this process on the BI side of the house, on the business intelligence side of the house, but it was something I was driving towards the logging side as well. And that’s where I got introduced to Cribl. Was as a mechanism for doing that. For being able to have a single pipeline where I could send data in varying forms to all of these different systems. Where I could do the enrichment in one place and make sure that it’s consistent when it gets anywhere downstream.

Steve:

That was a huge eye-opener for me. I kind of looked at it. On the BI side, we had done that with data virtualization tools, like Denodo is a good example. But we really didn’t have anything like that on the log analytics side. And it was a total opener for me when I saw Cribl and what it could do on that front.

Rachel:

And before that, in order to achieve those kinds of goals, you would have had to install a bunch of different agents, a lot of overhead for managing and keeping all those things up to date.

Steve:

Yep. We had Beats agents out there. We had Splunk agents, we had other agents for other systems, the footprint of all of the agents on any of our infrastructure was kind of getting ridiculous. And then you start adding things like application performance management that came along later. Get another agent.

Rachel:

Right and all the people who have to do that work to track everything, many Steves.

Steve:

Well, and the expertise that’s needed is kind of hard to fathom.

Rachel:

Yeah, expensive also, if you want to hire it.

Steve:

Absolutely.

Rachel:

Yeah, so then you were able to try out LogStream, and what happened at that point?

Steve:

I loved it so much I joined the company. No, we actually were a prospective customer when I decided to move on. And that company has since become a customer and it’s made a dramatic difference in their workflow, in how they’re ingesting data. They’re able to ingest data a lot more quickly with much fewer problems. Some of the tools that are out there, the pre-log stream, are very hard to configure and they don’t give you really an indication of whether they’re doing something right or not. You kind of have to put them in place and hope that the data gets to the other side. And then you look at it on the other side, not quite right. Okay. Got to go back and fix it again. That was one thing that I was really impressed with when I started looking at log stream is that I can model all that stuff within the interface. And then when I deployed it, I knew that I was deploying what I had intended to. That was a pretty significant change for them. But I did join the company right after that and I’ve been really happy with that decision.

Rachel:

We’re pretty happy with it too. Yeah, it sounds like in this particular case and in many cases, log stream helps those data, now they’re operations and engineering teams, not just IT guys, meet those challenges head-on, by giving them the control that they need to send data to their analytic tools that opens up this huge landscape of access to the information needed to drive the business.

Steve:

For sure. For sure. There’s another element to that. In my case, we had an IT team and a security team that each had their own, they had their own agenda. They also had different requirements on tools. Having those two groups be able to unify what they’re doing and they can still use their tools, they still use different tools. They can try out new tools, but all of a sudden it became a whole lot less painful. I was running an infrastructure team at the time and I didn’t have control of my own logs because they’d get handed off to security. And they said they had to be in one tool, but not everybody on my team had access to that tool. How can I have infrastructure operations managing a system where they can’t even really look at the logs? Just having something where there is a unified mechanism to route that data where you want, obscure the stuff that you need to obscure in an operational log, PII kind of stuff. But make it available to the people who need it to do their jobs. It was a huge game-changer for us.

Rachel:

Yeah. We are starting to call that data independence. Everyone in the organization can have the data that they need to do the best work they can for their org and at the same time, you’re able to deliver that functionality to them without having to worry about the details too much.

Steve:

For sure. It’s better than the data co-dependence that we were living under before.

Rachel:

Yes.

Abby:

Wow. And it makes it sound like data engineering is really the backbone of any organization. Man, I loved listening to you have that conversation. Thank you so much for joining us today and doing that on this podcast. I was so caught up in it, I almost forgot to speak. Not going to lie. Thank you, Rachel and Steve, for joining us, and thank you to everybody for listening in. And if you want to find either of them, they’re both available on our Cribl community and you can join that and ask Rachel and Steve questions about what they spoke about on this podcast today. And until then, maybe keep that machine data flowing and remember that Cribl helps you take control and shape all of your data. Thanks everybody. Have a great day.

 

Questions about our technology? We’d love to chat with you.