How SpyCloud Architected Its Cribl Stream Deployment

Last edited: November 21, 2023

In this livestream, I talked to Ryan Saunders – Manager of Security Operations at SpyCloud, about how he used the Cribl Reference Architecture to build a scalable deployment. He explained how this approach enabled SpyCloud to grow alongside its evolving needs without requiring significant rework. The reference architecture also facilitated a repeatable data-onboarding process, reducing administrative time and allowing the team to focus on critical security and data analysis tasks.

SpyCloud is a cloud-native organization that generates enormous amounts of data — from hosted email and EDR, sales solutions, and the rest of their sprawling cloud architecture. Before implementing Cribl Stream, they had too many sources and too little time to figure out how to integrate all of them.

Saving Valuable Engineering Time

Traditional on-prem environments can have many sources, but they generally come from a single area that makes it possible to capture them with a single set of agents. Because of their sprawling cloud architecture, Ryan and his team didn’t have that luxury.

During our conversation, Ryan pointed out that engineers come to work at SpyCloud to work in security, not to become a data butler. They don’t necessarily know how to architect large data pipelines — they just pull the data in and go to work on it. To that end, the first problem they solved with Cribl Stream was streamlining the process of bringing sources into their detection analytics platform. Data now flows in natively from a source like AWS instead of via a TA or other inefficient, incomplete method.

Flexibility in Scaling Security Architecture

SpyCloud can’t afford to have data held up in processing — once all their data comes in, it needs to be processed immediately so their security detections fire in real-time. Cribl’s Reference Architecture played a very important role in onboarding their sources and getting things to operate seamlessly.

There are times when Ryan and his team get little to no advance notice of a new product or customer, so there may not be much time to add to their logging pipeline. Without Cribl Stream, planning and execution may take weeks or months. But the right tools and a properly designed architecture allow them to scale up in minutes, if not automatically.

Splitting Up Worker Groups

Spycloud separates worker groups based on data volume workflow and as a way to mitigate risk. Instead of having one large worker group, they have a separate one on the internet with open ports, so they’re able to fail small and manage their blast radius. It’s good practice to split up your worker groups not only by load, but also by connection type and according to your security needs.

When I asked Ryan if he was concerned about the management overhead of having a bunch of worker groups, he compared the experience to his days as a Splunk admin. Setting up different indexer clusters was a nightmare because maintenance efforts only scaled linearly. With worker groups, there’s one interface to manage everything. Ryan can copy settings by cloning a worker group, or add and remove pipelines from different worker groups — all from one interface.

He sums it up quite nicely:

“The biggest win for us with Cribl Stream is that we can upgrade everything from one single pane of glass. I don’t have to go out and plan a 12-hour overnight weekend upgrade of my indexers. I just click upgrade in that worker group, and it happens.” – Ryan Saunders, Manager of Security Operations at SpyCloud

Taking Advantage of Cribl Edge

Ryan and the team at SpyCloud also have Cribl Edge deployed as a log collection agent on all their servers. They have a dozen Edge fleets collecting data that’s sent back to Cribl Stream for processing. Managing fleets in Cribl Edge is just as easy as managing worker groups in Cribl Stream. They have the flexibility to control separate configurations for Windows, Linux, production tests, and other products within the same interface.

SpyCloud also uses Cribl Edge to consolidate logging agents within the organization because it’s easier for them to have one agent that multiple teams can control. His team sends the data they need for security to their own tools, and their DevOps teams can extract the operations data they need as well. Everyone can control and manage their data however they see fit, so it’s a win for everybody.

Best Practices for a Scalable Cribl Stream Deployment

Ryan has many years of experience using Cribl’s tools within different organizations and environments, so he has learned some very valuable lessons along the way. His first deployment involved trying to run Kubernetes in a large environment with one giant worker group — so he quickly learned about the importance of splitting them up.

You want to be able to do this easily, especially in highly regulated environments. Multinational organizations may not be able to commingle data or send it across national borders. Companies processing healthcare data have strict requirements for handling PII. Even if you don’t fall into either of these categories today, business growth or regulatory requirements might change that, so you’ll need to be able to adjust quickly to split certain data out.

Taking advantage of auto-scaling has also proven beneficial for Ryan, and everyone can take advantage of it — just don’t forget to create limits. You want to avoid scaling up until an AWS region explodes, so you don’t wake up one night and find 1000 Kubernetes nodes running because something went sideways. Explaining that bill won’t be much fun the next day.

Watch the full livestream to see more on how SpyCloud uses Cribl Stream and Cribl Edge to streamline the onboarding process and get more visibility and insights from their business data. You’ll also learn how to use the Cribl Reference Architectures as a starting point for a scalable deployment so you can reduce administrative time and free up your team to focus on critical security and data analysis tasks.