The Critical Role of Data in Cybersecurity: Why Incomplete Data Weakens Your Overall Program

By

Last edited: April 5, 2023

Learn about the foundational issue many security programs face with incomplete or poor-quality data, and the strategies to solve them

In this live stream, CDW’s Brenden Morgenthaler and I discuss a foundational issue with many security programs — having the right data to detect issues and make fast decisions. Data drives every facet of security, so bad or incomplete data weakens your overall program. Watch the video or continue reading below to learn about these issues and the strategies we use to solve security’s data problem.

As the amount of data, tools, systems, and clouds continue to increase, the threat to enterprises’ security posture has risen as well. It simply doesn’t matter what kind of SIEM you have anymore — even if it’s as good as Splunk or its alternatives. If you don’t have the right data, you’ll run into problems.

The Problem with Dropping Data Sources Due to Budget Constraints

Budgets can no longer keep up with the amount of data that needs to be processed, so organizations are forced to get by without collecting and analyzing everything they should. As a result, security teams are forced to turn off data sources that could provide them valuable insights into credible threats.

One client that Brenden and the team at CDW worked with got a firsthand look at the effects this has during a pen test they performed. They tested some common detections and were surprised to find that their red team engineer was able to completely compromise the domain and gain full control — simply because they had turned off all audit events on Kerberos.

Situations like this are much too common and are just the tip of the iceberg —which is why it’s so critical to have visibility into all areas of your network. You also need someone who knows all the different attack vectors so they can help you set up your infrastructure to avoid them.

Poorly Formatted but Crucial Data Sources Eat Up Licensing Costs

Data sources like Powershell, Sysmon, and Windows DNS debug logs are generally more difficult to work with. In the past, you’d have to rely on the heavy forwarder on the Splunk side or a ton of manual fine-tuning of things on the source side to handle the flood of data coming in from all these different systems and formats.

This is where a tool like Cribl Stream can help — you can turn on a data source, send it to Stream, and then route to null by default. Then you can pull out specific streams and send them to your other tools as necessary. Other data won’t need to be processed but will need to be kept for regulatory compliance issues, so you can keep it offline in raw, unmodified form in a data lake or send it to an object storage like an S3 bucket for as long as you need. Then if you need to recall it to investigate a data breach, you can use the replay feature in Stream to ingest it back through to whatever source you want without having to use your license or processing power.

You can also use Cribl Stream to take advantage of EDR data. We see a lot of companies make enormous investments in EDR tools that also produce very accurate data, especially around assets — but then they don’t take that data and put it into their SIEM because it’s just too expensive. With Stream, you can take the majority of that EDR data and route it to a data lake, and then get value from the other 10-15% by routing it to your SIEM in the exact format you need it.

Data Volume Management Strategies to Get the Best Results for Security

To get the most value out of your data for security, you need to know what regulatory compliance you have to meet — what type of logs do you have to retain, and for how long? It also helps to have a good understanding of all the tools you have, what systems are in place, and what the limits are on your ingestion licenses.

From there, securing your perimeter is the best place to start. You want your authentication sources, MFA sources, and VPN set up first, and then you can start bringing in all your security tools. The Mitre Attack framework is incredibly helpful to figure out what vertical you’re in and see the common threat actors or attacks right you might encounter so you can decide which sources and services you’ll need visibility from.

Having had a long career in IT, I became used to constraints and compromise — which is why I was caught off guard when I first saw Cribl Stream back before I joined the company. Not having to make concessions on which data to pull in, where I could send it, what format it was in, or what my vendor would support was unexpected, to say the least. This choice and control is giving security teams the ability to have faster detections and even better responses to cyber threats.

Be sure to watch the full conversation between Brenden and I, and connect with us in our Cribl Slack community if you have any questions or want to continue the discussion!

Ed Bailey

Ed Bailey is a passionate engineering advocate with more than 20 years of experience in instrumenting a wide variety of applications, operating systems and hardware for operations and security observability. He has spent his career working to empower users with the ability to understand their technical environment and make the right data backed decisions quickly.

View all posts

Cribl, the AI Platform for Telemetry, empowers enterprises to manage and analyze telemetry for both humans and agents with no lock-in, no data loss, no compromises. Trusted by organizations worldwide, including half of the Fortune 100, Cribl gives customers the choice, control, and flexibility to build what’s next.

We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Previous articleNext article