Data Lakes and Beyond: Complementing the New AWS CloudTrail Lake Service With Cribl Stream

February 1, 2022
Categories: Engineering

AWS announced CloudTrail Lake on January 5th, 2022, as a fully-managed solution for storing and querying CloudTrail logs. At first glance, it is straightforward to set up, can be enabled for all your organization’s accounts with a radio button, and keeps data for up to seven years by default! It’s a huge time saver and headache eliminator for many, as getting CloudTrail from all organization accounts to a SIEM can be tedious and time-consuming. But all this comes with a cost. Per the AWS pricing guide, all examples referenced are for volumes generated over any given month. What does that translate to annually? If you process 1 TB/day of Cloudtrail data, that’s $403,080 per year. If you process 5 TB/day, that’s $1,167,360 per year. In addition, querying the daxta incurs additional costs, expected in today’s on-demand analytics platforms. These high costs might be justifiable if you’re storing the data for CloudTrail Lake’s upper limit of seven years. For companies in highly regulated industries, this makes a lot of sense.

In its initial incarnation at the time of this blog publishing, CloudTrail Lake is a turnkey solution with no integrations. Sending CloudTrail logs to a SIEM still requires existing methods such as writing to Amazon S3 or publishing via SNS/SQS. With S3 as the data lake, solutions such as AWS Athena can perform more robust analytics of the CloudTrail logs than the SQL querying initially in CloudTrail Lake.

While CloudTrail Lake is “the easy button,” its guardrails mandate the need to process CloudTrail logs separately. With Cribl Stream as your Observability & Security pipeline platform, you can route CloudTrail logs to a SIEM, a UEBA solution, and adopt S3 as your data lake for all observability and security data, not just CloudTrail logs. Additionally, Stream drastically helps reduce storage and query costs by choosing what to keep, what to transform, and what to eliminate. Stream can process CloudTrail logs through Kinesis Data Firehose or an SQS-based S3 source. Stream QuickConnect allows you to route CloudTrail logs to multiple destinations easily. Here’s QuickConnect in action:


With Stream, CloudTrail logs can be intelligently managed. Placing all CloudTrail events in a SIEM can be daunting and wasteful, especially with large event sizes and enormous EPS (events per second). The average CloudTrail event is 1,485 bytes in our environment, and the max event size is a whopping 8,900 bytes! UEBA and other analytics solutions will only care about select CloudTrail events. In the below illustration, the Stream pack for CloudTrail provides a starting point for pre-processing. Unwanted logs are filtered out with a drop function. Empty and unnecessary fields are filtered out through a parser function. Results are easily validated in the pane on the right, and results showing a 30% reduction statistics in the size of CloudTrail data.

When old CloudTrail logs need to be analyzed, Stream’s Replay function allows you to apply filters on time and metadata and forward the CloudTrail logs to the analytics platform of choice, regardless of how old they are.

Since Stream can write the data in JSON format, you could alternatively search the data in your observability lake using AWS Athena without having to ever replay the data.

With Cribl Stream, you can optimize your long-term data storage/analysis requirements, leverage Amazon S3 as your complete data lake, while also making your SIEM leaner and faster by filtering out noise and reshaping large events. You can get started for free with Stream. Download today or leverage Stream Cloud, which includes a hybrid option, so you can leverage a cloud-based control plane but keep the data plane on-premises.

The fastest way to get started with Cribl Stream is to sign-up at Cribl.Cloud. You can process up to 1 TB of throughput per day at no cost. Sign-up and start using Stream within a few minutes.

Cribl's Lookup Examples Pack

Learning by Example with Cribl’s New Lookup Examples Pack

Read More
Cribl Stream to Elastic

Sending Data to Elastic Security With Cribl Stream (And Making It Work With Elastic SIEM)

Read More
Air gap Cribl Stream

Cribl Stream + CDS: An Air Gapped Data Transfer Solution

Read More

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.