Products
Product Portfolio

Cribl puts your IT and Security data at the center of your data management strategy and provides a one-stop shop for analyzing, collecting, processing, and routing it all at any scale. Try the Cribl suite of products and start building your data engine today!
Learn more ›

Evolving demands placed on IT and Security teams are driving a new architecture for how observability data is captured, curated, and queried. This new architecture provides flexibility and control while managing the costs of increasing data volumes.
Read white paper ›

Cribl Stream

Cribl Stream is a vendor-agnostic observability pipeline that gives you the flexibility to collect, reduce, enrich, normalize, and route data from any source to any destination within your existing data infrastructure.
Learn more ›

Vodafone Case Study

Vodafone Dials up Business Insights with Cribl Stream
Read Case Study ›

Cribl Edge

Cribl Edge provides an intelligent, highly scalable edge-based data collection system for logs, metrics, and application data.
Learn more ›

SpyCloud Edge Story

Listen to how SpyCloud uses Cribl Edge at scale.
Watch Video ›

Cribl Search

Cribl Search turns the traditional search process on its head, allowing users to search data in place without having to collect/store first.
Learn more ›

How Cribl Search Can Save You From Drowning in a Deluge of Data
Read Blog ›

Cribl Lake

Cribl Lake is a turnkey data lake solution that takes just minutes to get up and running — no data expertise needed. Leverage open formats, unified security with rich access controls, and central access to all IT and security data.
Learn more ›

Navigating the future of IT and Security Data management white paper
Read white paper ›

Cribl.Cloud

The Cribl.Cloud platform gets you up and running fast without the hassle of running infrastructure.
Learn more ›

Cribl.Cloud Solution Brief

The fastest and easiest way to realize the value of an observability ecosystem.
Read Solution Brief ›

Cribl Copilot

Cribl Copilot gets your deployments up and running in minutes, not weeks or months.
Learn more ›

Cribl Copilot

Your Trusted AI Advisor for Deploying, Configuring & Troubleshooting.
Read blog ›

AppScope

AppScope gives operators the visibility they need into application behavior, metrics and events with no configuration and no agent required.
Learn more ›

Sandbox

Launch an AppScope Sandbox today!
Launch Now ›
Solutions
Use Cases

Explore Cribl’s Solutions by Use Cases:

Supercharge Security Insights ›

Accelerate Cloud Migration ›

Avoid Vendor Lock-in ›

Agent Consolidation ›

Slash Storage Costs ›

Free Up Space for High-Value Data ›

Route From Any Source To Any Destination ›

Immediate Access to Archived Data ›

Replay Data from Low-Cost Storage ›

Reduce Log Volume & Pay Less for Infrastructure ›
Integration

Explore Cribl’s Solutions by Integrations:

Amazon ›

CrowdStrike ›

Elastic ›

Exabeam ›

Google ›

Microsoft ›

Splunk ›

Wiz ›

View All Integrations ›

Seamless Integrations for Your Observability Data
Learn More ›
Industries

Explore Cribl’s Solutions by Industry:

AIOps ›

Financial Services ›

Healthcare ›

Managed Security Services ›

Manufacturing and Logistics ›

Media and Entertainment ›

Public Sector ›

Retail ›
Resources
Resources

Resource Library ›

Documentation ›

Guides ›

AppScope Docs ›

Blog ›

Glossary ›

Podcasts ›

Telemetry 101

Understanding the Basics of Telemetry and Its Benefits
Learn More ›
Events & Webinars

Events ›

Webinars ›

CriblCon24
Watch On-Demand ›

July 31 | 10am PT / 1pm ET

Navigating the Data Current Report: Transforming IT & Security Operations in 2024
Register ›
Learning

Try the Sandboxes ›

Self Guided Trials ›

Cribl University ›

Cribl Community ›

Cribl Curious Forum ›

What is Observability? ›

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›
Tools & Pricing

Download Library ›

Past Releases ›

Pricing Plans ›

Stream ROI Calculator ›

Download Library

Download Cribl’s suite of products for free to get started.
Download ›
Customers
Customer Stories

Get inspired by how our customers are innovating IT, security and observability. They inspire us daily!
Read Customer Stories ›

Sally Beauty Holdings

Sally Beauty Swaps LogStash and Syslog-ng with Cribl.Cloud for a Resilient Security and Observability Pipeline
Read Case Study ›
Customer Experience

Support & Success ›

Professional Services ›

Service Delivery Partners ›

Documentation ›

AppScope Docs ›

Professional Services

Check out our new Professional Services offering.
Learn More ›
Learning

Try the Sandboxes ›

Self Guided Trials ›

Cribl University ›

Cribl Community ›

Cribl Curious Forum ›

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›
Company
About Cribl

Transform data management with Cribl, the Data Engine for IT and Security
Learn More ›

Cribl Corporate Overview

Cribl makes open observability a reality, giving you the freedom and flexibility to make choices instead of compromises.
Get the Guide ›

Cribl Newsroom

Stay up to date on all things Cribl and observability.
Visit the Newsroom ›

Press Releases

Read our most recent press releases.
Recent Press Releases ›

Leadership

Cribl’s leadership team has built and launched category-defining products for some of the most innovative companies in the technology sector, and is supported by the world’s most elite investors.
Meet our Leaders ›

Careers

Join the Cribl herd! The smartest, funniest, most passionate goats you’ll ever meet.
Learn More ›

Cribl Named to the Inc. 5000 List of Fastest Growing Private Companies
Learn More ›

Cribl for Startups

Whether you’re just getting started or scaling up, the Cribl for Startups program gives you the tools and resources your company needs to be successful at every stage.
Learn More ›

Contact Us

Want to learn more about Cribl from our sales experts? Send us your contact information and we’ll be in touch.
Talk to an Expert ›

Try Cribl Talk to an expert

Serverless data forwarding to Cribl for AWS Services

November 27, 2018

Categories: Engineering, Learn

Back To Blogs

Organizations with AWS footprint have many options to get data in to their log and event management platforms. So did we. Up until recently we were using a pull based solution supplied from one of our vendors. Data collection worked, until it didn’t and we were starting to run into problems:

We had to operate an instance to pull the data with. This added to our overall infrastructure and most importantly to our ops & maintenance cost. Ain’t nobody got time for yet another server in 2018!
Scaling out was starting to become an issue as the solution was not distributed or clustered. No clustering also means cursoring, resiliency and failover have to provided by external means or custom built. Some would say about scaling “That’s a good problem to have” but now that we have it, now what?
The solution offered very little data controlling capabilities. We needed volume control, data shaping, masking, filtering, enriching, sampling and routing capabilities to meet our business and technology requirements.
CloudWatch Logs was getting pretty expensive, at $0.5/GB ingested, for the latencies and the querying & controlling capabilities, or lack thereof rather, that it offers. (We’ll probably do a full blog post about our findings here)

Is there greener grass somewhere else?

We started searching for other options. We went on to explore and analyze alternative data collection solutions and a few patterns emerged:

They are typically either push or pull based. We’re trying to stay away from pull-based ones because of the problems laid out in 1) above.
Some are vendor supported and some open source. For the most part they’re both fine with us as long as they don’t have any of the issues above.
Some were simple or cost-effective some weren’t. Some required long running instances and we’re looking to minimize the cost of maintenance over time.
Nearly all of them provided very light capabilities with respect to controlling data.

F*ck it, we’ll do it live!

It was clear that most solutions were not going to check all the boxes for us – they were sub-optimal in a few fronts. They either lacked controls or were too complex or not cost-effective enough or some combination thereof. So, after we added support for HTTP inputs with our v1.1 release, we decided to implement our own solution based on AWS Lambda and Cribl to collect these sources from S3 driven S3 Event Notifications:

Amazon CloudFront Access Logs – used to track access to our public web assets.
AWS CloudTrail Logs – used to track AWS account/API activity.
Amazon VPC Flow Logs – captures network flow metadata in/out of our VPCs.
Elastic Load Balancing Access Logs – capture request information to our LBs

With AWS Lambda we get:

The simplicity that we’re looking for. No instances to operate and maintain.
The effectiveness of an event-driven solution. Runs only when needed.
And the auto scaling capabilities that we need. Supports spikes/bursts just fine.

Cribl LogStream provides for:

Data masking and volume control capabilities to meet our business needs.
- Sensitive data is obfuscated and high noise sources are trimmed down to acceptable levels.
Data shaping, enriching, and routing capabilities to meet our technology needs.
- Some of the events are custom-shaped, converted into metrics and some are routed to required destinations.

If you’d like to see how we did it, please read on.

For the impatient: GitHub repos with the Lambda function, or the Cribl S3 Collector Serverless App alternative, and Cribl pipeline configurations that we built and use to normalize, filter and sample our AWS events.

Step 1. Configure Cribl to receive events over HTTPS

We configured our deployment to accept events over HTTPS using the Cribl Event API.

While in Cribl, click Sources from the top menu and select HTTP on the left.
Enter an Input name and set Host to 0.0.0.0 (i.e. listen on all interfaces)
Supply a Port number that the firewalls allow inbound traffic.
Enter a Shared secret string. This will be the authentication token that Lambda will use via itsCRIBL_AUTH environment variable (see below).
Toggle TLS Settings and fill in the fields as required. Save!

Step 2. Configuring the Lambda function

How does it work: The Lambda function, available on GitHub, or as a Serverless App, will first get invoked by S3 Event Notifications. Next, it check the objects passed by S3 against patterns that match the following supported services logs: CloudTrail, VPC Flowlogs, CloudFront Access Log and ELB/ALB Access Logs. CloudTrail events are in JSON format and will be handled as is. For the rest, fields will be extracted given the information present in the file header, or via a list manually. Event will then be sent to Cribl. As you notice very light processing is done in Lambda. The goal is to get the events as fast and as efficiently as possible and express a more thorough processing logic in Cribl instead.

Option A: Deploying the function from scratch:

Get the Lambda function. It’s open source under an MIT license.
In your AWS Lambda console create a function via Author from scratch
Give it a name and select Node.js 8.10 as your runtime
Next, select an IAM Role with read permissions on S3 Buckets from which this function will receive events.
Copy the function code inline into the Editor.
Allocate 256MB of RAM and a Timeout of 10s.

Option B: Deploying the function as a serverless app:

In your AWS Lambda console create a function via AWS Serverless App Repo
- Or, click here to get started.
Search for cribl and select Cribl-S3-Log-Collector. Proceed as instructed.
Select an IAM Role with read permissions on S3 Buckets from which this function will receive events.

In both cases, there are two environment variables that you’ll need to supply:

CRIBL_URL: This is the receiving endpoint where the function will deliver events and it should look like this: https://<host>:<port>/cribl/_bulk
CRIBL_AUTH: this defines the token string that Lambda will use to authenticate against Cribl receiving endpoint above.

Configuring S3 to trigger Lambda:

In your S3 Console, go the bucket(s) where your logs are stored.
- Note that triggers can also be configured in Designer section in Lambda, too.
Select the Properties tab and locate Events under Advanced settings
Click on Events, then click on + Add Notification.
Enter a name and select All objects create events under Events
Enter a prefix that indicates the location of the logs of interest in this bucket.
Select Lambda Function under Send to.
In dropdown find and select the Lambda function from above. Save!
Repeat this for each log type/prefix.

After configuring S3 Events the Lambda function screen looks like this. Note the S3 trigger box on the left hand-side.

3. Configuring Cribl Pipelines

Once you have Lambda sending data into Cribl, proceed to install and configure the pipelines.

Get them on our GitHub repo.
Go to Pipelines and click on + Add Pipeline. Click on Advanced Mode.
Select Import to import a pipeline or paste its JSON config. Change name as necessary.
Select a Destination from the top right. Save!

Before you associate pipelines with Routes, please check their functions and change them to meet your needs.

Let’s take a look at one of the pipelines: VPC FLOW LOGS

VPC flows are extremely voluminous but we all want to eat the cake and have it, too 🙂. We wanted to keep all the data but we wanted to be selective with what we ingested. We also want day to day querying to be as fast as possible. This pipeline allows us to keep all data in S3 and ingest ~60% less of the total events resulting in very significant storage savings and incredible search speed improvements via |tstats.

Function #1 (Eval) sets a sourcetype and index. This is used in our downstream system (Splunk) for searching, tagging/eventtyping and access control.
Function #2 (Sampling) samples ACCEPT events at a 5:1 ratio. These are the events that contribute to the majority of the volume.
Function #3 (Comment) comment function.
Function #4 (Eval) normalizes field names by mapping native field names, extracted in Lambda, to Splunk Common Information Model name.
Function #5 function (Eval) remove VPC Flowlogs native fields names and leaves event/_raw intact.
Using last two Eval functions:
- To send to Splunk as-is (as delivered by Lambda), disable both Function #4 and #5 Evals.
- To send to Splunk with index-time CIM fields, enable Function #4 Eval and disable Function #5. Index-time CIM fields allow for fast |tstats querying.
- To send to Splunk only raw events and extract search-time CIM fields, disable Function #4 Eval and enable theFunction #6 .
Note: you can add more functions to this pipeline. E.g., you can Filter/Drop out events from certain IPs (e.g., known chatty IPs), or Sample on protocol numbers, or port numbers for even greater storage savings and performance improvements.

Done!

Using Cribl with Lambda is an effective and scalable way to effortlessly collect and process AWS logs. Please check us out at Cribl.io and get started with your deployment.

If you’d like more details on installation or configuration, see our documentation or join us in Slack #cribl, tweet at us @cribl_io, or contact us via hello@cribl.io. We’d love to help you!

Enjoy it! — The Cribl Team

Get Cribl LogStream Now!

Blog

Preventing Friction With an Impactful Security Champions Program

Blog

From Necessity to Opportunity: The Customer Push for SIEM Options

Blog

Securing the Foundation of Cribl Copilot

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.

Launch Now

Product Portfolio

Cribl Stream

Cribl Edge

Cribl Search

Cribl Lake

Cribl.Cloud

Cribl Copilot

AppScope

Use Cases

Integration

Industries

Resources

Events & Webinars

Learning

Tools & Pricing

Download Library

Customer Stories

Customer Experience

Learning

Try Your Own Cribl Sandbox

About Cribl

Cribl Newsroom

Leadership

Careers

Cribl for Startups

Contact Us

Serverless data forwarding to Cribl for AWS Services

Is there greener grass somewhere else?

F*ck it, we’ll do it live!

Step 1. Configure Cribl to receive events over HTTPS

Step 2. Configuring the Lambda function

3. Configuring Cribl Pipelines

Done!

Get Cribl LogStream Now!

Blog

Preventing Friction With an Impactful Security Champions Program

Blog

From Necessity to Opportunity: The Customer Push for SIEM Options

Blog

Securing the Foundation of Cribl Copilot

Try Your Own Cribl Sandbox

So you're rockin' Internet Explorer!