Cribl Sources: Amazon S3 Collector vs. S3 Input (SQS) - Which one is right for you?

Last edited: November 7, 2025

You’ve got logs landing in Amazon S3 — maybe CloudTrail, maybe Palo Alto Firewall, maybe CrowdStrike FDR as examples. The question that inevitably comes up is:

“Should we use an Amazon S3 Collector or an Amazon S3 Source (via SQS) to get them into Cribl Stream?”

It’s a great question, and like many things in the security and observability space, the answer depends on how you want to work with your data.

Let’s break it down.

What’s an Amazon S3 Collector Source?

Think of the Amazon S3 Collector Source as a batch retrieval engine. It’s perfect when you need to reach back in time and grab a chunk of data from S3 — say, “get me three hours of logs from noon yesterday for host==abcd.”

It doesn’t care about real-time ingestion or message queues. You tell it what timeframe and what filters you want, and it goes and gets them, either on demand or on a schedule.

Ideal when you:

Need to rehydrate historical logs (for example, for a retroactive investigation)
Want to replay old data for validation, benchmarking, or pipeline testing
Work with data that’s organized neatly by date or prefix (e.g., YYYY/MM/DD/hh/mm/...)
Don’t need continuous ingestion, just periodic snapshots

What’s the Amazon S3 Source (with SQS)?

The Amazon S3 Source takes a more event-driven approach. It uses SQS notifications to track new objects as they’re written to S3 and then ingests those files automatically.

In other words: it’s a “pull as you go” model — continuous, near-real-time ingestion without having to define time ranges or run manual jobs.

Ideal when you:

Want to ingest new data continuously as it lands in S3
Are collecting logs from many sources that drop files at unpredictable intervals
Don’t want to waste time or compute scanning the bucket for already-processed data
Need high efficiency and low duplication risk

Wrap up

At the end of the day, both options deliver the same goal of getting your data out of Amazon S3 and into Cribl Stream. The real difference comes down to how you work and what your priorities are. If your team regularly needs to go back in time, replay logs for investigations, or validate data flows after a configuration change, the Amazon S3 Collector is ideal. It gives you the control to pull exactly what you need when you need it.

If your focus is on keeping data flowing in real time, the Amazon S3 Source with SQS is built for that continuous pace. It automatically keeps up as new logs land in Amazon S3, without the manual work of defining time ranges or worrying about overlap. It is a set-and-forget option that keeps data fresh and consistent across your environment.

In many cases, the best answer is not one or the other but both. Use the Amazon S3 Collector when you need to look back or recover lost data, and use the Amazon S3 Source with SQS when you need nonstop ingestion. Cribl Stream gives you the flexibility to mix and match depending on your workload so you can build a data pipeline that fits your team instead of forcing your team to fit the pipeline.

Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl’s suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs.

We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Previous articleNext article