You’ve got logs landing in Amazon S3 — maybe CloudTrail, maybe Palo Alto Firewall, maybe CrowdStrike FDR as examples. The question that inevitably comes up is:
“Should we use an Amazon S3 Collector or an Amazon S3 Source (via SQS) to get them into Cribl Stream?”
It’s a great question, and like many things in the security and observability space, the answer depends on how you want to work with your data.
Let’s break it down.
What’s an Amazon S3 Collector Source?
Think of the Amazon S3 Collector Source as a batch retrieval engine. It’s perfect when you need to reach back in time and grab a chunk of data from S3 — say, “get me three hours of logs from noon yesterday for host==abcd.”
It doesn’t care about real-time ingestion or message queues. You tell it what timeframe and what filters you want, and it goes and gets them, either on demand or on a schedule.
Ideal when you:
Need to rehydrate historical logs (for example, for a retroactive investigation)
Want to replay old data for validation, benchmarking, or pipeline testing
Work with data that’s organized neatly by date or prefix (e.g., YYYY/MM/DD/hh/mm/...)
Don’t need continuous ingestion, just periodic snapshots
What’s the Amazon S3 Source (with SQS)?
The Amazon S3 Source takes a more event-driven approach. It uses SQS notifications to track new objects as they’re written to S3 and then ingests those files automatically.
In other words: it’s a “pull as you go” model — continuous, near-real-time ingestion without having to define time ranges or run manual jobs.
Ideal when you:
Want to ingest new data continuously as it lands in S3
Are collecting logs from many sources that drop files at unpredictable intervals
Don’t want to waste time or compute scanning the bucket for already-processed data
Need high efficiency and low duplication risk
Wrap up
At the end of the day, both options deliver the same goal of getting your data out of Amazon S3 and into Cribl Stream. The real difference comes down to how you work and what your priorities are. If your team regularly needs to go back in time, replay logs for investigations, or validate data flows after a configuration change, the Amazon S3 Collector is ideal. It gives you the control to pull exactly what you need when you need it.
If your focus is on keeping data flowing in real time, the Amazon S3 Source with SQS is built for that continuous pace. It automatically keeps up as new logs land in Amazon S3, without the manual work of defining time ranges or worrying about overlap. It is a set-and-forget option that keeps data fresh and consistent across your environment.
In many cases, the best answer is not one or the other but both. Use the Amazon S3 Collector when you need to look back or recover lost data, and use the Amazon S3 Source with SQS when you need nonstop ingestion. Cribl Stream gives you the flexibility to mix and match depending on your workload so you can build a data pipeline that fits your team instead of forcing your team to fit the pipeline.







