We are increasingly asked if a Cribl Stream instance can be used to send to another Stream instance. The answer is a definite yes! While there are various reasons for wanting to send data from one Stream instance to another, let’s walk through just one example: collecting data in one AWS region using Stream, while sending (using compression to minimize cost) to another instance in a different region.
We will look at a comparison of various supported protocols for accomplishing this, as well as provide our recommendation. We will also discuss data flow architecture options that may be of interest for multiple Stream instances.
As with receiving data from any other source, the question of whether the data can be received at a destination is simply a matter of finding a common protocol that both the source and the destination support. In this scenario, Stream is both the sender and receiver therefore we have to find a protocol that Stream can use as a method for both receiving data and sending it. The following protocols currently satisfy this criterion:
While any of these five protocols are acceptable for sending data from one Stream instance to another, one protocol stands out as the best option using the features below as criteria for ranking purposes.
Name | TLS | Compression | Load Balancing | Persistent Queueing (PQ) | Cribl native | Lightweight |
---|---|---|---|---|---|---|
TCP JSON | Yes | Yes | No | Yes | Yes | Yes |
syslog | Yes | No | No | Yes | No | Yes |
Elastic API | Yes | Yes | No | Yes | No | No |
Splunk HEC | Yes | Yes | No | Yes | No | No |
Splunk LB | Yes | No | Yes | Yes | No | Yes |
If you need to transfer from one LogStream instance to another within AWS, then TCP JSON is preferred over Splunk HEC and Elastic because it is lighter-weight and a native Cribl protocol, while still saving on inter-region data transfer costs via compression. And you can utilize AWS’s load balancing capabilities until load balancing is natively supported. (That feature is on the roadmap!) Without an external load balancer, the TCP JSON destination type will create a TCP socket, and remain bound to that host and port until the connection is broken. So you’ll want a load balancer between the source and the destination LogStream instances, to properly distribute across the destination’s LogStream workers.
What about the other options?
Syslog is lightweight, but Stream does not currently support compression or load balancing with syslog. Of the six features used for comparison in the table, syslog provides three of them.
Elastic and Splunk HEC are essentially equal to each other in specific functionality using the six feature criteria and both rank equal to syslog by providing three features.
Splunk LB is the only protocol that supports native load balancing, but without compression, your interregional AWS costs may be prohibitive. It surpasses syslog, Elastic API, and Splunk HEC as far as offering the most features with four.
So what’s the takeaway here? TCP JSON is a Cribl protocol that supports TLS, PQ, and compression, but without the overhead of HTTP or the Splunk TCP protocol. So it stands out as the best choice, despite lacking a load-balancing capability for now.
Astute readers may ask “But what about Amazon S3, Amazon Kinesis, Apache Kafka, or Azure Event Hubs?” Stream also supports those as both Sources and Destinations, and you are welcome to use those, if one or more of them suit your needs better than TCP JSON or another protocol mentioned above.
However, the protocols highlighted above are those for which Stream supports direct host-to-host communications between the Stream Worker Nodes. These others involve an intermediary. This intermediary will cause one or more of the following: latency, extra (
Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.