Cribl puts your IT and Security data at the center of your data management strategy and provides a one-stop shop for analyzing, collecting, processing, and routing it all at any scale. Try the Cribl suite of products and start building your data engine today!
Learn more ›Evolving demands placed on IT and Security teams are driving a new architecture for how observability data is captured, curated, and queried. This new architecture provides flexibility and control while managing the costs of increasing data volumes.
Read white paper ›Cribl Stream is a vendor-agnostic observability pipeline that gives you the flexibility to collect, reduce, enrich, normalize, and route data from any source to any destination within your existing data infrastructure.
Learn more ›Cribl Edge provides an intelligent, highly scalable edge-based data collection system for logs, metrics, and application data.
Learn more ›Cribl Search turns the traditional search process on its head, allowing users to search data in place without having to collect/store first.
Learn more ›Cribl Lake is a turnkey data lake solution that takes just minutes to get up and running — no data expertise needed. Leverage open formats, unified security with rich access controls, and centralize access to all IT and security data.
Learn more ›The Cribl.Cloud platform gets you up and running fast without the hassle of running infrastructure.
Learn more ›Cribl.Cloud Solution Brief
The fastest and easiest way to realize the value of an observability ecosystem.
Read Solution Brief ›AppScope gives operators the visibility they need into application behavior, metrics and events with no configuration and no agent required.
Learn more ›Explore Cribl’s Solutions by Use Cases:
Explore Cribl’s Solutions by Integrations:
Explore Cribl’s Solutions by Industry:
April 24 | 10am PT / 1pm ET
3 ways to fast-track your data lake strategy without being a data expert
REGISTER ›Try Your Own Cribl Sandbox
Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›Get inspired by how our customers are innovating IT, security and observability. They inspire us daily!
Read Customer Stories ›Sally Beauty Holdings
Sally Beauty Swaps LogStash and Syslog-ng with Cribl.Cloud for a Resilient Security and Observability Pipeline
Read Case Study ›Experience a full version of Cribl Stream and Cribl Edge in the cloud.
Launch Now ›Transform data management with Cribl, the Data Engine for IT and Security
Learn More ›Cribl Corporate Overview
Cribl makes open observability a reality, giving you the freedom and flexibility to make choices instead of compromises.
Get the Guide ›Stay up to date on all things Cribl and observability.
Visit the Newsroom ›Cribl’s leadership team has built and launched category-defining products for some of the most innovative companies in the technology sector, and is supported by the world’s most elite investors.
Meet our Leaders ›Join the Cribl herd! The smartest, funniest, most passionate goats you’ll ever meet.
Learn More ›Whether you’re just getting started or scaling up, the Cribl for Startups program gives you the tools and resources your company needs to be successful at every stage.
Learn More ›Want to learn more about Cribl from our sales experts? Send us your contact information and we’ll be in touch.
Talk to an Expert ›August 11, 2019
In this post we’ll walk through a memory leak pattern we recently encountered when using Javascript Promises. For those unfamiliar, a Promise is a handle to a value that is generally computed asynchronously. One of the most useful features of Promises is that they can be chained to express a series of asynchronous operations to be performed in some order, for example
asyncFetch(url) .then(content => findLinks(content)) .then(links => insertIntoDb(links)) ...
The problem
Cribl LogStream is a stream processing engine for machine data, as such it is responsible for receiving, processing and sending data to destination systems. Communication failures with destination systems is something that need to be handled gracefully, a few reasons why downstream systems can be unavailable include: network issues, system upgrades or outages etc. There are a few ways we can deal with downstream systems not being available: stop processing incoming data (ie backpressure), drop data being sent to the unavailable system or queue it for later delivery. By default Cribl LogStream chooses to backpressure, while constantly retrying to send data to the downstream system(s) – and this is where the fun begins:
The code for retrying sends looked something like this (greatly simplified for readability):
function sendWithRetry(data, dest) { return dest.send(data) .catch(error => delayMs(100).then(() => sendWithRetry(data, dest))); // retry send after 100ms }
If the destination is available dest.send(data)
would resolve, otherwise it would reject and the code above would retry the send 100ms later, ad infinitum. We want the Promise returned by sendWithRetry()
to resolve only once the data has been successfully sent to the destination (so that we can stop reading and thus apply backpressure). In the pattern above we’re chaining Promises in the case of error, which would eventually resolve once destination becomes available. However, if dest.send()
rejects for a prolonged time, the above code can result in infinitely long Promise chains and since none of the Promises in the chain can be garbage collected until the chain resolves this situation ultimately results in RAM exhaustion. In our case we had 1000s of data streams attempting to be delivered to an unavailable system – ie 1000s of such Promise chains extending infinitely resulting in a process crash due to GC not being able to free up enough heap space.
Solution
The solution is to break the Promise chain and it looks like this:
function sendWithRetry(data, dest) { return dest.send(data) .catch(error => new Promise(resolve => { function retryIt() { delayMs(100).then(() => dest.send(data)).then(() => resolve()) .catch(err => retryIt()); // DO NOT return a Promise here! } retryIt(); })); }
Since retryIt()
does not return a Promise the chain is never extended. In the reject condition a new Promise chain is created, which would either resolve or be thrown away and retried. The above solution also satisfies the requirement for sendWithRetry()
to resolve only once the data has been successfully sent to the destination.
How/why does it work?
Let’s assume that the destination is down for a prolonged period of time, in which case the first try to send will fail and we’ll enter the retry logic. Let’s see how that works by breaking it down into two parts:
1. delayMs(100).then(() => dest.send(data)).then(() => resolve())
this code tries to send again after 100ms of delay, and on success it will resolve the Promise
2. catch(err => retryIt());
this code will call the function itself again, remember that we’ve delayed the execution by at least 100ms
Now, since we’re not returning anything from the retryIt()
and not chaining with previous calls, any previous invocations of the function can be cleaned up – ie we are not creating a chain and thus no real recursion for that matter. Another way to think of it would be imagine every time send fails we schedule a timeout to retry, while on success we don’t
Another way to express the above using async/await
would be (thx helloimsomeone):
async function sendWithRetry(data, dest) { for (;;) { try { await dest.send(data); break; } catch { await delayMs(100); } } }
The lesson here is to be on the look out for infinitely chaining of Promises and paying special attention to recursive functions that return Promises.
If you’ve enjoyed reading this and are looking to join a kick ass engineering team drop us a line at hello@cribl.io – we’re hiring!
Ryan Conway Apr 9, 2024
Perry Correll Apr 4, 2024