In this post we’ll walk through a memory leak pattern we recently encountered when using Javascript Promises. For those unfamiliar, a Promise is a handle to a value that is generally computed asynchronously. One of the most useful features of Promises is that they can be chained to express a series of asynchronous operations to be performed in some order, for example

asyncFetch(url)
 .then(content => findLinks(content)) 
 .then(links => insertIntoDb(links))
...

The problem

Cribl LogStream is a stream processing engine for machine data, as such it is responsible for receiving, processing and sending data to destination systems. Communication failures with destination systems is something that need to be handled gracefully, a few reasons why downstream systems can be unavailable include: network issues, system upgrades or outages etc. There are a few ways we can deal with downstream systems not being available: stop processing incoming data (ie backpressure), drop data being sent to the unavailable system or queue it for later delivery. By default Cribl LogStream chooses to backpressure, while constantly retrying to send data to the downstream system(s) – and this is where the fun begins:

The code for retrying sends looked something like this (greatly simplified for readability):

function sendWithRetry(data, dest) {
  return dest.send(data)
    .catch(error => delayMs(100).then(() => sendWithRetry(data, dest))); // retry send after 100ms
}

If the destination is available¬† dest.send(data) would resolve, otherwise it would reject and the code above would retry the send 100ms later, ad infinitum. We want the Promise returned by sendWithRetry()to resolve only once the data has been successfully sent to the destination (so that we can stop reading and thus apply backpressure). In the pattern above we’re chaining Promises in the case of error, which would eventually resolve once destination becomes available.¬† However, if dest.send() rejects for a prolonged time, the above code can result in infinitely long Promise chains and since none of the Promises in the chain can be garbage collected until the chain resolves this situation ultimately results in RAM exhaustion. In our case we had 1000s of data streams attempting to be delivered to an unavailable system – ie 1000s of such Promise chains extending infinitely resulting in a process crash due to GC not being able to free up enough heap space.

Solution

The solution is to break the Promise chain and it looks like this:

function sendWithRetry(data, dest) { 
  return dest.send(data) 
   .catch(error => new Promise(resolve => {
      function retryIt() {
        delayMs(100).then(() => dest.send(data)).then(() => resolve())
          .catch(err => retryIt());
        // DO NOT return a Promise here!
      }
      retryIt();
   }));
 }

Since retryIt()does not return a Promise the chain is never extended. In the reject condition a new Promise chain is created, which would either resolve or be thrown away and retried. The above solution also satisfies the requirement for sendWithRetry()to resolve only once the data has been successfully sent to the destination.

How/why does it work?

Let’s assume that the destination is down for a prolonged period of time, in which case the first try to send will fail and we’ll enter the retry logic. Let’s see how that works by breaking it down into two parts:

1. delayMs(100).then(() => dest.send(data)).then(() => resolve())

this code tries to send again after 100ms of delay, and on success it will resolve the Promise

2. catch(err => retryIt());

this code will call the function itself again, remember that we’ve delayed the execution by at least 100ms

Now, since we’re not returning anything from the retryIt() and not chaining with previous calls, any previous invocations of the function can be cleaned up – ie we are not creating a chain and thus no real recursion for that matter. Another way to think of it would be imagine every time send fails we schedule a timeout to retry, while on success we don’t

Another way to express the above using async/awaitwould be (thx helloimsomeone):

async function sendWithRetry(data, dest) {
  for (;;) {
    try {
      await dest.send(data);
      break;
    } catch {
      await delayMs(100);
    }
  }
}

The lesson here is to be on the look out for infinitely chaining of Promises and paying special attention to recursive functions that return Promises.

If you’ve enjoyed reading this and are looking to join a kick ass engineering team drop us a line at hello@cribl.io – we’re hiring!