April 1, 2021
This is a short blog post about how we used AppScope to identify and resolve a DNS-related problem reported by one of our customers … and it is a fact that it’s always a DNS problem, except when it isn’t :). The problem description went something like this. The customer has a LogStream deployment to ingest data from S3. Everything works, but they are seeing the deployment make about 1,000 DNS queries per second. Their DNS server admin shared that LogStream was trying to resolve just one domain:
sqs.<region>.amazonaws.com. (The quick solution for this customer was to run use nscd to cache the DNS responses at the OS level.)
With DNS logs at hand we’d be ready to tackle this problem. The solution could be simple enough. Just cache the DNS requests for some period of time at the application level and move on to sexier problems, right? However, the idea of caching DNS requests at the application layer felt wrong; but asking customers to install and maintain nscd felt even more wrong.
We started with reproducing the problem internally, and hit an immediate challenge. How to get visibility into the DNS requests that LogStream is making? Here’s where AppScope came in super handy. We simply scoped it and were able to see not only that LogStream was making a ton of DNS requests to resolve the SQS endpoints, but more importantly it was establishing new TCP connections and making HTTP requests just as frequently.
Armed with this information, we started to look one step further and found out that the AWS SDK does not enable HTTP connection keep-alive by default. This resulted in all API requests establishing new connections – WTF?!? The fix for this problem was even simpler than trying to implement DNS caching at the application layer.
The ascii chart below shows the number of DNS requests over time
Having the level of data granularity provided by AppScope, we got better visibility than we would with only DNS logs. We were able to identify and fix the actual root cause of the problem, rather than just addressing a symptom.