x
August-PR-Images_PR Image - 1920x1005

How AppScope helped resolve a DNS problem

April 1, 2021
Written by
Ledion Bitincka's Image

Ledion Bitincka is Cribl's CTO and co-founder. Ledion has over a decade of engineering ex... Read Moreperience developing next-generation technologies and leading the launch of enterprise products. He was the Advanced Development Architect at Splunk where he introduced Search-Time Schema and led the design of Hunk and SmartStore. Read Less

Categories: Engineering, Learn

This is a short blog post about how we used AppScope to identify and resolve a DNS-related problem reported by one of our customers … and it is a fact that it’s always a DNS problem, except when it isn’t :). The problem description went something like this. The customer has a LogStream deployment to ingest data from S3. Everything works, but they are seeing the deployment make about 1,000 DNS queries per second. Their DNS server admin shared that Cribl Stream was trying to resolve just one domain: sqs.<region>.amazonaws.com. (The quick solution for this customer was to run use nscd to cache the DNS responses at the OS level.)

With DNS logs at hand we’d be ready to tackle this problem. The solution could be simple enough. Just cache the DNS requests for some period of time at the application level and move on to sexier problems, right? However, the idea of caching DNS requests at the application layer felt wrong; but asking customers to install and maintain nscd felt even more wrong.

We started with reproducing the problem internally, and hit an immediate challenge. How to get visibility into the DNS requests that LogStream is making? Here’s where AppScope came in super handy. We simply scoped it and were able to see not only that LogStream was making a ton of DNS requests to resolve the SQS endpoints, but more importantly it was establishing new TCP connections and making HTTP requests just as frequently.

Get started by downloading AppScope 

Armed with this information, we started to look one step further and found out that the AWS SDK does not enable HTTP connection keep-alive by default. This resulted in all API requests establishing new connections – WTF?!?  The fix for this problem was even simpler than trying to implement DNS caching at the application layer. 

The ascii chart below shows the number of DNS requests over time

Having the level of data granularity provided by AppScope, we got better visibility than we would with only DNS logs. We were able to identify and fix the actual root cause of the problem, rather than just addressing a symptom.

If you want to read more about DNS resolution please go here or here. To test drive AppScope, explore our online sandbox, AppScope Fundamentals.

.
Blog
Feature Image

Mastering Tail Sampling for OpenTelemetry: Cost-Effective Strategies with Cribl

Read More
.
Blog
Feature Image

The Stream Life Podcast 110: Microsoft Azure + Cribl – Better together

Read More
.
Blog
Feature Image

Rethinking Security: Why Organizations are Flocking to Microsoft Sentinel

Read More
pattern

Try Your Own Cribl Sandbox

Experience a full version of Cribl Stream and Cribl Edge in the cloud with pre-made sources and destinations.

box

So you're rockin' Internet Explorer!

Classic choice. Sadly, our website is designed for all modern supported browsers like Edge, Chrome, Firefox, and Safari

Got one of those handy?