angle-uparrow-clockwisearrow-counterclockwisearrow-down-uparrow-leftatcalendarcard-listchatcheckenvelopefolderhouseinfo-circlepencilpeoplepersonperson-fillperson-plusphoneplusquestion-circlesearchtagtrashx

AIOHTTP: Detecting DNS timeout with custom nameservers

Using 'Client tracing' we can generate variables dns_cache_miss and host_resolved to determine if an exception was raised in the resolver.

27 July 2022 Updated 27 July 2022
In Async
post main image
https://www.pexels.com/nl-nl/@fotios-photos

When using AIOHTTP to fetch data from a web page on the internet you are probably using a timeout to limit the maximum waiting time.

If you are using a domain name then the IP Address must be resolved. Without using a separate resolver you are dependent on the underlying operating system. Any errors propagate to your application.

I did not want this dependency and specify the nameservers myself, using the AsyncResolver and TCPConnector.

Now assume a timeout occurs. How do we know if the timeout is caused by the resolver or by the remote server connection?

The problem

The AIOHTTP request consists of two parts:

  • Resolve DNS
  • Receive data
    |----------------- request ----------------->|

    |---- resolve DNS --->|---- receive data --->|

    |                     |                      |
----+---------------------+----------------------+---> t
  start

With AIOHTTP we can specify a maximum time for the request. When this time expires, a TimeoutError exception is raised.

But this is for the whole request. There is no separate exception for a resolver timeout. Again, how do we know if the timeout is caused by the DNS resolver or by the remote server?

Client tracing to the rescue

Fortunately we can follow the execution flow of a request by attaching listener coroutines to the signals provided by the TraceConfig instance, which can be used as a parameter for the ClientSession constructor.

If we look at the AIOHTTP 'Tracing Reference', see links below, and zoom in on 'Connection acquiring' and 'DNS resolving', then we see that we need the following coroutines:

  • on_request_start
  • on_dns_cache_miss
  • on_dns_resolvehost_end

When a timeout occurs and 'on_dns_cache_miss' was called and 'on_dns_resolvehost_end' was not called, then we can assume the timeout is caused by the resolver.

To get the coroutines running, we create a TraceConfig object and attach the coroutines. All we do in these coroutines is measuring the time since the start of the request and store this in our 'trace_result' dictionary, passed around as the context, with initial values None:

trace_results = {
	'on_dns_cache_hit': None,
	'on_dns_cache_miss': None,
	'on_dns_resolvehost_end': None,
}

The code

When an exception is raised, we first check if the error is a TimeoutError. If this is the case we check if the exception occurred in the resolver using 'cache_miss' and 'host_resolved'. Choose either the working resolver with nameservers of quad9.net, or just use some IP address.

import asyncio
import aiohttp
from aiohttp.resolver import AsyncResolver
import socket
import sys
import traceback


class Runner:

    def __init__(self):
        pass

    async def on_request_start(self, session, trace_config_ctx, params):
        trace_config_ctx.start = asyncio.get_event_loop().time()

    async def on_dns_cache_miss(self, session, trace_config_ctx, params):
        elapsed = asyncio.get_event_loop().time() - trace_config_ctx.start
        trace_config_ctx.trace_request_ctx['on_dns_cache_miss'] = elapsed

    async def on_dns_resolvehost_end(self, session, trace_config_ctx, params):
        elapsed = asyncio.get_event_loop().time() - trace_config_ctx.start
        trace_config_ctx.trace_request_ctx['on_dns_resolvehost_end'] = elapsed

    async def get_trace_config(self):
        trace_config = aiohttp.TraceConfig()
        trace_config.on_request_start.append(self.on_request_start)
        trace_config.on_dns_cache_miss.append(self.on_dns_cache_miss)
        trace_config.on_dns_resolvehost_end.append(self.on_dns_resolvehost_end)
        trace_results = {
            'on_dns_cache_hit': None,
            'on_dns_cache_miss': None,
            'on_dns_resolvehost_end': None,
        }
        return trace_config, trace_results

    async def run(self, url):
        # quad9.net dns server
        resolver = AsyncResolver(nameservers=['9.9.9.9', '149.112.112.112'])
        # ip address of www.example.com, using this causes a resolver timeout
        resolver = AsyncResolver(nameservers=['93.184.216.34'])

        connector = aiohttp.TCPConnector(
            family=socket.AF_INET,
            resolver=resolver,
        )

        trace_config, trace_results = await self.get_trace_config()

        error = None
        e_str = None
        try:
            async with aiohttp.ClientSession(
                connector=connector,
                timeout=aiohttp.ClientTimeout(total=.5),
                trace_configs=[trace_config],
            ) as session:
                async with session.get(
                    url,
                    trace_request_ctx=trace_results,
                ) as client_response:
                    html = await client_response.text()

        except Exception as e:
            print(traceback.format_exc())
            error = type(e).__name__
            e_str = str(e)
            print('url = {}'.format(url))
            print('error = {}'.format(type(e).__name__))
            print('e_str = {}'.format(e_str))
            print('e.args = {}'.format(e.args))

        finally:
            print('url = {}'.format(url))
            for k, v in trace_results.items():
                print('trace_results: {} = {}'.format(k, v))

        dns_cache_miss = True if trace_results['on_dns_cache_miss'] else False
        host_resolved = True if trace_results['on_dns_resolvehost_end'] else False

        if error == 'TimeoutError':
            if dns_cache_miss and not host_resolved:
                error = 'DNSTimeoutError'

        print('error = {}, e_str = {}'.format(error, e_str))

if __name__=='__main__':
    # 'fast' website
    url = 'http://www.example.com'
    # 'slow' website
    url = 'http://www.imdb.com'
    runner = Runner()
    loop = asyncio.get_event_loop()
    loop.run_until_complete(runner.run(url))

More resolver errors

There are more resolver errors and AIOHTTP is not helping us here as well, for example:

  • Could not contact DNS servers
  • ConnectionRefusedError

The first has certainly to do with the resolver, but the ConnectionRefusedError, can originate from both actions in the request.

Summary

I want to know whether a raised exception is from the resolver or another part of the request. If it is the resolver, then I can mark this resolver (temporary) invalid and use another one.

I was hoping the AIOHHTP exceptions would give me all the information, but that appeared not to be true. Maybe one day it will be implemented, but for the moment I must do the dirty work myself. Besides that, AIOHTTP is a very nice package!

Links / credits

AIOHTTP - Client exceptions
https://docs.aiohttp.org/en/stable/client_reference.html?highlight=exceptions#client-exceptions

AIOHTTP - Tracing Reference
https://docs.aiohttp.org/en/stable/tracing_reference.html

Monitoring network calls in Python using TIG stack
https://calendar.perfplanet.com/2020/monitoring-network-calls-in-python-using-tig-stack

Leave a comment

Comment anonymously or log in to comment.

Comments

Leave a reply

Reply anonymously or log in to reply.