AIOHTTP: Detecting DNS timeout with custom nameservers
Using 'Client tracing' we can generate variables dns_cache_miss and host_resolved to determine if an exception was raised in the resolver.
When using AIOHTTP to fetch data from a web page on the internet you are probably using a timeout to limit the maximum waiting time.
If you are using a domain name then the IP Address must be resolved. Without using a separate resolver you are dependent on the underlying operating system. Any errors propagate to your application.
I did not want this dependency and specify the nameservers myself, using the AsyncResolver and TCPConnector.
Now assume a timeout occurs. How do we know if the timeout is caused by the resolver or by the remote server connection?
The problem
The AIOHTTP request consists of two parts:
- Resolve DNS
- Receive data
|----------------- request ----------------->|
|---- resolve DNS --->|---- receive data --->|
| | |
----+---------------------+----------------------+---> t
start
With AIOHTTP we can specify a maximum time for the request. When this time expires, a TimeoutError exception is raised.
But this is for the whole request. There is no separate exception for a resolver timeout. Again, how do we know if the timeout is caused by the DNS resolver or by the remote server?
Client tracing to the rescue
Fortunately we can follow the execution flow of a request by attaching listener coroutines to the signals provided by the TraceConfig instance, which can be used as a parameter for the ClientSession constructor.
If we look at the AIOHTTP 'Tracing Reference', see links below, and zoom in on 'Connection acquiring' and 'DNS resolving', then we see that we need the following coroutines:
- on_request_start
- on_dns_cache_miss
- on_dns_resolvehost_end
When a timeout occurs and 'on_dns_cache_miss' was called and 'on_dns_resolvehost_end' was not called, then we can assume the timeout is caused by the resolver.
To get the coroutines running, we create a TraceConfig object and attach the coroutines. All we do in these coroutines is measuring the time since the start of the request and store this in our 'trace_result' dictionary, passed around as the context, with initial values None:
trace_results = {
'on_dns_cache_hit': None,
'on_dns_cache_miss': None,
'on_dns_resolvehost_end': None,
}
The code
When an exception is raised, we first check if the error is a TimeoutError. If this is the case we check if the exception occurred in the resolver using 'cache_miss' and 'host_resolved'. Choose either the working resolver with nameservers of quad9.net, or just use some IP address.
import asyncio
import aiohttp
from aiohttp.resolver import AsyncResolver
import socket
import sys
import traceback
class Runner:
def __init__(self):
pass
async def on_request_start(self, session, trace_config_ctx, params):
trace_config_ctx.start = asyncio.get_event_loop().time()
async def on_dns_cache_miss(self, session, trace_config_ctx, params):
elapsed = asyncio.get_event_loop().time() - trace_config_ctx.start
trace_config_ctx.trace_request_ctx['on_dns_cache_miss'] = elapsed
async def on_dns_resolvehost_end(self, session, trace_config_ctx, params):
elapsed = asyncio.get_event_loop().time() - trace_config_ctx.start
trace_config_ctx.trace_request_ctx['on_dns_resolvehost_end'] = elapsed
async def get_trace_config(self):
trace_config = aiohttp.TraceConfig()
trace_config.on_request_start.append(self.on_request_start)
trace_config.on_dns_cache_miss.append(self.on_dns_cache_miss)
trace_config.on_dns_resolvehost_end.append(self.on_dns_resolvehost_end)
trace_results = {
'on_dns_cache_hit': None,
'on_dns_cache_miss': None,
'on_dns_resolvehost_end': None,
}
return trace_config, trace_results
async def run(self, url):
# quad9.net dns server
resolver = AsyncResolver(nameservers=['9.9.9.9', '149.112.112.112'])
# ip address of www.example.com, using this causes a resolver timeout
resolver = AsyncResolver(nameservers=['93.184.216.34'])
connector = aiohttp.TCPConnector(
family=socket.AF_INET,
resolver=resolver,
)
trace_config, trace_results = await self.get_trace_config()
error = None
e_str = None
try:
async with aiohttp.ClientSession(
connector=connector,
timeout=aiohttp.ClientTimeout(total=.5),
trace_configs=[trace_config],
) as session:
async with session.get(
url,
trace_request_ctx=trace_results,
) as client_response:
html = await client_response.text()
except Exception as e:
print(traceback.format_exc())
error = type(e).__name__
e_str = str(e)
print('url = {}'.format(url))
print('error = {}'.format(type(e).__name__))
print('e_str = {}'.format(e_str))
print('e.args = {}'.format(e.args))
finally:
print('url = {}'.format(url))
for k, v in trace_results.items():
print('trace_results: {} = {}'.format(k, v))
dns_cache_miss = True if trace_results['on_dns_cache_miss'] else False
host_resolved = True if trace_results['on_dns_resolvehost_end'] else False
if error == 'TimeoutError':
if dns_cache_miss and not host_resolved:
error = 'DNSTimeoutError'
print('error = {}, e_str = {}'.format(error, e_str))
if __name__=='__main__':
# 'fast' website
url = 'http://www.example.com'
# 'slow' website
url = 'http://www.imdb.com'
runner = Runner()
loop = asyncio.get_event_loop()
loop.run_until_complete(runner.run(url))
More resolver errors
There are more resolver errors and AIOHTTP is not helping us here as well, for example:
- Could not contact DNS servers
- ConnectionRefusedError
The first has certainly to do with the resolver, but the ConnectionRefusedError, can originate from both actions in the request.
Summary
I want to know whether a raised exception is from the resolver or another part of the request. If it is the resolver, then I can mark this resolver (temporary) invalid and use another one.
I was hoping the AIOHHTP exceptions would give me all the information, but that appeared not to be true. Maybe one day it will be implemented, but for the moment I must do the dirty work myself. Besides that, AIOHTTP is a very nice package!
Links / credits
AIOHTTP - Client exceptions
https://docs.aiohttp.org/en/stable/client_reference.html?highlight=exceptions#client-exceptions
AIOHTTP - Tracing Reference
https://docs.aiohttp.org/en/stable/tracing_reference.html
Monitoring network calls in Python using TIG stack
https://calendar.perfplanet.com/2020/monitoring-network-calls-in-python-using-tig-stack
Most viewed
- Using Python's pyOpenSSL to verify SSL certificates downloaded from a host
- Using PyInstaller and Cython to create a Python executable
- Reducing page response times of a Flask SQLAlchemy website
- Connect to a service on a Docker host from a Docker container
- SQLAlchemy: Using Cascade Deletes to delete related objects
- Using UUIDs instead of Integer Autoincrement Primary Keys with SQLAlchemy and MariaDb