Aiohttp with custom DNS servers, Unbound and Docker
Offload your Python aoihttp application by adding caching DNS resolvers to your local system.
Using aiohttp looks so easy, but it is not. It's confusing. The 'Client Quickstart' documentation begins with the following:
Note Don’t create a session per request. Most likely you need a session per application which performs all requests together. More complex cases may require a session per site, e.g. one for Github and other one for Facebook APIs. Anyway making a session for every request is a very bad idea. A session contains a connection pool inside. Connection reusage and keep-alives (both are on by default) may speed up total performance. |
Hmmm ... ok ... repeat please ...
Anyway, the problem: I must check many different sites, and I also want to use custom DNS servers. This means a session per site. I do not know which sites, I just get a list of Urls. So we go for a session per Url. And everything is using Docker.
In this post we feed the aiohttp AsyncResolver with the IP addresses of two Unbound caching DNS resolvers, Cloudflare or Quad9, running on our local system. As always my development system is Ubuntu 22.04.
The Python application
Below is our (incomplete) Python application. It uses aiohttp to check sites (Urls). The script is running using a Python Docker container. I am not going to bore you here with setting up a Docker Python image and container.
Note that we create an AsyncResolver for every TCPConnector for every session using the IP addresses of Cloudflare or Quad9.
# check_urls.py
import asyncio
import aiodns
import aiohttp
import logging
import os
import socket
import sys
def get_logger(
console_log_level=logging.DEBUG,
file_log_level=logging.DEBUG,
log_file=os.path.splitext(__file__)[0] + '.log',
):
logger_format = '%(asctime)s %(levelname)s [%(filename)-30s%(funcName)30s():%(lineno)03s] %(message)s'
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
if console_log_level:
# console
console_handler = logging.StreamHandler(sys.stdout)
console_handler.setLevel(console_log_level)
console_handler.setFormatter(logging.Formatter(logger_format))
logger.addHandler(console_handler)
if file_log_level:
# file
file_handler = logging.FileHandler(log_file)
file_handler.setLevel(file_log_level)
file_handler.setFormatter(logging.Formatter(logger_format))
logger.addHandler(file_handler)
return logger
logger = get_logger(file_log_level=None)
async def check_url(task_number, url, nameservers=None):
logger.debug(f'[{task_number}] url = {url}, nameservers = {nameservers}')
resolver = None
if nameservers:
resolver = aiohttp.resolver.AsyncResolver(nameservers=nameservers)
connector=aiohttp.TCPConnector(
limit=1,
use_dns_cache=True,
ttl_dns_cache=300,
family=socket.AF_INET,
resolver=resolver
)
async with aiohttp.ClientSession(
connector=connector,
# if we want to reuse the connector with other sessions,
# we must not close it: connector_owner=False
connector_owner=True,
) as session:
async with session.get(
url,
) as client_response:
logger.debug(f'[{task_number}] status = {client_response.status}')
logger.debug(f'[{task_number}] url = {client_response.url}')
logger.debug(f'[{task_number}] content_type = {client_response.headers.get("Content-Type", None)}')
logger.debug(f'[{task_number}] charset = {client_response.charset}')
async def main():
logger.debug(f'()')
dns_cloudflare = ['1.1.1.1', '1.0.0.1']
dns_quad9 = ['9.9.9.9', '149.112.112.112']
sites = [
('http://www.example.com', dns_cloudflare),
('http://www.example.org', dns_quad9)
]
tasks = []
for task_number, site in enumerate(sites):
url, nameservers = site
task = asyncio.create_task(check_url(task_number, url, nameservers))
tasks.append(task)
for task in tasks:
await task
logger.debug(f'ready')
asyncio.run(main())
Problem: Directly connected to remote DNS servers
Although the above works it has some problems. If we check a lot of Urls then we are firing a lot of (separate) requests to the DNS servers of Cloudflare or Quad9.
We can re-use the TCPConnector, e.g. by creating a pool of TCPConnectors, and use the DNS caching of the connectors. This is a big improvement but it still is far from perfect because our connectors remain 'directly' connected to the outside world (via the resolver).
Solution: Local caching DNS servers
We can do better by running one or more caching DNS servers on our local system, and feeding the AsyncResolvers with the IP addresses of our caching DNS servers.
Caching DNS server: Unbound
There are a lot of Docker DNS server images and I selected 'Unbound DNS Server Docker Image', see links below. Why? It is easy to use, and by default it forwards queries to a remote DNS server, Cloudflare. A nice feature is that we can use DNS over TLS (DoT). This means we shield the requests from (ISP) tracking.
Because we want more than one local DNS server, we first copy some configuration files outside the container. In the directory where we start the DNS server, we create a new directory:
my_conf
Then we start the DNS server:
docker run --name=my-unbound mvance/unbound:1.17.0
And in another terminal, we copy some files from inside the container to our system:
mkdir my_conf
docker cp my-unbound:/opt/unbound/etc/unbound/forward-records.conf my_conf
docker cp my-unbound:/opt/unbound/etc/unbound/a-records.conf my_conf
Stop the DNS server by hitting 'CTRL-C'.
I created the following docker-compose.yml file:
version: '3'
services:
unbound_cloudflare_service:
image: "mvance/unbound:1.17.0"
container_name: unbound_cloudflare_container
networks:
- dns
volumes:
- type: bind
read_only: true
source: ./my_conf/forward-records.conf
target: /opt/unbound/etc/unbound/forward-records.conf
- type: bind
read_only: true
source: ./my_conf/a-records.conf
target: /opt/unbound/etc/unbound/a-records.conf
restart: unless-stopped
networks:
dns:
external: true
name: unbound_dns_network
volumes:
mydata:
We are connecting from the Python application Docker container to the DNS server container using Docker network. This means that there is no need to specify the ports in docker-compose.yml file. Publishing no ports means better security. To create the Docker network 'unbound_dns_network':
docker network create unbound_dns_network
To start the DNS server:
docker-compose up
Check if the DNS server is working
For this I use 'netshoot: a Docker + Kubernetes network trouble-shooting swiss-army container', see links below. When we start it, we also connect to the 'unbound_dns_network':
docker run --rm -it --net=unbound_dns_network nicolaka/netshoot
Then we use 'dig' to check if our DNS server is working.
Note that we are refering to the Docker-compose service name, 'unbound_cloudflare_service', here:
dig @unbound_cloudflare_service -p 53 google.com
Result:
---
; <<>> DiG 9.18.13 <<>> @unbound_cloudflare_service -p 53 google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55895
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;google.com. IN A
;; ANSWER SECTION:
google.com. 300 IN A 142.250.179.174
;; Query time: 488 msec
;; SERVER: 172.17.10.3#53(unbound_cloudflare_service) (UDP)
;; WHEN: Wed Jul 12 16:32:56 UTC 2023
;; MSG SIZE rcvd: 55
The 'ANSWER SECTION' gives the IP address. The query time is 488 milliseconds. If we run the command again, we get the same result but the query time will be (close to) 0. Note that the IP address of our local DNS server service is also shown:
172.17.10.3
Anyway, our local DNS server is working!
We can repeat these steps for Quad9. Create a new directory and copy the files from the Cloudflare setup.
Edit the docker-compose.yml file and replace 'cloudflare' by 'quad9'.
Edit the 'forward-records.conf' file:
- Comment the lines for Cloudflare
- Decomment the lines for Quad9
And take it up!
Using our local DNS servers in our Python script
This is the last step. We must do two things:
- Add the 'unbound_dns_network' to our Python Docker container.
- Translate the names 'unbound_cloudflare_service', and 'unbound_quad9_service' to IP addresses.
Adding the 'unbound_dns_network' to our Python Docker container is easy. We do this the same way as we did in the Unbound docker-compose.yml file.
We already know the IP addresses of our local DSN server services, but they can change. Instead of hard-coding the IP addresses, we translate the service names to IP addresses in our Python script, by changing the following code in our script, see above, from:
dns_cloudflare = ['1.1.1.1', '1.0.0.1']
dns_quad9 = ['9.9.9.9', '149.112.112.112']
to:
dns_cloudflare = [socket.gethostbyname('unbound_cloudflare_service')]
dns_quad9 = [socket.gethostbyname('unbound_quad9_service')]
Of course this only works if the local DNS server services are up-and-running.
Now all DNS requests from our Python application are routed to our local DNS servers!
Summary
We wanted to behave friendly and not overload remote DNS servers with too many connections. We also wanted to remove the direct connection of our Python application to remote DNS servers. We achieved this by spinning up local DNS server services and connected our Python script to them, using Docker network.
We created an extra depency, local DNS servers, but also removed a depency. If a remote DNS server is down (for some time), our Python application keeps working.
Links / credits
Docker Container Published Port Ignoring UFW Rules
https://www.baeldung.com/linux/docker-container-published-port-ignoring-ufw-rules
netshoot: a Docker + Kubernetes network trouble-shooting swiss-army container
https://github.com/nicolaka/netshoot
Unbound
https://nlnetlabs.nl/projects/unbound/about
Unbound DNS Server Docker Image
https://github.com/MatthewVance/unbound-docker
Recent
Most viewed
- Using PyInstaller and Cython to create a Python executable
- Reducing page response times of a Flask SQLAlchemy website
- Using Python's pyOpenSSL to verify SSL certificates downloaded from a host
- Connect to a service on a Docker host from a Docker container
- Using UUIDs instead of Integer Autoincrement Primary Keys with SQLAlchemy and MariaDb
- SQLAlchemy: Using Cascade Deletes to delete related objects