From monolithic code to services with RabbitMQ and Pika
Use the examples in the Pika documentation to create your own RabbitMQ publisher and consumer classes.
This post is about using RabbitMQ in your Python application. If you are using Rabbit already, you will probably find nothing useful in this post. Why a post about RabbitMQ? Because I have an application using it.It is running for a year, and I thought to share my experiences.
In this post, we transform a monolithic application into services decoupled by queues. Also, I will talk about services instead of microservices, you decide how 'big' your service is. There are no code samples in this post.
Monolithic application example: Contact form
The example we will use is the contact form process. A user of our website enters the contact form data. This is processed, an email is prepared and the email is send to an external SMTP-server.
+-----------------+ smtp user | - process form | server ------->| - prepare email |-------> | - send email | +-----------------+
Why would you want to split your code in services?
A number of pros and cons of splitting your code are:
- Stricter separation between the main services (functions) of your code.
- Services can be stopped and restarted without loss of data.
- Performance. By decoupling, a service becomes available sooner.
- Asynchronous. A service will wait for another service to become available, without impacting other services.
- Increased complexity.
- Increased development time.
- Increased difficulty to test.
Summary of the pros is: a faster and more flexible system. Summary of the cons is: (much) more work.
There are many articles on the internet mentioning the 'microservices hell', but there are as many articles mentioning the 'monolithic hell'. I suggest you do some research yourself. Here is a nice article: 'You Don’t Need Microservices', see links below. Also read the comments.
The services after splitting
Here are the services, after we split our code.
+---------+ +---------+ +---------+ smtp user | process | | prepare | | send | server ------->| form |---->| email |---->| email |-------> | | | | | | +---------+ +---------+ +---------+
Not very special, this is just a well-structured monolith.
RabbitMQ and Pika
To decouple the services, we use queues. RabbitMQ is a message broker, and supports message queues, multiple messaging protocols, and more. To communicate with RabbitMQ in Python we use the Python package Pika. Pika is a pure-Python implementation of the AMQP 0-9-1 protocol.
Queues in RabbitMQ are FIFO, first-in-first-out. With RabbitMQ, you can define exchanges and queues. Typically, you define an exchange and then bind the queues to this exchange.
A publisher is used to put items in the queue. A consumer is used to get items from the queue. You can find examples of publishers and consumers in the Pika documentation.
+-----------+ +-------------+ +-----------+ | | | RabbitMQ | | | ------->| publisher |---->| - exchange |---->| consumer |-------> | | | - queue | | | +-----------+ +-------------+ +-----------+
Publishers and consumers can be synchronous or asynchronous.
In many cases we will have a:
- Synchronous publisher. This means we send an item to the RabbitMQ queue and wait for the confirmation that the item has been placed in the queue, before continuing doing other things.
- Asynchronous consumer. This means our receive function is called by RabbitMQ as soon as there is an item available in the queue.
RabbitMQ and acknowledgments
RabbitMQ has many options, the most important to understand are acknowledgments. There are two types of acknowledgments.
- Positive acknowledgment
- Negative acknowledgment
(Synchronous) Publisher acknowledgments
When the publisher (our code) sends an item to RabbitMQ, RabbitMQ responds by sending a:
- Positive acknowledgment
The item has been received by RabbitMQ
- Negative acknowledgment
Something went wrong. Maybe the exchange or queue did not exist, the routing key was wrong.
(Asynchronous) Consumer acknowledgments
Our consumer is called by RabbitMQ with the next item from the queue. After processing the item, the consumer (our code) responds by sending a:
- Positive acknowledgment
The item has been received and processed, I am ready for the next item
- Negative acknowledgment
The item has been received but something went wrong during processing. RabbitMQ will NOT remove the item from the queue, and try again.
Decoupling the services with queues
To decouple our services, we put a queue between the 'process form' service and the 'prepare email' service:
publisher consumer +---------+ +-------+ +---------+ user | process | | | | prepare | ------->| form |---->| queue |---->| email |----> | | | | | | +---------+ +-------+ +---------+
Service 'process form' acts as the publisher. It contains code to send items to the queue. Service 'prepare email' acts as the consumer. It contains code to receive items to the queue.
The output queue is 'part' of the publisher
Now assume the two services are running on two servers. Because we want our services to be stopped and restarted without loss of data, it is important to understand where we put the queue between the services. On the publisher server, or on the consumer server.
The answer is of course that the queue must be on the same server as the 'process form' service. Then, when the server of the 'prepare email' service is down, the 'process form' service can keep running. Users can keep sending contact forms, the data is temporarily stored in the queue. Once the 'prepare email' service is up again, it processes all data waiting in the queue.
server 1 server 2 - - - - - - - - - - - - - - - - - - - - - - - | | publisher consumer +---------+ +-------+ | | +---------+ user | process | | | | prepare | ------->| form |---->| queue |------------>| email |----> | | | | | | +---------+ +-------+ | | +---------+ | | - - - - - - - - - - - - - - - - - - - - - - -
In the above example, we used servers. But the same applies when you have everything on a single server, but have two different Docker-Compose projects, one for the 'process form' service and one for the 'prepare email' service. In this case we add RabbitMQ to the 'process form' Docker-Compose project.
A consumer and publisher at the same time
It makes perfect sense when a service is a consumer and publisher at the same time. In our example, both the 'prepare email' service and the 'send email' service, act as consumer and publisher.
consumer consumer + + publisher publisher publisher +---------+ +-------+ +---------+ +-------+ +---------+ user | process | | | | prepare | | | | send | ------->| form |---->| queue |---->| email |---->| queue |---->| email |----> | | | | | | | | | | +---------+ +-------+ +---------+ +-------+ +---------+
Normal operation for the 'prepare email' service is:
- Consumer is called with next item from the input-queue.
- Item is processed.
- New data is published to the output-queue.
- A publishing positive acknowledgment is received.
- The consumer sends a postive acknowledgment to the input-queue.
- RabbitMQ removes the item from the input-queue.
- RabbitMQ calls the consumer with the next item from the input-queue, once available.
If anything goes wrong in the consumer process, for example failure to publish to the output-queue, the consumer can send a negative acknowledgment to RabbitMQ. In this case, RabbitMQ will not remove the item from the input-queue, but will try the same item again.
Getting started with RabbitMQ
Getting started with RabbitMQ is overwhelming. Application developers want to use recipes, not reading an endless amount of documents. I started by installing Docker RabbitMQ with the Management Plugin. The Management Plugin allows you to create exchanges and queues, to see the number of items in queues and much more.
Then I tried the examples of the Pika documentation. Once you understand these, you can use this code to build your own publisher and consumer classes, to be used in your application.
Asynchronous publishing is complex, I did not use this so far. Synchronous publishing with RabbitMQ is not very fast especially when you also want the data to persist. And you probably want the data in the queue to persist, because with this option, the data does not get lost when you stop RabbitMQ or on a system crash.
You can speed up synchronous publishing by sending a list of items to the queue instead of a single item. But this depends on your application.
I do not use a single fault-tolerant RabbitMQ instance, but instead spin up several RabbitMQ instances together with the services. I am using RabbitMQ with multiple Docker-Compose projects, it is just one of the services.
RabbitMQ is memory hungry. My application is not very complex or demanding, but one RabbitMQ instance uses about 200MB according to Docker Stats.
Some people on the internet complain about high CPU usage. I noticed some CPU usage spikes. RabbitMQ stores a lot of data, which appear not to go away after a crash/kill. After removing this, the CPU spikes went away.
I also use the Shovel Plugin to send data from one RabbitMQ instance to another, using Docker Network. This also works fine.
You can use your application to configure RabbitMQ exchanges and queues. I do not do this, but configure the exchanges and queues using the Management Plugin, then export the new configuration file, and copy it to the RabbitMQ configuration file.
Have been using Dockerized RabbitMQ now for over a year and it is running very reliable. Never had to restart because something was wrong.
Is splitting your monolithic into services with RabbitMQ worth it? Yes, but, unless you are a big company, do not exaggerate because it also adds a lot of complexity. As always, this pain gets less once you start understanding more of RabbitMQ. Finally, RabbitMQ should be in the toolbox of every developer.
Links / credits
You Don’t Need Microservices
- Collect and block IP addresses with ipset and Python
- How to cancel tasks with Python Asynchronous IO (AsyncIO)
- Run a Docker command inside a Docker Cron container
- Creating a Captcha with Flask, WTForms, SQLAlchemy, SQLite
- Multiprocessing, file locking, SQLite and testing
- Sending messages to Slack using chat_postMessage
- Flask RESTful API request parameter validation with Marshmallow schemas
- Using UUIDs instead of Integer Autoincrement Primary Keys with SQLAlchemy and MariaDb
- Using Python's pyOpenSSL to verify SSL certificates downloaded from a host
- Connect to a service on a Docker host from a Docker container
- Using PyInstaller and Cython to create a Python executable
- SQLAlchemy: Using Cascade Deletes to delete related objects