Using UUIDs instead of Integer Autoincrement Primary Keys with SQLAlchemy and MariaDb
UUID Primary Keys are 'universe unique' and have many advantages. Consider using them in your next project.
We all know the Integer Autoincrement Primary Key. Works great, much software is doing it this way. It is fast. But it also has limitations.
An alternative for the Integer Autoincrement Primary Key is the UUID (Universally Unique IDentifier) Primary Key. A UUID is a 128-bit number. It can be represented by a string, for example:
or a hex string (the same as above with the dash character removed):
or a binary value.
UUIDs are unique, there is no numbering involved. If you generate a UUID on your laptop in London and your colleague in New York generates a UUID on his laptop, then both UUIDs are unique, they can never be the same. There are several versions of UUIDs, here we are going to use version 4, also known as 'random' UUID. The probability of a collision is almost none.
Are UUID Primary Keys a good choice?
I believe there are enough reasons even for small projects. Everyone knows the problem when merging database records or tables. If your database tables use Integer Autoincrement Primary Keys this can be messy but with UUIDs it is a breeze.
Example#1: Blog with multiple offline authors
Assume you have this CMS / blog website and you have two persons, blog post writers, each working on a local copy, running on their laptops. Then how do you transfer the new records to the production system?
If your database uses Integer Autoincrement Primary Keys then you must be extremely careful. Probably you do this by exporting some records and then import them in your production database. You cannot import the IDs of the records, both writers can have different posts with identical IDs. It becomes more difficult when the blog posts refer to (new) records of a table holding the links/references for a Blog post. Probably you will write a script that deals with this.
If we would be using UUID Primary Keys on the other hand, we just export the new records from both laptops and import them in the production database. No conflicts, easy does it.
Example#2: Webshop with offline prepared product information
Assume you have a webshop and you have suppliers who deliver their prepared product information records with UUID Primary Keys. If your webshop already uses UUID Primary Keys this is very easy to import, no conversions.
Pros and cons of using a UUID as a primary key
- You can generate UUIDs everywhere
This means that you can create records without connecting to a database
- The UUIDs are unique across tables, databases, systems
This makes it easy to merge records from different tables, databases
- UUIDs make replication more easy
- Uses more memory and disk space, index tables get bigger
But who cares about this today?
- Slower for SELECT with big tables
But optimizations are possible
- Can be much slower for INSERT
It takes more time to recalculate the index, can be a serious problem
- More difficult to debug because of missing insert order
But you can add a DATETIME (microseconds) 'created_on' column and use this to sort.
UUIDs prevent exposure of business information (secrets)
An additional important advantage of using a UUID primary key instead of an Integer Autoincrement Primary Key is that you can worry less about exposing business information. Example: contacts are stored in a contact table. If this table has a Integer Autoincrement Primary Key an edit view could expose the user_id in the URL:
This means that you can guess the number of users of a website. If you use a UUID primary key then the user_id in the URL does not expose anything:
Of course it is still better never to expose any type of Primary Key. This may look trivial but can cause a lot of extra work.
Example: consider a contact record contains a Foreign Key for a city. The contact Foreign Key references the city Primary Key. If we do not want to expose the Primary Keys we can add a Surrogate Key column to every table and generate a Surrogate Key for every record we create. When editing the contact we must show a select / dropdown list with the available cities. We do not want the Primary Keys of the Cities exposed and select the Surrogate Keys instead. Then when the Contact form is submitted we must lookup the Primary Key of the City first using the submitted Surrogate Key value. Once we retrieved the City Primary Key we can update the Contact.
Python and UUIDs
Generating UUIDs in Python is easy:
>>> import uuid >>> uuid_obj = uuid.uuid4() >>> uuid_obj UUID('05390f2b-48e8-45c7-8ae4-08272dc69122') >>> str(uuid_obj) '05390f2b-48e8-45c7-8ae4-08272dc69122' >>> uuid_obj.hex '05390f2b48e845c78ae408272dc69122'
We can also convert the hex UUID back to a string UUID:
>>> uuid_obj2 = uuid.UUID(hex='05390f2b48e845c78ae408272dc69122', version=4) >>> str(uuid_obj2) '05390f2b-48e8-45c7-8ae4-08272dc69122'
Order of inserts
When using UUID primary keys we loose the order of inserts. In my model every table has a MariaDb DATETIME 'created_on' column meaning that the minimum resolution is 1 second. When the insert order is important we can increase the resolution of the 'created_on' column to microseconds by using DATETIME(6). If you really need the insert order you can add an extra Autoincrement column.
Using UUIDs for primary keys is not really speeding up queries. SELECT queries perform slower because of the string compare instead of the integer compare. INSERT queries can be really slow because for every insert the index must be rebuild with random numbers. There are optimizations possible but they also have drawbacks, like that the UUIDs are no longer 'random'.
SQLAlchemy, MariaDb, UUIDs and Alembic
I use SQLAlchemy and MariaDb so let's use a 32-character hex value UUID. PostgreSQL already supports UUIDs out-of-the-box. There is recipe on the SQLAlchemy site: 'Backend-agnostic GUID Type', see links below. I included this recipe (script), with a small modification, in my models.py file. Then I write my UUIDCity model as:
class UUIDCity(Base): __tablename__ = 'uuid_city' id = Column(GUID(), primary_key=True, default=uuid.uuid4) ...
This takes care of the generation of a UUID for a new record. I changed the script a little to make sure it returns a string value and not a GUID object. This makes it far more easy to use in queries. (But I may loose specific SQLAlchemy advantages?)
... def process_result_value(self, value, dialect): if value is None: return value else: # 20200328/PPM commented the next two lines to return the db id as string # if not isinstance(value, uuid.UUID): # value = uuid.UUID(value) return value
There is an Alembic problem here. When you run Alembic for the UUIDCity model:
alembic revision --autogenerate -m "some version text"
Alembic generates a versions file with:
def upgrade(): # ### commands auto generated by Alembic - please adjust! ### op.create_table('uuid_demo_city', sa.Column('id', shared.models.GUID(), nullable=False), sa.Column('name', sa.String(length=250), server_default='', nullable=True), sa.PrimaryKeyConstraint('id') ) op.create_index(op.f('ix_uuid_demo_city_name'), 'uuid_demo_city', ['name'], unique=False)
I have a folder called 'shared' and in this folder is the models.py file. This will make the upgrade command fail. The result of the command:
alembic upgrade head
File "...", line 17, in upgrade sa.Column('id', shared.models.GUID(), nullable=False), NameError: name 'shared' is not defined
This is bit confusing but it has to do with the fact that there is no data type shared.models.GUID(). You can modify env.py and script.py.mako. There is some information in the links below. I did not do this yet. For the moment I edited the versions file and changed:
sa.Column('id', shared.models.GUID(), nullable=False),
sa.Column('id', sa.CHAR(32), nullable=False),
Now the upgrade command runs without error. We also have to follow this procedure for Foreign Keys. Of course it is better to modify the scripts mentioned above but this editing is just once when creating the model.
How did I come to write this post? I think I was looking for ways to generate unique random codes. In Python using UUIDs is easy. I may have typed random unique code + MySQL. Then you hit also the UUID Primary Keys. I read about them in the past, but this time I took time to research this subject.
If you are running a database with Integer Autoincrement Primary Keys is it worth converting to using UUID Primary Keys? I think not. It can take a lot of time and you can get into a lot of trouble.
But for new databases, even very small ones, I recommend using UUID Primary Keys. You never know if you will ever come in a situation where you must merge tables, import records with UUID keys, have persons work offline with your database.
Of course there are exceptions. Assume you sample webcam images and store their references in a database record. In this case you want maximum INSERT performance so do not use UUID Primary Keys here!
A high performance website with a big database typically does not have to suffer from slower UUID Primary Keys. SELECT is only little bit slower and if designed properly, the majority of (SELECT) requests are served from cache.
For this CMS/ blog website I decided to (very) slowly convert existing Integer Autoincrement Primary Keys to UUID Primary Keys.
Links / credits
Alembic: How to migrate custom type in a model?
Backend-agnostic GUID Type
Best UUID data type?
Configuring MySQL to allow generation of Primary Key values
How can I bind a list to a parameter in a custom query in sqlalchemy?
How can I use UUIDs in SQLAlchemy?
Microseconds in MariaDB
UUIDs are Popular, but Bad for Performance — Let’s Discuss
- Using UUIDs instead of Integer Autoincrement Primary Keys with SQLAlchemy and MariaDb
- Using Python's pyOpenSSL to verify SSL certificates downloaded from a host
- Connect to a service on a Docker host from a Docker container
- Using PyInstaller and Cython to create a Python executable
- SQLAlchemy: Using Cascade Deletes to delete related objects
- Flask RESTful API request parameter validation with Marshmallow schemas