SQLAlchemy PostgreSQL: Add a second BigInteger Primary Key
Using UUIDs is fine, but when you're dealing with millions of records, stick with BigIntegers

Suppose we use UUID Primary Keys in our project. This works fine, but there are also a few tables with millions of records. We know that SELECT and INSERT operations with UUID-based Primary Key tables
can be much slower compared to Integer based Primary Key tables. What we want is for these huge tables to have only BigInteger Primary Keys and link them to other UUID Primary Key based tables, with BigInteger Foreign Keys.
Test and TestResult
In our project we have the models Test and TestResult:
import uuid
import sqlalchemy as sa
import sqlalchemy.dialects.postgresql as pg
class Test(Base):
__tablename__ = 'test'
id = sa.Column(pg.UUID(as_uuid=True), primary_key=True, default=uuid.uuid4, unique=True)
...
class TestResult(Base):
__tablename__ = 'test_result'
id = sa.Column(pg.UUID(as_uuid=True), primary_key=True, default=uuid.uuid4, unique=True)
# foreign key
test_id = sa.Column(pg.UUID(as_uuid=True), sa.ForeignKey('test.id'), nullable=False, index=True)
Step1: In TestResult, replace the UUID Primary Key by a BigInteger Primary Key
This is easy. In TestResult we replace the line:
id = sa.Column(pg.UUID(as_uuid=True), primary_key=True, default=uuid.uuid4, unique=True)
by:
id = sa.Column(sa.BigInteger, primary_key=True)
Step2: In Test, add a second autoincrement BigInteger 'Primairy Key'
For this we need to dig in the PostgreSQL and SQLAlchemy documents. It appears that for PostgreSQL 10 and above we can use IDENTITY columns and this is supported by SQLAlchemy, see links below. To add a second autoincrement 'Primary Key' to Test we add an extra column:
# extra 'primary key' for use with test_result
id_bi = sa.Column(sa.BigInteger, sa.Identity(always=True, increment=1), unique=True, index=True)
Step3: In TestResult, replace the UUID Foreign Key by a BigInteger Foreign Key
This also is easy, we do not link back to test.id but to test.id_bi. Replace:
# foreign key
test_id = sa.Column(pg.UUID(as_uuid=True), sa.ForeignKey('test.id'), index=True)
by:
# foreign key
test_id_bi = sa.Column(sa.BigInteger, sa.ForeignKey('test.id_bi'), index=True)
Our models after these changes
After our changes, Test and TestResult look like this:
class Test(Base):
__tablename__ = 'test'
id = sa.Column(pg.UUID(as_uuid=True), primary_key=True, default=uuid.uuid4, unique=True)
# extra 'primary key' for use with test_result
id_bi = sa.Column(sa.BigInteger, sa.Identity(always=True, increment=1), unique=True, index=True)
class TestResult(Base):
__tablename__ = 'test_result'
id = Column(BigInteger, primary_key=True)
# foreign key
test_id_bi = sa.Column(sa.BigInteger, sa.ForeignKey('test.id_bi'), index=True)
Summary
When dealing with millions of records, it is best to avoid UUIDs. They can significantly slow down SELECT, and more importantly INSERT operations. Here we have removed all UUIDs from the table containing millions of records by adding a second BigInteger 'Primary Key' to Test and changing the Foreign Key from UUID to BigInteger. We can't get any faster than that.
Links / credits
PostgreSQL Identity Column
https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-identity-column
SQLAlchemy - PostgreSQL
https://docs.sqlalchemy.org/en/14/dialects/postgresql.html
Read more
SQLAlchemy
Most viewed
- Using UUIDs instead of Integer Autoincrement Primary Keys with SQLAlchemy and MariaDb
- Using Python's pyOpenSSL to verify SSL certificates downloaded from a host
- Using PyInstaller and Cython to create a Python executable
- Connect to a service on a Docker host from a Docker container
- SQLAlchemy: Using Cascade Deletes to delete related objects
- Flask RESTful API request parameter validation with Marshmallow schemas