angle-uparrow-clockwisearrow-counterclockwisearrow-down-uparrow-leftatcalendarcard-listchatcheckenvelopefolderhouseinfo-circlepencilpeoplepersonperson-fillperson-plusphoneplusquestion-circlesearchtagtrashx

SQLAlchemy PostgreSQL: Add a second BigInteger Primary Key

Using UUIDs is fine, but when you're dealing with millions of records, stick with BigIntegers

28 May 2022 Updated 29 May 2022
post main image
https://www.pexels.com/nl-nl/@fotios-photos

Suppose we use UUID Primary Keys in our project.  This works fine, but there are also a few tables with millions of records. We know that SELECT and INSERT operations with UUID-based Primary Key tables
can be much slower compared to Integer based Primary Key tables. What we want is for these huge tables to have only BigInteger Primary Keys and link them to other UUID Primary Key based tables, with BigInteger Foreign Keys.

Test and TestResult

In our project we have the models Test and TestResult:

import uuid
import sqlalchemy as sa
import sqlalchemy.dialects.postgresql as pg


class Test(Base):
    __tablename__ = 'test'

    id = sa.Column(pg.UUID(as_uuid=True), primary_key=True, default=uuid.uuid4, unique=True)
	...

class TestResult(Base):
    __tablename__ = 'test_result'

    id = sa.Column(pg.UUID(as_uuid=True), primary_key=True, default=uuid.uuid4, unique=True)

    # foreign key
    test_id = sa.Column(pg.UUID(as_uuid=True), sa.ForeignKey('test.id'), nullable=False, index=True)

Step1: In TestResult, replace the UUID Primary Key by a BigInteger Primary Key

This is easy. In TestResult we replace the line:

    id = sa.Column(pg.UUID(as_uuid=True), primary_key=True, default=uuid.uuid4, unique=True)

by:

    id = sa.Column(sa.BigInteger, primary_key=True)

Step2: In Test, add a second autoincrement BigInteger 'Primairy Key'

For this we need to dig in the PostgreSQL and SQLAlchemy documents. It appears that for PostgreSQL 10 and above we can use IDENTITY columns and this is supported by SQLAlchemy, see links below. To add a second autoincrement 'Primary Key' to Test we add an extra column:

    # extra 'primary key' for use with test_result
    id_bi = sa.Column(sa.BigInteger, sa.Identity(always=True, increment=1), unique=True, index=True)

Step3: In TestResult, replace the UUID Foreign Key by a BigInteger Foreign Key

This also is easy, we do not link back to test.id but to test.id_bi. Replace:

    # foreign key
    test_id = sa.Column(pg.UUID(as_uuid=True), sa.ForeignKey('test.id'), index=True)

by:

    # foreign key
    test_id_bi = sa.Column(sa.BigInteger, sa.ForeignKey('test.id_bi'), index=True)

Our models after these changes

After our changes, Test and TestResult look like this:

class Test(Base):
    __tablename__ = 'test'

    id = sa.Column(pg.UUID(as_uuid=True), primary_key=True, default=uuid.uuid4, unique=True)

    # extra 'primary key' for use with test_result
    id_bi = sa.Column(sa.BigInteger, sa.Identity(always=True, increment=1), unique=True, index=True)


class TestResult(Base):
    __tablename__ = 'test_result'

    id = Column(BigInteger, primary_key=True)

    # foreign key
    test_id_bi = sa.Column(sa.BigInteger, sa.ForeignKey('test.id_bi'), index=True)

Summary

When dealing with millions of records, it is best to avoid UUIDs. They can significantly slow down SELECT, and more importantly INSERT operations. Here we have removed all UUIDs from the table containing millions of records by adding a second BigInteger 'Primary Key' to Test and changing the Foreign Key from UUID to BigInteger. We can't get any faster than that.

Links / credits

PostgreSQL Identity Column
https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-identity-column

SQLAlchemy - PostgreSQL
https://docs.sqlalchemy.org/en/14/dialects/postgresql.html

Read more

SQLAlchemy

Leave a comment

Comment anonymously or log in to comment.

Comments

Leave a reply

Reply anonymously or log in to reply.