angle-uparrow-clockwisearrow-counterclockwisearrow-down-uparrow-leftatcalendarcard-listchatcheckenvelopefolderhouseinfo-circlepencilpeoplepersonperson-fillperson-plusphoneplusquestion-circlesearchtagtrashx

Converting a SQLAlchemy MariaDb / MySQL database with Integer Primary Keys to UUID Primary Keys for a Flask website

Create scripts to automate the conversion. They will save you time and prevent unnecessary errors.

26 April 2020 Updated 30 June 2020
post main image
https://unsplash.com/@enginakyurt

This website uses Flask, (plain) SQLAlchemy with Alembic for migrations and MariaDb. If you do not use these, this post probably is not what you are looking for.

In a previous post I wrote why many websites should use UUID Primary Keys instead of Integer Primary Keys. Even if you have a high performance website then probably only few tables are responsible for this performance. High performance websites use caching meaning that (most of the time) SELECT performance difference will not result in page response time differences.

The main issue when using UUID Primary Keys is INSERT performance with tables with many records. Recalculating the keys after an INSERT is far more time consuming than when using Integer Primary Keys. For this website I record every request in a database table page_request, also a blog post view is recorded in a table content_item_view. Both tables must have maximum INSERT performance. The other tables do not have that many records. These two tables, and some demo tables, are the only tables that I did not convert. The total number of tables in the database is 80. All have the id column that must be converted, maybe half (?) is not using Foreign Keys.

Should you convert the database of your website? Probably not if it is running fine and you are happy. I wanted to convert to UUIDs so continue reading this post if you also want this. Does the described method work? Yes. This post was written on the new (converted) database.

Automating conversion of the existing database

There are two important reasons to automate the conversion:

  • Minimum production website downtime
  • Avoiding mistakes

The process to move the converted database to production looks something like:

  • Take website down
  • Make a copy of the existing database
  • Copy database to another (local) system
  • Convert database
  • Copy converted database to production system
  • Enable new software version
  • Bring website up

The time to convert the database should be minimized, in fact, it must be automated because we also want to avoid mistakes. A script that runs the conversion can do this for you. You can prepare and test this script as long as you want. Once you know it is working fine, you are ready for the production database conversion.

I also developed two other scrips, see also below:

  • compare tables and columns
  • copy data from one database to another

Convert the database in four steps

Before anything else, make backups of your database and check that this backup database is correct by restoring and testing! You do not want to lose any data. The four steps are:

  1. Convert the existing database to a new database DB_DATA with UUIDs
    This database has the data. We do not care about constraints, in fact we remove them all. This means that in the new database Foreign Keys are just values. The main things we do here is: convert Integer Primary Keys to UUID Primary Keys, and convert Integer Foreign Keys to UUID Foreign Keys
  2. Generate a database DB_FINAL with Alembic with UUIDs
    This database has the UUID Primary Keys, UUID Foreign Keys and relationships. But it has no data ... yet. I created a new tree or version of the software. Here I make all changes to handle UUID Primary Keys instead of Integer Primary Keys. In this new version I made changes to the models.py file: Integer Primary Keys to UUID Primary Keys, Integer Foreign Keys to UUID Foreign Keys
  3. Compare the database tables and columns of database DB_DATA with database tables and columns of database DB_FINAL
    We do not compare constraints. They MUST be identical otherwise we cannot copy the data in the next step!
  4. Copy the database table data of DB_DATA to the database tables of DB_FINAL
    I do this using SELECT INTO OUTFILE and LOAD DATA INFILE.

Some words about the code

The most time-consuming piece I had to develop was the database class. This contains mainly MariaDb / MySQL administrative methods. I am not going to show all my code but to give you an idea here is the database class with some methods:

import pymysql.cursors
import uuid


class Database:

    def __init__(self, user=None, passwd=None, host=None, db=None, charset=None, autocommit=None, connect_timeout=None):
        ...

    def set_foreign_key_checks(self, off_on):
        
    def table_get_constraints(self, table_name):

    def table_has_integer_primary_key_id(self, table_name):

    def table_create_column(self, table_name, column_definition):

    def table_column_add_key(self, table_name, column_name):

    def table_drop_column(self, table_name, column_name):

    def table_drop_constraint(self, table_name, constraint_name):

    def load_column_with_uuids(self, table_name, column_name, force=False):

    def get_all_foreign_keys_to_table(self, table_name, column_name=None):

    def select_into_outfile(self, outfile, table_name, column_names):

    def load_from_infile(self, infile, table_name, column_names):

    def copy_uuid_from_referenced_table_to_foreign_key_table(self, 
                referenced_table_name, referenced_table_column_name, 
                foreign_key_table_name, foreign_key_table_column_name, 
                from_referenced_table_column_name, to_foreign_key_table_column_name):

1. Convert the existing database to a new database DB_DATA with UUIDs

We must create a new database with UUIDs in the proper places. This means we have to convert Integer Primary Keys and Integer Foreign Keys to UUIDs. After having created UUID Foreign Keys we can drop the constraints.
This makes it easy to rename columns.

Creating a UUID Primary Key column is not a big problem. I create a CHAR(32) column 'id_upk_copy' and load it with UUIDs. To handle Foreign Keys we can use the INFORMATION_SCHEMA table to get all Foreign Keys to a table. Below is the example for the Foreign Keys referring to the id column of the user table. The query is:

SELECT
  TABLE_NAME,COLUMN_NAME,CONSTRAINT_NAME, REFERENCED_TABLE_NAME,REFERENCED_COLUMN_NAME
  FROM
  INFORMATION_SCHEMA.KEY_COLUMN_USAGE 
  WHERE 
    REFERENCED_TABLE_SCHEMA = 'peterspython_dev_existing' 
    AND
    REFERENCED_TABLE_NAME = 'user';

And the result is:

+---------------------+-------------+----------------------------+-----------------------+------------------------+
| TABLE_NAME          | COLUMN_NAME | CONSTRAINT_NAME            | REFERENCED_TABLE_NAME | REFERENCED_COLUMN_NAME |
+---------------------+-------------+----------------------------+-----------------------+------------------------+
| ...                 | ...         | ...                        | user                  | id                     |
| user_mtm_user_group | user_id     | user_mtm_user_group_ibfk_1 | user                  | id                     |
| user_mtm_user_role  | user_id     | user_mtm_user_role_ibfk_1  | user                  | id                     |
| user_login          | user_id     | user_login_ibfk_1          | user                  | id                     |
| comment             | user_id     | comment_ibfk_3             | user                  | id                     |
| contact_form        | user_id     | contact_form_ibfk_2        | user                  | id                     |
| ...                 | ...         | ...                        | user                  | id                     |
+---------------------+-------------+----------------------------+-----------------------+------------------------+

The result includes everything we need to convert the Foreign Keys. Tables 'user_mtm_user_group' and 'user_mtm_user_role' are the many-to-many relationship tables . What I do is, for every constraint:

  • create a new column user_id_ufk_copy
  • copy the UUID from the REFERENCED_TABLE id_upk_copy column to TABLE_NAME user_id_ufk_copy

The latter is done using a method in my Database class:

    def copy_uuid_from_referenced_table_to_foreign_key_table(self, 
                referenced_table_name, referenced_table_column_name, 
                foreign_key_table_name, foreign_key_table_column_name, 
                from_referenced_table_column_name, to_foreign_key_table_column_name):

        # we alias the referenced_table because the key can be in the same table
        query = '''
            UPDATE {} 
              INNER JOIN {} abcde
                ON abcde.{} = {}.{}
              SET {}.{} = abcde.{}'''\
                .format(
                    foreign_key_table_name, 
                    referenced_table_name, 
                    referenced_table_column_name, 
                    foreign_key_table_name, foreign_key_table_column_name,
                    foreign_key_table_name, to_foreign_key_table_column_name,
                    from_referenced_table_column_name)

        cursor = self.conn.cursor()
        self.execute_query(cursor, query)

        return True

The script for the database conversion looks like this:

# convert integer primary keys and integer foreign keys to uuids
def to_uuid():

    db = Database(
        user=DB_USER, 
        passwd=DB_PASSWORD, 
        host=DB_HOST,
        db=DB_DATABASE_NAME)

    # connect to database

    # get all tables

    # get all tables with integer primary key id

    # create and load new columns: id_ipk_copy, INT(11), and id_upk_copy, CHAR(32)
    for table_name in table_names_ipkid:

        # create new column id_ipk_copy and copy id column into it

        # create new column id_upk_copy and fill with uuids

    # create new foreign key uuid columns in tables
    for table_name_ipkid in table_names_ipkid:
        for row in db.get_all_foreign_keys_to_table(table_name_ipkid, column_name='id'):

            foreign_key_table_name = row['FOREIGN_KEY_TABLE_NAME']
            foreign_key_table_column_name = row['FOREIGN_KEY_TABLE_COLUMN_NAME']

            if not db.table_column_is_integer(foreign_key_table_name, foreign_key_table_column_name):
                # already uuid?
                continue

            referenced_table_name = row['REFERENCED_TABLE_NAME']
            referenced_table_column_name = row['REFERENCED_TABLE_COLUMN_NAME']

            foreign_key_table_new_column_name = foreign_key_table_column_name + '_ufk_copy'
            foreign_key_table_new_column_definition = {
                'column_name': foreign_key_table_new_column_name,
                'definition': foreign_key_table_new_column_name + ' CHAR(32)',
            }
            db.table_create_column(foreign_key_table_name, foreign_key_table_new_column_definition)

            # copy uuids (id_upk_copy) from referenced_table to new column of foreign_key_table
            db.copy_uuid_from_referenced_table_to_foreign_key_table(
                referenced_table_name, referenced_table_column_name, 
                foreign_key_table_name, foreign_key_table_column_name, 
                'id_upk_copy', foreign_key_table_new_column_name)

    # at this point we have:
    # 
    # - tables with an integer primary key column id:
    #   - have a new column id_ipk_copy with a copy of id
    #   - have a new column id_upk_copy with uuids values 
    # - all tables with foreign keys:
    #   - have an extra foreign key column foreign_key_table_column_name + '_uuid'
    # 

    # now we must rearrange

    # first we drop all constraints
    ...

    # 1. Rearrange primary key
    # - drop integer primary key column id
    # - rename column id_upk_copy to id
    ...

    # 2. Rearrange foreign keys
    # for all tables, using column names ending with '_ufk_copy':
    # - drop integer foreign key column 
    # - rename uuid foreign key column to dropped integer foreign key column name
    ...

    # 3. Remove temporary columns
    ...

    # at this point we have:
    # a database that has all the data in the correct columns

2. Generate a database DB_FINAL with Alembic with UUID Primary Keys and Foreign Keys

This database has the UUID Primary Keys, UUID Foreign Keys and relationships. But it has no data. I created a new tree or version of the software. Here I make all changes to handle UUID Primary Keys instead of Integer Primary Keys. In this new version I made the following changes to the models.py file:

  • Replaced Integer Primary Keys by UUID Primary Keys
  • Replaced Integer Foreign Keys by UUID Foreign Keys

First step is create an empty database. In mysql I did:

drop database peterspython_dev;
create database peterspython_dev;

Next we empty the alembic/versions directory. Then we run Alembic to create a fresh version:

alembic revision -m "v2.20 first" 
alembic upgrade head

And do this a second time to generate all tables and relationships:

alembic revision --autogenerate -m "v2.20 2e time"
alembic upgrade head

I did not look into importing GUID() in the Alembic process yet. Before running 'alembic upgrade head' the second time I edit the versions file and replaced the string:

shared.models.GUID()

by:

sa.CHAR(32)

3. Compare the database tables and columns of database DB_DATA with database tables and columns of database DB_FINAL

I also created a script for this. It is not very difficult but must be done, our databases must be identical! Not only do I compare table names and column names but also I check if the DATA_TYPE of the columns match. The compare script gave me (error) messages like:

db1 column content_item_translation.intro data_type text not same as db2 data_type longtext
db1 column content_item_translation.summary data_type text not same as db2 data_type longtext
db1 column mail_smtp_message.body_text data_type text not same as db2 data_type mediumtext
db1 column contact_form.message data_type text not same as db2 data_type mediumtext

Probably I changed them by hand because I was just starting with SQLAlchemy. I was using the SQLAlchemy Text data type and needed bigger space. Anyway, time to do this right.

Unfortunately the only way is to use MySQL dialect data types MEDIUMTEXT and LONGTEXT which takes me another step away from supporting both MariaDb / MySQL and PostgreSQL. But at least now both databases are identical!

4. Copy the database table data of DB_DATA to the database tables of DB_FINAL

Of course I ran into a problem here. NULL values of the DB_DATA database were not NULL values in the DB_FINAL database. The reason was I had defined my own field separator, line separator. Reading about this on the internet the solution appeared simple: use the MariaDb / MySQL defaults. This means exporting by:

SELECT <column_names_list> FROM <table_name> INTO OUTFILE <outfile_infile>

And then importing by:

LOAD DATA LOCAL INFILE <outfile_infile> INTO TABLE <table_name> <column_names_list>

Now NULL values are transferred correct. Another problem you may encounter is the OUTFILE. If this file exists, you will get an error so make sure to remove these files before a new run.

Things that went wrong

Database did not respond anymore

Maybe all this adding and removing columns is something that messes something up. But probably there was something wrong in my code. In the first version I forgot closing the connection. When I went into mysql and wanted to drop the database I got this message:

MySQL: Error dropping database (errno 13; errno 17; errno 39)

This means there is one or more files that are locked or whatever by the database server. The solution is go into this database directory on your filesystem and remove these files manually, for example:

sudo rm /var/lib/mysql/your_database_name/category_translation.ibd

Be careful here! After I closed the connection in the script I did not see this message anymore but I thought I' d just share this. 

UUID object has no attribute 'translate'

While testing the contact form I got the message:

AttributeError: 'UUID' object has no attribute 'translate' when using backend-agnostic GUID type

Why? This happened when I first committed the object mail_message and then use this mail_message.id as a Foreign Key when adding records to table mail_message_to. After commit, type(item.id) = <class 'uuid.UUID'>, item.id = dcd0fd6b-cd0f-44b0-92c6-a5f08c0d784f and it could not be written into the ForeignKey field.

The problem: I had converted the Integer Primary Keys to UUID Primary Keys but did not do this for the Foreign Keys ... uh oh ... stupid me ... The solution was to replace CHAR(32) in the models.py for GUID():

    ... = Column(CHAR(32), ForeignKey(...

Must be changed to:

    ... = Column(GUID(), ForeignKey(...

It was not necessary to run Alembic after this change.

Added missing Foreign Keys

At (at least, so far) three places I just added an Integer to a table where I should have added a Foreign Key. Two of them were in ContentItem and ContentItemTranslation classess. I used them to mark the parent item. After I edit a ContentItem or ContentItemTranslation object, I click the Save button. Then a copy of the current record is made, with a parent_id of the original record. To fix this I went back to original database, added Foreign Keys, and ran the process again.

Raw SQL

Although I am using mosty SLQAlchemy ORM queries I am also using SQLAlchemy Raw SQL. It appeared that some of these queries failed, because after the conversion the ids were un-quoted.

Flask routes

Of course there is a problem with many routes. In Flask you can specify a data type for the view function parameters, and most of the time they were Integer. Now they must be strings. Example:

@pages_blueprint.route('/category', defaults={'category_id': 0})
@pages_blueprint.route('/category/<int:category_id>')
def category(category_id):
    ...

Must be changed to:

@pages_blueprint.route('/category', defaults={'category_id': ''})
@pages_blueprint.route('/category/<category_id>')
def category(category_id):
    ...

Integer checking

At some places I explicitly check if a value is an Integer. That's just me in paranoid mode. For example, this website uses Flask-Login and in create_app() we have the user_loader:

    @login_manager.user_loader
    def load_user(user_id):
        ...
        return g.db_session.query(User).get(user_id)

I had to remove checking that the user_id really was an Integer.

Sorting

Yes, of course I used the Integer Primary Key id in a number of cases to sort the records. With UUIDs this sorting gets messed up. Solved this by using the (record) created_on timestamp. I already had this created_on column in every table.

WTForms SelectField

In most forms with selects I use (id, name) Tuples. But the id is no longer an Integer. The solution is to remove the coerce=int parameter. Example:

class ContentItemAssignAuthorsForm(FlaskForm):

    author_ids = MultiCheckboxField(_('Select authors for content item'), coerce=int)
    submit = SubmitField(_l('Assign'))

Must be changed to:

class ContentItemAssignAuthorsForm(FlaskForm):

    author_ids = MultiCheckboxField(_('Select authors for content item'))
    submit = SubmitField(_l('Assign'))

Summary

This was the first time that I used Python pymysql with mainly MariaDb / MySQL maintenance commands, like get column information, contraints, adding and removing columns, storing and loading data. I expected serious difficulties somewhere during the conversion but this did not happen.

Using UUIDs can be confusing. Is it at a certain moment a UUID object or a string? I am using the Backend-agnostic GUID Type recipe, see links below. I think I would prefer that UUIDs are strings everywhere, after all we only need the UUID object when inserting a Primary Key or Foreign Key. 

Links / credits

AttributeError: 'UUID' object has no attribute 'replace' when using backend-agnostic GUID type
https://stackoverflow.com/questions/47429929/attributeerror-uuid-object-has-no-attribute-replace-when-using-backend-agno

Backend-agnostic GUID Type
https://docs.sqlalchemy.org/en/13/core/custom_types.html?highlight=guid#backend-agnostic-guid-type

How can I use UUIDs in SQLAlchemy?
https://stackoverflow.com/questions/183042/how-can-i-use-uuids-in-sqlalchemy

How do I see all foreign keys to a table or column?
https://stackoverflow.com/questions/201621/how-do-i-see-all-foreign-keys-to-a-table-or-column

MySQL: Error dropping database (errno 13; errno 17; errno 39)
https://stackoverflow.com/questions/12196996/mysql-error-dropping-database-errno-13-errno-17-errno-39

Replace integer id field with uuid
https://stackoverflow.com/questions/42674931/replace-integer-id-field-with-uuid

SQLAlchemy one to one relationship get parent and add child
https://stackoverflow.com/questions/59227673/sqlalchemy-one-to-one-relationship-get-parent-and-add-child

Using custom UUID Type from SQLAlchemy
https://stackoverflow.com/questions/47275130/using-custom-uuid-type-from-sqlalchemy

What column type does SQLAlchemy use for “Text” on MySQL?
https://stackoverflow.com/questions/47644739/what-column-type-does-sqlalchemy-use-for-text-on-mysql

Leave a comment

Comment anonymously or log in to comment.

Comments

Leave a reply

Reply anonymously or log in to reply.