angle-uparrow-clockwisearrow-counterclockwisearrow-down-uparrow-leftatcalendarcard-listchatcheckenvelopefolderhouseinfo-circlepencilpeoplepersonperson-fillperson-plusphoneplusquestion-circlesearchtagtrashx

SQLAlchemy Many-To-Many: Four ways to select data

12 April 2022 Updated 12 April 2022
post main image

This is a short post about Many-To-Many selection with SQLAlchemy. In the past I used the association (link) table in ORM queries, because I thought it must be fastest. This time I did a small test comparing various ways to select data.

Spoiler: The association (link) table way is (of course) fastest.

The model

We have a Many-To-Many relationship between Orders and Products.

# link table: order - product
order_mtm_product_table = sa.Table(
    'order_mtm_product',
    Base.metadata,
    sa.Column('order_id', sa.ForeignKey('order.id')),
    sa.Column('product_id', sa.ForeignKey('product.id')),
)

class Order(Base):
    __tablename__ = 'order'    

    id = sa.Column(sa.Integer, primary_key=True)
    name = sa.Column(sa.String(100), nullable=False)

    products = relationship(
        'Product',
        secondary=order_mtm_product_table,
        back_populates='orders',
    )

class Product(Base):
    __tablename__ = 'product'

    id = sa.Column(sa.Integer, primary_key=True)
    name = sa.Column(sa.String(100), nullable=False)

    orders = relationship(
        'Order',
        secondary=order_mtm_product_table,
        back_populates="products",
    )

Four ways to select data

The four ways to select data for a Many-To-Many relationship:

  • Walk
  • Orm
  • Join
  • Link

'Walk' is selecting the order and then walking the products

	stmt = sa.select(Order).\
		where(
			Order.name == order_name,
		)
	order_01 = session.execute(stmt).scalars().first()
	products = []
	for product in order_01.products:
		products.append(product)

'Orm' is the trivial way to select products

	stmt = sa.select(Product).\
		where(
			Product.orders.any(Order.name == order_name),
		)
	products = session.execute(stmt).scalars().all()

'Join' uses a 'join' to select the products

	stmt = sa.select(Product).\
		join(Product.orders).\
		where(
			Order.name == order_name,
		)
	products = session.execute(stmt).scalars().all()

'Link' uses the link table to select the products

	stmt = sa.select(Product).\
		where(sa.and_(
			Order.name == order_name,
			# mtm
			Order.id == order_mtm_product_table.c.order_id,
			Product.id == order_mtm_product_table.c.product_id,
		))
	products = session.execute(stmt).scalars().all()

The results

We measure the number of queries and the total time of getting the products.

+------------+-------+------+----------+---------+---------
| order_name | howto | type | products | queries | msecs
+------------+-------+------+----------+---------+---------
| order_01   |     1 | Walk |       18 |       2 | 4.315
| order_01   |     2 | Orm  |       18 |       1 | 1.991
| order_01   |     3 | Join |       18 |       1 | 1.681
| order_01   |     4 | Link |       18 |       1 | 1.242
+------------+-------+------+----------+---------+---------

The 'Walk' method is slowest because it uses two queries. We can reduce this to one query by using eager loading. If you need maximum performance then use the 'Link' method.

Extra results after adding search, ordering and limit

We extend the queries of 'Orm', 'Join' and 'Link' to allow searching, ordering and putting a limit on the returned items. The results:

+------------+-------+------+----------+---------+---------
| order_name | howto | type | products | queries | msecs
+------------+-------+------+----------+---------+---------
| order_01   |     1 | Walk |       18 |       2 | 4.356
| order_01   |     2 | Orm  |       18 |       1 | 2.686
| order_01   |     3 | Join |       18 |       1 | 2.380
| order_01   |     4 | Link |       18 |       1 | 1.657
| order_01   |     5 | Orm  |        2 |       1 | 3.328
| order_01   |     6 | Join |        2 |       1 | 2.647
| order_01   |     7 | Link |        2 |       1 | 1.664
+------------+-------+------+----------+---------+---------

No changes in the performance order.

The code

In case you want to try yourself:

# query_mtm.py
import datetime
import sys

import sqlalchemy as sa
from sqlalchemy.orm import declarative_base, sessionmaker, relationship

# query counter - start
# see:
# How to get SQL execution count for a query? #5709 
# https://github.com/sqlalchemy/sqlalchemy/issues/5709
from sqlalchemy import event
import contextlib

@contextlib.contextmanager
def count_queries(conn):
    queries = []
    def before_cursor_execute(conn, cursor, statement, parameters, context, executemany):
        queries.append(statement)

    event.listen(conn, "before_cursor_execute", before_cursor_execute)
    try:
        yield queries
    finally:
        event.remove(conn, "before_cursor_execute", before_cursor_execute)
# query counter - end

connection_uri = 'sqlite:///query_mtm.sqlite'
engine = sa.create_engine(connection_uri, echo = True)
Base = declarative_base()


# link table: order - product
order_mtm_product_table = sa.Table(
    'order_mtm_product',
    Base.metadata,
    sa.Column('order_id', sa.ForeignKey('order.id')),
    sa.Column('product_id', sa.ForeignKey('product.id')),
)

class Order(Base):
    __tablename__ = 'order'    

    id = sa.Column(sa.Integer, primary_key=True)
    name = sa.Column(sa.String(100), nullable=False)

    products = relationship(
        'Product',
        secondary=order_mtm_product_table,
        back_populates='orders',
    )

class Product(Base):
    __tablename__ = 'product'

    id = sa.Column(sa.Integer, primary_key=True)
    name = sa.Column(sa.String(100), nullable=False)

    orders = relationship(
        'Order',
        secondary=order_mtm_product_table,
        back_populates="products",
    )

Base.metadata.drop_all(engine, checkfirst=True)
Base.metadata.create_all(engine)

# ### create products and orders ###
print('create session')
Session = sessionmaker(bind=engine)
session = Session()

n_products = 20
print('create products ...')
products = []
for i in range(n_products):
    products.append(
        Product(
            name='name_{:02d}'.format(i)
        )
    )
session.add_all(products)
session.commit()

print('create orders ...')
order_00 = Order(
    name='order_00',
	products=[
		products.pop(3),
		products.pop(3),
	]
)
order_01 = Order(
    name='order_01',
    products=products,
)
session.add_all([order_00, order_01])
session.commit()

print('close session and close all connections of the connection pool')
session.close()
engine.dispose() 


# ### query products and orders ###
print('create session')
engine = sa.create_engine(connection_uri, echo=True)
Session = sessionmaker(bind=engine)
session = Session()

print('dummy query to start sqlalchemy ...')
stmt = sa.select(Order).where(Order.name == 'order_00')
order_00 = session.execute(stmt).scalars().first()


class OrderOps:

    def __init__(self):
        self.summary = []

    def get(self, howto, howto_type, order_name, search=None):
        if howto == 1:
            self.ftitle('1. [{}] get order, then get products ...'.format(howto_type))
            stmt = sa.select(Order).\
                where(
                    Order.name == order_name,
                )
            order_01 = session.execute(stmt).scalars().first()
            products = []
            for product in order_01.products:
                products.append(product)
            return products

        elif howto == 2:
            self.ftitle('2. [{}] select all products from order ...'.format(howto_type))
            stmt = sa.select(Product).\
                where(
                    Product.orders.any(Order.name == order_name),
                )
            return session.execute(stmt).scalars().all()

        elif howto == 3:
            self.ftitle('3. [{}] select all products from order using join ...'.format(howto_type))
            stmt = sa.select(Product).\
                join(Product.orders).\
                where(
                    Order.name == order_name,
                )
            return session.execute(stmt).scalars().all()

        elif howto == 4:
            self.ftitle('4. [{}] select all products from order using the link table ourselves ...'.format(howto_type))
            stmt = sa.select(Product).\
                where(sa.and_(
                    Order.name == order_name,
                    # mtm
                    Order.id == order_mtm_product_table.c.order_id,
                    Product.id == order_mtm_product_table.c.product_id,
                ))
            return session.execute(stmt).scalars().all()

        # with search, order and limit

        elif howto == 5:
            self.ftitle('5. as 2. but with search, ordering and limit ...')
            stmt = sa.select(Product).\
                where(sa.and_(
                    Product.orders.any(Order.name == order_name),
                    Product.name.like(search),
                )).\
                order_by(Product.name.desc()).\
                limit(3)
            return session.execute(stmt).scalars().all()

        elif howto == 6:
            self.ftitle('6. as 3. but with search, ordering and limit ...')
            stmt = sa.select(Product).\
                join(Product.orders).\
                where(sa.and_(
                    Order.name == order_name,
                    Product.name.like(search),
                )).\
                order_by(Product.name.desc()).\
                limit(3)
            return session.execute(stmt).scalars().all()

        elif howto == 7:
            self.ftitle('7. as 4. but with search, ordering and limit ...')
            stmt = sa.select(Product).\
                where(sa.and_(
                    Order.name == order_name,
                    Product.name.like(search),
                    # mtm
                    Order.id == order_mtm_product_table.c.order_id,
                    Product.id == order_mtm_product_table.c.product_id,
                )).\
                order_by(Product.name.desc()).\
                limit(3)
            return session.execute(stmt).scalars().all()

    def get_products(self, howto, howto_type, order_name, search=None):
        dt_start = datetime.datetime.now()
        with count_queries(session.connection()) as queries:
            products = self.get(howto, howto_type, order_name, search)
        print("total number of queries: %s" % len(queries))
        products_len = len(products)
        msecs = (datetime.datetime.now() - dt_start).total_seconds()*1000
        self.summary.append(dict(
            order_name=order_name,
            howto=howto,
            howto_type=howto_type,
            products_cnt=products_len,
            queries_cnt=len(queries),
            msecs=msecs,
        ))
        print('{} products: total {}, executed in {:.2f} msecs'.format(order_name, products_len, msecs))
        print(*('{:02d}: {}'.format(i, x.name) for i, x in enumerate(products)), sep='\n')

    def ftitle(self, title):
        print('+-{}'.format('-'*60))
        print('| {}'.format(title))
        print('+-{}'.format('-'*60))

    def print_summary(self):
        hs = '+-{}-+-{}-+-{}-+-{}-+-{}-+-{}'.format('-'*10, '-'*5, '-'*4, '-'*8, '-'*7, '-'*8)
        print(hs)
        print('| {} | {} | {} | {} | {} | {}'.format('order_name', 'howto', 'type', 'products', 'queries', 'msecs'))
        print(hs)
        for s in self.summary:
            print('| {: <10} | {: >5} | {: <4} | {: >8} | {: >7} | {}'.format(s['order_name'], s['howto'], s['howto_type'], s['products_cnt'], s['queries_cnt'], s['msecs']))
        print(hs)

order_ops = OrderOps()
order_ops.get_products(1, 'Walk', 'order_01')
order_ops.get_products(2, 'Orm ', 'order_01')
order_ops.get_products(3, 'Join', 'order_01')
order_ops.get_products(4, 'Link', 'order_01')

order_ops.print_summary()

order_ops.get_products(5, 'Orm ', 'order_01', '%2%')
order_ops.get_products(6, 'Join', 'order_01', '%2%')
order_ops.get_products(7, 'Link', 'order_01', '%2%')

order_ops.print_summary()

Summary

Of course this is very limited test. But I believe it is very important that you know what you are doing when you select a method. If you need maximum performance, use the 'Link' method,
it is not more difficult than the other methods.

Links / credits

How to get SQL execution count for a query? #5709
https://github.com/sqlalchemy/sqlalchemy/issues/5709

SQLAlchemy - Basic Relationship Patterns
https://docs.sqlalchemy.org/en/14/orm/basic_relationships.html

Read more

SQLAlchemy

Leave a comment

Comment anonymously or log in to comment.

Comments

Leave a reply

Reply anonymously or log in to reply.