angle-uparrow-clockwisearrow-counterclockwisearrow-down-uparrow-leftatcalendarcard-listchatcheckenvelopefolderhouseinfo-circlepencilpeoplepersonperson-fillperson-plusphoneplusquestion-circlesearchtagtrashx

SQLAlchemy Many-To-Many: Vier manieren om gegevens te selecteren

12 april 2022
post main image

Dit is een korte post over Many-To-Many selectie met SQLAlchemy. In het verleden gebruikte ik de associatie (link) tabel in ORM queries, omdat ik dacht dat die het snelst moest zijn. Deze keer heb ik een kleine test gedaan waarbij ik verschillende manieren om gegevens te selecteren heb vergeleken.

Spoiler: De associatie (link) tabel manier is (natuurlijk) het snelst.

Het model

We hebben een Many-To-Many relatie tussen Orders en Products.

# link table: order - product
order_mtm_product_table = sa.Table(
    'order_mtm_product',
    Base.metadata,
    sa.Column('order_id', sa.ForeignKey('order.id')),
    sa.Column('product_id', sa.ForeignKey('product.id')),
)

class Order(Base):
    __tablename__ = 'order'    

    id = sa.Column(sa.Integer, primary_key=True)
    name = sa.Column(sa.String(100), nullable=False)

    products = relationship(
        'Product',
        secondary=order_mtm_product_table,
        back_populates='orders',
    )

class Product(Base):
    __tablename__ = 'product'

    id = sa.Column(sa.Integer, primary_key=True)
    name = sa.Column(sa.String(100), nullable=False)

    orders = relationship(
        'Order',
        secondary=order_mtm_product_table,
        back_populates="products",
    )

Vier manieren om gegevens te selecteren

De vier manieren om gegevens te selecteren voor een Many-To-Many relatie:

  • Loop
  • Orm
  • Join
  • Link

Walk' is het selecteren van de order en dan de producten doorlopen

	stmt = sa.select(Order).\
		where(
			Order.name == order_name,
		)
	order_01 = session.execute(stmt).scalars().first()
	products = []
	for product in order_01.products:
		products.append(product)

Orm' is de triviale manier om producten te selecteren

	stmt = sa.select(Product).\
		where(
			Product.orders.any(Order.name == order_name),
		)
	products = session.execute(stmt).scalars().all()

Join' gebruikt een 'join' om de producten te selecteren

	stmt = sa.select(Product).\
		join(Product.orders).\
		where(
			Order.name == order_name,
		)
	products = session.execute(stmt).scalars().all()

Link' gebruikt de koppelingstabel om de producten te selecteren

	stmt = sa.select(Product).\
		where(sa.and_(
			Order.name == order_name,
			# mtm
			Order.id == order_mtm_product_table.c.order_id,
			Product.id == order_mtm_product_table.c.product_id,
		))
	products = session.execute(stmt).scalars().all()

De resultaten

We meten het aantal queries en de totale tijd van het ophalen van de producten.

+------------+-------+------+----------+---------+---------
| order_name | howto | type | products | queries | msecs
+------------+-------+------+----------+---------+---------
| order_01   |     1 | Walk |       18 |       2 | 4.315
| order_01   |     2 | Orm  |       18 |       1 | 1.991
| order_01   |     3 | Join |       18 |       1 | 1.681
| order_01   |     4 | Link |       18 |       1 | 1.242
+------------+-------+------+----------+---------+---------

De 'Walk' methode is het traagst omdat deze twee queries gebruikt. We kunnen dit terugbrengen tot één query door eager loading te gebruiken. Als je maximale prestaties wilt, gebruik dan de 'Link' methode.

Extra resultaten na toevoegen van zoeken, ordenen en beperken

We breiden de query's van 'Orm', 'Join' en 'Link' uit met zoeken, ordenen en het zetten van een limiet op de geretourneerde items. De resultaten:

+------------+-------+------+----------+---------+---------
| order_name | howto | type | products | queries | msecs
+------------+-------+------+----------+---------+---------
| order_01   |     1 | Walk |       18 |       2 | 4.356
| order_01   |     2 | Orm  |       18 |       1 | 2.686
| order_01   |     3 | Join |       18 |       1 | 2.380
| order_01   |     4 | Link |       18 |       1 | 1.657
| order_01   |     5 | Orm  |        2 |       1 | 3.328
| order_01   |     6 | Join |        2 |       1 | 2.647
| order_01   |     7 | Link |        2 |       1 | 1.664
+------------+-------+------+----------+---------+---------

Geen veranderingen in de prestatievolgorde.

De code

Voor het geval je het zelf wilt proberen:

# query_mtm.py
import datetime
import sys

import sqlalchemy as sa
from sqlalchemy.orm import declarative_base, sessionmaker, relationship

# query counter - start
# see:
# How to get SQL execution count for a query? #5709 
# https://github.com/sqlalchemy/sqlalchemy/issues/5709
from sqlalchemy import event
import contextlib

@contextlib.contextmanager
def count_queries(conn):
    queries = []
    def before_cursor_execute(conn, cursor, statement, parameters, context, executemany):
        queries.append(statement)

    event.listen(conn, "before_cursor_execute", before_cursor_execute)
    try:
        yield queries
    finally:
        event.remove(conn, "before_cursor_execute", before_cursor_execute)
# query counter - end

connection_uri = 'sqlite:///query_mtm.sqlite'
engine = sa.create_engine(connection_uri, echo = True)
Base = declarative_base()


# link table: order - product
order_mtm_product_table = sa.Table(
    'order_mtm_product',
    Base.metadata,
    sa.Column('order_id', sa.ForeignKey('order.id')),
    sa.Column('product_id', sa.ForeignKey('product.id')),
)

class Order(Base):
    __tablename__ = 'order'    

    id = sa.Column(sa.Integer, primary_key=True)
    name = sa.Column(sa.String(100), nullable=False)

    products = relationship(
        'Product',
        secondary=order_mtm_product_table,
        back_populates='orders',
    )

class Product(Base):
    __tablename__ = 'product'

    id = sa.Column(sa.Integer, primary_key=True)
    name = sa.Column(sa.String(100), nullable=False)

    orders = relationship(
        'Order',
        secondary=order_mtm_product_table,
        back_populates="products",
    )

Base.metadata.drop_all(engine, checkfirst=True)
Base.metadata.create_all(engine)

# ### create products and orders ###
print('create session')
Session = sessionmaker(bind=engine)
session = Session()

n_products = 20
print('create products ...')
products = []
for i in range(n_products):
    products.append(
        Product(
            name='name_{:02d}'.format(i)
        )
    )
session.add_all(products)
session.commit()

print('create orders ...')
order_00 = Order(
    name='order_00',
	products=[
		products.pop(3),
		products.pop(3),
	]
)
order_01 = Order(
    name='order_01',
    products=products,
)
session.add_all([order_00, order_01])
session.commit()

print('close session and close all connections of the connection pool')
session.close()
engine.dispose() 


# ### query products and orders ###
print('create session')
engine = sa.create_engine(connection_uri, echo=True)
Session = sessionmaker(bind=engine)
session = Session()

print('dummy query to start sqlalchemy ...')
stmt = sa.select(Order).where(Order.name == 'order_00')
order_00 = session.execute(stmt).scalars().first()


class OrderOps:

    def __init__(self):
        self.summary = []

    def get(self, howto, howto_type, order_name, search=None):
        if howto == 1:
            self.ftitle('1. [{}] get order, then get products ...'.format(howto_type))
            stmt = sa.select(Order).\
                where(
                    Order.name == order_name,
                )
            order_01 = session.execute(stmt).scalars().first()
            products = []
            for product in order_01.products:
                products.append(product)
            return products

        elif howto == 2:
            self.ftitle('2. [{}] select all products from order ...'.format(howto_type))
            stmt = sa.select(Product).\
                where(
                    Product.orders.any(Order.name == order_name),
                )
            return session.execute(stmt).scalars().all()

        elif howto == 3:
            self.ftitle('3. [{}] select all products from order using join ...'.format(howto_type))
            stmt = sa.select(Product).\
                join(Product.orders).\
                where(
                    Order.name == order_name,
                )
            return session.execute(stmt).scalars().all()

        elif howto == 4:
            self.ftitle('4. [{}] select all products from order using the link table ourselves ...'.format(howto_type))
            stmt = sa.select(Product).\
                where(sa.and_(
                    Order.name == order_name,
                    # mtm
                    Order.id == order_mtm_product_table.c.order_id,
                    Product.id == order_mtm_product_table.c.product_id,
                ))
            return session.execute(stmt).scalars().all()

        # with search, order and limit

        elif howto == 5:
            self.ftitle('5. as 2. but with search, ordering and limit ...')
            stmt = sa.select(Product).\
                where(sa.and_(
                    Product.orders.any(Order.name == order_name),
                    Product.name.like(search),
                )).\
                order_by(Product.name.desc()).\
                limit(3)
            return session.execute(stmt).scalars().all()

        elif howto == 6:
            self.ftitle('6. as 3. but with search, ordering and limit ...')
            stmt = sa.select(Product).\
                join(Product.orders).\
                where(sa.and_(
                    Order.name == order_name,
                    Product.name.like(search),
                )).\
                order_by(Product.name.desc()).\
                limit(3)
            return session.execute(stmt).scalars().all()

        elif howto == 7:
            self.ftitle('7. as 4. but with search, ordering and limit ...')
            stmt = sa.select(Product).\
                where(sa.and_(
                    Order.name == order_name,
                    Product.name.like(search),
                    # mtm
                    Order.id == order_mtm_product_table.c.order_id,
                    Product.id == order_mtm_product_table.c.product_id,
                )).\
                order_by(Product.name.desc()).\
                limit(3)
            return session.execute(stmt).scalars().all()

    def get_products(self, howto, howto_type, order_name, search=None):
        dt_start = datetime.datetime.now()
        with count_queries(session.connection()) as queries:
            products = self.get(howto, howto_type, order_name, search)
        print("total number of queries: %s" % len(queries))
        products_len = len(products)
        msecs = (datetime.datetime.now() - dt_start).total_seconds()*1000
        self.summary.append(dict(
            order_name=order_name,
            howto=howto,
            howto_type=howto_type,
            products_cnt=products_len,
            queries_cnt=len(queries),
            msecs=msecs,
        ))
        print('{} products: total {}, executed in {:.2f} msecs'.format(order_name, products_len, msecs))
        print(*('{:02d}: {}'.format(i, x.name) for i, x in enumerate(products)), sep='\n')

    def ftitle(self, title):
        print('+-{}'.format('-'*60))
        print('| {}'.format(title))
        print('+-{}'.format('-'*60))

    def print_summary(self):
        hs = '+-{}-+-{}-+-{}-+-{}-+-{}-+-{}'.format('-'*10, '-'*5, '-'*4, '-'*8, '-'*7, '-'*8)
        print(hs)
        print('| {} | {} | {} | {} | {} | {}'.format('order_name', 'howto', 'type', 'products', 'queries', 'msecs'))
        print(hs)
        for s in self.summary:
            print('| {: <10} | {: >5} | {: <4} | {: >8} | {: >7} | {}'.format(s['order_name'], s['howto'], s['howto_type'], s['products_cnt'], s['queries_cnt'], s['msecs']))
        print(hs)

order_ops = OrderOps()
order_ops.get_products(1, 'Walk', 'order_01')
order_ops.get_products(2, 'Orm ', 'order_01')
order_ops.get_products(3, 'Join', 'order_01')
order_ops.get_products(4, 'Link', 'order_01')

order_ops.print_summary()

order_ops.get_products(5, 'Orm ', 'order_01', '%2%')
order_ops.get_products(6, 'Join', 'order_01', '%2%')
order_ops.get_products(7, 'Link', 'order_01', '%2%')

order_ops.print_summary()

Samenvatting

Natuurlijk is dit een zeer beperkte test. Maar ik denk dat het heel belangrijk is dat je weet wat je doet als je een methode kiest. Als je maximale prestaties wil, gebruik dan de 'Link' methode,
die is niet moeilijker dan de andere methodes.

Links / credits

How to get SQL execution count for a query? #5709
https://github.com/sqlalchemy/sqlalchemy/issues/5709

SQLAlchemy - Basic Relationship Patterns
https://docs.sqlalchemy.org/en/14/orm/basic_relationships.html

Lees meer

SQLAlchemy

Laat een reactie achter

Reageer anoniem of log in om commentaar te geven.

Opmerkingen

Laat een antwoord achter

Antwoord anoniem of log in om te antwoorden.