angle-uparrow-clockwisearrow-counterclockwisearrow-down-uparrow-leftatcalendarcard-listchatcheckenvelopefolderhouseinfo-circlepencilpeoplepersonperson-fillperson-plusphoneplusquestion-circlesearchtagtrashx

SQLAlchemy Many-To-Many: Vier Möglichkeiten zur Auswahl von Daten

12 April 2022
post main image

Dies ist ein kurzer Beitrag über die Many-To-Many Auswahl mit SQLAlchemy. In der Vergangenheit habe ich die Assoziationstabelle (Link) in ORM -Abfragen verwendet, weil ich dachte, dass sie am schnellsten sein muss. Diesmal habe ich einen kleinen Test durchgeführt, bei dem ich verschiedene Möglichkeiten der Datenauswahl verglichen habe.

Spoiler: Die Assoziations-(Link-)Tabelle ist (natürlich) am schnellsten.

Das Modell

Wir haben eine Many-To-Many-Beziehung zwischen Orders und Products.

# link table: order - product
order_mtm_product_table = sa.Table(
    'order_mtm_product',
    Base.metadata,
    sa.Column('order_id', sa.ForeignKey('order.id')),
    sa.Column('product_id', sa.ForeignKey('product.id')),
)

class Order(Base):
    __tablename__ = 'order'    

    id = sa.Column(sa.Integer, primary_key=True)
    name = sa.Column(sa.String(100), nullable=False)

    products = relationship(
        'Product',
        secondary=order_mtm_product_table,
        back_populates='orders',
    )

class Product(Base):
    __tablename__ = 'product'

    id = sa.Column(sa.Integer, primary_key=True)
    name = sa.Column(sa.String(100), nullable=False)

    orders = relationship(
        'Order',
        secondary=order_mtm_product_table,
        back_populates="products",
    )

Vier Möglichkeiten der Datenauswahl

Die vier Möglichkeiten zur Auswahl von Daten für eine Many-To-Many-Beziehung:

  • Gehen Sie
  • Orm
  • Verbinden
  • Link

Walk" bedeutet, die Bestellung auszuwählen und dann die Produkte durchzugehen.

	stmt = sa.select(Order).\
		where(
			Order.name == order_name,
		)
	order_01 = session.execute(stmt).scalars().first()
	products = []
	for product in order_01.products:
		products.append(product)

Orm" ist die triviale Art, Produkte auszuwählen

	stmt = sa.select(Product).\
		where(
			Product.orders.any(Order.name == order_name),
		)
	products = session.execute(stmt).scalars().all()

Join" verwendet eine Verknüpfung, um die Produkte auszuwählen

	stmt = sa.select(Product).\
		join(Product.orders).\
		where(
			Order.name == order_name,
		)
	products = session.execute(stmt).scalars().all()

'Link' verwendet die Verknüpfungstabelle zur Auswahl der Produkte

	stmt = sa.select(Product).\
		where(sa.and_(
			Order.name == order_name,
			# mtm
			Order.id == order_mtm_product_table.c.order_id,
			Product.id == order_mtm_product_table.c.product_id,
		))
	products = session.execute(stmt).scalars().all()

Die Ergebnisse

Wir messen die Anzahl der Abfragen und die Gesamtzeit für das Abrufen der Produkte.

+------------+-------+------+----------+---------+---------
| order_name | howto | type | products | queries | msecs
+------------+-------+------+----------+---------+---------
| order_01   |     1 | Walk |       18 |       2 | 4.315
| order_01   |     2 | Orm  |       18 |       1 | 1.991
| order_01   |     3 | Join |       18 |       1 | 1.681
| order_01   |     4 | Link |       18 |       1 | 1.242
+------------+-------+------+----------+---------+---------

Die Methode 'Walk' ist am langsamsten, weil sie zwei Abfragen benötigt. Durch die Verwendung von Eager Loading können wir dies auf eine Abfrage reduzieren. Wenn Sie maximale Leistung benötigen, verwenden Sie die Methode "Link".

Zusätzliche Ergebnisse nach Hinzufügen von Suche, Bestellung und Begrenzung

Wir erweitern die Abfragen von "Orm", "Join" und "Link", um die Suche, das Ordnen und das Setzen einer Grenze für die zurückgegebenen Elemente zu ermöglichen. Die Ergebnisse:

+------------+-------+------+----------+---------+---------
| order_name | howto | type | products | queries | msecs
+------------+-------+------+----------+---------+---------
| order_01   |     1 | Walk |       18 |       2 | 4.356
| order_01   |     2 | Orm  |       18 |       1 | 2.686
| order_01   |     3 | Join |       18 |       1 | 2.380
| order_01   |     4 | Link |       18 |       1 | 1.657
| order_01   |     5 | Orm  |        2 |       1 | 3.328
| order_01   |     6 | Join |        2 |       1 | 2.647
| order_01   |     7 | Link |        2 |       1 | 1.664
+------------+-------+------+----------+---------+---------

Keine Änderungen in der Leistungsreihenfolge.

Der Code

Für den Fall, dass Sie es selbst versuchen wollen:

# query_mtm.py
import datetime
import sys

import sqlalchemy as sa
from sqlalchemy.orm import declarative_base, sessionmaker, relationship

# query counter - start
# see:
# How to get SQL execution count for a query? #5709 
# https://github.com/sqlalchemy/sqlalchemy/issues/5709
from sqlalchemy import event
import contextlib

@contextlib.contextmanager
def count_queries(conn):
    queries = []
    def before_cursor_execute(conn, cursor, statement, parameters, context, executemany):
        queries.append(statement)

    event.listen(conn, "before_cursor_execute", before_cursor_execute)
    try:
        yield queries
    finally:
        event.remove(conn, "before_cursor_execute", before_cursor_execute)
# query counter - end

connection_uri = 'sqlite:///query_mtm.sqlite'
engine = sa.create_engine(connection_uri, echo = True)
Base = declarative_base()


# link table: order - product
order_mtm_product_table = sa.Table(
    'order_mtm_product',
    Base.metadata,
    sa.Column('order_id', sa.ForeignKey('order.id')),
    sa.Column('product_id', sa.ForeignKey('product.id')),
)

class Order(Base):
    __tablename__ = 'order'    

    id = sa.Column(sa.Integer, primary_key=True)
    name = sa.Column(sa.String(100), nullable=False)

    products = relationship(
        'Product',
        secondary=order_mtm_product_table,
        back_populates='orders',
    )

class Product(Base):
    __tablename__ = 'product'

    id = sa.Column(sa.Integer, primary_key=True)
    name = sa.Column(sa.String(100), nullable=False)

    orders = relationship(
        'Order',
        secondary=order_mtm_product_table,
        back_populates="products",
    )

Base.metadata.drop_all(engine, checkfirst=True)
Base.metadata.create_all(engine)

# ### create products and orders ###
print('create session')
Session = sessionmaker(bind=engine)
session = Session()

n_products = 20
print('create products ...')
products = []
for i in range(n_products):
    products.append(
        Product(
            name='name_{:02d}'.format(i)
        )
    )
session.add_all(products)
session.commit()

print('create orders ...')
order_00 = Order(
    name='order_00',
	products=[
		products.pop(3),
		products.pop(3),
	]
)
order_01 = Order(
    name='order_01',
    products=products,
)
session.add_all([order_00, order_01])
session.commit()

print('close session and close all connections of the connection pool')
session.close()
engine.dispose() 


# ### query products and orders ###
print('create session')
engine = sa.create_engine(connection_uri, echo=True)
Session = sessionmaker(bind=engine)
session = Session()

print('dummy query to start sqlalchemy ...')
stmt = sa.select(Order).where(Order.name == 'order_00')
order_00 = session.execute(stmt).scalars().first()


class OrderOps:

    def __init__(self):
        self.summary = []

    def get(self, howto, howto_type, order_name, search=None):
        if howto == 1:
            self.ftitle('1. [{}] get order, then get products ...'.format(howto_type))
            stmt = sa.select(Order).\
                where(
                    Order.name == order_name,
                )
            order_01 = session.execute(stmt).scalars().first()
            products = []
            for product in order_01.products:
                products.append(product)
            return products

        elif howto == 2:
            self.ftitle('2. [{}] select all products from order ...'.format(howto_type))
            stmt = sa.select(Product).\
                where(
                    Product.orders.any(Order.name == order_name),
                )
            return session.execute(stmt).scalars().all()

        elif howto == 3:
            self.ftitle('3. [{}] select all products from order using join ...'.format(howto_type))
            stmt = sa.select(Product).\
                join(Product.orders).\
                where(
                    Order.name == order_name,
                )
            return session.execute(stmt).scalars().all()

        elif howto == 4:
            self.ftitle('4. [{}] select all products from order using the link table ourselves ...'.format(howto_type))
            stmt = sa.select(Product).\
                where(sa.and_(
                    Order.name == order_name,
                    # mtm
                    Order.id == order_mtm_product_table.c.order_id,
                    Product.id == order_mtm_product_table.c.product_id,
                ))
            return session.execute(stmt).scalars().all()

        # with search, order and limit

        elif howto == 5:
            self.ftitle('5. as 2. but with search, ordering and limit ...')
            stmt = sa.select(Product).\
                where(sa.and_(
                    Product.orders.any(Order.name == order_name),
                    Product.name.like(search),
                )).\
                order_by(Product.name.desc()).\
                limit(3)
            return session.execute(stmt).scalars().all()

        elif howto == 6:
            self.ftitle('6. as 3. but with search, ordering and limit ...')
            stmt = sa.select(Product).\
                join(Product.orders).\
                where(sa.and_(
                    Order.name == order_name,
                    Product.name.like(search),
                )).\
                order_by(Product.name.desc()).\
                limit(3)
            return session.execute(stmt).scalars().all()

        elif howto == 7:
            self.ftitle('7. as 4. but with search, ordering and limit ...')
            stmt = sa.select(Product).\
                where(sa.and_(
                    Order.name == order_name,
                    Product.name.like(search),
                    # mtm
                    Order.id == order_mtm_product_table.c.order_id,
                    Product.id == order_mtm_product_table.c.product_id,
                )).\
                order_by(Product.name.desc()).\
                limit(3)
            return session.execute(stmt).scalars().all()

    def get_products(self, howto, howto_type, order_name, search=None):
        dt_start = datetime.datetime.now()
        with count_queries(session.connection()) as queries:
            products = self.get(howto, howto_type, order_name, search)
        print("total number of queries: %s" % len(queries))
        products_len = len(products)
        msecs = (datetime.datetime.now() - dt_start).total_seconds()*1000
        self.summary.append(dict(
            order_name=order_name,
            howto=howto,
            howto_type=howto_type,
            products_cnt=products_len,
            queries_cnt=len(queries),
            msecs=msecs,
        ))
        print('{} products: total {}, executed in {:.2f} msecs'.format(order_name, products_len, msecs))
        print(*('{:02d}: {}'.format(i, x.name) for i, x in enumerate(products)), sep='\n')

    def ftitle(self, title):
        print('+-{}'.format('-'*60))
        print('| {}'.format(title))
        print('+-{}'.format('-'*60))

    def print_summary(self):
        hs = '+-{}-+-{}-+-{}-+-{}-+-{}-+-{}'.format('-'*10, '-'*5, '-'*4, '-'*8, '-'*7, '-'*8)
        print(hs)
        print('| {} | {} | {} | {} | {} | {}'.format('order_name', 'howto', 'type', 'products', 'queries', 'msecs'))
        print(hs)
        for s in self.summary:
            print('| {: <10} | {: >5} | {: <4} | {: >8} | {: >7} | {}'.format(s['order_name'], s['howto'], s['howto_type'], s['products_cnt'], s['queries_cnt'], s['msecs']))
        print(hs)

order_ops = OrderOps()
order_ops.get_products(1, 'Walk', 'order_01')
order_ops.get_products(2, 'Orm ', 'order_01')
order_ops.get_products(3, 'Join', 'order_01')
order_ops.get_products(4, 'Link', 'order_01')

order_ops.print_summary()

order_ops.get_products(5, 'Orm ', 'order_01', '%2%')
order_ops.get_products(6, 'Join', 'order_01', '%2%')
order_ops.get_products(7, 'Link', 'order_01', '%2%')

order_ops.print_summary()

Zusammenfassung

Natürlich ist dieser Test sehr begrenzt. Aber ich glaube, es ist sehr wichtig, dass Sie wissen, was Sie tun, wenn Sie eine Methode auswählen. Wenn Sie die maximale Leistung benötigen, sollten Sie die Methode "Link" verwenden,
die nicht schwieriger ist als die anderen Methoden.

Links / Impressum

How to get SQL execution count for a query? #5709
https://github.com/sqlalchemy/sqlalchemy/issues/5709

SQLAlchemy - Basic Relationship Patterns
https://docs.sqlalchemy.org/en/14/orm/basic_relationships.html

Mehr erfahren

SQLAlchemy

Einen Kommentar hinterlassen

Kommentieren Sie anonym oder melden Sie sich zum Kommentieren an.

Kommentare

Eine Antwort hinterlassen

Antworten Sie anonym oder melden Sie sich an, um zu antworten.