angle-uparrow-clockwisearrow-counterclockwisearrow-down-uparrow-leftatcalendarcard-listchatcheckenvelopefolderhouseinfo-circlepencilpeoplepersonperson-fillperson-plusphoneplusquestion-circlesearchtagtrashx

SQLAlchemy Many-To-Many: Cuatro formas de seleccionar datos

12 abril 2022
post main image

Este es un breve post sobre la selección Many-To-Many con SQLAlchemy. En el pasado utilicé la tabla de asociación (enlace) en las consultas de ORM , porque pensé que debía ser más rápida. Esta vez hice una pequeña prueba comparando varias formas de seleccionar los datos.

Spoiler: La forma de la tabla de asociación (enlace) es (por supuesto) más rápida.

El modelo

Tenemos una relación Muchos-a-Muchos entre Pedidos y Productos.

# link table: order - product
order_mtm_product_table = sa.Table(
    'order_mtm_product',
    Base.metadata,
    sa.Column('order_id', sa.ForeignKey('order.id')),
    sa.Column('product_id', sa.ForeignKey('product.id')),
)

class Order(Base):
    __tablename__ = 'order'    

    id = sa.Column(sa.Integer, primary_key=True)
    name = sa.Column(sa.String(100), nullable=False)

    products = relationship(
        'Product',
        secondary=order_mtm_product_table,
        back_populates='orders',
    )

class Product(Base):
    __tablename__ = 'product'

    id = sa.Column(sa.Integer, primary_key=True)
    name = sa.Column(sa.String(100), nullable=False)

    orders = relationship(
        'Order',
        secondary=order_mtm_product_table,
        back_populates="products",
    )

Cuatro formas de seleccionar los datos

Las cuatro formas de seleccionar datos para una relación Muchos-a-Muchos:

  • Caminar
  • Orm
  • Unir
  • Enlace

Walk' es la selección de la orden y el recorrido de los productos

	stmt = sa.select(Order).\
		where(
			Order.name == order_name,
		)
	order_01 = session.execute(stmt).scalars().first()
	products = []
	for product in order_01.products:
		products.append(product)

Orm' es la forma trivial de seleccionar los productos

	stmt = sa.select(Product).\
		where(
			Product.orders.any(Order.name == order_name),
		)
	products = session.execute(stmt).scalars().all()

Join' utiliza un 'join' para seleccionar los productos

	stmt = sa.select(Product).\
		join(Product.orders).\
		where(
			Order.name == order_name,
		)
	products = session.execute(stmt).scalars().all()

Link' utiliza la tabla de enlaces para seleccionar los productos

	stmt = sa.select(Product).\
		where(sa.and_(
			Order.name == order_name,
			# mtm
			Order.id == order_mtm_product_table.c.order_id,
			Product.id == order_mtm_product_table.c.product_id,
		))
	products = session.execute(stmt).scalars().all()

Los resultados

Medimos el número de consultas y el tiempo total de obtención de los productos.

+------------+-------+------+----------+---------+---------
| order_name | howto | type | products | queries | msecs
+------------+-------+------+----------+---------+---------
| order_01   |     1 | Walk |       18 |       2 | 4.315
| order_01   |     2 | Orm  |       18 |       1 | 1.991
| order_01   |     3 | Join |       18 |       1 | 1.681
| order_01   |     4 | Link |       18 |       1 | 1.242
+------------+-------+------+----------+---------+---------

El método 'Walk' es el más lento porque utiliza dos consultas. Podemos reducirlo a una sola consulta utilizando eager loading. Si necesita el máximo rendimiento, utilice el método 'Link'.

Resultados extra después de añadir búsqueda, ordenación y límite

Ampliamos las consultas de 'Orm', 'Join' y 'Link' para permitir la búsqueda, el ordenamiento y el establecimiento de un límite en los elementos devueltos. Los resultados:

+------------+-------+------+----------+---------+---------
| order_name | howto | type | products | queries | msecs
+------------+-------+------+----------+---------+---------
| order_01   |     1 | Walk |       18 |       2 | 4.356
| order_01   |     2 | Orm  |       18 |       1 | 2.686
| order_01   |     3 | Join |       18 |       1 | 2.380
| order_01   |     4 | Link |       18 |       1 | 1.657
| order_01   |     5 | Orm  |        2 |       1 | 3.328
| order_01   |     6 | Join |        2 |       1 | 2.647
| order_01   |     7 | Link |        2 |       1 | 1.664
+------------+-------+------+----------+---------+---------

No hay cambios en el orden de actuación.

El código

Por si quieres probar tú mismo:

# query_mtm.py
import datetime
import sys

import sqlalchemy as sa
from sqlalchemy.orm import declarative_base, sessionmaker, relationship

# query counter - start
# see:
# How to get SQL execution count for a query? #5709 
# https://github.com/sqlalchemy/sqlalchemy/issues/5709
from sqlalchemy import event
import contextlib

@contextlib.contextmanager
def count_queries(conn):
    queries = []
    def before_cursor_execute(conn, cursor, statement, parameters, context, executemany):
        queries.append(statement)

    event.listen(conn, "before_cursor_execute", before_cursor_execute)
    try:
        yield queries
    finally:
        event.remove(conn, "before_cursor_execute", before_cursor_execute)
# query counter - end

connection_uri = 'sqlite:///query_mtm.sqlite'
engine = sa.create_engine(connection_uri, echo = True)
Base = declarative_base()


# link table: order - product
order_mtm_product_table = sa.Table(
    'order_mtm_product',
    Base.metadata,
    sa.Column('order_id', sa.ForeignKey('order.id')),
    sa.Column('product_id', sa.ForeignKey('product.id')),
)

class Order(Base):
    __tablename__ = 'order'    

    id = sa.Column(sa.Integer, primary_key=True)
    name = sa.Column(sa.String(100), nullable=False)

    products = relationship(
        'Product',
        secondary=order_mtm_product_table,
        back_populates='orders',
    )

class Product(Base):
    __tablename__ = 'product'

    id = sa.Column(sa.Integer, primary_key=True)
    name = sa.Column(sa.String(100), nullable=False)

    orders = relationship(
        'Order',
        secondary=order_mtm_product_table,
        back_populates="products",
    )

Base.metadata.drop_all(engine, checkfirst=True)
Base.metadata.create_all(engine)

# ### create products and orders ###
print('create session')
Session = sessionmaker(bind=engine)
session = Session()

n_products = 20
print('create products ...')
products = []
for i in range(n_products):
    products.append(
        Product(
            name='name_{:02d}'.format(i)
        )
    )
session.add_all(products)
session.commit()

print('create orders ...')
order_00 = Order(
    name='order_00',
	products=[
		products.pop(3),
		products.pop(3),
	]
)
order_01 = Order(
    name='order_01',
    products=products,
)
session.add_all([order_00, order_01])
session.commit()

print('close session and close all connections of the connection pool')
session.close()
engine.dispose() 


# ### query products and orders ###
print('create session')
engine = sa.create_engine(connection_uri, echo=True)
Session = sessionmaker(bind=engine)
session = Session()

print('dummy query to start sqlalchemy ...')
stmt = sa.select(Order).where(Order.name == 'order_00')
order_00 = session.execute(stmt).scalars().first()


class OrderOps:

    def __init__(self):
        self.summary = []

    def get(self, howto, howto_type, order_name, search=None):
        if howto == 1:
            self.ftitle('1. [{}] get order, then get products ...'.format(howto_type))
            stmt = sa.select(Order).\
                where(
                    Order.name == order_name,
                )
            order_01 = session.execute(stmt).scalars().first()
            products = []
            for product in order_01.products:
                products.append(product)
            return products

        elif howto == 2:
            self.ftitle('2. [{}] select all products from order ...'.format(howto_type))
            stmt = sa.select(Product).\
                where(
                    Product.orders.any(Order.name == order_name),
                )
            return session.execute(stmt).scalars().all()

        elif howto == 3:
            self.ftitle('3. [{}] select all products from order using join ...'.format(howto_type))
            stmt = sa.select(Product).\
                join(Product.orders).\
                where(
                    Order.name == order_name,
                )
            return session.execute(stmt).scalars().all()

        elif howto == 4:
            self.ftitle('4. [{}] select all products from order using the link table ourselves ...'.format(howto_type))
            stmt = sa.select(Product).\
                where(sa.and_(
                    Order.name == order_name,
                    # mtm
                    Order.id == order_mtm_product_table.c.order_id,
                    Product.id == order_mtm_product_table.c.product_id,
                ))
            return session.execute(stmt).scalars().all()

        # with search, order and limit

        elif howto == 5:
            self.ftitle('5. as 2. but with search, ordering and limit ...')
            stmt = sa.select(Product).\
                where(sa.and_(
                    Product.orders.any(Order.name == order_name),
                    Product.name.like(search),
                )).\
                order_by(Product.name.desc()).\
                limit(3)
            return session.execute(stmt).scalars().all()

        elif howto == 6:
            self.ftitle('6. as 3. but with search, ordering and limit ...')
            stmt = sa.select(Product).\
                join(Product.orders).\
                where(sa.and_(
                    Order.name == order_name,
                    Product.name.like(search),
                )).\
                order_by(Product.name.desc()).\
                limit(3)
            return session.execute(stmt).scalars().all()

        elif howto == 7:
            self.ftitle('7. as 4. but with search, ordering and limit ...')
            stmt = sa.select(Product).\
                where(sa.and_(
                    Order.name == order_name,
                    Product.name.like(search),
                    # mtm
                    Order.id == order_mtm_product_table.c.order_id,
                    Product.id == order_mtm_product_table.c.product_id,
                )).\
                order_by(Product.name.desc()).\
                limit(3)
            return session.execute(stmt).scalars().all()

    def get_products(self, howto, howto_type, order_name, search=None):
        dt_start = datetime.datetime.now()
        with count_queries(session.connection()) as queries:
            products = self.get(howto, howto_type, order_name, search)
        print("total number of queries: %s" % len(queries))
        products_len = len(products)
        msecs = (datetime.datetime.now() - dt_start).total_seconds()*1000
        self.summary.append(dict(
            order_name=order_name,
            howto=howto,
            howto_type=howto_type,
            products_cnt=products_len,
            queries_cnt=len(queries),
            msecs=msecs,
        ))
        print('{} products: total {}, executed in {:.2f} msecs'.format(order_name, products_len, msecs))
        print(*('{:02d}: {}'.format(i, x.name) for i, x in enumerate(products)), sep='\n')

    def ftitle(self, title):
        print('+-{}'.format('-'*60))
        print('| {}'.format(title))
        print('+-{}'.format('-'*60))

    def print_summary(self):
        hs = '+-{}-+-{}-+-{}-+-{}-+-{}-+-{}'.format('-'*10, '-'*5, '-'*4, '-'*8, '-'*7, '-'*8)
        print(hs)
        print('| {} | {} | {} | {} | {} | {}'.format('order_name', 'howto', 'type', 'products', 'queries', 'msecs'))
        print(hs)
        for s in self.summary:
            print('| {: <10} | {: >5} | {: <4} | {: >8} | {: >7} | {}'.format(s['order_name'], s['howto'], s['howto_type'], s['products_cnt'], s['queries_cnt'], s['msecs']))
        print(hs)

order_ops = OrderOps()
order_ops.get_products(1, 'Walk', 'order_01')
order_ops.get_products(2, 'Orm ', 'order_01')
order_ops.get_products(3, 'Join', 'order_01')
order_ops.get_products(4, 'Link', 'order_01')

order_ops.print_summary()

order_ops.get_products(5, 'Orm ', 'order_01', '%2%')
order_ops.get_products(6, 'Join', 'order_01', '%2%')
order_ops.get_products(7, 'Link', 'order_01', '%2%')

order_ops.print_summary()

Resumen

Por supuesto esta es una prueba muy limitada. Pero creo que es muy importante que sepas lo que estás haciendo cuando seleccionas un método. Si necesitas el máximo rendimiento, utiliza el método 'Link',
no es más difícil que los otros métodos.

Enlaces / créditos

How to get SQL execution count for a query? #5709
https://github.com/sqlalchemy/sqlalchemy/issues/5709

SQLAlchemy - Basic Relationship Patterns
https://docs.sqlalchemy.org/en/14/orm/basic_relationships.html

Leer más

SQLAlchemy

Deje un comentario

Comente de forma anónima o inicie sesión para comentar.

Comentarios

Deje una respuesta.

Responda de forma anónima o inicie sesión para responder.