angle-uparrow-clockwisearrow-counterclockwisearrow-down-uparrow-leftatcalendarcard-listchatcheckenvelopefolderhouseinfo-circlepencilpeoplepersonperson-fillperson-plusphoneplusquestion-circlesearchtagtrashx

Flask + SQLAlchemy + MariaDB profiling

Checking timing is another way of looking for errors

29 July 2019 Updated 29 August 2019
post main image
Original photo unsplash.com/@elmeng.

Performance is a never ending story. At a certain moment I noticed that putting the home page on the screen took some 370 milliseconds using the Flask development server with debug messages shown. That seemed far too long! How did I notice this? Because at the bottom of every page I show the time it takes to generate a page. Please note that the timings below are from a more optimized version of the blog_reference model where I added an index to the name field.

Showing page generation time

This is measured using Flask's before_request and after_request handlers.

    @app.before_request
    def before_request():
        app.logger.debug(fname + ': before_request')
        # start time of request
        g.start = time.time()

    ... 

    @app.after_request
    def after_request(response):
        if ((response.response) and (200 <= response.status_code < 300) and (response.content_type.startswith('text/html'))):
            diff_milliseconds = int( (time.time() - g.start) * 1000 )
            response.set_data(response.get_data().replace(b'47', bytes(str(diff_milliseconds), 'utf-8')))
        return response

This will slow down your page a little bit, because after the page has been rendered the REQUEST_TIME is inserted using a string replace.

SQLAlchemy with ApacheBench test and connection failures

I like ApacheBench, it is easy to use and gives fast an indication of response times and failures. For a concurrency of 1, the command is:

ab -n 100 -c 1 http://127.0.0.1:8000/

The result:

This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)...
..done


Server Software:        Werkzeug/0.14.1
Server Hostname:        127.0.0.1
Server Port:            8000

Document Path:          /
Document Length:        25160 bytes

Concurrency Level:      1
Time taken for tests:   6.897 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      2539800 bytes
HTML transferred:       2516000 bytes
Requests per second:    14.50 [#/sec] (mean)
Time per request:       68.972 [ms] (mean)
Time per request:       68.972 [ms] (mean, across all concurrent requests)
Transfer rate:          359.61 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    64   69   4.1     68      81
Waiting:       64   69   4.0     68      81
Total:         64   69   4.1     68      81

Percentage of the requests served within a certain time (ms)
  50%     68
  66%     70
  75%     71
  80%     73
  90%     76
  95%     78
  98%     80
  99%     81
 100%     81 (longest request)

Important: always check that Failed requests is 0! Also check the Document length. With a concurrency of 10, I got Failed requests:

ab -n 100 -c 10 http://127.0.0.1:8000/

SQLAlchemy has a default of 10 connections so I increased this to 50 with a maximum of 60:

engine = create_engine(os.getenv('SQLALCHEMY_DATABASE_URI'), pool_size=50, max_overflow=60, pool_pre_ping=True, echo=True)

You cannot keep increasing the number of connections. MariaDB/MySQL has a maximum number of connections, check it by typing in the database console:

show variables like 'max_connections';

For me this was 100. The pool_pre_ping parameter is used to prevent database disconnects, see also links below. I added this after random failures. With the pool_size set to 50, the Failed requests did not show up anymore.

Locating the cause of the slow page generation

I could look into SQLAlchemy profiling but was more interested in seeing what happend in the view function showing the home page. First, I added the timer code to the start and end of parts of the home page view function:

    query_timers = []
    ...
    query_timers.append( {'query': 'post_id2authors', 'time_start': datetime.datetime.now() } )
    post_id2authors=get_blog_post_authors(post_ids),
    query_timers.append( {'query': 'post_id2authors', 'time_end': datetime.datetime.now() } )
    ...
    query_timers.append( {'query': 'post_id2tags', 'time_start': datetime.datetime.now() } )
    post_id2tags=get_blog_post_tags(post_ids),
    query_timers.append( {'query': 'post_id2tags', 'time_end': datetime.datetime.now() } )
    ...

Second, I added the following 'show timers' code to the end of the home page view function:

    # timer: get queries
    query_timers_queries = []
    query_timer_query2time_start_time_ends = {}
    for query_timer in query_timers: 
        query = query_timer['query']
        if query not in query_timer_query2time_start_time_ends:
            query_timer_query2time_start_time_ends[query] = { 'time_start': 0, 'time_end': 0 }
        if 'time_start' in query_timer:
            query_timer_query2time_start_time_ends[query]['time_start'] = query_timer['time_start']
        if 'time_end' in query_timer:
            query_timer_query2time_start_time_ends[query]['time_end'] = query_timer['time_end']

    # timer: loop through queries
    current_app.logger.debug(fname + ': len(query_timer_query2time_start_time_ends) = {}'.format(len(query_timer_query2time_start_time_ends)))
    current_app.logger.debug(fname + ': query_timer_query2time_start_time_ends = {}'.format(query_timer_query2time_start_time_ends))
    for query in query_timer_query2time_start_time_ends:
        if query_timer_query2time_start_time_ends[query]['time_start'] != 0 and query_timer_query2time_start_time_ends[query]['time_end']: 
            delta = query_timer_query2time_start_time_ends[query]['time_end'] - query_timer_query2time_start_time_ends[query]['time_start']
            current_app.logger.debug(fname + ': query = {}, delta = {}'.format(query, delta))

This code prints the time of all parts. Running the website home home page gave the following debug information:

peterspython_container_web | [2019-07-29 18:33:17,932] DEBUG in views: homepage: query = select_posts, delta = 0:00:00.009654
peterspython_container_web | [2019-07-29 18:33:17,932] DEBUG in views: homepage: query = post_id2authors, delta = 0:00:00.007860
peterspython_container_web | [2019-07-29 18:33:17,933] DEBUG in views: homepage: query = post_id2categories, delta = 0:00:00.011795
peterspython_container_web | [2019-07-29 18:33:17,933] DEBUG in views: homepage: query = post_id2tags, delta = 0:00:00.009891
peterspython_container_web | [2019-07-29 18:33:17,933] DEBUG in views: homepage: query = post_id2references, delta = 0:00:00.042603
peterspython_container_web | [2019-07-29 18:33:17,933] DEBUG in views: homepage: query = render_template, delta = 0:00:00.019005

This showed that the post_id2references query was not as expected. Inspecting the query I noticed that I forgot a query condition ... :-( After fixing this, performance was much much better!

MariaDB / MySQL query profiling

Afterwards I also checked the query in MariaDB. This is easy to do. Using the database terminal, first enable profiling:

SET profiling = 1;

To clear the cache of the database engine:

RESET QUERY CACHE

To show the timings of the queries type:

SHOW PROFILES

This gives a list like:

+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query_ID | Duration   | Query                                                                                                                                                                                                                                                                                                        |
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|        1 | 0.00007394 | SELECT blog_post.id AS blog_post_id, blog_reference.id AS blog_reference_id, blog_reference.site_id AS blog_reference_site_id, blog_reference.deleted AS blog_reference_deleted, blog_reference.deleted_on AS blog_reference_deleted_on, blog_reference.status AS blog_reference_status, blog_reference.crea |
|        2 | 0.00005384 | RESET QUERY CACHE                                                                                                                                                                                                                                                                                            |
|        3 | 0.00133299 | SELECT blog_post.id AS blog_post_id, blog_reference.id AS blog_reference_id, blog_reference.site_id AS blog_reference_site_id, blog_reference.deleted AS blog_reference_deleted, blog_reference.deleted_on AS blog_reference_deleted_on, blog_reference.status AS blog_reference_status, blog_reference.crea |
|        4 | 0.00006290 | RESET QUERY CACHE                                                                                                                                                                                                                                                                                            |
|        5 | 0.00154395 | SELECT blog_post.id AS blog_post_id, blog_reference.id AS blog_reference_id, blog_reference.site_id AS blog_reference_site_id, blog_reference.deleted AS blog_reference_deleted, blog_reference.deleted_on AS blog_reference_deleted_on, blog_reference.status AS blog_reference_status, blog_reference.crea |
|        6 | 0.00175268 | describe blog_reference                                                                                                                                                                                                                                                                                      |
|        7 | 0.00013651 | RESET QUERY CACHE                                                                                                                                                                                                                                                                                            |
|        8 | 0.00223707 | SELECT blog_post.id AS blog_post_id, blog_reference.id AS blog_reference_id, blog_reference.site_id AS blog_reference_site_id, blog_reference.deleted AS blog_reference_deleted, blog_reference.deleted_on AS blog_reference_deleted_on, blog_reference.status AS blog_reference_status, blog_reference.crea |
|        9 | 0.00008391 | SELECT blog_post.id AS blog_post_id, blog_reference.id AS blog_reference_id, blog_reference.site_id AS blog_reference_site_id, blog_reference.deleted AS blog_reference_deleted, blog_reference.deleted_on AS blog_reference_deleted_on, blog_reference.status AS blog_reference_status, blog_reference.crea |
+----------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
9 rows in set (0.00 sec)

You can see that after a RESET QUERY CACHE the query is really executed, the time goes from 0.07 milliseconds to some 1.3 milliseconds. This means that MariaDB/MySQL performs query caching.

Because queries are hashed in the MariaDB Query Cache there is no way to see which queries are in the cache. But it is still possible to check if SQLAlchemy queries actually use cached query results by examining the MariaDB Query Cache status:

SHOW STATUS LIKE 'Qcache%';

The output of this command is:

+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Qcache_free_blocks      | 7        |
| Qcache_free_memory      | 66685576 |
| Qcache_hits             | 23370    |
| Qcache_inserts          | 2873     |
| Qcache_lowmem_prunes    | 0        |
| Qcache_not_cached       | 29673    |
| Qcache_queries_in_cache | 62       |
| Qcache_total_blocks     | 152      |
+-------------------------+----------+

Now we refresh the home page and run ApacheBench, -n 100 -c 1, to generate 100 home page requests. Then we run the Query Cache status command again and compare the Qcache_hits and other data:

+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Qcache_free_blocks      | 7        |
| Qcache_free_memory      | 66685576 |
| Qcache_hits             | 23870    |
| Qcache_inserts          | 2873     |
| Qcache_lowmem_prunes    | 0        |
| Qcache_not_cached       | 29873    |
| Qcache_queries_in_cache | 62       |
| Qcache_total_blocks     | 152      |
+-------------------------+----------+

I know the home page uses 5 to 6 queries.  The number of Qcache_hits is up from 23366 to 23870 = 504, which looks very reasonable.  

After a few hours I feel that the SQL Alchemy queries much (?) slower than raw sql. For the moment it is good enough. I believe It is no use to look into SQLAlchemy dogpile because I may want to scale the frontend over multiple servers.

Links / credits

Calculate execution time for every page in Python's Flask
https://stackoverflow.com/questions/12273889/calculate-execution-time-for-every-page-in-pythons-flask

Clear MySQL query cache without restarting server
https://stackoverflow.com/questions/5231678/clear-mysql-query-cache-without-restarting-server

Connection Pooling
https://docs.sqlalchemy.org/en/13/core/pooling.html

Dealing with Disconnects
https://docs.sqlalchemy.org/en/13/core/pooling.html?highlight=disconnects#dealing-with-disconnects

How can I profile a SQLAlchemy powered application?
https://docs.sqlalchemy.org/en/13/faq/performance.html

How to increase MySQL connections(max_connections)?
https://stackoverflow.com/questions/22297773/how-to-increase-mysql-connectionsmax-connections

Query Cache
https://mariadb.com/kb/en/library/query-cache/

SHOW PROFILE Syntax
https://dev.mysql.com/doc/refman/5.5/en/show-profile.html

Leave a comment

Comment anonymously or log in to comment.

Comments

Leave a reply

Reply anonymously or log in to reply.