fix: vastly reduces number of db queries #252

abaisero · 2022-04-11T04:48:04Z

Problem: The creation of some tables was causing a large number of
queries to be run which itself scaled with the size of the database,
i.e., larger databases (such as the one in production) caused more
queries to be made the the db itself (a problem which componds on
itself). This issue was not apparent in development because of the
smaller size of the development db.

First, we can verify the above problem by increasing the size of the
development db. We can do this by temporarily editing
scripts/entrypoint.sh, chanding line

    python make_fake_fixtures.py 1000 1000 1000 > /tmp/fake_agagd_data.json

with

    python make_fake_fixtures.py 100 10000 100 > /tmp/fake_agagd_data.json

and then deleting and recreating the docker images/containers/volumes.

Before the change, the development database was creating data for 1000
players, 1000 games, and 1000 tournaments. Note that this meant that,
on average, each player only played one game and each tournament only
contained one game. This was helping mask the primary issue. After the
change, the development database is creating 100 players, 10_000 games,
and 100 tournaments, meaning that, on average, each player played 100
games, and each tournament contained 100 games. We note that, although
this new development db is still significantly smaller than the one in
production, the local development app is already having trouble
responding to requests in involving player and tournament pages.

With the new larger development database in place, we observe the
following concerning facts:

a random player page requires ~1000 queries to be made.
a random tournament page requires ~100 queries to be made.

Solution:

This commit addresses the above issues by making 2 primary changes in
how some tables are computed or rendered:

The first involves the way player names and ids were rendered in the
game tables using the custom
agagd_core.tables.players.LinkFullMembersNameColumn. This custom
column had a .render() method which internally made a query to the
db. Then, for the rendering of the whole table, the .render()
method was being called for every entry in the table data, causing
many queries to be made. To solve this issue, we removed
LinkFullMembersNameColumn altogether, and constructed the
appropriate queries to construct the "player-name (player-id)" label
once for all entries in the table.
The second involves the way opponent and tournament data was
collected to create the opponent and tournament tables in the
player_profile view. This data was collected and manipulated in pure
python, using explicit python loops over the Games model, and at
least one db query per iteration. To solve this issue, we created
appropriate queries which gathered and aggregated the required data
using a couple of large queries (rather than many small ones).

After the above changes:

the number of queries made in a random player page dropped from ~1000
to ~10.
the number of queries made in a random tournament page dropped from
~100 to ~5.

Problem: The creation of some tables was causing a large number of queries to be run which itself scaled with the size of the database, i.e., larger databases (such as the one in production) caused more queries to be made the the db itself (a problem which componds on itself). This issue was not apparent in development because of the smaller size of the development db. First, we can verify the above problem by increasing the size of the development db. We can do this by temporarily editing `scripts/entrypoint.sh`, chanding line python make_fake_fixtures.py 1000 1000 1000 > /tmp/fake_agagd_data.json with python make_fake_fixtures.py 100 10000 100 > /tmp/fake_agagd_data.json and then deleting and recreating the docker images/containers/volumes. Before the change, the development database was creating data for 1000 players, 1000 games, and 1000 tournaments. Note that this meant that, on average, each player only played one game and each tournament only contained one game. This was helping mask the primary issue. After the change, the development database is creating 100 players, 10_000 games, and 100 tournaments, meaning that, on average, each player played 100 games, and each tournament contained 100 games. We note that, although this new development db is still significantly smaller than the one in production, the local development app is already having trouble responding to requests in involving player and tournament pages. With the new larger development database in place, we observe the following concerning facts: * a random player page requires ~1000 queries to be made. * a random tournament page requires ~100 queries to be made. Solution: This commit addresses the above issues by making 2 primary changes in how some tables are computed or rendered: 1. The first involves the way player names and ids were rendered in the game tables using the custom `agagd_core.tables.players.LinkFullMembersNameColumn`. This custom column had a `.render()` method which internally made a query to the db. Then, for the rendering of the whole table, the `.render()` method was being called for every entry in the table data, causing many queries to be made. To solve this issue, we removed `LinkFullMembersNameColumn` altogether, and constructed the appropriate queries to construct the "player-name (player-id)" label once for all entries in the table. 2. The second involves the way opponent and tournament data was collected to create the opponent and tournament tables in the player_profile view. This data was collected and manipulated in pure python, using explicit python loops over the Games model, and at least one db query per iteration. To solve this issue, we created appropriate queries which gathered and aggregated the required data using a couple of large queries (rather than many small ones). After the above changes: * the number of queries made in a random player page dropped from ~1000 to ~10. * the number of queries made in a random tournament page dropped from ~100 to ~5.

abaisero · 2022-04-11T04:52:21Z

I will note that this commit gets rid of a complicated part of code which was constructing the data for some tables, and reimplements similar logic using more straightforward queries. However, in doing so, some facets of the previous data-collection may have been changed. I will add some comments in the places where I know this is happening, so that somebody can review the importance of those aspects.

abaisero · 2022-04-11T04:54:08Z

agagd/agagd_core/views/players_profile.py

-                # Set default game_date to None
-                game_date = None
-
-                # Check for 0000-00-00 dates


The purpose of this check is not entirely clear; supposedly all games should have a valid date? This piece of logic, which leaves the game_date as None in selected cases, is not reflected in the new implementation.

the records in the games database are not clean or unified. generally a good idea to assume that the checks are because there's bad data in prod that are >20 years old :) I'll see if i can run some queries.

very fair, but I'm still a bit confused as to how what exactly this check is doing;

the Game table uses DateField to represent dates, which returns datetime.date objects. So this comparison against a string would always return True, meaning that the field is always used anyway.. Further, you cannot create a `datetime.date(0, 0, 0)' object to begin with, as the year/month/day combination is out of range, so it's not like there's any chance that any of the dates actually match '0000-00-00'. I'm a bit skeptical this check is doing anything at all; if there's a way for you to access the database and find some examples of problematic game dates, that'll help clear a light into what's going on here, and how I can integrate the special cases in the current query.

abaisero · 2022-04-11T04:56:16Z

agagd/agagd_core/views/players_profile.py

-                tourney_data.values(),
-                key=lambda d: d.get("date", date.today()) or date.today(),
-                reverse=True,
-            ),


The purpose of giving tourney_data twice to the PlayersTournamentTable is unclear to me. Especially since the first time is unsorted while the second time is sorted. This may represent some kind of logic which is not reflected in the new implementation.

Before this commit, the construction of "player_name (player_id)" was performed at the db query level by creating a new composite data field. After this commit, the construction of "player_name (player_id)" is performed post-query as a simpler rendering function based on given fields.

amj

this is definitely faster but the tournament totals are not working correctly.

amj · 2022-05-01T21:11:42Z

agagd/agagd_core/views/players_profile.py

-                # Set default game_date to None
-                game_date = None
-
-                # Check for 0000-00-00 dates


the records in the games database are not clean or unified. generally a good idea to assume that the checks are because there's bad data in prod that are >20 years old :) I'll see if i can run some queries.

amj · 2022-05-01T21:17:17Z

agagd/agagd_core/views/players_profile.py

+                tournament_date=F("tournament_code__tournament_date"),
+                tournament_total_players=F("tournament_code__total_players"),
+                date=F("game_date"),
+                won=Count("game_id", filter=Q_won),


This doesn't seem to be working -- all tournaments report just a single game. Possibly missing a group-by?

I don't see them always as being a single game with the fixture data, but I do see something fishy going on; a bit hard to diagnose because the fixture data is not realistic, e.g., many players share the same name, and a tournament is made up from games from different dates. Currently looking into this to see what is the problem.

found the issue; the entire aggregation was wrong because I was trying to construct the data starting from the Games table rather than the Tournament table.. Already have a fix; I'll push it after I address the other feedback.

Actually nvm; I've pushed this one for now, since I'm not sure what to make of the other issue. Latest commit should fix the tournament table in the player profile.

the previous commit which reduces the number of db queries was constructing tournament tables in player profiles with wrong numbers of won/lost games. This commit fixes that issue.

abaisero requested a review from a team as a code owner April 11, 2022 04:48

abaisero commented Apr 11, 2022

View reviewed changes

amj reviewed May 1, 2022

View reviewed changes

fix: tournaments table in player profile

65649ca

the previous commit which reduces the number of db queries was constructing tournament tables in player profiles with wrong numbers of won/lost games. This commit fixes that issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: vastly reduces number of db queries #252

fix: vastly reduces number of db queries #252

abaisero commented Apr 11, 2022

abaisero commented Apr 11, 2022

abaisero Apr 11, 2022

amj May 1, 2022

abaisero May 2, 2022

abaisero Apr 11, 2022

amj left a comment

amj May 1, 2022

amj May 1, 2022

abaisero May 1, 2022

abaisero May 2, 2022

abaisero May 2, 2022

fix: vastly reduces number of db queries #252

Are you sure you want to change the base?

fix: vastly reduces number of db queries #252

Conversation

abaisero commented Apr 11, 2022

abaisero commented Apr 11, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment