Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No fast-track for bulk inserts in Cursor.executemany with INSERT/REPLACE syntax introduced in MySQL 8.0.19 #116

Open
kozchris opened this issue Jan 24, 2025 · 2 comments

Comments

@kozchris
Copy link

asyncmy is not handling the new syntax for doing bulk inserts correctly. The same issue also exists in aiomysql with a good explanation of what is happening. That issue is: aio-libs/aiomysql#968

MySQL is deprecating this syntax:

INSERT INTO t1 (a,b,c) VALUES (1,2,3),(4,5,6)
  ON DUPLICATE KEY UPDATE c=VALUES(a)+VALUES(b);

and asyncmy emits a warning about it. However if you try to use the new syntax:

INSERT INTO t1 (a,b,c) VALUES (1,2,3),(4,5,6) AS new(m,n,p)
  ON DUPLICATE KEY UPDATE c = m+n;

the insertion speed becomes extremely slow.

I'm attaching the same example program as I did in the aiomysql bug:

I am seeing this issue in asyncmy = "0.2.10".

From my test app:

DEBUG:root:On deprecated named col duplicate insert of 8000 rows took 0.275493860244751 seconds
DEBUG:root:On named col duplicate insert of 8000 rows took 24.421008110046387 seconds
DEBUG:root:On placeholder duplicate insert of 8000 rows took 1.0064198970794678 seconds

I'll attach a sample program demonstrating the issue. For ease, add a .env file with the following information for your db and then run the dbtest2.py file:

#.env

DATABASE_USER=xxx
DATABASE_PASSWORD=yyy
DATABASE_HOST=aaa
DATABASE_NAME=zzzz

db_test2.py.zip

@kozchris
Copy link
Author

Additional note, in my example, changing the syntax from something like:

INSERT INTO my_table (c1, c2)
VALUES (VALUES (%(c1)s,VALUES (%(c2)s)
AS new 
ON DUPLICATE KEY UPDATE
c2=new.c2

and using named values as the input to something like:

INSERT INTO my_table (c1, c2)
VALUES (%s, %s )
AS new 
ON DUPLICATE KEY UPDATE
c2=new.c2

and using rows as input, the query performed almost as fast as the deprecated syntax. However, when I tried to create a generic table to demonstrate the phenomenon, I couldn't. So something is also special with the example table structure. That could be related to the underlying issue, or could be another issue.

@kozchris
Copy link
Author

Another note, the mysql-connector-python lib handles the syntax with no problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant