-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Got rid of redundant JOINs/WHEREs of WhereInWalker #10917
base: 3.3.x
Are you sure you want to change the base?
Got rid of redundant JOINs/WHEREs of WhereInWalker #10917
Conversation
e5d9d71
to
e60ba0f
Compare
e60ba0f
to
2d74ff0
Compare
Please explain what problem exactly you're attempting to solve with your change. And please have a look at the failing tests. |
@derrabus please check the updated PR description, and also I pushed some code changes that fix coding standards/test and added testRedunandQueryPartsAreRemovedForWhereInWalker test which demonstrates this PR changes. |
@SerheyDolgushev do you have any benchmarks to show the performance improvement from this PR? I'm interested to see if there is any real-world benefit besides tweaking the generated query. |
@cjunge I see your concerns, but performance improvement depends on the actual query. The more joins and conditions the original query has, the more its performance will improve with this change. I just used a simple CREATE TABLE cms_users (
id BIGINT UNSIGNED AUTO_INCREMENT NOT NULL,
name VARCHAR(32) NOT NULL,
PRIMARY KEY(id)
);
CREATE TABLE cms_addresses (
id BIGINT UNSIGNED AUTO_INCREMENT NOT NULL,
user_id BIGINT UNSIGNED NOT NULL,
country CHAR(2) NOT NULL,
city VARCHAR(32) NOT NULL,
street VARCHAR(32) NOT NULL,
PRIMARY KEY(id)
);
ALTER TABLE cms_addresses ADD CONSTRAINT FK_user FOREIGN KEY (user_id) REFERENCES cms_users (id);
INSERT INTO cms_users (`name`)
VALUES ('John'), ('Nicole'), ('David');
INSERT INTO cms_addresses (`user_id`, `country`, `city`, `street`)
VALUES
(1, 'UK', 'London', 'Street #1'),
(1, 'US', 'Tampa', 'Street #2'),
(2, 'FR', 'Paris', 'Street #3'),
(3, 'US', 'Miami', 'Street #4')
; And let's assume that in the original query, you are getting users with US addresses. In this case, the first query to get unique entity IDs will be: mysql> SELECT DISTINCT c0_.id as id_0
-> FROM cms_users c0_
-> INNER JOIN cms_addresses c1_ ON c1_.user_id = c0_.id
-> WHERE c1_.country = 'US';
+------+
| id_0 |
+------+
| 1 |
| 3 |
+------+
2 rows in set (0.00 sec) And without this PR changes the second query to fetch actual users data will be: SELECT
c0_.id AS id_0,
c0_.name AS name_1
FROM cms_users c0_
INNER JOIN cms_addresses c1_ ON c1_.user_id = c0_.id
WHERE
c1_.country = 'US'
AND c0_.id IN (1, 3); But with this PR changes the second query to fetch actual users data will be: SELECT
c0_.id AS id_0,
c0_.name AS name_1
FROM cms_users c0_
WHERE c0_.id IN (1, 3); The difference between these two queries is: mysql> EXPLAIN SELECT
-> c0_.id AS id_0,
-> c0_.name AS name_1
-> FROM cms_users c0_
-> INNER JOIN cms_addresses c1_ ON c1_.user_id = c0_.id
-> WHERE
-> c1_.country = 'US'
-> AND c0_.id IN (1, 3);
+----+-------------+-------+------------+-------+---------------+---------+---------+-------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------------+------+----------+-------------+
| 1 | SIMPLE | c0_ | NULL | range | PRIMARY | PRIMARY | 8 | NULL | 2 | 100.00 | Using where |
| 1 | SIMPLE | c1_ | NULL | ref | FK_user | FK_user | 8 | test.c0_.id | 1 | 25.00 | Using where |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec) VS mysql> EXPLAIN SELECT
-> c0_.id AS id_0,
-> c0_.name AS name_1
-> FROM cms_users c0_
-> WHERE c0_.id IN (1, 3);
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | c0_ | NULL | range | PRIMARY | PRIMARY | 8 | NULL | 2 | 100.00 | Using where |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec) This will become more noticeable the more joins/conditions/rows are used in the query. Regarding introducing additional complexity with this PR, I tried to cover just the most used and basic cases to make this as simple as possible. Also, all the changes in the unit tests should give a good perspective on the context here. Please let me know if there are any other questions you would like to discuss. |
There hasn't been any activity on this pull request in the past 90 days, so it has been marked as stale and it will be closed automatically if no further activity occurs in the next 7 days. |
@SerheyDolgushev Looks like this PR did not get much traction, which is probably due to its complexity and the nieche problem that it solves. We don't introduce this kind of improvement to ORM 2 anymore, so the PR would need to be reworked to ORM 3. Do you want to do that or shall we close the PR? |
@derrabus to be honest I had to go through the PR description to refresh my memories about it. And seems like it is still useful change and brings value. So I merged it with |
There hasn't been any activity on this pull request in the past 90 days, so it has been marked as stale and it will be closed automatically if no further activity occurs in the next 7 days. |
I am leaving this comment to keep this PR active (and pinging @derrabus). |
Doctrine\ORM\Tools\Pagination\Paginator
is an amazing tool that lets you fetch entities in chunks. And it sends two separate SQL queries:For simplicity let's call them id and data queries.
To demonstrate it, lets use CmsUser and CmsAddress entities:
The code above will trigger two SQL queries:
id
data
The problem is that the data query has:
INNER JOIN cms_addresses c1_
c1_.city = ?
This PR tries to optimize the data query.
Basically, this PR does the following:
JOIN
s are skipped for the data query if not any join is used inSELECT
,GROUP BY
, orORDER BY
clauses. It is done only if theHAVING
clause is not specified (simplicity is the main reason, and it can be optimized in the future).I want to clarify that the EasyAdmin
createIndexQueryBuilder
triggered this PR, and all the index actions in EasyAdmin should benefit from it.