Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEW Add ORM abstraction for "WITH" clauses #10943

Merged

Conversation

GuySartorelli
Copy link
Member

@GuySartorelli GuySartorelli commented Sep 12, 2023

Note that several times the code or comments in this PR refer to "CTE" - that stands for "Common Table Expression" and is a very very common way to refer to a WITH statement in SQL, making it a lot easier to find with a text search since "with" is such a common word.

Issue

@GuySartorelli GuySartorelli marked this pull request as draft September 12, 2023 00:37
@GuySartorelli GuySartorelli force-pushed the pulls/5/recursive-orm-queries branch 11 times, most recently from 491d5b2 to 08ab07e Compare September 18, 2023 21:15
@GuySartorelli GuySartorelli force-pushed the pulls/5/recursive-orm-queries branch 3 times, most recently from feb2260 to 53f3750 Compare September 21, 2023 00:06
@GuySartorelli GuySartorelli force-pushed the pulls/5/recursive-orm-queries branch 5 times, most recently from d4eac7c to 983bd0a Compare September 25, 2023 03:10
Comment on lines +798 to +843
/**
* Tests that CTE queries have appropriate JOINs for subclass tables etc.
* If `$query->query()->` was replaced with `$query->query->` in DataQuery::with(), this test would throw an exception.
* @doesNotPerformAssertions
*/
public function testWithUsingDataQueryAppliesRelations()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using @doesNotPerformAssertions is the correct way to mark that a test passes simply by not throwing an exception. There's another test in the codebase that also does this.
The alternative would be to either:

  1. add an assertion in here like $this->assertTrue(true); which is redundant with the @doesNotPerformAssertions annotation, or
  2. add an assertion testing the result of the query, which makes it look like we care about the result of the query, which we don't

@GuySartorelli GuySartorelli force-pushed the pulls/5/recursive-orm-queries branch 5 times, most recently from 92b5d80 to e68f33e Compare September 25, 2023 03:54
@GuySartorelli GuySartorelli marked this pull request as ready for review September 25, 2023 03:54
Comment on lines +344 to +347
// Assume it's lower if it's not a proper version number
if (!preg_match('/^(\d+\.){2}\d+$/', $actualVersion)) {
return -1;
}
Copy link
Member Author

@GuySartorelli GuySartorelli Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm doing this for safety against things like a proxy-db which could have a weird version syntax. I'd much rather just assume those don't support it so we can safely use alternative queries rather than throwing exceptions due to unexpected version strings or incorrectly saying it's supported when it isn't.

@GuySartorelli GuySartorelli force-pushed the pulls/5/recursive-orm-queries branch 3 times, most recently from 2ceb05e to ca92098 Compare September 26, 2023 04:05
Copy link
Member

@emteknetnz emteknetnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only looked at the code portion, I haven't looked at tests yet

src/ORM/Connect/DBQueryBuilder.php Show resolved Hide resolved
* @param string[] $cteFields Aliases for any columns selected in $query which can be referenced in any queries
* UNIONed to the $query and in this query directly, as though they were columns in a real table.
* NOTE: If $query is a DataQuery, then cteFields must be the names of real columns on that DataQuery's data class.
* @param string|string[] $onClause The "ON" clause (escaped SQL statement) for an INNER JOIN on the query.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should just not have the $onClause param at, seems like we're trying to be helpful by making it a tiny bit easier to add an INNER JOIN at the cost of impurifying this method?

Honestly I think if you're using this API at this point you've probably already written you complex query with a CTE manually using DB::query() and now you're just trying to do it "properly" with more ORM

Copy link
Member Author

@GuySartorelli GuySartorelli Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should just not have the $onClause param at, seems like we're trying to be helpful by making it a tiny bit easier to add an INNER JOIN at the cost of impurifying this method?

Inner join is (from what I've seen while I was looking into how to implement this) the most common type of join used in CTE queries. i.e. you're using the CTE to filter your result set to include only the records that are discovered by the CTE, and which exist in the main database table you're querying.
So this caters for the most common scenario while allowing alternatives.

Honestly I think if you're using this API at this point you've probably already written you complex query with a CTE manually using DB::query() and now you're just trying to do it "properly" with more ORM

I don't know where that assumption comes from. It hasn't been supported until MySQL 8 so the likelyhood of any DB::query() using CTEs in existing project codebases is pretty low.
There's a lot of room to use this API in core code now that we're introducing it, and we don't have any purely raw SQL queries using CTEs right now.

Copy link
Member

@emteknetnz emteknetnz Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry what I meant is that because CTE's are complex, personally I'd write it out as raw SQL first using DB::query() and then convert it to use the ORM later and validate it against my raw query. I wouldn't start out with alterDataQuery() :-)

I still don't think we should be polluting this method with this inner join "helper", after all you can just go $dataQuery->with(<withParams>)->innerJoin(<innerJoinParams>) to do the same thing which also reads better

Copy link
Member Author

@GuySartorelli GuySartorelli Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry what I meant is that because CTE's are complex, personally I'd write it out as raw SQL first using DB::query() and then convert it to use the ORM later and validate it against my raw query. I wouldn't start out with alterDataQuery() :-)

Ahh yup that makes sense, fair enough.

I still don't think we should be polluting this method with this inner join "helper", after all you can just go $dataQuery->with(<withParams>)->innerJoin(<innerJoinParams>) to do the same thing which also reads better

Inner join will be by far the most common join for a CTE query. Part of the point of having higher level abstractions is that they take care of common scenarios for you, which is what is happening here. So this really is all about convenience.

To put it into perspective:

  • The only time you'd want a left join is if you want all of the records in the main query, and are only adding to the result set with the CTE. I can't think of a single scenario where you'd want to to that.
  • You'd use a right join only when your CTE is filling in some gaps in your DataQuery result set. There's an example of that in the unit tests, where the CTE gets the full date range between the min and max dates in the result set, and adds a bunch of null results for the missing dates. There's really only a handful of cases where you'd want to do something like this, and in probably 90% of those cases you'd more likely just do it directly in PHP instead.
  • Inner join would be used when you want to effectively filter your result set based on the CTE query. This covers scenarios like getting the ancestors/descendents for a record, or getting all archived versions of records (where finding "wasDeleted" versions in the latest_versions set is the non-recursive CTE, which gets inner joined on ID = version ID).

In the scenarios I've identified so far where we could use CTEs to simplify or optimise queries in our codebase, none of them would use a left or right join.

Regarding the "pollution" of this method, I don't understand which part of it you consider pollution, or how it represents a pollution of the method?

Regarding chaining an innerJoin() instead of passing in args, one of the main points of having the $onClause argument is the array syntax, which greatly simplifies the information you need to pass in to create the join in the first place. The join method call would look something like this otherwise: innerJoin(DataObject::getSchema()->sqlColumnForField($dataQuery->dataClass(), 'JoinField') . ' = ' . Convert::symbol2sql([$cteName, 'CTEField'])]);
Yes obviously some of that can be made into variables but it's a lot more fussing about than ['JoinField' => 'CTEField']

Copy link
Member

@emteknetnz emteknetnz Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we keep the $onClause param at very least we kind of need to get rid of array support because it's simply inconsistent with the rest of the ORM, e.g. DataList::innerJoin() which should be more "helpful" given it's the API projects actually use only supports the string param for $onClause not an array param

Once the with() method is down to only supporting string params then at that point all we're doing is $this->query->addInnerJoin($name, $onClause); so at that point you may as well just go ->with()->innerJoin($onClause)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't understand where you're coming from with this, and especially I don't understand why you keep putting quotes around "helpful" lol. I can't see a world where this isn't useful additional abstraction.
However, given that you also seemingly can't see where I'm coming from, I'll remove this piece of code. Just don't be surprised if in a year's time we have a lot of this

$joinField = DataObject::getSchema()->sqlColumnForField($dataQuery->dataClass(), 'JoinField');
$cteJoinField = Convert::symbol2sql([$cteName, 'CTEField']);
$dataQuery->with(...)->innerJoin("$joinField = $cteJoinField");

that we want to abstract away.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also

e.g. DataList::innerJoin() which should be more "helpful" given it's the API projects actually use only supports the string param for $onClause not an array param

I absolutely agree that the DataList join abstractions are shit and should be made more useful. If adding this array syntax support to those would help you accept this I'd very gladly raise a PR to do that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the $onClause parameter, though I still say this makes the abstraction considerably less useful.

src/ORM/Connect/DBQueryBuilder.php Outdated Show resolved Hide resolved
src/ORM/DataQuery.php Outdated Show resolved Hide resolved
src/ORM/DataQuery.php Show resolved Hide resolved
src/ORM/DataQuery.php Outdated Show resolved Hide resolved
@GuySartorelli GuySartorelli force-pushed the pulls/5/recursive-orm-queries branch from 4a9d157 to 900b4c3 Compare September 28, 2023 20:44
@GuySartorelli
Copy link
Member Author

Ugh, now CI is failing lol. I'll take a look at what's going on there.

@GuySartorelli GuySartorelli force-pushed the pulls/5/recursive-orm-queries branch 2 times, most recently from 931bc08 to 043f610 Compare September 28, 2023 22:54
Copy link
Member

@emteknetnz emteknetnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there

src/ORM/DataQuery.php Show resolved Hide resolved
* @param string[] $cteFields Aliases for any columns selected in $query which can be referenced in any queries
* UNIONed to the $query and in this query directly, as though they were columns in a real table.
*/
public function addWith(string $name, self $query, array $cteFields = [], bool $recursive = false): static
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public function addWith(string $name, self $query, array $cteFields = [], bool $recursive = false): static
public function addWith(string $name, SQLSelect $query, array $cteFields = [], bool $recursive = false): static

Just make it a little easier to read

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - you allowed self as the typehint for union in a recent PR so I've changed that as well to be consistent.

src/ORM/Queries/SQLSelect.php Show resolved Hide resolved
src/ORM/DataQuery.php Show resolved Hide resolved
Copy link
Member

@emteknetnz emteknetnz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a couple of tiny changes to make to not use self for method param types

@GuySartorelli GuySartorelli force-pushed the pulls/5/recursive-orm-queries branch from 043f610 to 94dcd9f Compare October 2, 2023 01:02
@emteknetnz emteknetnz merged commit 44b1700 into silverstripe:5 Oct 2, 2023
@emteknetnz emteknetnz deleted the pulls/5/recursive-orm-queries branch October 2, 2023 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants