Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX Use ORM functions to perform eager loading instead of raw query #11509

Open
wants to merge 1 commit into
base: 5.3
Choose a base branch
from

Conversation

Kevinn1109
Copy link

Description

The original query was a hardcoded and a rather hacky way of retaining the ordering of the child objects while fetching from a many_many table. The method used to retain this order was 'FIELD', which is a MySQL exclusive feature and breaks in other sql engines. This change converts the query to properly make use of the ORM functions and use the ordering used in the original query directly.

Manual testing steps

  1. Create and populate two DataObjects that have a many_many relation between them
  2. Perform an eagerLoad on DataObject::get() and convert the result to an array
  3. Observe that the new SQLSelect query is performed correctly
  4. Observe that grabbing the first relation entry from the first array object is the expected DataObject and does not perform another query

Issues

Pull request checklist

  • The target branch is correct
  • All commits are relevant to the purpose of the PR (e.g. no debug statements, unrelated refactoring, or arbitrary linting)
    • Small amounts of additional linting are usually okay, but if it makes it hard to concentrate on the relevant changes, ask for the unrelated changes to be reverted, and submitted as a separate PR.
  • The commit messages follow our commit message guidelines
  • The PR follows our contribution guidelines
  • Code changes follow our coding conventions
  • This change is covered with tests (or tests aren't necessary for this change)
  • Any relevant User Help/Developer documentation is updated; for impactful changes, information is added to the changelog for the intended release
  • CI is green

. ' ORDER BY FIELD(' . $childIDField . ', ' . $fetchedIDsAsString . ')'
);
$joinRows = SQLSelect::create()
->setSelect('"' . $joinTable . '".' . "*")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
->setSelect('"' . $joinTable . '".' . "*")

It will select everything by default, I believe.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the table-specific select may lead to ambiguity with the columns from the joined table. This ensures that we're always working with the correct values.

src/ORM/DataList.php Outdated Show resolved Hide resolved
Comment on lines 1372 to 1373
->addWhere('"' . $parentIDField . '" IN (' . implode(',', $parentIDs) . ')')
->addWhere('"' . $childIDField . '" IN (' . $fetchedIDsAsString . ')')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to include the table name in addWhere so there's no ambiguity if we change the query in the future.

->setFrom([$joinTable => $joinTable])
->addWhere('"' . $parentIDField . '" IN (' . implode(',', $parentIDs) . ')')
->addWhere('"' . $childIDField . '" IN (' . $fetchedIDsAsString . ')')
->addLeftJoin($childTable, "$childTable.ID = $joinTable.$childIDField")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this join?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say we have a joinTable AB linking A's many_many to B. The original query preserved the order of the items that were fetched from table B by putting their IDs in the FIELD list. This changes that by joining table B so that the original order by used to fetch B items can also be used in this query.

The perfect solution would be to right join the join table to the relation query (line 1350), which removes the need for this query altogether, though I'm not sure what the implications will be. I'll have to look into that.

->addWhere('"' . $parentIDField . '" IN (' . implode(',', $parentIDs) . ')')
->addWhere('"' . $childIDField . '" IN (' . $fetchedIDsAsString . ')')
->addLeftJoin($childTable, "$childTable.ID = $joinTable.$childIDField")
->setOrderBy($fetchedOrderBy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this orderby? How do we know it's what we actually want?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This order by is the replacement for the FIELD order by in the original query as explained in the other comment.

$query = $fetchList->dataQuery()->query();
$fetchedOrderBy = $query->getOrderBy();
$childTables = $query->queriedTables();
$childTable = reset($childTables);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we arbitrarily take the first table?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I could tell this is the only way to derive the base table (the FROM) from a DataList query. An alternative is to grab the table name from the baseClass data object.

I'll add comments to my next commit to make clear what's happening exactly.

@LennyObez
Copy link

LennyObez commented Dec 23, 2024

Alternative proposal: Optimize sorting logic with CASE

Your solution is a clean implementation that aligns well with SilverStripe ORM practices. However, I propose an alternative approach that focuses on performance and simplicity, particularly for large datasets or scenarios where additional joins might introduce unnecessary overhead.


Differences between the two approaches

Aspect Your solution Proposed alternative
Performance Relies on an additional join (addLeftJoin) and follows $fetchedOrderBy, which may introduce overhead for large datasets. Uses a CASE SQL clause for sorting, avoiding unnecessary joins.
Compatibility Compatible with various SQL databases, avoids raw SQL. Compatible with various SQL databases, avoids raw SQL but uses a manual CASE.
Conformity Fully conforms to SilverStripe ORM standards via SQLSelect. Bypasses ORM methods, introducing direct SQL for simplicity and efficiency.
Maintainability More maintainable for developers familiar with SilverStripe ORM. Slightly less maintainable due to manual SQL logic.
Use Case Preferred when maintaining conformity and ORM standards. Ideal for high-performance requirements without additional joins.

Proposed implementation

Below is the alternative logic for the join records, utilizing a CASE SQL clause for ordering while ensuring compatibility across supported databases:

$joinRows = [];
if (!empty($parentIDs) && !empty($fetchedIDs)) {
    // Construct CASE-based ORDER BY clause
    $orderClauses = [];
    foreach ($fetchedIDs as $index => $id) {
        $orderClauses[] = "WHEN {$childIDField} = {$id} THEN {$index}";
    }
    $orderByCase = implode(' ', $orderClauses);

    // Construct optimized query
    $joinQuery = "
        SELECT *
        FROM {$joinTable}
        WHERE {$parentIDField} IN (" . implode(',', $parentIDs) . ")
          AND {$childIDField} IN (" . implode(',', $fetchedIDs) . ")
        ORDER BY CASE {$orderByCase} ELSE 999 END
    ";

    $joinRows = DB::query($joinQuery);
}

Unit test

The following unit test validates the functionality and correctness of the proposed implementation:

<?php

namespace SilverStripe\ORM\Tests\DataListTest\EagerLoading;

use SilverStripe\Dev\SapphireTest;
use SilverStripe\ORM\DB;

class CustomOrderEagerLoadingTest extends SapphireTest
{
    /**
     * Test that the ORDER BY CASE logic correctly sorts the results
     */
    public function testOrderByCaseSorting()
    {
        // Define test data
        $parentIDs = [1, 2];
        $fetchedIDs = [3, 2, 1]; // Expected order
        $expectedOrder = [3, 2, 1];

        $joinTable = 'TestJoinTable';
        $parentIDField = 'ParentID';
        $childIDField = 'ChildID';

        // Create a temporary test table
        DB::query("
            CREATE TEMPORARY TABLE {$joinTable} (
                {$childIDField} INT NOT NULL,
                {$parentIDField} INT NOT NULL
            )
        ");

        // Insert mock data
        DB::query("
            INSERT INTO {$joinTable} ({$childIDField}, {$parentIDField})
            VALUES (1, 1), (2, 2), (3, 1)
        ");

        // Construct the ORDER BY CASE logic
        $orderClauses = [];
        foreach ($fetchedIDs as $index => $id) {
            $orderClauses[] = "WHEN {$childIDField} = {$id} THEN {$index}";
        }
        $orderByCase = implode(' ', $orderClauses);

        // Build the query with ORDER BY CASE
        $query = "
            SELECT *
            FROM {$joinTable}
            WHERE {$parentIDField} IN (" . implode(',', $parentIDs) . ")
              AND {$childIDField} IN (" . implode(',', $fetchedIDs) . ")
            ORDER BY CASE {$orderByCase} ELSE 999 END
        ";

        // Execute the query and convert results to an array
        $result = DB::query($query);
        $rows = [];
        foreach ($result as $row) {
            $rows[] = $row;
        }

        // Extract the actual order of fetched ChildIDs
        $actualOrder = array_column($rows, $childIDField);

        // Assert that the results are ordered as expected
        $this->assertEquals(
            $expectedOrder,
            $actualOrder,
            'The results should maintain the custom order defined by fetchedIDs'
        );
    }
}

@lekoala
Copy link
Contributor

lekoala commented Dec 23, 2024

in addition to that, maybe it would make sense to add in the abstract database class a sortByField function which would output the CASE WHEN helper (instead of inlining the code just for the eager loading part).
And that would mean that the mysql database instance could override this function and use its custom FIELD method which is likely more efficient (and nicer to read, anyway)

@LennyObez
Copy link

in addition to that, maybe it would make sense to add in the abstract database class a sortByField function which would output the CASE WHEN helper (instead of inlining the code just for the eager loading part). And that would mean that the mysql database instance could override this function and use its custom FIELD method which is likely more efficient (and nicer to read, anyway)

Thank you for your insightful suggestion. I agree that introducing a sortByField method in the abstract Database class would provide cleaner and more reusable code. This approach ensures that the sorting logic can be encapsulated in one place and overridden for specific database engines, such as MySQL's FIELD() function. Below, I've outlined the necessary changes and the implementation for your review.


Proposed implementation: sortByField

Changes to Database

Add the sortByField method to the Database class, using CASE WHEN as the default implementation for databases that do not support optimized sorting.

/**
 * Generate SQL for sorting by a specific field using CASE WHEN logic.
 *
 * Subclasses can override this method to provide optimized implementations
 * (e.g., using MySQL's FIELD method).
 *
 * @param string $field The name of the field to sort by.
 * @param array $values The values to order by.
 * @return string SQL snippet for ordering.
 */
public function sortByField(string $field, array $values): string
{
    $caseStatements = [];
    foreach ($values as $index => $value) {
        $caseStatements[] = "CASE {$field} = '" . addslashes($value) . "' THEN {$index}";
    }

    return "CASE " . implode(' ', $caseStatements) . " ELSE 999 END";
}

Changes to MySQLDatabase

Override the sortByField method in MySQLDatabase to leverage MySQL's FIELD() function for better performance and readability.

/**
 * Generate SQL for sorting by a specific field using MySQL's FIELD function.
 *
 * @param string $field The name of the field to sort by.
 * @param array $values The values to order by.
 * @return string SQL snippet for ordering.
 */
public function sortByField(string $field, array $values): string
{
    $escapedValues = array_map(fn($value) => "'" . addslashes($value) . "'", $values);
    return "FIELD({$field}, " . implode(', ', $escapedValues) . ")";
}

Changes to DataList

Replace the inline sorting logic in the eager loading implementation with a call to sortByField.

// Get the join records so we can correctly identify which children belong to which parents
// If there are no parents and no children, skip this to avoid an error (and to skip an unnecessary DB call)
// Note that $joinRows also holds extra fields data
$joinRows = [];
if (!empty($parentIDs) && !empty($fetchedIDs)) {
    // Use sortByField to generate the ORDER BY clause
    $orderByClause = DB::get_conn()->sortByField($childIDField, $fetchedIDs);

    // Construct the query
    $joinQuery = 
        'SELECT * FROM "' . $joinTable . '" ' .
        'WHERE "' . $parentIDField . '" IN (' . implode(',', $parentIDs) . ') ' .
        'AND "' . $childIDField . '" IN (' . implode(',', $fetchedIDs) . ') ' .
        'ORDER BY ' . $orderByClause;

    // Execute the query
    $joinRows = DB::query($joinQuery);
}

Unit test

The following unit test validates the functionality of the sortByField method in Database:

<?php

namespace SilverStripe\ORM\Tests\DataListTest\EagerLoading;

use SilverStripe\Dev\SapphireTest;
use SilverStripe\ORM\DB;

class CustomOrderEagerLoadingTest extends SapphireTest
{
    /**
     * Test that the sortByField method correctly generates an ORDER BY clause
     * and the query returns results in the expected order.
     */
    public function testSortByFieldIntegration()
    {
        // Define test data
        $parentIDs = [1, 2];
        $fetchedIDs = [3, 2, 1]; // Expected order
        $expectedOrder = [3, 2, 1];

        $joinTable = 'TestJoinTable';
        $parentIDField = 'ParentID';
        $childIDField = 'ChildID';

        // Create a temporary test table
        DB::query("
            CREATE TEMPORARY TABLE {$joinTable} (
                {$childIDField} INT NOT NULL,
                {$parentIDField} INT NOT NULL
            )
        ");

        // Insert mock data
        DB::query("
            INSERT INTO {$joinTable} ({$childIDField}, {$parentIDField})
            VALUES (1, 1), (2, 2), (3, 1)
        ");

        // Use sortByField to generate the ORDER BY clause
        $orderByClause = DB::get_conn()->sortByField($childIDField, $fetchedIDs);

        // Build the query with sortByField
        $query = "
            SELECT *
            FROM {$joinTable}
            WHERE {$parentIDField} IN (" . implode(',', $parentIDs) . ")
              AND {$childIDField} IN (" . implode(',', $fetchedIDs) . ")
            ORDER BY {$orderByClause}
        ";

        // Execute the query and convert results to an array
        $result = DB::query($query);
        $rows = [];
        foreach ($result as $row) {
            $rows[] = $row;
        }

        // Extract the actual order of fetched ChildIDs
        $actualOrder = array_column($rows, $childIDField);

        // Assert that the results are ordered as expected
        $this->assertEquals(
            $expectedOrder,
            $actualOrder,
            'The results should maintain the custom order defined by fetchedIDs'
        );
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants