Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Allowed array as schema property for indexes, foreign keys and unique constraints #17

Open
wants to merge 1 commit into
base: 1.0
Choose a base branch
from

Conversation

Steveb-p
Copy link
Contributor

@Steveb-p Steveb-p commented Sep 17, 2021

Allows index / foreign keys / unique constraint declarations to not contain an explicit name.

Doctrine Schema will automatically generate a unique name based on table name & columns, preventing name collisions.

Before:

tables:
    my_table:
        id:
            id: { type: integer, nullable: false, options: { autoincrement: true } }
        uniqueConstraints:
            foreign_key_identifier_here: { fields: [name] }
        indexes:
            index_1_identifier_here: { fields: [data1] }
            index_2_identifier_here: { fields: [data1, data2] }
        fields:
            data1: { type: integer, nullable: false }
            data2: { type: integer, nullable: false }
            name: { type: string, nullable: false }
    my_secondary_table:
        id:
            id: { type: integer, nullable: false, options: { autoincrement: true } }
        fields:
            main_id: { type: integer, nullable: false }
        foreignKeys:
            foreign_key_identifier_here:
                fields: [ main_id ]
                foreignTable: my_main_table
                foreignFields: [ id ]
                options: { onDelete: CASCADE, onUpdate: CASCADE }

After:

tables:
    my_table:
        id:
            id: { type: integer, nullable: false, options: { autoincrement: true } }
        uniqueConstraints:
            - { fields: [name] } # Autogenerated as "UNIQ_9AEF3D825E237E06"
        indexes:
            - { fields: [data1] } # Autogenerated as "IDX_9AEF3D8257CA2CA6"
            - { fields: [data1, data2] } # Autogenerated as "IDX_9AEF3D8257CA2CA6CEC37D1C"
        fields:
            data1: { type: integer, nullable: false }
            data2: { type: integer, nullable: false }
            name: { type: string, nullable: false }
    my_secondary_table:
        id:
            id: { type: integer, nullable: false, options: { autoincrement: true } }
        fields:
            main_id: { type: integer, nullable: false }
        foreignKeys:
            -   fields: [ main_id ] # Autogenerated as "FK_D8A74C1627EA78A"
                foreignTable: my_main_table
                foreignFields: [ id ]
                options: { onDelete: CASCADE, onUpdate: CASCADE }

@Steveb-p Steveb-p force-pushed the rfc/nameless-indexes branch from 62795b0 to 5f469db Compare September 17, 2021 12:50
@Steveb-p Steveb-p changed the title RFC: Allow array as schema indexes property RFC: Allowed array as schema property for indexes, foreign keys and unique constraints Sep 17, 2021
@Steveb-p Steveb-p requested review from alongosz, adamwojs, mikadamczyk and a team September 17, 2021 12:56
@Steveb-p Steveb-p marked this pull request as ready for review September 17, 2021 12:56
@@ -70,12 +70,6 @@ private function importSchemaTable(
): void {
$table = $schema->createTable($tableName);

if (!empty($defaultTableOptions)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers: $defaultTableOptions variable cannot exist at this point.

Copy link
Member

@alongosz alongosz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Steveb-p what is the benefit here? IMHO this rather leads to bad practices. You can name indices for a very good reason - you can work with them easily when writing DDL statements. Everyone should use names for constraints.

@Steveb-p
Copy link
Contributor Author

Steveb-p commented Sep 17, 2021

@Steveb-p what is the benefit here? IMHO this rather leads to bad practices. You can name indices for a very good reason - you can work with them easily when writing DDL statements. Everyone should use names for constraints.

@alongosz The difference is that you can name them, but don't need to. Quite honestly most of indexes or names for foreign keys are simply their columns, or name of the relationship table. That's something you can easily get by looking at table schema, so why duplicate? In my opinion naming of indexes or constraints gives you very little, other than the extra effort of coming up with a name. Rarely there are additional info that you might put into a name.

image

And if you want to, you can.

I see no bad practice here. Generated names are constant as long as the definition isn't changed. Name collisions are almost impossible, since name is in practice hash of it's meaningful parts of definition.

Everyone should use names for constraints.

Respectfully, but strongly disagree. Information necessary to understand what foreign key constraint does is contained in it's definition.

And if not, then naming pattern for indices, foreign keys and unique constraints should be more clearly defined going forward, so it's consistent. But I find it not worth our time.

@adamwojs
Copy link
Member

adamwojs commented Oct 3, 2021

I agree with @alongosz that automatically generated names for indexes, foreign keys and unique constraints is not something that product will be benefit from but IMHO it's brings an value on project level.

@lserwatka
Copy link
Member

lserwatka commented Oct 13, 2021

I see value here for projects. Can we make it disabled by default and make it configurable for those who really need it? Rebase is required here BTW.

@Steveb-p
Copy link
Contributor Author

@lserwatka I doesn't break the current behavior, so you can think of it as disabled.

@Steveb-p Steveb-p force-pushed the rfc/nameless-indexes branch from 5f469db to ff001d8 Compare October 13, 2021 12:25
@lserwatka
Copy link
Member

Let's do it then @Steveb-p . What do you think @adamwojs and @alongosz ?

@Steveb-p
Copy link
Contributor Author

Rebase is required here BTW.

Rebased.

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

@alongosz
Copy link
Member

@lserwatka of course, for this package itself, to be used by 3rd party, it is beneficial because exposes things available in Doctrine and provided OOTB by DBMS. I was just trying to make sure we don't use it ourselves, because we build a product not a project :-)

I still would recommend everyone to use named keys, but it's just a recommendation for 3rd party, requirement for us (IMO)

@Steveb-p
Copy link
Contributor Author

I still would recommend everyone to use named keys, but it's just a recommendation for 3rd party, requirement for us (IMO)

Agreed, however this also means that we should be more specific about how the indices, foreign keys and unique constraints are named. When we looked over the schema the combinations were all over the place.

I was considering being able to automatically name the indices based on our eventual agreement, so we can remove the duplication of column/table names (which I assume will be part of the name) and have proper names in the schema. WDYT?

Some sort of an optional service injected into the schema importer, that would calculate the name if not provided.

new Column('name', Type::getType('string')),
],
[
new Index('IDX_9AEF3D8257CA2CA6', ['data1'], false, false),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really deterministic, not some variation of UUID?

Copy link
Contributor Author

@Steveb-p Steveb-p Oct 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alongosz Doctrine will always generate the same name, as long as columns included are not changed.

See the function that generates this in Doctrine:

/**
 * Generates an identifier from a list of column names obeying a certain string length.
 *
 * This is especially important for Oracle, since it does not allow identifiers larger than 30 chars,
 * however building idents automatically for foreign keys, composite keys or such can easily create
 * very long names.
 *
 * @param string[] $columnNames
 * @param string   $prefix
 * @param int      $maxSize
 *
 * @return string
 */
protected function _generateIdentifierName($columnNames, $prefix = '', $maxSize = 30)
{
    $hash = implode('', array_map(static function ($column) {
        return dechex(crc32($column));
    }, $columnNames));

    return strtoupper(substr($prefix . '_' . $hash, 0, $maxSize));
}

And the calling function:

public function addIndex(array $columnNames, $indexName = null, array $flags = [], array $options = [])
{
    if ($indexName === null) {
        $indexName = $this->_generateIdentifierName(
            array_merge([$this->getName()], $columnNames),
            'idx',
            $this->_getMaxIdentifierLength()
        );
    }

    return $this->_addIndex($this->_createIndex($columnNames, $indexName, false, false, $flags, $options));
}

@alongosz
Copy link
Member

I still would recommend everyone to use named keys, but it's just a recommendation for 3rd party, requirement for us (IMO)

Agreed, however this also means that we should be more specific about how the indices, foreign keys and unique constraints are named. When we looked over the schema the combinations were all over the place.

It should always follow the pattern <table_name>_<column1>_...<columnN> where column name can be abbreviated because of key name length limitations (don't remember if it still applies). They can have descriptive suffixes like fx, ux, pk to describe what that is in a self commenting way. We might consider dropping that last part. Otherwise I think schema.yaml from Core follows that conventions, mostly.

I was considering being able to automatically name the indices based on our eventual agreement, so we can remove the duplication of column/table names (which I assume will be part of the name) and have proper names in the schema. WDYT?

Some sort of an optional service injected into the schema importer, that would calculate the name if not provided.

But then we would get back to the square one - it is explicit, so writing automation around it is straightforward. Not saying no, but we need to carefully consider disadvantages and find a way to overcome them. I think some configurable strategy should be applied for that not to cover entirely Doctrine feature we just exposed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants