Database collation concerns from Spookerton #3646

MistakeNot4892 · 2024-01-30T23:37:54Z

MistakeNot4892
Jan 30, 2024
Maintainer

Copied from Spook's ticket: #3018

Meta, I guess.

Database collation is not exactly a modern performance concern at ss13 scale. Because nebula aims to serve communities across languages and character sets it might still be a discussion worth having: collation is important for symbol comparison.

In #1318 back at the start of 2021 I mentioned utf8mb4_unicode_520_ci as a more up to date collation for text-centric tables, which are the most commonly used for ss13ish purposes. The PR's base utf8mb4 was still an upgrade over latin_swedish, and good. In the last couple of years things have moved on - but sadly not for the common better.

Mysql is still iterating on 8.0 and still supports the same utf8mb4_0900_ai_ci. Mariadb has recently (mid 2022) released 10.10 (the new lts, last month) 10.11. They also skipped matching mysql after all that time by releasing a unicode 14 collation set with 10.10 onward, preferring uca1400_ai_ci.

On premise: the later the unicode version referenced in the collation, in theory the more natural the sorting is and the better the grouping of those symbols is. To add - derived from mysql naming, the "ai/s" and "ci/s" suffixes stand for accent and case insensitivity/sensitivity.

As an example, in cases like "ted" and Téd", you probably want them to show up together in a search for either. Base utf8mb4 is completely unaware of both similarities, and that's just within extended latin.

So!

utf8mb4_unicode_520_ci is sadly still the last common best text collation. Given the choice of shipping least-worst, or duplicate-best, least-worst probably comes out on top for convenience? So, utf8mb4_unicode_520_ci is a good upgrade path for the collation indicated by the sql files the repo ships in sql/*.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database collation concerns from Spookerton #3646

{{title}}

Replies: 0 comments

Select a reply

Database collation concerns from Spookerton #3646

MistakeNot4892 Jan 30, 2024 Maintainer

Replies: 0 comments

MistakeNot4892
Jan 30, 2024
Maintainer