Skip to content

Commit

Permalink
Remove iconv for decision search highlighting
Browse files Browse the repository at this point in the history
Unfortunately, by using `iconv` there is a greater chance that the
resulting (converted) text is either short or longer. For example,
by using the euro symbol (€) we artifically increase the length of
the texts we are comparing:

```php
iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', '€');
// 'EUR'
```

This is problematic, as it will result in incorrectly alignments
of `<mark>`. While this can be mitigated by carefully calculating
offsets for the offsets this quickly makes it more difficult to
keep maintaining this functionality. Especially when there need
to be more of these exceptions.

Only using the transliterator with `Any-Latin; Latin-ASCII` seems
to preserve the length of the comparing elements and allow for
searching accented/special characters. There are characters that
are not part of/exist in `Latin-ASCII`, however, these characters
are probably never used in the setting of the association.
  • Loading branch information
tomudding committed Nov 19, 2023
1 parent 22e8d95 commit ad2166f
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 13 deletions.
1 change: 0 additions & 1 deletion composer.json
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@
"ext-exif": "*",
"ext-fileinfo": "*",
"ext-gd": "*",
"ext-iconv": "*",
"ext-intl": "*",
"ext-imagick": "^3.5.0",
"ext-mbstring": "*",
Expand Down
17 changes: 5 additions & 12 deletions module/Decision/view/decision/decision/search.phtml
Original file line number Diff line number Diff line change
Expand Up @@ -23,25 +23,18 @@ function highlightSearch(
string $decision,
string $search,
): string {
// Convert the decision to something that is easily searchable (i.e. it MUST NOT contain any non-ASCII characters).
$transliteratedDecision = iconv(
'UTF-8',
'ASCII//TRANSLIT//IGNORE',
transliterator_transliterate('Any-Latin; Latin-ASCII', $decision),
);
// Convert the decision to something that is easily searchable (i.e. it MUST contain only Latin-ASCII characters).
$transliteratedDecision = transliterator_transliterate('Any-Latin; Latin-ASCII', $decision);
// Do the same for the search prompt, as otherwise searches WITH non-ASCII characters will not work.
$search = iconv(
'UTF-8',
'ASCII//TRANSLIT//IGNORE',
transliterator_transliterate('Any-Latin; Latin-ASCII', $search),
);
$search = transliterator_transliterate('Any-Latin; Latin-ASCII', $search);

$offset = 0;
$output = '';
$length = mb_strlen($search);

// There is a very important assumption here; the transliterated version of the decision MUST be exactly as long as
// the original version. Otherwise, the insertion is done with an incorrect offset.
// the original version. Otherwise, the insertion is done with an incorrect offset. As such, using `iconv` is NOT
// good as it will either extend (e.g. `€` becomes `EUR`) or completely remove characters (`//IGNORE` option).
while (false !== ($position = mb_stripos($transliteratedDecision, $search, $offset, 'UTF-8'))) {
// Progressively insert markers into the original decision.
$output .= sprintf('%s%s%s%s',
Expand Down

0 comments on commit ad2166f

Please sign in to comment.