Allow emojis without selector-16 variation character to be recognized #26
+46
−8
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why is this change needed?
We are using this library in our company to show emojis in text that comes from chats uploaded by users. Chats come from different sources (WhatsApp, Facebook, Instagram, Telegram), different platforms (Android, iOS, Windows) and different versions.
Over the time we have found that some times emojis are not composed of the strictly correct unicode characters. Namely we have found that sometimes the Variation Selector 16 (Unicode char FE0F) is missing. This character is theoretically used to indicate that a character that defaults to text representation should instead have emoji representation. To me, The spec is not clear about whether these modifiers are mandatory or not. From what I can understand it seems something that was added "fairly recently" and some clients may not implement (some of our users have really old phones). In any case, we all know that one thing is the spec and a different one what the actual implementations do. And some implementations do not add this modifier.
When there is a sequence without modifier the library is not able to recognize it as an emoji because it does not match any entry in
emojiList
.What's the change
The idea is to take every emoji that has the FE0F character and generate all possible combinations with and without it. For example, for
"002a-fe0f-20e3"
it generates both"002a-fe0f-20e3"
(the same) and"002a-20e3"
. For"1f441-fe0f-200d-1f5e8-fe0f"
it generates:"1f441-200d-1f5e8-fe0f"
,"1f441-fe0f-200d-1f5e8"
,"1f441-200d-1f5e8"
and"1f441-fe0f-200d-1f5e8-fe0f"
.Then we associate all of those combinations to the same emoji.
There's a small twist to that. Some emojis, after removing the FE0F character end up being the same as a commonly used ASCII character (these are the emojis
:digit_one
, etc.). For those, I added a guard clause to not add them as combinations, otherwise it transforms a regular character"1"
to an emoji.Risks
I understand that it may be felt that this change is risky and it make break the recognition of some Emojis. I know it's difficult to take just my word for it but we have been using this modification for a long time (~ 1 year) and it has not caused any issue, and we process hundreds of chats per day.
In any case, I can understand if it is seen as an edge case that should not be always applied. In this case I would like the consideration to still add it with an optional flag. I can do that if there is interest. It would be very beneficial for us to have this code (even behind a flag) integrated in the library in order to make future updates much easier. Not sure if it's relevant, but we have a paid Joypixels license.
TODO