You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
using "semantic variant" data from Unihan, we can maybe start to fix problems of differing glyphs at the tokenizer level – see the NewNLP docs and probably also spaCy's docs on tokenizer exceptions. We'd basically want to add NORM forms for each semantic variant.
this wouldn't change token.text, however, so we might need to figure out the best way to use it.
The text was updated successfully, but these errors were encountered:
using "semantic variant" data from Unihan, we can maybe start to fix problems of differing glyphs at the tokenizer level – see the NewNLP docs and probably also spaCy's docs on tokenizer exceptions. We'd basically want to add
NORM
forms for each semantic variant.this wouldn't change
token.text
, however, so we might need to figure out the best way to use it.The text was updated successfully, but these errors were encountered: