You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Almost every data point is damaged. Georgian part is nonsense. When I searched those data in OpenSubtitle site, I found out that those are just Russian characters mapped onto Georgian alphabet. Nowadays many multilingual model is poisoned because of that data. It would be great to investigate more into that topic.
The text was updated successfully, but these errors were encountered:
https://opus.nlpl.eu/OpenSubtitles/en&ka/v2018/OpenSubtitles
Almost every data point is damaged. Georgian part is nonsense. When I searched those data in OpenSubtitle site, I found out that those are just Russian characters mapped onto Georgian alphabet. Nowadays many multilingual model is poisoned because of that data. It would be great to investigate more into that topic.
The text was updated successfully, but these errors were encountered: