You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There seem to be 417 language varieties represented in https://opus.nlpl.eu/JW300.php. This would imply 417C2 = 86,736 undirected language pairs. However, I only count 54,376 of them, and the paper confirms this number. Do you know where the missing 32,360 language pairs are, and would you be willing to provide them?
I notice that the adjacency matrix seems to have only one fully connected component, so e.g. although ady has no parallel data with en, it has parallel data with "jw_rmv", which has parallel data with en. So it seems likely that ady and en can be aligned. Just to demonstrate that it's conceptually possible, I found these two pairs in the respective corpora:
jw_rmv: Пала со амэ подаса дума андэ авэр статья ?
ady: Сыда къыкІэлъыкІорэ статьям щызэхэтфыщтыр ?
jw_rmv: Пала со амэ подаса дума андэ авэр статья ?
en: What will we consider in the following article ?
Implication: the following is a sentence pair between English and Adyghe:
ady: Сыда къыкІэлъыкІорэ статьям щызэхэтфыщтыр ?
en: What will we consider in the following article ?
(Interestingly, jw_rmv, which actually seems to be Vlax Romany in Cyrillic script, is the one language that is aligned with the most other languages -- more than English!)
The text was updated successfully, but these errors were encountered:
There seem to be 417 language varieties represented in https://opus.nlpl.eu/JW300.php. This would imply 417C2 = 86,736 undirected language pairs. However, I only count 54,376 of them, and the paper confirms this number. Do you know where the missing 32,360 language pairs are, and would you be willing to provide them?
I notice that the adjacency matrix seems to have only one fully connected component, so e.g. although ady has no parallel data with en, it has parallel data with "jw_rmv", which has parallel data with en. So it seems likely that ady and en can be aligned. Just to demonstrate that it's conceptually possible, I found these two pairs in the respective corpora:
jw_rmv: Пала со амэ подаса дума андэ авэр статья ?
ady: Сыда къыкІэлъыкІорэ статьям щызэхэтфыщтыр ?
jw_rmv: Пала со амэ подаса дума андэ авэр статья ?
en: What will we consider in the following article ?
Implication: the following is a sentence pair between English and Adyghe:
ady: Сыда къыкІэлъыкІорэ статьям щызэхэтфыщтыр ?
en: What will we consider in the following article ?
(Interestingly, jw_rmv, which actually seems to be Vlax Romany in Cyrillic script, is the one language that is aligned with the most other languages -- more than English!)
The text was updated successfully, but these errors were encountered: