You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, first of all thanks for the repo. I have one question. In the README you state
some binaries, like the diacritizer, might take very long time to start. Hence, this option (Standalone) is preferred when you have long text and you want to do it only once.
In addition, loading the diacritizer in Interactive mode raises the following warning
[2024-07-17 11:03:58,511 - farasapy_logger - WARNING]: Be careful with large lines as they may break on interactive mode. You may switch to Standalone mode for such cases.
By the way, I'm trying to discover a limit to the Diacritizer, but I'm not able to find it. Do you have any idea of how long the text should be to break the Diacritizer? There is a text that really breaks it, or simply it doesn't work well with long text?
The text was updated successfully, but these errors were encountered:
Thanks for the issue @lbdave94 , I, unfortunately, do not have much details on this. Thie is models dependent and it can be different from model to model. If you are more interested, you can contact farasa authors for that [https://farasa.qcri.org/]. If you have long lines, you can just run the package in the standalone mode, just to be safe. You can, by the way, evaluate both modes on your datasets and analyze the difference.
Hi, first of all thanks for the repo. I have one question. In the README you state
some binaries, like the diacritizer, might take very long time to start. Hence, this option (Standalone) is preferred when you have long text and you want to do it only once.
In addition, loading the diacritizer in Interactive mode raises the following warning
[2024-07-17 11:03:58,511 - farasapy_logger - WARNING]: Be careful with large lines as they may break on interactive mode. You may switch to Standalone mode for such cases.
By the way, I'm trying to discover a limit to the Diacritizer, but I'm not able to find it. Do you have any idea of how long the text should be to break the Diacritizer? There is a text that really breaks it, or simply it doesn't work well with long text?
The text was updated successfully, but these errors were encountered: