Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interactive mode limits #24

Open
lbdave94 opened this issue Jul 17, 2024 · 1 comment
Open

Interactive mode limits #24

lbdave94 opened this issue Jul 17, 2024 · 1 comment

Comments

@lbdave94
Copy link

Hi, first of all thanks for the repo. I have one question. In the README you state

some binaries, like the diacritizer, might take very long time to start. Hence, this option (Standalone) is preferred when you have long text and you want to do it only once.

In addition, loading the diacritizer in Interactive mode raises the following warning

[2024-07-17 11:03:58,511 - farasapy_logger - WARNING]: Be careful with large lines as they may break on interactive mode. You may switch to Standalone mode for such cases.

By the way, I'm trying to discover a limit to the Diacritizer, but I'm not able to find it. Do you have any idea of how long the text should be to break the Diacritizer? There is a text that really breaks it, or simply it doesn't work well with long text?

@MagedSaeed
Copy link
Owner

Thanks for the issue @lbdave94 , I, unfortunately, do not have much details on this. Thie is models dependent and it can be different from model to model. If you are more interested, you can contact farasa authors for that [https://farasa.qcri.org/]. If you have long lines, you can just run the package in the standalone mode, just to be safe. You can, by the way, evaluate both modes on your datasets and analyze the difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants