You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Do Kraken plan to integrate KenLM or similar language model to help fixing errors in transcriptions? If yes, already have something (codes and ideas) that I could use? PyLaia have something but I never test it.
Thank you,
Weslley O.
The text was updated successfully, but these errors were encountered:
There is already a beam decoder in `lib/ctc_decoder.py` which is
compatible with KenLM but proper language model integration is somewhat
tricky. The issue is that decoding happens in label space but the
language model works code point space (in whatever granularity it's been
trained in). This makes it quite easy to create broken combinations.
The new party recognizer on the other hand has much stronger language
modeling with its pretrained Llama decoder so there shouldn't be any
need for an external LM anymore. It is going to end up in kraken as soon
as I figure out how to fine-tune the thing (and got the time to start
the integration work).
Hello,
Do Kraken plan to integrate KenLM or similar language model to help fixing errors in transcriptions? If yes, already have something (codes and ideas) that I could use? PyLaia have something but I never test it.
Thank you,
Weslley O.
The text was updated successfully, but these errors were encountered: