Skip to content
This repository has been archived by the owner on Apr 21, 2024. It is now read-only.

Change truecaser #5

Open
dmlls opened this issue Mar 30, 2021 · 0 comments
Open

Change truecaser #5

dmlls opened this issue Mar 30, 2021 · 0 comments
Assignees
Labels
enhancement Improvement of a feature

Comments

@dmlls
Copy link
Member

dmlls commented Mar 30, 2021

The truecasing process (which is carried out in the post-processing stage) is of great importance since it's the last step performed before returning the summary to the user. Even if the generated summary could be considered to be "good", the overall user's perception would be greatly affected if the truecasing was applied with low accuracy,

The current truecaser works just fine, but its accuracy could be improved. However, its "not-too-efficient" (to put it lightly) implementation, makes this task quite tedious both to train and later on, to use it, since the whole vocabulary is stored in memory, which means that dozens of GB of RAM are needed to make it work with a big vocabulary. A big vocabulary that, on the other hand, is needed for good accuracy.

We are currently considering the Stanford NLP Group's TrueCaseAnnotator as a suitable candidate to substitute the current truecaser.

@dmlls dmlls added the enhancement Improvement of a feature label Mar 30, 2021
@dmlls dmlls self-assigned this Mar 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Improvement of a feature
Projects
None yet
Development

No branches or pull requests

1 participant