-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BERTLMScore #3
base: master
Are you sure you want to change the base?
Conversation
Can you please add a flag to allow to setup a batch size?
There is a bug in the branch you are using (fixed here df67f70) that won't allow you to run poetry. git fetch upstream
git rebase upstream/master
All the tests run automatically every time you submit changes to this PR. If you click on details and expand the line where you can see a red cross, you will see exactly what's the problem. |
Ok, succeed to make work black and poetry.
and
Second one is more surprising. I will add the flag option. |
I added the possibility to input the mask sentences by batches. I have also tried to add new test functions (to check if this does not change the score results), however it is not taken into account when I run |
I woud like to get #4 merged before reviewing this. Since we are chaning the API we have to update this PR later anyway. |
Things todo here before we can merge this:
|
Hi Simone, I agree and will work on that next week. |
Sure, no hurry! The comment was meant as a reminder for the future. |
Hi @simonepri, I have begin to work on adapting BERTScore to new API and I have a question about the batching part.
I will in fact input 8 sentences to BERT
And so my first batch will consist of:
and second of :
So what I have done for now is the following :
Code is not finished yet (not tested, not cleaned ...), but just to give you an idea : My question is: according you what is the best way to proceed with these batch specificity ? Do I simply overwrite _tokens_log_prob from BatchedLMScorer ? |
What I would do is the following: First, You put your code inside a function called In the function How does it sound? PS: You don't want to tokenize all the sentences at once if you then process the sentences independently. If you do so, you only introduce unnecessary padding. Just process each sentence one by one using Notes on the code:Here https://gist.github.com/dldk-gael/2025f0584f781716697463c62a9779ee#file-bert_score-py-L12-L23 You don't need this. Just add the tokens_mask = encoding["special_tokens_mask"] == 0
ids = encoding["input_ids"][tokens_mask] |
By doing so, when the user input lot of small sentences but use large batch size (for instance because he have lot of GPU memory), we won't fill the batch at each input. |
Yes that's true, but it also makes the logic much much more easier to maintain. Anyway, If you find a clean way to batch multiple sentences then sure go ahead and do it. |
Hi @simonepri , |
Hi @simonepri, did you have time to look ? |
Quick update: I run some test on the Blimp dataset. With GPT2 (base) score I get 0.9 accuracy vs 0.85 with Bert Score. So I agree we can go back a to more easy-to-maintain version of the batch handler within Bert. |
@dldk-gael Sorry I don't have time at the moment to review the code.
Wha do you mean? It's just the harmonic mean of the individual token probabilities. I added it just for completeness (i.e. to have all the three Pythagorean means). It wasn't meant to be particularly useful. |
Hi @simonepri , are you still planning to work on this one ? |
Hi,
I propose to add a BERT-based language modeling score.
I implemented the idea describe in Effective Sentence Scoring Method Using BERT for Speech. In this paper, this authors said they obtained better results by using a bidirectional LM (by comparison to a unidirectional LM as GPT2).
The idea is to successively mask each token and retrieve the probability that BERT give to that tokens in the output.
For now, I pass all the mask sentences in one single batch to BERT so it may raise some memory problem for very long sentences.
Also, I did not manage to run the different test with poetry as you asked.
I am not familiar with automatic testing and poetry, so I have to look into it.