Add BERTLMScore #3

dldk-gael · 2020-04-28T12:10:50Z

Hi,
I propose to add a BERT-based language modeling score.

I implemented the idea describe in Effective Sentence Scoring Method Using BERT for Speech. In this paper, this authors said they obtained better results by using a bidirectional LM (by comparison to a unidirectional LM as GPT2).

The idea is to successively mask each token and retrieve the probability that BERT give to that tokens in the output.

For now, I pass all the mask sentences in one single batch to BERT so it may raise some memory problem for very long sentences.

Also, I did not manage to run the different test with poetry as you asked.
I am not familiar with automatic testing and poetry, so I have to look into it.

simonepri · 2020-04-28T13:51:02Z

For now, I pass all the mask sentences in one single batch to BERT so it may raise some memory problem for very long sentences.

Can you please add a flag to allow to setup a batch size?

Also, I did not manage to run the different test with poetry as you asked.

There is a bug in the branch you are using (fixed here df67f70) that won't allow you to run poetry.
Please run the following on your branch to fetch the fix:

git fetch upstream
git rebase upstream/master

I am not familiar with automatic testing and poetry, so I have to look into it.

All the tests run automatically every time you submit changes to this PR.

If you click on details and expand the line where you can see a red cross, you will see exactly what's the problem.

dldk-gael · 2020-04-29T16:03:27Z

Ok, succeed to make work black and poetry.
Also I copy/paste the test file you made for GPT2 and it is interesting to see that it does not pass two tests :

I need to finish this project, but I do not have enough time.
I need to finish this project, and I do not have enough time.

and

I can walk there.
I can ambulate there.

Second one is more surprising.

I will add the flag option.

dldk-gael · 2020-04-29T21:12:40Z

I added the possibility to input the mask sentences by batches. I have also tried to add new test functions (to check if this does not change the score results), however it is not taken into account when I run poetry run task test (by the way, there is a small type in readme, you ask to run poetry run task testS) I will look into it tomorrow.

simonepri · 2020-04-30T15:00:12Z

I woud like to get #4 merged before reviewing this. Since we are chaning the API we have to update this PR later anyway.

simonepri · 2020-05-16T13:33:19Z

Things todo here before we can merge this:

Refactor the code to match the new API
Update the tests to match the ones implemented for GPT2

dldk-gael · 2020-05-18T19:30:44Z

Hi Simone, I agree and will work on that next week.

simonepri · 2020-05-18T22:48:30Z

Sure, no hurry! The comment was meant as a reminder for the future.

dldk-gael · 2020-05-25T12:05:31Z

Hi @simonepri,

I have begin to work on adapting BERTScore to new API and I have a question about the batching part.
In BERTScore, you can not directly use BatchedLMScorer because you don't batch by sentence but by masked sentence.
For instance, suppose the user set a batch size of 4 and input 2 sentences :

I like tennis
I like to eat potatoes

I will in fact input 8 sentences to BERT

MASK like tennis
I MASK tennis
I like MASK
MASK like to eat potatoes
I MASK to eat potatoes
I like MASK eat potatoes
I like to MASK potatoes
I like to eat MASK

And so my first batch will consist of:

MASK like tennis
I MASK tennis
I like MASK
MASK like to eat potatoes

and second of :

I MASK to eat potatoes
I like MASK eat potatoes
I like to MASK potatoes
I like to eat MASK

So what I have done for now is the following :

Encode every sentence provide by user in one shot using BatchEncoding
Expand the input ids to replace each sentence by mask sentences
Input to BERT by batch of batch_size
Resplit the result by the sentences.

Code is not finished yet (not tested, not cleaned ...), but just to give you an idea :
https://gist.github.com/dldk-gael/2025f0584f781716697463c62a9779ee

My question is: according you what is the best way to proceed with these batch specificity ? Do I simply overwrite _tokens_log_prob from BatchedLMScorer ?

simonepri · 2020-05-25T13:15:14Z

What I would do is the following:

First, You put your code inside a function called _tokens_log_prob_for_sentence.
Then inside the _tokens_log_prob_for_batch you simply do a for loop that calls a _tokens_log_prob_for_sentence for every sentence in the batch.

In the function _tokens_log_prob_for_sentence you then generate the masked sentences and you batch them according to the batch_size field (which is what you are already doing I guess).

How does it sound?

PS: You don't want to tokenize all the sentences at once if you then process the sentences independently. If you do so, you only introduce unnecessary padding. Just process each sentence one by one using encode_plus instead.

Notes on the code:

Here https://gist.github.com/dldk-gael/2025f0584f781716697463c62a9779ee#file-bert_score-py-L12-L23

You don't need this. Just add the return_special_tokens_mask=True when you call the encode_plus method and then do the following:

tokens_mask = encoding["special_tokens_mask"] == 0
ids = encoding["input_ids"][tokens_mask]

dldk-gael · 2020-05-25T13:23:03Z

By doing so, when the user input lot of small sentences but use large batch size (for instance because he have lot of GPU memory), we won't fill the batch at each input.
And sometime, the problem will be the reverse. That why I think one batch should not necessary contained only masked sentences coming from one unique sentence.

simonepri · 2020-05-25T13:59:52Z

Yes that's true, but it also makes the logic much much more easier to maintain.
I would rather implement a simple version first test it, and then if it's worth it, improve it.

Anyway, If you find a clean way to batch multiple sentences then sure go ahead and do it.
If you do so, just be careful to not build all the masked sequences at once anyway.
Say that the user inputs 64 sentences each long 64 tokens you don't want to store 64^2 sequences of tokens in memory just to process up to batch_size sentences at a time.

dldk-gael · 2020-05-27T10:24:15Z

Hi @simonepri ,
I gave a try to the second option because BERT scoring method request many more operations than in GPT2 and so I want to use full batch capacity in any cases.
It is working even if is still some test to update (and some minor change to made as handling the empty string case correctly), but because I am not respecting BatchedLMScorer structure here, I want your opinion on this draft before going further.

dldk-gael · 2020-06-11T08:59:52Z

Hi @simonepri, did you have time to look ?
For your information, next I plan to compare the different scorer using Blimp dataset. I don't know if you have used this dataset before : https://arxiv.org/pdf/1912.00582.pdf but it is really related to what we try to achieve here ! (and it is available on huggingface/nlp ! :) )

dldk-gael · 2020-06-11T13:57:41Z

Quick update: I run some test on the Blimp dataset. With GPT2 (base) score I get 0.9 accuracy vs 0.85 with Bert Score. So I agree we can go back a to more easy-to-maintain version of the batch handler within Bert.
(btw, I observe a huge drop when using hmean reducing strategy, but i did not understand the idea behind this one).

simonepri · 2020-06-17T17:41:21Z

@dldk-gael Sorry I don't have time at the moment to review the code.
I should be able to come back to you in about a week. Ping me in case I foget.

but i did not understand the idea behind this one

Wha do you mean? It's just the harmonic mean of the individual token probabilities. I added it just for completeness (i.e. to have all the three Pythagorean means). It wasn't meant to be particularly useful.

dldk-gael · 2020-07-16T11:17:22Z

Hi @simonepri , are you still planning to work on this one ?

add BERTLMScore

6f8aee4

simonepri changed the title ~~add BERTLMScore~~ Add BERTLMScore Apr 28, 2020

dldk-gael mentioned this pull request Apr 28, 2020

Add possibility to input multiple sentences #4

Merged

dldk-gael added 4 commits April 29, 2020 16:45

add BERTLMScore

f2023fd

apply rebasing from upstream to correct poetry bug

275b55d

format file with black + handle empty string input case

427b783

add test file for BERT (not passing all tests for now)

631d70f

add batch_size feature (for mask sentences)

a965cc8

dldk-gael added 2 commits April 30, 2020 15:58

correct typo

64b65ed

add specific test for batch size

580c494

simonepri self-requested a review April 30, 2020 15:00

simonepri assigned dldk-gael Apr 30, 2020

simonepri marked this pull request as draft April 30, 2020 22:19

Merge branch 'master' into bert_score

5b979c8

dldk-gael added 2 commits May 27, 2020 12:15

use full batch capacity

2dfbc45

Update test to match GPT2 test

5ac2b81

label grice dataset

61d6cf8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BERTLMScore #3

Add BERTLMScore #3

dldk-gael commented Apr 28, 2020 •

edited by simonepri

Loading

simonepri commented Apr 28, 2020

dldk-gael commented Apr 29, 2020

dldk-gael commented Apr 29, 2020

simonepri commented Apr 30, 2020

simonepri commented May 16, 2020 •

edited

Loading

dldk-gael commented May 18, 2020

simonepri commented May 18, 2020

dldk-gael commented May 25, 2020

simonepri commented May 25, 2020

dldk-gael commented May 25, 2020 •

edited

Loading

simonepri commented May 25, 2020

dldk-gael commented May 27, 2020

dldk-gael commented Jun 11, 2020 •

edited

Loading

dldk-gael commented Jun 11, 2020

simonepri commented Jun 17, 2020

dldk-gael commented Jul 16, 2020

Add BERTLMScore #3

Are you sure you want to change the base?

Add BERTLMScore #3

Conversation

dldk-gael commented Apr 28, 2020 • edited by simonepri Loading

simonepri commented Apr 28, 2020

dldk-gael commented Apr 29, 2020

dldk-gael commented Apr 29, 2020

simonepri commented Apr 30, 2020

simonepri commented May 16, 2020 • edited Loading

dldk-gael commented May 18, 2020

simonepri commented May 18, 2020

dldk-gael commented May 25, 2020

simonepri commented May 25, 2020

Notes on the code:

dldk-gael commented May 25, 2020 • edited Loading

simonepri commented May 25, 2020

dldk-gael commented May 27, 2020

dldk-gael commented Jun 11, 2020 • edited Loading

dldk-gael commented Jun 11, 2020

simonepri commented Jun 17, 2020

dldk-gael commented Jul 16, 2020

dldk-gael commented Apr 28, 2020 •

edited by simonepri

Loading

simonepri commented May 16, 2020 •

edited

Loading

dldk-gael commented May 25, 2020 •

edited

Loading

dldk-gael commented Jun 11, 2020 •

edited

Loading