Fine tuned bert LM #2

zparcheta · 2020-06-08T09:30:09Z

Hi,
I use pytorch_pretrained_BERT/examples/python run_lm_finetuning.py to fit the model with monolingual set of sentences. I use bert multilingual cased model.

Once the model is fine-tuned, I get the loss for given sentences with the following code:

def get_score(sentence, model):
    tokenize_input = tokenizer.tokenize(sentence)
    tensor_input = torch.tensor([tokenizer.convert_tokens_to_ids(tokenize_input)])
    model.eval()
    predictions=model(tensor_input)
    loss_fct = torch.nn.CrossEntropyLoss()
    loss = loss_fct(predictions.squeeze(),tensor_input.squeeze()).data 
    return math.exp(loss)

sentence = "ﺶﻋﺮﺴﺗﺎﻧ؛ ﺩ پښﺕﻭ ﺶﻋﺭپﻮﻬﻧې ﻥﻭی پړﺍﻭ - ﺕﺎﻧﺩ"
tokenizer = BertTokenizer.from_pretrained('bert-base-multilingual-cased')
stats=torch.load('pytorch_model.bin')
bertMaskedLM = BertForMaskedLM.from_pretrained('bert-base-multilingual-cased', state_dict=stats)

print(get_score(sentence, bertMaskedLM))

78637.05198167797

bertMaskedLM_orig = BertForMaskedLM.from_pretrained('bert-base-multilingual-cased')
print(get_score(sentence, bertMaskedLM_orig))

7.919475431571431

The strange thing is that the fine-tuned model returns much higher loss scores, even if the evaluated sentence appeared in monolingual training data.

Is something I am doing wrong? I just want to check how well the given sentence fits into LM.

Regards and thanks in advance

The text was updated successfully, but these errors were encountered:

pangbochen · 2020-07-30T05:52:37Z

I suggest to use https://github.com/huggingface/transformers

this repo is the copy of huggingface's project

pytorch_pretrained_BERT in huggingface change to transformers

pangbochen · 2020-07-30T05:54:05Z

see the original code in
https://github.com/huggingface/transformers/tree/0.5.0

best regards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine tuned bert LM #2

Fine tuned bert LM #2

zparcheta commented Jun 8, 2020 •

edited

Loading

pangbochen commented Jul 30, 2020

pangbochen commented Jul 30, 2020

Fine tuned bert LM #2

Fine tuned bert LM #2

Comments

zparcheta commented Jun 8, 2020 • edited Loading

pangbochen commented Jul 30, 2020

pangbochen commented Jul 30, 2020

zparcheta commented Jun 8, 2020 •

edited

Loading