Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The performance of the model I reproduced does not meet the standards outlined in the paper. #14

Open
stddddd opened this issue May 28, 2024 · 5 comments

Comments

@stddddd
Copy link

stddddd commented May 28, 2024

I reproduced the Main Result Reproduction on LoRA + InstructERC based on Llama2, and the performance I got did not meet the paper. The table below is the comparision:

W-F1 IEMOCAP MELD EmoryNLP
reproduce 65.47 66.96 39.16
paper 71.39 69.15 41.37

Compared to the original code, I only made the following modifications:

  1. data_percent: 1/64 -> 1

  2. set LLaMA2 MODELPATH to my model path, the Llama2 version I use is Llama-2-7b-chat-hf

  3. While reproducing the code, I met an issue: RuntimeError: probability tensor contains either inf, nan or element < 0.

    To solve the problem, I added a code to the Llama2 model file:

probs = nn.functional.softmax(next_token_scores, dim=-1)

nans = torch.isnan(probs)
if nans.any(): 
   idx = torch.argwhere(torch.sum(nans, 1))
   z = torch.zeros_like(probs[idx][0])
   z[0][2] = 1.
   probs[idx] = z
   
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)

What else should I modify to reach the performance mentioned in the paper?

@stddddd stddddd changed the title The performance of the reproduced model does not meet the standards outlined in the paper. The performance of the model I reproduced does not meet the standards outlined in the paper. May 28, 2024
@LIN-SHANG
Copy link
Owner

The large performance gap is indeed confusing,belowing is something that may help you:

LLaMA version: https://huggingface.co/meta-llama/Llama-2-7b-hf or https://huggingface.co/meta-llama/Llama-2-7b
I haven't tried about any version of LLaMA Chat

Besides, I haven't met the RuetimeError you provides before, I can provide related GPU, Nivida Driver and CUDA version:

A100, 470, 11.7

@stddddd
Copy link
Author

stddddd commented May 29, 2024

I reproduced again using your mentioned environment A100, Nvidia Driver 470, and CUDA version 11.7. Besides, I downloaded LLaMA version Llama-2-7b-hf from your produced link: https://huggingface.co/meta-llama/Llama-2-7b-hf.

However, the performance I got still did not meet the paper. The table below is the comparison:

W-F1 IEMOCAP MELD EmoryNLP
reproduce 67.53 67.46 39.20
paper 71.39 69.15 41.37

Do you have any idea about it?

The large performance gap is indeed confusing,belowing is something that may help you:

LLaMA version: https://huggingface.co/meta-llama/Llama-2-7b-hf or https://huggingface.co/meta-llama/Llama-2-7b I haven't tried about any version of LLaMA Chat

Besides, I haven't met the RuetimeError you provides before, I can provide related GPU, Nivida Driver and CUDA version:

A100, 470, 11.7

@LIN-SHANG
Copy link
Owner

It seems that this gap has been reduced a bit, you can try to adjust the historical window (from 5 to 12), this parameter has an impact on the best performance.

@stddddd
Copy link
Author

stddddd commented May 29, 2024

the historical window has already been set to 12 in the previous two reproductions

It seems that this gap has been reduced a bit, you can try to adjust the historical window (from 5 to 12), this parameter has an impact on the best performance.

@stddddd
Copy link
Author

stddddd commented May 31, 2024

This is my hyper-parameter setting while reproducing the model, what should I modify to improve the performance?

hyper-parameter IEMOCAP/MELD/EmoryNLP
GPU A100
Nvidia Driver 470
CUDA version 11.7
llm-model llama-2-7b-hf
experiment setting lora
historical window 12
accumulations 8
graphics card 4
speaker task None
domain base False
emotion prediction False
data percent 1.0
LR 2e-4
eval batch size 8
num train epochs 6
save steps 100000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants