Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train_llava推理结果有问题。 #197

Closed
weiaicunzai opened this issue Oct 31, 2024 · 7 comments
Closed

train_llava推理结果有问题。 #197

weiaicunzai opened this issue Oct 31, 2024 · 7 comments

Comments

@weiaicunzai
Copy link

weiaicunzai commented Oct 31, 2024

大佬好。我的train_llava训练完以后,推理过程出现两个问题:
1: 预测结尾有 < | i m _ e n d | >

2: 预测结尾每个字符之间都有空格间隔

T h e   i m a g e   s h o w s   a   p e r s o n   s t a n d i n g   i n   f r o n t   o f   a   d o o r ,   w i t h   t h e i r   h a n d s   i n   t h e i r   p o c k e t s   a n d   t h e i r   e y e s   f i x e d   o n   t h e   g r o u n d .   T h e   p e r s o n ' s   f a c e   i s   e x p r e s s i o n l e s s ,   a n d   t h e r e   i s   n o   c l e a r   a c t i o n   o r   m o v e m e n t   i n   t h e   s c e n e .   T h e   i m a g e   c o u l d   b e   a   s c e n e   f r o m   a   m o v i e ,   a   v i d e o   g a m e ,   o r   a   r e a l - l i f e   s i t u a t i o n   w h e r e   s o m e o n e   i s   s t a n d i n g   i n   f r o n t   o f   a   d o o r . < | i m _ e n d | >

而 label的string是

chatbot: the test - footed nerve's steps an evening with frank zappa by michael e schwartz

不知道为啥会这样。

以下是我的推理代码:

import os
from PIL import Image

import pandas as pd
from transformers import  LlavaForConditionalGeneration, AutoProcessor



def load_modal_and_processor(model_path):
    model = LlavaForConditionalGeneration.from_pretrained(model_path)
    processor = AutoProcessor.from_pretrained(model_path)

    return model, processor


def build_model_input(data_path, processor):
    # from dataset import PretrainData, Collator

    # return PretrainData(data_path, processor, -100), Collator(processor.tokenizer.pad_token_id)

    json_path = os.path.join(data_path, 'blip_laion_cc_sbu_558k.json')
    df = pd.read_json(json_path)
    name, image_path, conversations = df.iloc[55]
    image_path = os.path.join(data_path, 'images', image_path)
    human_input = conversations[0].get('value')
    chatbot_output = conversations[1].get('value')
    print('human:', human_input)
    print('chatbot:', chatbot_output)
    print('image_path', image_path)
    print(json_path)

    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": human_input},
    ]

    image = Image.open(image_path)
    prompt = processor.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

    prompt = processor(text=prompt, images=image, return_tensors='pt')



    return prompt







model, processor = load_modal_and_processor('/mnt/dolphinfs/ssd_pool/docker/user/hadoop-mlm/by/train_llava/pretrained_model/model001')
prompt = build_model_input('/mnt/dolphinfs/hdd_pool/docker/user/hadoop-aipnlp/BERT_TRAINING_SERVICE/platform/dataset/liuhaotian/LLaVA-Pretrain/main/', processor)

model.eval()

model = model.to('cuda:1')


for tk in prompt.keys():
    prompt[tk] = prompt[tk].to(model.device)

generate_ids = model.generate(**prompt, max_new_tokens=100)


generate_ids = [
    oid[len(iids):] for oid, iids in zip(generate_ids, prompt.input_ids)
]

gen_text = processor.batch_decode(generate_ids, skip_special_tokens=False, clean_up_tokenization_spaces=False)[0]

print('pred:', gen_text)
@weiaicunzai
Copy link
Author

weiaicunzai commented Oct 31, 2024

#185

发现有类似的bug,也是文本有空格+末尾有个token没有被替换掉。是否是因为tokenizer 用了两个,一个clip的,一个qwen的导致的?

@yuanzhoulvpi2017
Copy link
Owner

检查一下preprocessor_config.json文件里面"processor_class"的值。看看你的是不是 "image_processor_type": "CLIPImageProcessor",

但是这个值要是:"LlavaProcessor"才行

@yuanzhoulvpi2017
Copy link
Owner

具体代码参考这个吧#185 (comment)

@weiaicunzai
Copy link
Author

weiaicunzai commented Jan 1, 2025

感谢回复。
请问有一个follow up 问题。为什么LlavaProcessor会导致这种bug? 我看llava-next中写的就是 "image_processor_type": "CLIPImageProcessor",

https://hf-mirror.com/lmms-lab/llama3-llava-next-8b/blob/main/preprocessor_config.json

为啥他们没这种问题?

@yuanzhoulvpi2017

@weiaicunzai
Copy link
Author

weiaicunzai commented Jan 1, 2025

哦,我可能懂了。

因为我在代码里,用了clip的tokenizer来格式化prompt

prompt = self.processor.tokenizer.apply_chat_template(
            messages, 
            tokenize=False, 
            add_generation_prompt=True
        )

然后由于我把processor和qwen的文件放在一起,clip的tokenizer找不到自己的bpe文件对应的单词,而qwen的可能跟他不匹配,所以把每个character分一个单词,就有空格了?而llava-next只是用了clip的image processor部分,没用tokenizer,所以他们没事。

您感觉我的推理正确吗?我没有用实验去验证。

如果是这样的话,那就是各种bug混在一起反而跑通了,但是导致效果不行。太神奇了,但是这种bug也是最难定位的,没有解释器的错误提示。

@yuanzhoulvpi2017
Copy link
Owner

这种不是最难定位的,而是你混淆了构造llava模型的最主要的关键点。

  1. 图像处理模块。
  2. 文本处理模块。
  3. 零碎的处理需求。建议查看我的b站视频,再复习一下。

@weiaicunzai
Copy link
Author

这种不是最难定位的,而是你混淆了构造llava模型的最主要的关键点。

  1. 图像处理模块。
  2. 文本处理模块。
  3. 零碎的处理需求。建议查看我的b站视频,再复习一下。

好的,大佬。你的视频我已经看了三遍了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants