train_llava推理结果有问题。 #197

weiaicunzai · 2024-10-31T04:55:11Z

大佬好。我的train_llava训练完以后，推理过程出现两个问题：
1: 预测结尾有 < | i m _ e n d | >

2: 预测结尾每个字符之间都有空格间隔

T h e   i m a g e   s h o w s   a   p e r s o n   s t a n d i n g   i n   f r o n t   o f   a   d o o r ,   w i t h   t h e i r   h a n d s   i n   t h e i r   p o c k e t s   a n d   t h e i r   e y e s   f i x e d   o n   t h e   g r o u n d .   T h e   p e r s o n ' s   f a c e   i s   e x p r e s s i o n l e s s ,   a n d   t h e r e   i s   n o   c l e a r   a c t i o n   o r   m o v e m e n t   i n   t h e   s c e n e .   T h e   i m a g e   c o u l d   b e   a   s c e n e   f r o m   a   m o v i e ,   a   v i d e o   g a m e ,   o r   a   r e a l - l i f e   s i t u a t i o n   w h e r e   s o m e o n e   i s   s t a n d i n g   i n   f r o n t   o f   a   d o o r . < | i m _ e n d | >

而 label的string是

chatbot: the test - footed nerve's steps an evening with frank zappa by michael e schwartz

不知道为啥会这样。

以下是我的推理代码：

import os
from PIL import Image

import pandas as pd
from transformers import  LlavaForConditionalGeneration, AutoProcessor



def load_modal_and_processor(model_path):
    model = LlavaForConditionalGeneration.from_pretrained(model_path)
    processor = AutoProcessor.from_pretrained(model_path)

    return model, processor


def build_model_input(data_path, processor):
    # from dataset import PretrainData, Collator

    # return PretrainData(data_path, processor, -100), Collator(processor.tokenizer.pad_token_id)

    json_path = os.path.join(data_path, 'blip_laion_cc_sbu_558k.json')
    df = pd.read_json(json_path)
    name, image_path, conversations = df.iloc[55]
    image_path = os.path.join(data_path, 'images', image_path)
    human_input = conversations[0].get('value')
    chatbot_output = conversations[1].get('value')
    print('human:', human_input)
    print('chatbot:', chatbot_output)
    print('image_path', image_path)
    print(json_path)

    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": human_input},
    ]

    image = Image.open(image_path)
    prompt = processor.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

    prompt = processor(text=prompt, images=image, return_tensors='pt')



    return prompt







model, processor = load_modal_and_processor('/mnt/dolphinfs/ssd_pool/docker/user/hadoop-mlm/by/train_llava/pretrained_model/model001')
prompt = build_model_input('/mnt/dolphinfs/hdd_pool/docker/user/hadoop-aipnlp/BERT_TRAINING_SERVICE/platform/dataset/liuhaotian/LLaVA-Pretrain/main/', processor)

model.eval()

model = model.to('cuda:1')


for tk in prompt.keys():
    prompt[tk] = prompt[tk].to(model.device)

generate_ids = model.generate(**prompt, max_new_tokens=100)


generate_ids = [
    oid[len(iids):] for oid, iids in zip(generate_ids, prompt.input_ids)
]

gen_text = processor.batch_decode(generate_ids, skip_special_tokens=False, clean_up_tokenization_spaces=False)[0]

print('pred:', gen_text)

The text was updated successfully, but these errors were encountered:

weiaicunzai · 2024-10-31T07:07:09Z

#185

发现有类似的bug，也是文本有空格+末尾有个token没有被替换掉。是否是因为tokenizer 用了两个，一个clip的，一个qwen的导致的？

yuanzhoulvpi2017 · 2024-12-15T09:50:03Z

检查一下preprocessor_config.json文件里面"processor_class"的值。看看你的是不是 "image_processor_type": "CLIPImageProcessor",

但是这个值要是："LlavaProcessor"才行

yuanzhoulvpi2017 · 2024-12-15T09:58:03Z

具体代码参考这个吧#185 (comment)

weiaicunzai · 2025-01-01T15:19:27Z

感谢回复。
请问有一个follow up 问题。为什么LlavaProcessor会导致这种bug？我看llava-next中写的就是 "image_processor_type": "CLIPImageProcessor",

https://hf-mirror.com/lmms-lab/llama3-llava-next-8b/blob/main/preprocessor_config.json

为啥他们没这种问题？

@yuanzhoulvpi2017

weiaicunzai · 2025-01-01T15:38:50Z

哦，我可能懂了。

因为我在代码里，用了clip的tokenizer来格式化prompt

prompt = self.processor.tokenizer.apply_chat_template(
            messages, 
            tokenize=False, 
            add_generation_prompt=True
        )

然后由于我把processor和qwen的文件放在一起，clip的tokenizer找不到自己的bpe文件对应的单词，而qwen的可能跟他不匹配，所以把每个character分一个单词，就有空格了？而llava-next只是用了clip的image processor部分，没用tokenizer，所以他们没事。

您感觉我的推理正确吗？我没有用实验去验证。

如果是这样的话，那就是各种bug混在一起反而跑通了，但是导致效果不行。太神奇了，但是这种bug也是最难定位的，没有解释器的错误提示。

yuanzhoulvpi2017 · 2025-01-02T10:54:01Z

这种不是最难定位的，而是你混淆了构造llava模型的最主要的关键点。

图像处理模块。
文本处理模块。
零碎的处理需求。建议查看我的b站视频，再复习一下。

weiaicunzai · 2025-01-02T14:55:10Z

这种不是最难定位的，而是你混淆了构造llava模型的最主要的关键点。

图像处理模块。

文本处理模块。

零碎的处理需求。建议查看我的b站视频，再复习一下。

好的，大佬。你的视频我已经看了三遍了。

weiaicunzai mentioned this issue Oct 31, 2024

我发现有个issue 也是类似的错误： #198

Closed

weiaicunzai closed this as completed Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train_llava推理结果有问题。 #197

train_llava推理结果有问题。 #197

weiaicunzai commented Oct 31, 2024 •

edited

Loading

weiaicunzai commented Oct 31, 2024 •

edited

Loading

yuanzhoulvpi2017 commented Dec 15, 2024

yuanzhoulvpi2017 commented Dec 15, 2024

weiaicunzai commented Jan 1, 2025 •

edited

Loading

weiaicunzai commented Jan 1, 2025 •

edited

Loading

yuanzhoulvpi2017 commented Jan 2, 2025

weiaicunzai commented Jan 2, 2025

train_llava推理结果有问题。 #197

train_llava推理结果有问题。 #197

Comments

weiaicunzai commented Oct 31, 2024 • edited Loading

weiaicunzai commented Oct 31, 2024 • edited Loading

yuanzhoulvpi2017 commented Dec 15, 2024

yuanzhoulvpi2017 commented Dec 15, 2024

weiaicunzai commented Jan 1, 2025 • edited Loading

weiaicunzai commented Jan 1, 2025 • edited Loading

yuanzhoulvpi2017 commented Jan 2, 2025

weiaicunzai commented Jan 2, 2025

weiaicunzai commented Oct 31, 2024 •

edited

Loading

weiaicunzai commented Oct 31, 2024 •

edited

Loading

weiaicunzai commented Jan 1, 2025 •

edited

Loading

weiaicunzai commented Jan 1, 2025 •

edited

Loading