inference with LLM and vision frozen #400

simoneriggi · 2025-01-22T17:54:01Z

Dear all,
I have fine-tuned a LLaVA-OneVision (0.5B and 7B) with LLM and vision components frozen. The checkpoint output directory contains these files:

runs
checkpoint-1000     
...
...
checkpoint-22000
trainer_state.json
mm_projector.bin
config.json

Loading the trained model with load_pretrained_model(model_path, model_base=None, model_name='llava_qwen', device_map='auto') fail as no tokenizer files are found in the model_path. When I copy tokenizer files (tokenizer_config.json, tokenizer.json) from the base model, loading fails with this error: OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory.
I had a look at the load_pretrained_model method in the builder.py file. It seems that I should set model_base to base model (e.g. lmms-lab/llava-onevision-qwen2-0.5b-ov) rather than setting to None. Also, it seems that some logic to load qwen model is missing in the method.
I tried to add this code:

elif "qwen" in model_name.lower():
    from llava.model.language_model.llava_qwen import LlavaQwenConfig, LlavaQwenForCausalLM
            	
    tokenizer = AutoTokenizer.from_pretrained(model_base, use_fast=False)
    if overwrite_config is not None:
        llava_cfg = LlavaQwenConfig.from_pretrained(model_path)
        rank0_print(f"Overwriting config with {overwrite_config}")
        for k, v in overwrite_config.items():
             setattr(llava_cfg, k, v)
        model = LlavaQwenForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, attn_implementation=attn_implementation, config=llava_cfg, **kwargs)
    else:
        model = LlavaQwenForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, attn_implementation=attn_implementation, **kwargs)

right after:

elif model_base is not None:
    ...
    ...
    elif (
                "wizardlm-2" in model_name.lower()
                and "vicuna" in model_name.lower()
                or "llama" in model_name.lower()
                or "yi" in model_name.lower()
                or "nous-hermes" in model_name.lower()
                or "llava-v1.6-34b" in model_name.lower()
                or "llava-v1.5" in model_name.lower()
            ):
            ....
            ....
            model = LlavaLlamaForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, config=llava_cfg, **kwargs)      

   [ADD CODE HERE]

I managed to load the model with this fix. Can you please confirm if this is correct or if I am doing something wrong?
Thanks a lot for your help.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference with LLM and vision frozen #400

inference with LLM and vision frozen #400

simoneriggi commented Jan 22, 2025

inference with LLM and vision frozen #400

inference with LLM and vision frozen #400

Comments

simoneriggi commented Jan 22, 2025