You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear all,
I have fine-tuned a LLaVA-OneVision (0.5B and 7B) with LLM and vision components frozen. The checkpoint output directory contains these files:
Loading the trained model with load_pretrained_model(model_path, model_base=None, model_name='llava_qwen', device_map='auto') fail as no tokenizer files are found in the model_path. When I copy tokenizer files (tokenizer_config.json, tokenizer.json) from the base model, loading fails with this error: OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory.
I had a look at the load_pretrained_model method in the builder.py file. It seems that I should set model_base to base model (e.g. lmms-lab/llava-onevision-qwen2-0.5b-ov) rather than setting to None. Also, it seems that some logic to load qwen model is missing in the method.
I tried to add this code:
elif "qwen" in model_name.lower():
from llava.model.language_model.llava_qwen import LlavaQwenConfig, LlavaQwenForCausalLM
tokenizer = AutoTokenizer.from_pretrained(model_base, use_fast=False)
if overwrite_config is not None:
llava_cfg = LlavaQwenConfig.from_pretrained(model_path)
rank0_print(f"Overwriting config with {overwrite_config}")
for k, v in overwrite_config.items():
setattr(llava_cfg, k, v)
model = LlavaQwenForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, attn_implementation=attn_implementation, config=llava_cfg, **kwargs)
else:
model = LlavaQwenForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, attn_implementation=attn_implementation, **kwargs)
right after:
elif model_base is not None:
...
...
elif (
"wizardlm-2" in model_name.lower()
and "vicuna" in model_name.lower()
or "llama" in model_name.lower()
or "yi" in model_name.lower()
or "nous-hermes" in model_name.lower()
or "llava-v1.6-34b" in model_name.lower()
or "llava-v1.5" in model_name.lower()
):
....
....
model = LlavaLlamaForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, config=llava_cfg, **kwargs)
[ADD CODE HERE]
I managed to load the model with this fix. Can you please confirm if this is correct or if I am doing something wrong?
Thanks a lot for your help.
The text was updated successfully, but these errors were encountered:
Dear all,
I have fine-tuned a LLaVA-OneVision (0.5B and 7B) with LLM and vision components frozen. The checkpoint output directory contains these files:
Loading the trained model with
load_pretrained_model(model_path, model_base=None, model_name='llava_qwen', device_map='auto')
fail as no tokenizer files are found in the model_path. When I copy tokenizer files (tokenizer_config.json
,tokenizer.json
) from the base model, loading fails with this error:OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory
.I had a look at the
load_pretrained_model
method in thebuilder.py
file. It seems that I should setmodel_base
to base model (e.g.lmms-lab/llava-onevision-qwen2-0.5b-ov
) rather than setting to None. Also, it seems that some logic to load qwen model is missing in the method.I tried to add this code:
right after:
I managed to load the model with this fix. Can you please confirm if this is correct or if I am doing something wrong?
Thanks a lot for your help.
The text was updated successfully, but these errors were encountered: