RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::Half != float #693

miaochaowei · 2025-01-06T07:47:21Z

System Info / 系統信息

transformers 4.47.1
python 3.11

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

args=TrainingArguments(
per_device_train_batch_size=2,
optim="adafactor", gradient_checkpointing=False,
gradient_accumulation_steps=4,
warmup_steps=2,
max_steps=10,
learning_rate=2e-4,
seed = 42,
report_to="wandb",
fp16=False,
logging_steps=1,
output_dir="./GLM4"
)
trainer = Trainer(
model=model,
args=args,
train_dataset=tokenized_id,
data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
)
trainer.train()---这一步执行报错，

Expected behavior / 期待表现

希望可以解决该问题

zhipuch · 2025-01-06T07:53:31Z

两个tensor数据类型不一样，能提供示例吗

miaochaowei · 2025-01-06T09:05:21Z

from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
StoppingCriteria,
StoppingCriteriaList,
)

device_map = {'transformer.word_embeddings': 0,

'transformer.final_layernorm': num_gpus-1, 'lm_head': num_gpus-1}

model = AutoModelForCausalLM.from_pretrained("/starfs-dev1/miaochaowei/deepctrcloudml/DeepCTR2/llmdemo/glm-4-9b-chat/", trust_remote_code=True).half().cuda()

model = model.eval()

tokenizer = AutoTokenizer.from_pretrained("/starfs-dev1/miaochaowei/deepctrcloudml/DeepCTR2/llmdemo/glm-4-9b-chat/", trust_remote_code=True)

#--------------------------------------------lora train-----------
from peft import LoraConfig, TaskType, get_peft_model
import importlib
import transformers
importlib.reload(transformers)
from transformers import TrainingArguments, Trainer, GenerationConfig,DataCollatorForSeq2Seq
config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
target_modules=["query_key_value", "dense"], # 现存问题只微调部分演示即可
inference_mode=False, # 训练模式
r=8, # Lora 秩
lora_alpha=32, # Lora alaph，具体作用参见 Lora 原理
lora_dropout=0.1# Dropout 比例)

model = get_peft_model(model, config)

def process_func(example):
MAX_LENGTH = 384
input_ids, attention_mask, labels = [], [], []
instruction = tokenizer((f"[gMASK]<|system|>\n 假如你是一个专业的jira 分析员。<|user|>\n"
f"{example['instruction']+example['input']}<|assistant|>\n"
).strip(),
add_special_tokens=False)
response = tokenizer(f"{example['output']}", add_special_tokens=False)
input_ids = instruction["input_ids"] + response["input_ids"] + [tokenizer.pad_token_id]
attention_mask = instruction["attention_mask"] + response["attention_mask"] + [1] # 因为eos token咱们也是要关注的所以补充为1
labels = [-100] * len(instruction["input_ids"]) + response["input_ids"] + [tokenizer.pad_token_id]
if len(input_ids) > MAX_LENGTH: # 做一个截断
input_ids = input_ids[:MAX_LENGTH]
attention_mask = attention_mask[:MAX_LENGTH]
labels = labels[:MAX_LENGTH]
return {
"input_ids": input_ids,
"attention_mask": attention_mask,
"labels": labels
}
tokenized_id = ds.map(process_func, remove_columns=ds.column_names)
print(tokenizer.decode(tokenized_id[0]['input_ids']))
print(tokenized_id[0]['input_ids'])
print(tokenizer.decode([151331, 151333, 151335]))
print(tokenizer.encode('[gMASK]<|system|>', add_special_tokens=False))

args = TrainingArguments(
output_dir="./GLM4",
per_device_train_batch_size=1,
gradient_accumulation_steps=8,
logging_steps=50,
num_train_epochs=2,
save_steps=100,
learning_rate=1e-5,
save_on_each_node=True,
gradient_checkpointing=True
)

trainer = Trainer(
model=model,
args=args,
train_dataset=tokenized_id,
data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
)
trainer.train()

两个tensor数据类型不一样，能提供示例吗

上面是所有的代码

zRzRzRzRzRzRzR · 2025-01-11T03:52:24Z

是否有使用int4之类的工作，或者GPU不支持BF16

zhipuch self-assigned this Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::Half != float #693

RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::Half != float #693

miaochaowei commented Jan 6, 2025

zhipuch commented Jan 6, 2025

miaochaowei commented Jan 6, 2025

zRzRzRzRzRzRzR commented Jan 11, 2025

RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::Half != float #693

RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::Half != float #693

Comments

miaochaowei commented Jan 6, 2025

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

zhipuch commented Jan 6, 2025

miaochaowei commented Jan 6, 2025

device_map = {'transformer.word_embeddings': 0,

'transformer.final_layernorm': num_gpus-1, 'lm_head': num_gpus-1}

model = model.eval()

zRzRzRzRzRzRzR commented Jan 11, 2025