Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eval results make no sense #3

Open
findlet39 opened this issue Jun 6, 2024 · 5 comments
Open

eval results make no sense #3

findlet39 opened this issue Jun 6, 2024 · 5 comments

Comments

@findlet39
Copy link

使用您预训练的权重去进行语义推断,发现生成的句子是一些杂乱无章的单词拼凑而成,不知道是什么原因。
image
我的操作步骤是将bert-base-uncased的模型文件下载到本地,然后修改对应的模型路径;从您给的链接下载了视觉编码器的模型和预训练的权重,分别放在了本地文件夹下,在coco_eval.py中更改了对应的路径。
我在自己的照片和coco2014数据集上都进行了推断,都是生成杂乱无章的句子。
p.s.在运行的时候给出了以下警告:
Some weights of the model checkpoint at D:\pyproject\LAVIS-main\models\blip_2\bert-base-uncased were not used when initializing BertLowModel: ['bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.8.output.dense.bias', 'cls.predictions.transform.dense.weight', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.8.attention.self.query.bias', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.query.weight', 'cls.predictions.transform.dense.bias', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.6.attention.output.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.9.attention.self.key.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'bert.encoder.layer.7.attention.self.key.bias', 'cls.predictions.transform.LayerNorm.weight', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.11.attention.self.query.bias', 'cls.predictions.bias']

  • This IS expected if you are initializing BertLowModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing BertLowModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    有一些权重没有加载,这是导致这种问题的原因吗?
@wanghua-lei
Copy link

wanghua-lei commented Jun 6, 2024

这个权重没加载是正常的,毕竟encoder加载的是bert前6层,后6层没加载,但是decoder重建加载的是整个bert12层,但是作者好像把decoder加载并没有经过后6层,不知道是不是我理解的不对
image
然后我的推理结果也是这样,会有很多无意义的词或者是"."或者是重复表达或者是主体错误,不知道是什么原因

@findlet39
Copy link
Author

findlet39 commented Jun 6, 2024 via email

@wangyuchi369
Copy link
Owner

@wanghua-lei @findlet39 你们好!感谢你们的疑问,首先关于预训练的加载 @wanghua-lei 的说法是对的,为了代码实现方便我把decoder全部12层参数都加载进去了,但forward里面没过前六层。然后乱码的问题我们测试了一下,怀疑是你们听从了之前一个issue的建议或者config里的默认设置把variance diate设成了9,但我们提供的这个ckpt版本是var_dilate为4的,因此inference的时候这个ckpt没有见过噪声如此大的情况。

为此,我们在新的commit里加了一个参数"var_dilate_val",把训练和测试时的这两个variance分开了,"var_dilate_val"建议设为和训练时匹配的值或者直接设为1即可。(至少保证val_var < train_var)

我们的测试结果如下,也会有权重没有加载的提示,然后输出结果正常,供参考。如果不是这个问题欢迎继续comment。

image

@wanghua-lei
Copy link

@wangyuchi369 感谢作者回答,还有一个疑问,在这个训练部分需要在dim维度concat一个全0的向量这个做法的目的是什么,是把 0 的这个信息引入,帮助将这些 special token 预测成全0吗,可以在dim维度不变直接相加➕吗?
issue

@wangyuchi369
Copy link
Owner

@wanghua-lei Hi, 这个是self-conditioning的technique,请参考https://arxiv.org/pdf/2208.04202
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants