Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练和评估分类模型的时候报以下错误,请问是什么原因呢? #301

Open
CachCheng opened this issue Jun 27, 2024 · 2 comments

Comments

@CachCheng
Copy link

Traceback (most recent call last):
File "/meta/cash/llm/InternImage/classification/main.py", line 661, in
main(config)
File "/meta/cash/llm/InternImage/classification/main.py", line 275, in main
acc1, acc5, loss = validate(config, data_loader_val, model)
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/meta/cash/llm/InternImage/classification/main.py", line 531, in validate
for idx, (images, target) in enumerate(data_loader):
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 633, in next
data = self._next_data()
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
return self._process_data(data)
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
data.reraise()
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/_utils.py", line 644, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/meta/cash/llm/InternImage/classification/dataset/cached_image_folder.py", line 323, in getitem
img = self.transform(img)
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torchvision/transforms/transforms.py", line 95, in call
img = t(img)
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torchvision/transforms/transforms.py", line 137, in call
return F.to_tensor(pic)
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torchvision/transforms/functional.py", line 166, in to_tensor
img = torch.from_numpy(np.array(pic, mode_to_nptype.get(pic.mode, np.uint8), copy=True))
RuntimeError: Numpy is not available

@CachCheng
Copy link
Author

[2024-06-27 16:04:18 internimage_b_1k_224](main.py 354): INFO Max accuracy: 99.56%
[2024-06-27 16:04:19 internimage_b_1k_224](main.py 561): INFO Test: [0/25] Time 0.757 (0.757) Loss 8.1779 (8.1779) Acc@1 0.000 (0.000) Acc@5 0.000 (0.000) Mem 20310MB
[2024-06-27 16:04:22 internimage_b_1k_224](main.py 561): INFO Test: [10/25] Time 0.308 (0.349) Loss 8.1767 (8.2419) Acc@1 0.000 (0.000) Acc@5 0.000 (0.000) Mem 20310MB
[2024-06-27 16:04:25 internimage_b_1k_224](main.py 561): INFO Test: [20/25] Time 0.308 (0.329) Loss 8.4152 (8.3329) Acc@1 0.000 (0.000) Acc@5 0.000 (0.000) Mem 20310MB
[2024-06-27 16:04:27 internimage_b_1k_224](main.py 568): INFO [Epoch:6] * Acc@1 0.000 Acc@5 0.000
[2024-06-27 16:04:27 internimage_b_1k_224](main.py 359): INFO Accuracy of the ema network on the 1600 test images: 0.0%
[2024-06-27 16:04:27 internimage_b_1k_224](main.py 374): INFO Max ema accuracy: 0.00%
[2024-06-27 16:04:28 internimage_b_1k_224](main.py 506): INFO Train: [7/300][0/100] eta 0:01:55 lr 0.000022 time 1.1594 (1.1594) model_time 0.6929 (0.6929) loss 1.5148 (1.5148) grad_norm 6.1735 (6.1735/0.0000) mem 20310MB
[2024-06-27 16:04:35 internimage_b_1k_224](main.py 506): INFO Train: [7/300][10/100] eta 0:01:06 lr 0.000022 time 0.6905 (0.7409) model_time 0.6904 (0.6983) loss 1.2380 (1.4515) grad_norm 6.4725 (6.8944/2.1804) mem 20310MB
[2024-06-27 16:04:42 internimage_b_1k_224](main.py 506): INFO Train: [7/300][20/100] eta 0:00:57 lr 0.000023 time 0.6914 (0.7173) model_time 0.6912 (0.6949) loss 1.5632 (1.4158) grad_norm 5.4286 (6.2839/2.3783) mem 20310MB

训练的时候为什么精度在INFO Max accuracy: 99.56%,但是Max ema accuracy: 0.00%,是怎么回事呢?

@CachCheng
Copy link
Author

(llm) ahs@ahs-SYS-4029GP-TRTC-ZY001:/meta/cash/llm/InternImage/classification$ python export.py --model_name internimage_b_1k_224 --ckpt_dir output/internimage_b_1k_224 --trt
=> merge config from ./configs/internimage_b_1k_224.yaml
using core type: DCNv3
using activation layer: GELU
using main norm layer: LN
using dpr: linear, 0.5
level2_post_norm: False
level2_post_norm_block_ids: None
res_post_norm: False
remove_center: False
============= Diagnostic Run torch.onnx.export version 2.0.1+cu117 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

Traceback (most recent call last):
File "/meta/cash/llm/InternImage/classification/export.py", line 121, in
main()
File "/meta/cash/llm/InternImage/classification/export.py", line 112, in main
torch2onnx(args, cfg)
File "/meta/cash/llm/InternImage/classification/export.py", line 61, in torch2onnx
torch.onnx.export(model,
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/onnx/utils.py", line 506, in export
_export(
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/onnx/utils.py", line 1548, in _export
graph, params_dict, torch_out = _model_to_graph(
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/onnx/utils.py", line 1117, in _model_to_graph
graph = _optimize_graph(
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/onnx/utils.py", line 665, in _optimize_graph
graph = _C._jit_pass_onnx(graph, operator_export_type)
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/onnx/utils.py", line 1708, in _run_symbolic_method
return symbolic_fn(graph_context, *args)
File "/meta/cash/llm/InternImage/classification/ops_dcnv3/functions/dcnv3_func.py", line 88, in symbolic
return g.op(
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/onnx/_internal/jit_utils.py", line 86, in op
return _add_op(self, opname, *raw_args, outputs=outputs, **kwargs)
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/onnx/_internal/jit_utils.py", line 245, in _add_op
node = _create_node(
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/onnx/_internal/jit_utils.py", line 304, in _create_node
_add_attribute(node, key, value, aten=aten)
File "/home/ahs/anaconda3/envs/llm/lib/python3.9/site-packages/torch/onnx/_internal/jit_utils.py", line 337, in _add_attribute
raise ValueError(
ValueError: Invalid attribute specifier 'remove_center' names must be suffixed with type, e.g. 'dim_i' or 'dims_i'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant