Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR - total_loss not present in the dictionary. Available keys are: []. Exiting!!! #104

Open
111hyq111 opened this issue Apr 16, 2024 · 0 comments

Comments

@111hyq111
Copy link

Training with my own dataset appear error:
2024-04-16 18:56:22 - DEBUG - Training epoch 0 with 0 samples
File "/home/hyq/anaconda3/envs/cvnets/bin/cvnets-train", line 8, in
sys.exit(main_worker())
File "/home/hyq/文档/ml-cvnets/main_train.py", line 235, in main_worker
main(opts=opts, **kwargs)
File "/home/hyq/anaconda3/envs/cvnets/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/hyq/文档/ml-cvnets/main_train.py", line 174, in main
training_engine.run(train_sampler=train_sampler)
File "/home/hyq/文档/ml-cvnets/engine/training_engine.py", line 606, in run
train_loss, train_ckpt_metric = self.train_epoch(epoch)
File "/home/hyq/文档/ml-cvnets/engine/training_engine.py", line 357, in train_epoch
avg_loss = train_stats.avg_statistics(
File "/home/hyq/文档/ml-cvnets/metrics/stats.py", line 148, in avg_statistics
logger.error(
File "/home/hyq/文档/ml-cvnets/utils/logger.py", line 46, in error
traceback.print_stack()
2024-04-16 18:56:22 - LOGS - Training took 00:00:02.11
2024-04-16 18:56:22 - ERROR - total_loss not present in the dictionary. Available keys are: []. Exiting!!!

train to use:cvnets-train --common.config-file /home/hyq/下载/pspnet-mobilevitv2-1.0.yaml --common.results-loc segmentation_results

pspnet-mobilevitv2-1.0.yaml:
common:
run_label: "run_1"
accum_freq: 1
accum_after_epoch: -1
log_freq: 200
auto_resume: false
mixed_precision: true
grad_clip: 10.0
dataset:
root_train: "/media/hyq/西部数据2TB/ml-cvnets_data/"
root_val: "/media/hyq/西部数据2TB/ml-cvnets_data/"
name: "ade20k1"
category: "segmentation"
train_batch_size0: 4 # effective batch size is 16 ( 4 * 4 GPUs)
val_batch_size0: 4
eval_batch_size0: 1
workers: 4
persistent_workers: false
pin_memory: false
image_augmentation:
random_crop:
enable: true
seg_class_max_ratio: 0.75
pad_if_needed: true
mask_fill: 0 # background idx is 0
random_horizontal_flip:
enable: true
resize:
enable: true
size: [512, 512]
interpolation: "bicubic"
random_short_size_resize:
enable: true
interpolation: "bicubic"
short_side_min: 256
short_side_max: 768
max_img_dim: 1024
photo_metric_distort:
enable: true
random_rotate:
enable: true
angle: 10
mask_fill: 0 # background idx is 0
random_gaussian_noise:
enable: true
sampler:
name: "batch_sampler"
bs:
crop_size_width: 512
crop_size_height: 512
loss:
category: "segmentation"
ignore_idx: -1
segmentation:
name: "cross_entropy"
cross_entropy:
aux_weight: 0.4
optim:
name: "sgd"
weight_decay: 1.e-4
no_decay_bn_filter_bias: true
sgd:
momentum: 0.9
scheduler:
name: "cosine"
is_iteration_based: false
max_epochs: 120
cosine:
max_lr: 0.02
min_lr: 0.0002
model:
segmentation:
name: "encoder_decoder"
lr_multiplier: 1
seg_head: "pspnet"
output_stride: 8
use_aux_head: true
activation:
name: "relu"
pspnet:
psp_dropout: 0.1
psp_out_channels: 512
psp_pool_sizes: [ 1, 2, 3, 6 ]
classification:
name: "mobilevit_v2"
mitv2:
width_multiplier: 1.0
attn_norm_layer: "layer_norm_2d"
activation:
name: "swish"
normalization:
name: "sync_batch_norm"
momentum: 0.1
activation:
name: "swish"
inplace: false
layer:
global_pool: "mean"
conv_init: "kaiming_uniform"
linear_init: "normal"
ema:
enable: true
momentum: 0.0005
stats:
val: [ "loss", "iou" ]
train: [ "loss", "grad_norm" ]
checkpoint_metric: "iou"
checkpoint_metric_max: true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant