Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train error #39

Open
jiyuwangbupt opened this issue Nov 11, 2024 · 1 comment
Open

train error #39

jiyuwangbupt opened this issue Nov 11, 2024 · 1 comment

Comments

@jiyuwangbupt
Copy link

Traceback (most recent call last):
File "main.py", line 132, in
launch(
File "/home/jyw/anaconda3/envs/ViTMatte/lib/python3.8/site-packages/detectron2/engine/launch.py", line 87, in launch
main_func(*args)
File "main.py", line 126, in main
do_train(args, cfg)
File "main.py", line 78, in do_train
train_loader = instantiate(cfg.dataloader.train)
File "/home/jyw/anaconda3/envs/ViTMatte/lib/python3.8/site-packages/detectron2/config/instantiate.py", line 67, in instantiate
cfg = {k: instantiate(v) for k, v in cfg.items()}
File "/home/jyw/anaconda3/envs/ViTMatte/lib/python3.8/site-packages/detectron2/config/instantiate.py", line 67, in
cfg = {k: instantiate(v) for k, v in cfg.items()}
File "/home/jyw/anaconda3/envs/ViTMatte/lib/python3.8/site-packages/detectron2/config/instantiate.py", line 83, in instantiate
return cls(**cfg)
File "/home/jyw/anaconda3/envs/ViTMatte/lib/python3.8/site-packages/torch/utils/data/distributed.py", line 68, in init
num_replicas = dist.get_world_size()
File "/home/jyw/anaconda3/envs/ViTMatte/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1181, in get_world_size
return _get_group_size(group)
File "/home/jyw/anaconda3/envs/ViTMatte/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 566, in _get_group_size
default_pg = _get_default_group()
File "/home/jyw/anaconda3/envs/ViTMatte/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 697, in _get_default_group
raise RuntimeError(
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

@jiyuwangbupt
Copy link
Author

(ViTMatte) jyw@rtx6000:~/vitmatte/ViTMatte$ python main.py --config-file configs/ViTMatte_S_100ep.py --num-gpus 1
args Namespace(config_file='configs/ViTMatte_S_100ep.py', dist_url='tcp://127.0.0.1:50166', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)
num_gpus_per_machine 1
num_machines 1
使用单进程模式
[11/11 19:26:30 detectron2]: Rank of current process: 0. World size: 1
[11/11 19:26:30 detectron2]: Environment info:


sys.platform linux
Python 3.8.8 (default, Apr 13 2021, 19:58:26) [GCC 7.3.0]
numpy 1.24.4
detectron2 0.6 @/home/jyw/anaconda3/envs/ViTMatte/lib/python3.8/site-packages/detectron2
Compiler GCC 11.4
CUDA compiler CUDA 11.7
detectron2 arch flags 8.6
DETECTRON2_ENV_MODULE
PyTorch 2.0.0+cu117 @/home/jyw/anaconda3/envs/ViTMatte/lib/python3.8/site-packages/torch
PyTorch debug build False
torch._C._GLIBCXX_USE_CXX11_ABI False
GPU available Yes
GPU 0 NVIDIA RTX 6000 Ada Generation (arch=8.9)
Driver version 535.183.01
CUDA_HOME /usr
Pillow 10.4.0
torchvision 0.15.1+cu117 @/home/jyw/anaconda3/envs/ViTMatte/lib/python3.8/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore 0.1.5.post20221221
iopath 0.1.9
cv2 4.5.3


PyTorch built with:

  • GCC 9.3
  • C++ Version: 201703
  • Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • LAPACK is enabled (usually provided by MKL)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 11.7
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  • CuDNN 8.5
  • Magma 2.6.1
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

[11/11 19:26:30 detectron2]: Command line arguments: Namespace(config_file='configs/ViTMatte_S_100ep.py', dist_url='tcp://127.0.0.1:50166', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)
[11/11 19:26:30 detectron2]: Contents of args.config_file=configs/ViTMatte_S_100ep.py:
from .common.train import train
from .common.model import model
from .common.optimizer import optimizer
from .common.scheduler import lr_multiplier
from .common.dataloader import dataloader

train.max_iter = int(43100 / 16 / 2 * 100)
train.checkpointer.period = int(43100 / 16 / 2 * 10)

optimizer.lr=5e-4
lr_multiplier.scheduler.values=[1.0, 0.1, 0.05]
lr_multiplier.scheduler.milestones=[int(43100 / 16 / 2 * 30), int(43100 / 16 / 2 * 90)]
lr_multiplier.scheduler.num_updates = train.max_iter
lr_multiplier.warmup_length = 250 / train.max_iter

train.init_checkpoint = './pretrained/dino_vit_s_fna.pth'
train.output_dir = './output_of_train/ViTMatte_S_100ep'

dataloader.train.batch_size=16
dataloader.train.num_workers=2

WARNING [11/11 19:26:33 d2.config.lazy]: The config contains objects that cannot serialize to a valid yaml. ./output_of_train/ViTMatte_S_100ep/config.yaml is human-readable but cannot be loaded.
WARNING [11/11 19:26:33 d2.config.lazy]: Config is saved using cloudpickle at ./output_of_train/ViTMatte_S_100ep/config.yaml.pkl.
[11/11 19:26:33 detectron2]: Full config saved to ./output_of_train/ViTMatte_S_100ep/config.yaml
[11/11 19:26:33 d2.utils.env]: Using a generated random seed 33626290
[11/11 19:26:33 detectron2]: Model:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant