Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileNotFoundError: [Errno 2] No such file or directory: 'datasets/coco_13/trainval/00020596.jpg' #42

Open
wowangle97 opened this issue Dec 10, 2024 · 6 comments

Comments

@wowangle97
Copy link

The problem shown in the title occurs after I run the code for preparing teacher embedding part. I use coco dataset, and have established folders for data preparation according to annotations and images, is there any problem? Thanks for help!

[2024-12-10 19:32:17 vit_h](save_embedding.py 56): INFO number of params: 637026048
[2024-12-10 19:32:17 vit_h](utils.py 60): INFO ==============> Resuming form weights/sam_vit_h_4b8939.pth....................
[2024-12-10 19:32:18 vit_h](utils.py 75): INFO
[2024-12-10 19:32:19 vit_h](save_embedding.py 69): INFO Start saving embeddings
Traceback (most recent call last):
File "training/save_embedding.py", line 238, in
main(config)
File "training/save_embedding.py", line 79, in main
save_embeddings_one_epoch(config, model, data_loader_train, epoch)
File "/home/work/miniforge3/envs/edgesam/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "training/save_embedding.py", line 99, in save_embeddings_one_epoch
for idx, ((samples, _), (keys, seeds)) in enumerate(data_loader):
File "/home/work/miniforge3/envs/edgesam/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 634, in next
data = self._next_data()
File "/home/work/miniforge3/envs/edgesam/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data
return self._process_data(data)
File "/home/work/miniforge3/envs/edgesam/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
data.reraise()
File "/home/work/miniforge3/envs/edgesam/lib/python3.8/site-packages/torch/_utils.py", line 644, in reraise
raise exception
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/work/miniforge3/envs/edgesam/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/home/work/miniforge3/envs/edgesam/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/work/miniforge3/envs/edgesam/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/work/EdgeSAM/training/data/augmentation/dataset_wrapper.py", line 31, in getitem
return self.__getitem_for_write(index)
File "/home/work/EdgeSAM/training/data/augmentation/dataset_wrapper.py", line 39, in __getitem_for_write
item = self.dataset[index]
File "/home/work/EdgeSAM/training/data/coco_dataset.py", line 98, in getitem
img = Image.open(img_path).convert('RGB')
File "/home/work/miniforge3/envs/edgesam/lib/python3.8/site-packages/PIL/Image.py", line 3431, in open
fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: '/home/work/EdgeSAM/datasets/coco_13/trainval/00020596.jpg'

@wowangle97
Copy link
Author

In addition, I wonder why I need the folder datasets/coco_13/trainval/, the data preparation stage did not say that I need to create a folder named trainval

@gold123fish
Copy link

Hello, I am also using a Coco format dataset and have not encountered the issue of not being able to find the graph in your dataset. Could you please check if your dataset is formatted incorrectly as datasets coco - (annotations/train2017/val2017)? Or maybe the DATASET has not been modified in YAML: coco,Your weight file also appears to have loaded incorrectly, and you need to use repvit instead of sam

But the errors in my place are the same as yours, ValueError: Caught ValueError in DataLoader worker process 0.

And do I have any further questions about distribution later on? Perhaps you have encountered it? I don't know if it's a version issue,Thanks for help!

[2024-12-11 05:38:30 rep_vit_m1_fuse_sa_distill](train.py 186): INFO Start training
Traceback (most recent call last):
File "/home/user/EdgeSAM/training/train.py", line 693, in
main(args, config)
File "/home/user/EdgeSAM/training/train.py", line 195, in main
train_one_epoch_distill_using_saved_embeddings(
File "/home/user/EdgeSAM/training/train.py", line 241, in train_one_epoch_distill_using_saved_embeddings
for idx, ((samples, annos), (saved_embeddings, seeds)) in enumerate(data_loader):
File "/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 634, in next
data = self._next_data()
File "/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1346, in _next_data
return self._process_data(data)
File "/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1372, in _process_data
data.reraise()
File "/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/_utils.py", line 644, in reraise
raise exception
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/user/EdgeSAM/training/data/augmentation/dataset_wrapper.py", line 32, in getitem
return self.__getitem_for_read(index)
File "/home/user/EdgeSAM/training/data/augmentation/dataset_wrapper.py", line 46, in __getitem_for_read
with AugRandomContext(seed=seed):
File "/home/user/EdgeSAM/training/data/augmentation/aug_random.py", line 14, in enter
RNG = Generator(PCG64(seed=self.seed))
File "_pcg64.pyx", line 123, in numpy.random._pcg64.PCG64.init
File "bit_generator.pyx", line 535, in numpy.random.bit_generator.BitGenerator.init
File "bit_generator.pyx", line 315, in numpy.random.bit_generator.SeedSequence.init
File "bit_generator.pyx", line 389, in numpy.random.bit_generator.SeedSequence.get_assembled_entropy
File "bit_generator.pyx", line 140, in numpy.random.bit_generator._coerce_to_uint32_array
File "bit_generator.pyx", line 70, in numpy.random.bit_generator._int_to_uint32_array
ValueError: expected non-negative integer

Batch 0:
Samples shape before stack: [torch.Size([3, 256, 256])]
Saved embeddings shape before stack: [(1048576,)]
Samples shape after stack: torch.Size([1, 3, 256, 256])
Saved embeddings shape after reshape: torch.Size([1, 1048576])
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 4042320 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 4042321) of binary: /home/user/anaconda3/envs/edgesam/bin/python

@wowangle97
Copy link
Author

你好,我也在使用 Coco 格式的数据集,没有遇到你数据集中找不到图的问题。请问你的数据集格式是否错误,如数据集 coco - (annotations/train2017/val2017)?或者 YAML 中没有修改 DATASET:coco,你的权重文件似乎也加载不正确,需要使用 repvit 而不是 sam

但我这里报的错误和你的一样,ValueError: Caught ValueError in DataLoader worker process 0。

还有我后面还有什么关于发行版的问题吗?也许你也遇到过?不知道是不是版本问题,谢谢帮助!

[2024-12-11 05:38:30 rep_vit_m1_f​​use_sa_distill](train.py 186): INFO 开始训练 回溯(最近一次调用最后一次): 文件“/home/user/EdgeSAM/training/train.py”,第 693 行,在 main(args,config) 文件“/home/user/EdgeSAM/training/train.py”,第 195 行,在 main train_one_epoch_distill_using_saved_embeddings( 文件“/home/user/EdgeSAM/training/train.py”,第 241 行,在 train_one_epoch_distill_using_saved_embeddings 中 for idx, ((samples, annos), (saved_embeddings, seeds)) 在 enumerate(data_loader) 中: 文件“/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/utils/data/dataloader.py”, 第 634 行,在下一个 数据 = self._next_data() 文件“/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/utils/data/dataloader.py”, 第 1346 行,在 _next_data 中 返回 self._process_data(data) 文件“/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/utils/data/dataloader.py”, 第 1372 行,在 _process_data 中 data.reraise() 文件“/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/_utils.py”, 第 644 行,在 reraise 中 引发异常 ValueError:在 DataLoader 工作进程 0 中捕获 ValueError。 原始回溯(最近一次调用最后一次): 文件“/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py”,第 308 行,在 _worker_loop 中 数据 = fetcher.fetch(index) 文件“/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py​​”,第 51 行,在 fetch 数据中 = [self.dataset[idx] for idx in perhaps_batched_index] 文件“/home/user/anaconda3/envs/edgesam/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py​​”,第 51 行,在数据中 = [self.dataset[idx] for idx in perhaps_batched_index] 文件“/home/user/EdgeSAM/training/data/augmentation/dataset_wrapper.py”,第 32 行,在getitem中 返回 self.__getitem_for_read(index) 文件“/home/user/EdgeSAM/training/data/augmentation/dataset_wrapper.py”,第 46 行,在 __getitem_for_read 中, 使用 AugRandomContext(seed=seed): 文件“/home/user/EdgeSAM/training/data/augmentation/aug_random.py”,第 14 行,在输入 RNG = Generator(PCG64(seed=self.seed)) 文件“_pcg64.pyx”,第 123 行,在 numpy.random._pcg64.PCG64 中。init文件 “bit_generator.pyx”,第 535 行,在 numpy.random.bit_generator.BitGenerator 中。init 文件“bit_generator.pyx”,第 315 行,在 numpy.random.bit_generator.SeedSequence 中。init文件 “bit_generator.pyx”,第 389 行,在 numpy.random 中。bit_generator.SeedSequence.get_assembled_entropy 文件“bit_generator.pyx”,第 140 行,在 numpy.random.bit_generator._coerce_to_uint32_array 文件“bit_generator.pyx”,第 70 行,在 numpy.random.bit_generator._int_to_uint32_array ValueError:预期非负整数

批次 0: 堆叠前的样本形状:[torch.Size([3, 256, 256])] 堆叠前的已保存嵌入形状:[(1048576,)] 堆叠后的样本形状:torch.Size([1, 3, 256, 256]) 重塑后保存的嵌入形状:torch.Size([1, 1048576]) 警告:torch.distributed.elastic.multiprocessing.api:发送进程 4042320 关闭信号 SIGTERM 错误:torch.distributed.elastic.multiprocessing.api:失败(退出代码:1)local_rank:1(pid:4042321)二进制文件:/home/user/anaconda3/envs/edgesam/bin/python

Finally, I modified line 97 of /training/data/coco_dataset. It was changed to train/. Currently, it can be trained normally, but I encountered ZeroDivisionError: division by zero during the final evaluation

@wowangle97
Copy link
Author

wowangle97 commented Dec 11, 2024

@gold123fish I don't have the same problem as you. I'm sorry. In addition, may I ask why I used the wrong weight file? Didn't the author say in the teacher Embed to download the weights/sam_vit_h_4b8939.pth? Why do you need to use repvit instead of sam, thank you

@gold123fish
Copy link

I noticed that I had previously modified the 98 line you mentioned. But it still shows that there is a problem with the distribution, and I still can't train. Regarding the weight file, I thought you had ended Teacher Embeddings and entered (Phase 1) Encoder Only Knowledge Distillation, which requires the use of repvit. I made a mistake

@wowangle97
Copy link
Author

我注意到我之前已经修改了你提到的 98 行。但仍然显示分布有问题,仍然无法训练。关于权重文件,我以为你已经结束了 Teacher Embeddings 并进入了(第一阶段)Encoder Only Knowledge Distillation,这需要使用 repvit。我犯了一个错误

You should try not to use distributed training, first on a GPU to see if it can run, first check whether it is an environment problem or cuda problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants