如何解决数据集camelyon17 #151

yuu-Wang · 2024-07-25T10:35:52Z

您好，要是想用Camelyon17 ，该怎么编排数据集的结构，DomainBed/domainbed/data/camelyon17/是这样吗，但是它一直显示没找到。Traceback (most recent call last):
File "/root/wangxy/AlignClip/main.py", line 155, in
main(args)
File "/root/wangxy/AlignClip/main.py", line 44, in main
train_iter, val_loader, test_loaders, train_class_names, template = get_dataset(args)
File "/root/wangxy/AlignClip/engine.py", line 59, in get_dataset
converter_domainbed.get_domainbed_datasets(dataset_name=args.data, root=args.root, targets=args.targets,
File "/root/wangxy/AlignClip/converter_domainbed.py", line 21, in get_domainbed_datasets
datasets = vars(dbdatasets)[dataset_name](root, targets, hparams)
File "/root/wangxy/AlignClip/DomainBed/domainbed/datasets.py", line 347, in init
dataset = Camelyon17Dataset(root_dir=root)
File "/root/anaconda3/envs/pytorch_2.0.1/lib/python3.8/site-packages/wilds/datasets/camelyon17_dataset.py", line 64, in init
self._data_dir = self.initialize_data_dir(root_dir, download)
File "/root/anaconda3/envs/pytorch_2.0.1/lib/python3.8/site-packages/wilds/datasets/wilds_dataset.py", line 341, in initialize_data_dir
self.download_dataset(data_dir, download)
File "/root/anaconda3/envs/pytorch_2.0.1/lib/python3.8/site-packages/wilds/datasets/wilds_dataset.py", line 368, in download_dataset
raise FileNotFoundError(
FileNotFoundError: The camelyon17 dataset could not be found in DomainBed/domainbed/data/camelyon17_v1.0. Initialize the dataset with download=True to download the dataset. If you are using the example script, run with --download. This might take some time for large datasets.

piotr-teterwak · 2024-07-25T16:10:09Z

Hi, unfortunately I'm an English speaker, but it looks like you're having issues using Camelyon17 because it is not downloaded?

I woudl run this script, with line 304 uncommented to download the dataset: https://github.com/facebookresearch/DomainBed/blob/main/domainbed/scripts/download.py#L304

yuu-Wang · 2024-07-25T16:18:01Z

Hello, I have already downloaded the dataset through this link, and the path is /root/wangxy/AlignClip/DomainBed/domainbed/data/camelyon17_v1.0/. However, I keep getting an error that says camelyon17 cannot be found. I'm not sure if my file naming is correct, but I successfully ran other datasets like PACS and Officehome. Do I need to handle the camelyon dataset separately?

piotr-teterwak · 2024-07-25T16:39:39Z

Could you post here the command you use to run main.py, and the directory you run it from? To me it looks like args.root is set to DomainBed/domainbed/data/ instead of '/root/wangxy/AlignClip/DomainBed/domainbed/data/'.

yuu-Wang · 2024-07-25T16:49:40Z

main.zip
This is the main file, and these are the parameters I need to run: DomainBed/domainbed/data/ -d WILDSCamelyon --task domain_shift --targets 0 -b 36 --lr 5e-6 --epochs 10 --beta 0.5. This is the location where I placed my dataset.

piotr-teterwak · 2024-07-25T17:56:31Z

Can you run with /root/wangxy/AlignClip/DomainBed/domainbed/data/ -d WILDSCamelyon --task domain_shift --targets 0 -b 36 --lr 5e-6 --epochs 10 --beta 0.5 instead? See the data path in the first parameter.

yuu-Wang · 2024-07-26T06:26:35Z

Hello, it's running now, but there's a problem. The camelyon17_v1.0 dataset contains raw dataset patches, and the patches are classified in the format patient_00X_node_X, which is necessary for it to work. However, I have already divided the dataset into hospital0, hospital1, hospital2, hospital3, hospital4, and now it's giving an error. Could you please explain why this is happening?

piotr-teterwak · 2024-07-26T13:09:40Z

Could you post the stack trace so I can have more information?

yuu-Wang · 2024-07-27T07:31:38Z

Hello, why is the test dataset empty here? I reclassified the camelyon17 dataset and regenerated the metadata.csv file. Do I need to create separate CSV files for test, validation, and training sets?

piotr-teterwak · 2024-07-29T14:05:33Z

Hi @yuu-Wang ,

It's pretty hard to understand what exactly is going on here without more details. In order to help you, I will need a minimal reproducable example including:

How you download the data.
How you generate the new metadata.csv file
A few lines of code, of how you load the data.

However, taking a quick look, I don't think the images need to be split into directories based on their hpspital source.

yuu-Wang · 2024-07-29T14:29:37Z

hi,
1.The dataset I downloaded is from line 304 of download.py.
(

DomainBed/domainbed/scripts/download.py

Line 304 in dad3ca3

# Camelyon17Dataset(root_dir=args.data_dir, download=True)

)
2.Since I noticed that the WILD environment in the file(

DomainBed/domainbed/datasets.py

Line 348 in dad3ca3

class WILDSCamelyon(WILDSDataset):

) requires four domains (hospital0, hospital1, hospital2, hospital3), I used this code(https://github.com/jameszhou-gl/gpt-4v-distribution-shift/blob/ccfcf00851ccd8867de7c6d92591eaedd8a66d0d/data/process_wilds.py#L21) to divide the downloaded dataset into these four domains.
3.I used this
(https://github.com/thuml/CLIPood/blob/bc0d8745e8b0d97b0873bd8ed8589793abd1c1a7/engine.py#L53)
(https://github.com/thuml/CLIPood/blob/bc0d8745e8b0d97b0873bd8ed8589793abd1c1a7/converter_domainbed.py#L18)
to divide the dataset into training, validation, and test sets.

piotr-teterwak · 2024-07-29T15:00:17Z

Steps 2 and 3 are not needed; the code takes care of this internally. Could you re-download with step 1, skip steps 2 and 3, and try again? If this does not work, could you please send a few lines of code of how exactly you are loading the dataset in your training code?

yuu-Wang · 2024-07-29T15:08:57Z

Sure, I downloaded it directly according to step one and then ran main.py
(https://github.com/thuml/CLIPood/blob/bc0d8745e8b0d97b0873bd8ed8589793abd1c1a7/main.py#L104).
Is it only this dataset that exceeds the length?

piotr-teterwak · 2024-07-29T15:16:11Z

I see that you're running main.py from another repository, CLIPood, and not the DomainBed repository. I'm not very familiar with the CLIPOod code. Could you run from an unmodified DomainBed codebase?

yuu-Wang · 2024-07-29T15:20:27Z

Thank you very much for your patience. It's doable.

piotr-teterwak mentioned this issue Jul 29, 2024

如何对Camelyon17的处理 #153

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

如何解决数据集camelyon17 #151

如何解决数据集camelyon17 #151

yuu-Wang commented Jul 25, 2024

piotr-teterwak commented Jul 25, 2024

yuu-Wang commented Jul 25, 2024

piotr-teterwak commented Jul 25, 2024

yuu-Wang commented Jul 25, 2024

piotr-teterwak commented Jul 25, 2024

yuu-Wang commented Jul 26, 2024

piotr-teterwak commented Jul 26, 2024

yuu-Wang commented Jul 27, 2024

piotr-teterwak commented Jul 29, 2024

yuu-Wang commented Jul 29, 2024

piotr-teterwak commented Jul 29, 2024 •

edited

Loading

yuu-Wang commented Jul 29, 2024

piotr-teterwak commented Jul 29, 2024

yuu-Wang commented Jul 29, 2024

如何解决数据集camelyon17 #151

如何解决数据集camelyon17 #151

Comments

yuu-Wang commented Jul 25, 2024

piotr-teterwak commented Jul 25, 2024

yuu-Wang commented Jul 25, 2024

piotr-teterwak commented Jul 25, 2024

yuu-Wang commented Jul 25, 2024

piotr-teterwak commented Jul 25, 2024

yuu-Wang commented Jul 26, 2024

piotr-teterwak commented Jul 26, 2024

yuu-Wang commented Jul 27, 2024

piotr-teterwak commented Jul 29, 2024

yuu-Wang commented Jul 29, 2024

piotr-teterwak commented Jul 29, 2024 • edited Loading

yuu-Wang commented Jul 29, 2024

piotr-teterwak commented Jul 29, 2024

yuu-Wang commented Jul 29, 2024

piotr-teterwak commented Jul 29, 2024 •

edited

Loading