Inconsistent json file from BaiduYun and Huggingface #1

cheliu-computation · 2024-01-05T12:59:23Z

Thanks for your impressive work and the effort on the dataset construction!

I have download the dataset from BaiduYun and the json file from huggingface.
The json from baidu shows 2,893 cases for Test, 32,891 cases for train. But the last case from train set has some error about unfinished strings.
However, the json from huggingface has 7,772 cases for test, 31,086 for train.
The total cases from the paper is 39,026, but the number from BaiduYun is '35,784' and '38,858' from huggingface.
This error makes reproducing work quite hardly.

So, if we want to reproduce this work based on 20 zip file from BaiduYun, which split file should we use?
Also, I found the subfolder1.zip and subfolder2.zip have some corrupted issue, even I fixed it with 'zip -F' command, I am not sure all files are restored. If authors could fix this issue, it will be really grateful.

After unzipping, it has multiple folders but not match the path from the json file.
I guess the number from json: ''/processed_images/2534/1/CT_0'', the '2534' is the correct path for the image?

The format from two different sources are also different:
the json from huggingface: "image_path": [ "/remote-home/share/data200/172.16.11.200/zhengqiaoyu//processed_file/npys/32940/1/MRI_0.nii.gz", "/remote-home/share/data200/172.16.11.200/zhengqiaoyu//processed_file/npys/32940/1/MRI_1.nii.gz",
the json from baidu: "image_path": [ "/processed_images/1/1/CT_0", "/processed_images/1/1/CT_1", "/processed_images/1/1/CT_2", "/processed_images/1/1/CT_3" ]

Seems the huggingface json point to the original medical image, but the baidu json is for jpg only?

Looking forward to your response!

The text was updated successfully, but these errors were encountered:

cheliu-computation · 2024-01-05T13:02:26Z

Additionally, in the huggingface json, the file provide 'icd10s' code, but the json from baiduyun does not have this term

qiaoyu-zheng · 2024-01-08T05:54:13Z

I'm sorry the json file from baiduyun has some error, please follow the json file from huggingface. Because of the uploading problem, there may be some subtle matching issue in the total number, we will check it again later. However, the exisiting version is already enough for reproducing. If you meet the matching error in the dataloader, you can just simply remove these cases, this will not lead to a significant effect on the reproduction results. Again, we will fix this error later

SZUHvern · 2024-09-28T01:51:24Z

Hello,

Thank you very much for your work！
I wanted to check if the relevant data and JSON corrections are now available.
I eagerly waiting for your response. Many thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent json file from BaiduYun and Huggingface #1

Inconsistent json file from BaiduYun and Huggingface #1

cheliu-computation commented Jan 5, 2024

cheliu-computation commented Jan 5, 2024

qiaoyu-zheng commented Jan 8, 2024

SZUHvern commented Sep 28, 2024

Inconsistent json file from BaiduYun and Huggingface #1

Inconsistent json file from BaiduYun and Huggingface #1

Comments

cheliu-computation commented Jan 5, 2024

cheliu-computation commented Jan 5, 2024

qiaoyu-zheng commented Jan 8, 2024

SZUHvern commented Sep 28, 2024