- download the dataset from PhysioNet/Computing in Cardiology Challenge 2012. The dataset contains three folder for three hospitals. You can also download the csv file containing all 12k patients from here(
p12.csv
).
- download the dataset from PhysioNet/Computing in Cardiology Challenge 2019. The dataset contains two folders for two hospitals. You can also download the csv file containing all ~43k patients from here(
df_A.csv
anddf_B.csv
).
- You need to request access to the dataset from PhysioNet. Once you downloaded the csv files, you can use this repo to extract ~50k patients into a
.csv
fuke.
Put the csv files in the data/raw
folder or another folder. Then, add the path_raw
to the configs/data/{DATASET_NAME}.yaml
file.
- Please run:
This will create 16 simulated datasets with different number of features and lambdas. The raw and processed datasets will be saved in the
python gen_sim.py --n-vars 16 32 64 128 --lambdas 0.2 0.5 1 2 python gen_sim.py --n-vars 16 --lambdas 0.5
data/raw
anddata/processed
folders, respectively. Create yaml file for the simulated data in theconfigs/data/{DATASET_NAME}.yaml
.
- Now, check Prepare_Datasets.ipynb to see how to prepare the raw datasets. The prepared datasets for
P12
andP19
can be downloaded from here.