[💡SUG] Automatic dataset and dataloader save path #864

damicoedoardo · 2021-07-02T15:49:48Z

For my understanding there are two different level of dataset preprocessing,
Level 1 filtering and preprocessing operation: you can save a dataset after this step calling dataset.save()
Level 2 dataset train/(val)/test split: you can save the different dataloaders with save_split_dataloaders()

In the first case you can specify the dir where you want to save the dataset
In the second case you can not specify the dir where you want to save, instead the dataloaders are saved under config['checkpoint_dir']

Is it possible to unify the behaviour of the two saving processes?

I would propose to use the data_path config parameter to store the three different level of preprocessed data that can be stored under three different dir:

f'{data_path}/atomic': all the atomic files which are now under dataset
f'{data_path}/preprocessed': All the dataset which have been preprocessed (level 1 mentioned above)
e) f'{data_path}/{model_name}/splitted': All the dataloaders for an algorithm or category of algorithms (level 2 mentioned above)

I see the proposed structure more intuitive and easy to use.

Thanks

The text was updated successfully, but these errors were encountered:

2017pxy · 2021-07-05T01:18:06Z

@damicoedoardo Thanks for your advice, we will consider it!

damicoedoardo added the enhancement New feature or request label Jul 2, 2021

damicoedoardo closed this as completed Jul 2, 2021

damicoedoardo reopened this Jul 2, 2021

2017pxy self-assigned this Jul 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[💡SUG] Automatic dataset and dataloader save path #864

[💡SUG] Automatic dataset and dataloader save path #864

damicoedoardo commented Jul 2, 2021

2017pxy commented Jul 5, 2021

[💡SUG] Automatic dataset and dataloader save path #864

[💡SUG] Automatic dataset and dataloader save path #864

Comments

damicoedoardo commented Jul 2, 2021

2017pxy commented Jul 5, 2021