You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For my understanding there are two different level of dataset preprocessing,
Level 1 filtering and preprocessing operation: you can save a dataset after this step calling dataset.save()
Level 2 dataset train/(val)/test split: you can save the different dataloaders with save_split_dataloaders()
In the first case you can specify the dir where you want to save the dataset
In the second case you can not specify the dir where you want to save, instead the dataloaders are saved under config['checkpoint_dir']
Is it possible to unify the behaviour of the two saving processes?
I would propose to use the data_path config parameter to store the three different level of preprocessed data that can be stored under three different dir:
f'{data_path}/atomic': all the atomic files which are now under dataset
f'{data_path}/preprocessed': All the dataset which have been preprocessed (level 1 mentioned above)
e) f'{data_path}/{model_name}/splitted': All the dataloaders for an algorithm or category of algorithms (level 2 mentioned above)
I see the proposed structure more intuitive and easy to use.
Thanks
The text was updated successfully, but these errors were encountered:
For my understanding there are two different level of dataset preprocessing,
Level 1 filtering and preprocessing operation: you can save a dataset after this step calling
dataset.save()
Level 2 dataset train/(val)/test split: you can save the different dataloaders with
save_split_dataloaders()
In the first case you can specify the dir where you want to save the dataset
In the second case you can not specify the dir where you want to save, instead the dataloaders are saved under
config['checkpoint_dir']
Is it possible to unify the behaviour of the two saving processes?
I would propose to use the
data_path
config parameter to store the three different level of preprocessed data that can be stored under three different dir:f'{data_path}/atomic'
: all the atomic files which are now under datasetf'{data_path}/preprocessed'
: All the dataset which have been preprocessed (level 1 mentioned above)e)
f'{data_path}/{model_name}/splitted'
: All the dataloaders for an algorithm or category of algorithms (level 2 mentioned above)I see the proposed structure more intuitive and easy to use.
Thanks
The text was updated successfully, but these errors were encountered: