Fix dataset loading, and other minor fixes #21

christopher-beckham · 2024-04-21T15:51:41Z

Hi,

DiskResource has been completely changed to now support downloading from a HuggingFace datasets repository. (Just to keep things simple I completely removed the Google Cloud logic, but if you think it should stay then we can maybe just merge the two together.)

As it stands, it's been hardcoded to download from this repo but it can be changed to something else by overriding DB_HF_DATA (see disk_resource.py). It would be good if you can test this branch out with a sanitised design_bench_data folder to make sure that everything downloads correctly.

Sadly, most datasets are missing their pretrained oracle weights :(. This means that most tasks just take forever to import since it will try train an oracle instead. These are the only pretrained weights I have on hand:

./ant_morphology/ant_morphology/gaussian_process.zip
./ant_morphology/ant_morphology/random_forest.zip
./dkitty_morphology/dkitty_morphology/gaussian_process.zip
./dkitty_morphology/dkitty_morphology/random_forest.zip
./hopper_controller/hopper_controller/random_forest.zip
./hopper_controller/hopper_controller/gaussian_process.zip
./superconductor/superconductor/random_forest.zip
./superconductor/superconductor/gaussian_process.zip
./tf_bind_8-SIX6_REF_R1/tf_bind_8/gaussian_process.zip
./tf_bind_8-SIX6_REF_R1/tf_bind_8/random_forest.zip

If you are able to fill in some of these gaps that would be good.

Other changes:

The warning Setting 'max_len_sentences_pair' is now deprecated. This value is automatically set up. Setting 'max_len_single_sentence' is now deprecated. This value is automatically set up. has now been suppressed since it spams the screen when you import.
Fix involving error with np.loads.

Thanks.

…ng in morgan fingerprint

christopher-beckham added 3 commits April 19, 2024 16:45

replace np.loads with pkl.loads

92ece3c

DiskResource is now based on HF datasets

4182bed

Complete migration to HF-style disk resource, suppress annoying warni…

34eba51

…ng in morgan fingerprint

christopher-beckham requested a review from brandontrabucco April 21, 2024 15:51

christopher-beckham mentioned this pull request May 17, 2024

Bugs in the conda environment #25

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dataset loading, and other minor fixes #21

Fix dataset loading, and other minor fixes #21

christopher-beckham commented Apr 21, 2024 •

edited

Loading

Fix dataset loading, and other minor fixes #21

Are you sure you want to change the base?

Fix dataset loading, and other minor fixes #21

Conversation

christopher-beckham commented Apr 21, 2024 • edited Loading

christopher-beckham commented Apr 21, 2024 •

edited

Loading