Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dataset loading, and other minor fixes #21

Open
wants to merge 3 commits into
base: new-api
Choose a base branch
from

Conversation

christopher-beckham
Copy link
Collaborator

@christopher-beckham christopher-beckham commented Apr 21, 2024

Hi,

DiskResource has been completely changed to now support downloading from a HuggingFace datasets repository. (Just to keep things simple I completely removed the Google Cloud logic, but if you think it should stay then we can maybe just merge the two together.)

As it stands, it's been hardcoded to download from this repo but it can be changed to something else by overriding DB_HF_DATA (see disk_resource.py). It would be good if you can test this branch out with a sanitised design_bench_data folder to make sure that everything downloads correctly.

Sadly, most datasets are missing their pretrained oracle weights :(. This means that most tasks just take forever to import since it will try train an oracle instead. These are the only pretrained weights I have on hand:

./ant_morphology/ant_morphology/gaussian_process.zip
./ant_morphology/ant_morphology/random_forest.zip
./dkitty_morphology/dkitty_morphology/gaussian_process.zip
./dkitty_morphology/dkitty_morphology/random_forest.zip
./hopper_controller/hopper_controller/random_forest.zip
./hopper_controller/hopper_controller/gaussian_process.zip
./superconductor/superconductor/random_forest.zip
./superconductor/superconductor/gaussian_process.zip
./tf_bind_8-SIX6_REF_R1/tf_bind_8/gaussian_process.zip
./tf_bind_8-SIX6_REF_R1/tf_bind_8/random_forest.zip

If you are able to fill in some of these gaps that would be good.

Other changes:

  • The warning Setting 'max_len_sentences_pair' is now deprecated. This value is automatically set up. Setting 'max_len_single_sentence' is now deprecated. This value is automatically set up. has now been suppressed since it spams the screen when you import.
  • Fix involving error with np.loads.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant