Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up downloading of Cubico data to make it more accessible #276

Open
charlie9578 opened this issue Feb 27, 2024 · 1 comment
Open

Speed up downloading of Cubico data to make it more accessible #276

charlie9578 opened this issue Feb 27, 2024 · 1 comment
Labels
documentation enhancement on hold Something that, for a variety of reasons, might not be addressed in the short term.

Comments

@charlie9578
Copy link
Contributor

It takes a long time to download all the Cubico data, and also corresponding reanalysis data, and the file sizes are large! This makes the dataset hard for people to use.

I'm thinking it would be good to provide the required files to run examples (and some additional signals to explore) to make the data more accessible. Initially I'm thinking just to use the "to_csv" function, upload the data to Zenodo, use the download tool, then load in the project data, which should signficantly speed things up and make the data more accessible.

The project_Cubico script should still contain the full functionality though for reference, or even converted into a notebook, as others will want to see how they could use their own data.

Any thoughts?

@RHammond2
Copy link
Collaborator

@charlie9578 if the data were able to be readily available on Zenodo, then I think that would speed up the example quite a bit. I think we've been hesitant to add more data to the repository itself. That said, if 20 year profiles were made available from the get-go that would speed up much of the work of the example.

A couple of suggestions on how this could be set up to be a little quicker:

  • Set the start_date and end_date for the reanalysis periods
  • Add the data to Zenodo, then it's a one step download process, though not much faster
  • The current suite of data is ~7GB, so it could be helpful to remove any intermediary files after the download process (mostly reanalysis data), and reassess how much extra data it would ultimately add to the repository. I think the cleanup steps are still important to keep available as well, so it might be good to have a more formal chat about this (and likely the other open issues too).

@RHammond2 RHammond2 added the on hold Something that, for a variety of reasons, might not be addressed in the short term. label Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation enhancement on hold Something that, for a variety of reasons, might not be addressed in the short term.
Projects
None yet
Development

No branches or pull requests

2 participants