Speed up downloading of Cubico data to make it more accessible #276

charlie9578 · 2024-02-27T19:42:55Z

It takes a long time to download all the Cubico data, and also corresponding reanalysis data, and the file sizes are large! This makes the dataset hard for people to use.

I'm thinking it would be good to provide the required files to run examples (and some additional signals to explore) to make the data more accessible. Initially I'm thinking just to use the "to_csv" function, upload the data to Zenodo, use the download tool, then load in the project data, which should signficantly speed things up and make the data more accessible.

The project_Cubico script should still contain the full functionality though for reference, or even converted into a notebook, as others will want to see how they could use their own data.

Any thoughts?

RHammond2 · 2024-03-05T00:18:44Z

@charlie9578 if the data were able to be readily available on Zenodo, then I think that would speed up the example quite a bit. I think we've been hesitant to add more data to the repository itself. That said, if 20 year profiles were made available from the get-go that would speed up much of the work of the example.

A couple of suggestions on how this could be set up to be a little quicker:

Set the start_date and end_date for the reanalysis periods
Add the data to Zenodo, then it's a one step download process, though not much faster
The current suite of data is ~7GB, so it could be helpful to remove any intermediary files after the download process (mostly reanalysis data), and reassess how much extra data it would ultimately add to the repository. I think the cleanup steps are still important to keep available as well, so it might be good to have a more formal chat about this (and likely the other open issues too).

RHammond2 mentioned this issue Mar 5, 2024

Static Yaw Error analysis example run with the Cubico data #277

Open

RHammond2 added enhancement documentation labels Mar 5, 2024

RHammond2 added the on hold Something that, for a variety of reasons, might not be addressed in the short term. label Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up downloading of Cubico data to make it more accessible #276

Speed up downloading of Cubico data to make it more accessible #276

charlie9578 commented Feb 27, 2024

RHammond2 commented Mar 5, 2024

Speed up downloading of Cubico data to make it more accessible #276

Speed up downloading of Cubico data to make it more accessible #276

Comments

charlie9578 commented Feb 27, 2024

RHammond2 commented Mar 5, 2024