storage options #44

lkeegan · 2022-11-15T13:53:39Z

Full results data will be up to ~96 x 300mb ~ 30gb/week

So up to ~120GB/month

Options for storing this:

(only for initial testing) just store on VM
- no extra costs but only ~25gb space available
pay for a heicloud storage volume & attach to the VM
- This would just be another hard drive from the point of view of the VM, so no problem transferring this to another server
- cost: ~20eur/TB/month
connect the VM to SDS@hd
- likely the best option but specific to heidelberg
- some admin required to set up an SDS account
- cost ~2.5eur/TB/month
store the larger files externally e.g. AWS S3
- web server would then just store links to these files
- cost ~30eur/TB/month

EdGreen21 · 2022-11-15T21:17:04Z

AWS S3 has best potential, but might be tricky to setup permissions... never tried this myself. But as I was thinking about S3, I wondered whether from a design perspective might it not be better to have the 'file server' as an actual server - again through a separate Docker image? Examples online:

Plugging in to existing solutions might have a number of advantages over building for the Heidelberg ecosystem - but I could be completely wrong here.

First objective would be a working solution, if Heidelberg is simpler then we go for that and if there's dev time left at the end of the project we move to S3. Working solution = users = likely that there'll be followup funding, especially as HMHLSA has deep pockets and needs to show some early wins to justify their 30m funding

lkeegan · 2022-11-16T09:13:00Z

So from the coding perspective there's nothing heidelberg-specific about using a heicloud storage volume - it's just more disk space that you can mount to your docker image, so there shouldn't be any changes required other than updating the path to where the data should be stored when this is hosted on another server. Based on this I don't really see an advantage in adding a separate internal file server layer between our backend (which authenticates the user and returns a file) and the file system.

With S3 I see 2 straightforward ways to use it:

as a public file server
- all files on S3 are public but with names like dasgfuyqwt789ep894w39d5f.zip
- users need to login to get a link to their files
- direct download from S3 to user
- but no security other than some basic obfuscation of urls - anyone can download any file if they known/guess the name
as a private file server
- all files on S3 are only accesible to our backend
- it stores files on S3 instead of on the filesystem
- same security as currently: users can only access their own files
- but more bandwidth/costs: backend gets file from S3, user downloads file from backend

I think the simplest solution to get things working now would be to add a heicloud storage volume.

EdGreen21 · 2022-11-17T21:00:38Z

agree to simplest solution, leaving space for a followup proposal

We received files from a nanopore sequencing commercial service today (expensive) and they use google drive(!)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage options #44

storage options #44

lkeegan commented Nov 15, 2022 •

edited

Loading

EdGreen21 commented Nov 15, 2022

lkeegan commented Nov 16, 2022

EdGreen21 commented Nov 17, 2022

storage options #44

storage options #44

Comments

lkeegan commented Nov 15, 2022 • edited Loading

EdGreen21 commented Nov 15, 2022

lkeegan commented Nov 16, 2022

EdGreen21 commented Nov 17, 2022

lkeegan commented Nov 15, 2022 •

edited

Loading