Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage options #44

Open
lkeegan opened this issue Nov 15, 2022 · 3 comments
Open

storage options #44

lkeegan opened this issue Nov 15, 2022 · 3 comments

Comments

@lkeegan
Copy link
Member

lkeegan commented Nov 15, 2022

Full results data will be up to ~96 x 300mb ~ 30gb/week

So up to ~120GB/month

Options for storing this:

  • (only for initial testing) just store on VM
    • no extra costs but only ~25gb space available
  • pay for a heicloud storage volume & attach to the VM
    • This would just be another hard drive from the point of view of the VM, so no problem transferring this to another server
    • cost: ~20eur/TB/month
  • connect the VM to SDS@hd
    • likely the best option but specific to heidelberg
    • some admin required to set up an SDS account
    • cost ~2.5eur/TB/month
  • store the larger files externally e.g. AWS S3
    • web server would then just store links to these files
    • cost ~30eur/TB/month
@EdGreen21
Copy link

AWS S3 has best potential, but might be tricky to setup permissions... never tried this myself. But as I was thinking about S3, I wondered whether from a design perspective might it not be better to have the 'file server' as an actual server - again through a separate Docker image? Examples online:

Plugging in to existing solutions might have a number of advantages over building for the Heidelberg ecosystem - but I could be completely wrong here.

First objective would be a working solution, if Heidelberg is simpler then we go for that and if there's dev time left at the end of the project we move to S3. Working solution = users = likely that there'll be followup funding, especially as HMHLSA has deep pockets and needs to show some early wins to justify their 30m funding

@lkeegan
Copy link
Member Author

lkeegan commented Nov 16, 2022

So from the coding perspective there's nothing heidelberg-specific about using a heicloud storage volume - it's just more disk space that you can mount to your docker image, so there shouldn't be any changes required other than updating the path to where the data should be stored when this is hosted on another server. Based on this I don't really see an advantage in adding a separate internal file server layer between our backend (which authenticates the user and returns a file) and the file system.

With S3 I see 2 straightforward ways to use it:

  • as a public file server
    • all files on S3 are public but with names like dasgfuyqwt789ep894w39d5f.zip
    • users need to login to get a link to their files
    • direct download from S3 to user
    • but no security other than some basic obfuscation of urls - anyone can download any file if they known/guess the name
  • as a private file server
    • all files on S3 are only accesible to our backend
    • it stores files on S3 instead of on the filesystem
    • same security as currently: users can only access their own files
    • but more bandwidth/costs: backend gets file from S3, user downloads file from backend

I think the simplest solution to get things working now would be to add a heicloud storage volume.

@EdGreen21
Copy link

agree to simplest solution, leaving space for a followup proposal

We received files from a nanopore sequencing commercial service today (expensive) and they use google drive(!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants