-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LEAP has less than 10% free disk space #3230
Comments
I'll write that email. I also need to check in with LEAP re: their cloud spend (currently at 50% of their allocation but also this is 6 months into their funding period so they are right on track) |
I hadn't noticed previously that Yuvi had already started that communication: https://2i2c.freshdesk.com/a/tickets/995 |
Merged my email and Yuvi's into one ticket: https://2i2c.freshdesk.com/a/tickets/995 (@yuvi -- apologizes for butting into this. I hadn't realized you were already on top of it. I read 'contact community rep' in the github description, noticed that it didn't have anyone assigned, and assumed that meant I should be engaging with LEAP in someway.) I note (mostly for my own reference) that this issue of LEAP diskspace is directly connected with the still open ticket https://2i2c.freshdesk.com/a/tickets/387 that references: |
For home directories storage are per-users quotas a feature we can enable? Looking at https://z2jh.jupyter.org/en/stable/jupyterhub/customizing/user-storage.html I see options like but perhaps the we use Google Filestore prevents this quota feature from being an option. I agree that implementing per-user quotas on home directories would be ideal for LEAP (and possibly for many other communities). Since that is not in place I assume it do to a constraint that I do not understand. Related to the space issue, Julius is requesting better tooling for measuring, reporting, and notifying user space. I can think of a few ways to approach this in a more traditional HPC setting but I am less certain on the options with Kubernetes PVCs, StorageClasses, and PVs. |
Sorry didn't fully see this, @jmunroe. Yes, using Google Filestore is the reason we can't use the default from z2jh. It uses dynamic disk provisioning, which gets one disk per user. Quoting the problems with that here:
In a HPC system or similar, they'd probably run their own NFS server and use something like XFS with their project quota for per-diretory quotas. We'd need to run our own NFS server to do this, and I think currently the team doesn't have enough capacity with filesystem engineering for us to do that. I did make a new dashboard with an actual table that can be used to look at users, much much better than the graph that currently exists. While this helps, I do agree that more automation needs to be made available. |
Thanks @yuvipanda for the explanation. I agree that running our own NFS server is not something we can support in the near term. So maybe not ideal, but it sounds like "someone" (Julius? me?) should write a tool that scrapes this table (or more likely grabs the equivalent data with an api call), then sends an automated email or Slack message to each user with a warning advising them that they are way over their quota. A cron job or Zapier / IFTT like tools could be used. Since it involves emailing users directly I wouldn't want 2i2c to actually run that service but perhaps it is a general enough issue across other communities that documenting how to set up some sort of automated "email my users daily if over quota" tool. @jbusecke -- is this kind of thing that would be useful to you? Did you want me to attempt to mock up something that you could then use to make your life easier? This may even give you a way to have a weekly report emailed automatically to you flagging "unusual" activity. Again, I don't think 2i2c would be able to run that automation for you but I am happy to work on it because I think other Hub Champions face similar issues. |
The other piece relevant from my response on freshdesk is probably:
But otherwise I think your message here matches what I had suggested we try to do on freshdesk. |
@jmunroe what is the status of this now? Can we close this? |
These notifications are helpful, but as I elaborated via email, I think ultimately not enough to ensure smooth operation from many users. But from my end this issue can be closed now. |
Just got an alert saying LEAP has less than 10% free disk space. Work with the community rep to figure out if we should increase the size or they ask some folks to cleanup.
The text was updated successfully, but these errors were encountered: