Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User data quota cron job #177

Open
8 tasks
asmacdo opened this issue Jul 23, 2024 · 6 comments · May be fixed by #199
Open
8 tasks

User data quota cron job #177

asmacdo opened this issue Jul 23, 2024 · 6 comments · May be fixed by #199
Milestone

Comments

@asmacdo
Copy link
Member

asmacdo commented Jul 23, 2024

IMO the choice of what files are safe to remove should come from the science side, its hard to guess what is safe to remove.

Heres an initial list:

Lets also provide each user with a scratch directory that cleans up files more than 30 days old.

The Plan (by @yarikoptic and @asmacdo )

  • develop a simple efficient script which would dump listing of content on EFS into a json or jsonlines file (might be worth compressing it right away with built-in to python compressor?)
    • just run it interactively on an EC2 instance to get idea on the time/size requirement
    • add to that script information on date when it ran and how long it took to execute
  • attach that script to be ran by AWS lambda and upload produced listing somewhere
    • unversioned S3 bucket might be the best choice
  • develop a script which consumes that dump and

Sample scripts from chatgpt to collect and analyze stats are available at https://chatgpt.com/share/6732630c-4e54-8002-bf09-41df8175b6d0 .

@asmacdo
Copy link
Member Author

asmacdo commented Jul 23, 2024

  • dendro tmp: dendro_compute_resource/jobs/9f26d4e2.6a1f740e/tmp/working/recording.dat: 0.106723

@kabilar
Copy link
Member

kabilar commented Jul 24, 2024

Thanks Austin. Once we determine our policy, let's also update the DANDI Handbook to add this information.

@asmacdo
Copy link
Member Author

asmacdo commented Jul 24, 2024

After chatting with @yarikoptic, we probably should not be too ambitious with a cron job.

All above should be run only prior to a migration (or kicked off manually if we want to reclaim some space).

There may still be a good use for a cron job-- cleaning up files older than X in a provided scratch dir.

@yarikoptic
Copy link
Member

I think we should indeed not cleanup anything automatically, especially since we are not tracking "access time" but only "modification time" on files - we can't make judgement if anything is still in use or not.

What cron job should do is per user:

  • check when user last used/logged in to hub
    • since we do not keep that info AFAIK, let's use last dandiarchive login information
    • that datetime would serve us as "cache-recency-identifier"
  • if for a given cache-recency-identifier we do not have up-to-date "statistics" for the user
    • run script to du entire home/{user} and /shared/{user}
    • get some specific dus:
      • find files larger than 1GB and mtime > 30 (?) days -- get total size and count
      • find _pycache_ and nwb-cache folders and pip cache and mtime > 30? days -- total sizes and list of them
    • Given any of below conditions are met, email user to notify about usage (list large outdated files etc) and notify about possible actions to be taken (to be decided), and annotate record that for that recency identified email was sent out
      • Conditions:
        • total du exceeds some threshold (e.g. 100G)
        • total outdated caches size exceeds some threshold (e.g. 1G)
        • prior notification was sent more than a week ago
      • Actions:
        • ask to login and clean up
  • after a sweep, given a list of users and their cache-recency-identifiers, per each user
    • if did have email sent, and cache-recency-identifier > 60 days (so didn't login/cleanup)
      • email with the list of large files and caches to be cleaned up automagically in 10 days if not logged in/cleanup
    • if did have email sent, if cache-recency-identifier > 70 days (so didn't login/cleanup even after notification)
      • automatically cleanup and send email with a list of files which were removed
      • remove recency identifier so we get back into checking that user.

Something like that?

please prepare design doc @asmacdo with above as a PR so we could iron out the behavior and then add it to ToS etc.

@asmacdo
Copy link
Member Author

asmacdo commented Jul 24, 2024

Awesome, thanks @yarikoptic

@asmacdo
Copy link
Member Author

asmacdo commented Jul 24, 2024

check when user last used/logged in to hub

This information is kept, but it's tracked by jupyterhub itself. I'll look into connecting to the REST API directly
https://jupyterhub.readthedocs.io/en/stable/reference/rest-api.html#operation/get-users

@kabilar kabilar added this to the phase2 milestone Aug 19, 2024
@asmacdo asmacdo changed the title Cleanup cron job User data quota cron job Sep 17, 2024
@asmacdo asmacdo linked a pull request Sep 25, 2024 that will close this issue
17 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants