Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"This repository is over its data quota" #162

Open
schuemie opened this issue Dec 13, 2022 · 13 comments
Open

"This repository is over its data quota" #162

schuemie opened this issue Dec 13, 2022 · 13 comments

Comments

@schuemie
Copy link
Member

When I try to clone this repo, I get

This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.

@leeevans : Maybe we can remove some of the older shiny apps (when we didn't use databases to store the results) from this repo, while keeping them on the Shiny server? For example, SystematicEvidence takes up a lot of space, and doesn't need to modified anymore.

@leeevans
Copy link
Contributor

@schuemie OK I'll work on removing the older shiny apps from this repo, starting with SystematicEvidence.

Is there are already a list of the shiny apps we can delete? If not what's the best way to generate that list - post the question on the OHDSI forums perhaps?

@schuemie
Copy link
Member Author

It might make sense to focus on the apps that take up the most space. I ran the script below, but I'm not sure my clone is correct since I got the error message above, so maybe you can rerun?

folder <- "../ShinyDeploy"

subFolders <- list.files(folder, include.dirs = TRUE)
subFolders <- subFolders[dir.exists(file.path(folder, subFolders))]

computeSize <- function(subFolder) {
  sum(file.info(list.files(file.path(folder, subFolder), recursive = TRUE, full.names = TRUE))$size)
}
sizes <- plyr::laply(subFolders, computeSize, .progress = "text")
data <- data.frame(
  subFolder = subFolders,
  mbs =  sizes / 1024 ^ 2
)
data <- data[order(-data$mbs), ]
head(data, 10)
# subFolder       mbs
# 62       EhdenRaDmardsEstimation 1211.1382
# 166           SystematicEvidence  625.8941
# 76           IbdCharacterization  521.5577
# 157                    Sglt2iDka  482.2656
# 168      TicagrelorVsClopidogrel  400.0139
# 87         MskaiEstimationPrelim  324.2289
# 17                       corazon  303.5255
# 1               AceBeta9Outcomes  254.4665
# 145         RanitidineCancerRisk  234.2958
# 119 OutcomeMisclassificationEval  193.2686

If this is correct, then EhdenRaDmardsEstimation is taking up a whopping 1,2 GB!

@schuemie
Copy link
Member Author

@leeevans: could you remove the biggest apps from this repo (but keep them in the Shiny server)? I still can't clone this repo (which means I also can't push anything)

@leeevans
Copy link
Contributor

@schuemie ,

The Shiny server deploy script just does a git pull so I don’t know how we could do that.

As a workaround are you able to do a ‘partial clone’ or ‘shallow clone’ of the repo so you can make changes locally and push them back to the repo?

https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/

My suggestion would be that we inform the OHDSI community about the repo space issue and ask the shiny developers to remove any shiny apps/data they no longer need from the repo. Maybe share that on the OHDSI Forums/community call/HADES call?

@schuemie
Copy link
Member Author

I'm probably missing something, but couldn't we just remove the big apps from version control?

How I would do that (maybe not optimal): on the Shiny server, temporarily move the apps to some folder outside the clone. Commit these changes (git will think the files have been deleted) and push. Move the apps back to the folder. Now they are no longer version controlled. We could add them to .gitignore to ensure they are not overwritten or pushed back to the repo.

@schuemie
Copy link
Member Author

Alternatively, I remember we have another Shiny server (e.g. running http://shiny.ohdsi.org:2020/AhasHfBkleAmputation/ ). Maybe we could move the large apps there? (And redirect from their current URLs like this)

@schuemie
Copy link
Member Author

(Note that both a blobless and a treeless clone run into the same error. The document seems to suggest a shallow clone is a bad idea)

@leeevans
Copy link
Contributor

I've moved the data from the EhdenRaDmardsEstimation Shiny app out of the ShinyDeploy GitHub repo but kept the code in GitHub by making a one line change to the global.R file (see below).

I moved the application data on the server to a separate shiny server level 'data' directory, in a subdirectory named after the shiny app.

Here is the R code change I made to reference the data under the shiny server data directory:

shinySettings <- list(dataFolder = "../../../data/EhdenRaDmardsEstimation", blind = FALSE)

I will do the same for the other Shiny apps with large data files (>100MB) that reference their data files in the same way.

In future, all OHDSI Shiny apps with large data files (>100MB), must access their data from the Shiny PostgreSQL database to avoid this repository data quota issue.

@schuemie
Copy link
Member Author

schuemie commented Mar 1, 2023

Thanks @leeevans ! Unfortunately, I still am unable to clone the repo:

 git clone --filter=tree:0 https://github.com/OHDSI/ShinyDeploy.git
Cloning into 'ShinyDeploy'...
remote: Enumerating objects: 1457, done.
remote: Counting objects: 100% (34/34), done.
remote: Compressing objects: 100% (34/34), done.
remote: Total 1457 (delta 0), reused 25 (delta 0), pack-reused 1423
Receiving objects: 100% (1457/1457), 345.37 KiB | 11.51 MiB/s, done.
Resolving deltas: 100% (9/9), done.
remote: Enumerating objects: 2502, done.
remote: Counting objects: 100% (250/250), done.
remote: Compressing objects: 100% (240/240), done.
remote: Total 2502 (delta 60), reused 14 (delta 9), pack-reused 2252
Receiving objects: 100% (2502/2502), 459.80 KiB | 13.93 MiB/s, done.
Resolving deltas: 100% (92/92), done.
remote: Enumerating objects: 11086, done.
remote: Counting objects: 100% (606/606), done.
remote: Compressing objects: 100% (519/519), done.
remote: Total 11086 (delta 330), reused 116 (delta 85), pack-reused 10480
Receiving objects: 100% (11086/11086), 6.01 GiB | 55.99 MiB/s, done.
Resolving deltas: 100% (636/636), done.
Updating files: 100% (16539/16539), done.
Downloading TicagrelorVsClopidogrel/data/covariate_balance_t874_c929_OptumEHR.rds (120 MB)
Error downloading object: TicagrelorVsClopidogrel/data/covariate_balance_t874_c929_OptumEHR.rds (d37116b): Smudge error:
Error downloading TicagrelorVsClopidogrel/data/covariate_balance_t874_c929_OptumEHR.rds 
(d37116b49e6ce8d5116190fe8f4479bdb6c2822f2a89e14cded108b970c638cc): batch response: This repository is over its data quota. 
Account responsible for LFS bandwidth should purchase more data packs to restore access.

Errors logged to 'C:\Users\admin_mschuemi\Documents\git\ShinyDeploy\.git\lfs\logs\20230301T021803.4692937.log'.
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: TicagrelorVsClopidogrel/data/covariate_balance_t874_c929_OptumEHR.rds: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

@msuchard
Copy link
Member

deleting files from the HEAD of a repo does not remove them (and all the space they entail). there are some nasty ways to permanently remove files, but yuck ... this repo is doomed.

@leeevans
Copy link
Contributor

leeevans commented Jun 1, 2023

@msuchard the GitHub LFS data quota is reset monthly.

As it is now June 1st, could you try this again?

@ablack3
Copy link

ablack3 commented Jun 27, 2023

We need a better solution for data.ohdsi.org in the long term. (Apps break and don't get fixed quickly) https://forums.ohdsi.org/t/multiple-shiny-apps-fail-to-load/18852/3

Also study results should be in some kind of large data store like a database or flat file storage system. Git/github isn't good for storing data.

Linking discussion here: https://forums.ohdsi.org/t/organization-of-shinydeploy/6223

@schuemie
Copy link
Member Author

Agreed. Continuing the discussion on the forums

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants