Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC + mappability corrections #4

Open
tlesluyes opened this issue Jan 23, 2020 · 2 comments
Open

GC + mappability corrections #4

tlesluyes opened this issue Jan 23, 2020 · 2 comments

Comments

@tlesluyes
Copy link

Hi. I’m not 100% sure whether this is an actual issue but I have some questions regarding the gc and mappability corrections.

So, those factors are computed early but applied later, during covariate corrections. The first covariate is GC, where you: 1) subsample bins to get 50,000 of them (that is x2s), 2) correct those with pre-computed factors, 3) try to fit a loess regression on those, 4) try to fit a loess regression on the entire dataset (that is x2) if the previous one fails and 5) apply correction factors for GC. Then the same process is performed for the mappability. Is this correct?
Does that imply that only subsampled bins are actually corrected for pre-computed factors at: https://github.com/mskilab/fragCounter/blob/575af9926e5177a39b45a31ad37048953a680ca4/R/fragCounter.R#L206
Because this seems to only be performed on the subset (x2s) and not on the whole dataset (x2). So, if the regression fails at that stage, then the other one that is performed in the entire dataset uses non-corrected read values because x2 has not been adjusted. If things go really wrong, then the two fits will fail fot the two covariates and coverages will never be adjusted with these factors. But what if things go right, are coverage values actually corrected twice? This correction is inside the loop that iterates for each covariate so it seems like it’s applied everytime. Am I reading this correctly and is this wanted? Would that make sense to adjust read coverages for the entire dataset with these pre-conputed factors first (a single time) and then correct for covariates (no matter if you use a single or two corrections and if the regression fails with subsampled regions)?

@imielinski
Copy link
Collaborator

imielinski commented Jan 23, 2020 via email

@tlesluyes
Copy link
Author

No problem, happy to help! I wanted to have a clear understanding of what fragCounter and Dryclean actually perform as I have a strong interest for cleaning signal from CNAs. I'm glad my review will somewhat improve those tools. :)

PS: I really enjoy my current position so I'm afraid you cannot hire me (yet?) ;)

BW

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants