Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multicoco going wrong #3

Open
tlesluyes opened this issue Jan 21, 2020 · 1 comment
Open

multicoco going wrong #3

tlesluyes opened this issue Jan 21, 2020 · 1 comment

Comments

@tlesluyes
Copy link

Hi. There is an issue in the multicoco function where indices for bigger bins are wrongly generated.

In the details: genomic regions are divided into bigger bins (I picked 1000bp windows so I end up with 100,000bp bins). The tricky part is that: https://github.com/mskilab/fragCounter/blob/575af9926e5177a39b45a31ad37048953a680ca4/R/fragCounter.R#L77 defines a new columns (lev1) where bigger bins are referenced with indices. At that point, the first 100 lines of cov.dt (that is chr1:1-100000) have a value of 2 and the next 100 lines (chr1:100001-200000) have a value of 3. The problem is that the first 100 lines of chromosome 2 also have a value of 3, so those are wrong at that stage because they are shared by multiple chromosomes. A few lines later: https://github.com/mskilab/fragCounter/blob/575af9926e5177a39b45a31ad37048953a680ca4/R/fragCounter.R#L97 mean values are computed based on this number. At that point, my first value is fine (mean(cov.dt$reads[1:100])==tmp.cov$reads[1]) but the second value represent the mean between chr1:100001-200000 and chr2:1-100000 (mean(cov.dt$reads[c(101:200,which(cov.dt$seqnames==2)[1:100])])==tmp.cov$reads[2]), which makes sense because it does the job but the indices are not correct. The third value is the mean between chr1:200001-300000, chr2:100001-200000 and chr3:1-100000, an so forth for next indices. Also, tmp.cov contains way less lines than expected as it should be total/100 but only is 2473 (which nearly correspond to chr1 size since a single bin is 100,000bp). Since multiple chromosomes are picked for a single index, chromosomes names and IRanges are also wrong in tmp.cov.

This error can be corrected by unsing the commented line: https://github.com/mskilab/fragCounter/blob/575af9926e5177a39b45a31ad37048953a680ca4/R/fragCounter.R#L73 (instead of the one indicated above). This way, lev1 values are different and are not identical between chromosomes. The mean values are then correctly computed and tmp.cov contains all the chromosomes (not only chr1).

Can you check, confirm and correct that?

@tlesluyes tlesluyes changed the title muticoco going wrong multicoco going wrong Jan 21, 2020
@imielinski
Copy link
Collaborator

imielinski commented Jan 21, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants