-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multicoco going wrong #3
Comments
Thanks – we will look into this
From: tlesluyes <[email protected]>
Reply-To: mskilab/fragCounter <[email protected]>
Date: Tuesday, January 21, 2020 at 9:34 AM
To: mskilab/fragCounter <[email protected]>
Cc: Subscribed <[email protected]>
Subject: [mskilab/fragCounter] muticoco going wrong (#3)
Hi. There is an issue in the multicoco function where indices for bigger bins are wrongly generated.
In the details: genomic regions are divided into bigger bins (I picked 1000bp windows so I end up with 100,000bp bins). The tricky part is that: https://github.com/mskilab/fragCounter/blob/575af9926e5177a39b45a31ad37048953a680ca4/R/fragCounter.R#L77<https://urldefense.com/v3/__https:/github.com/mskilab/fragCounter/blob/575af9926e5177a39b45a31ad37048953a680ca4/R/fragCounter.R*L77__;Iw!!C6sPl7C9qQ!HltWktHjF7GaKdCnz2diq1wcTv3rbTnfSgVzC39fF6L4WNp9p02TrtGK3mvrpvR_vkY$> defines a new columns (lev1) where bigger bins are referenced with indices. At that point, the first 100 lines of cov.dt (that is chr1:1-100000) have a value of 2 and the next 100 lines (chr1:100001-200000) have a value of 3. The problem is that the first 100 lines of chromosome 2 also have a value of 3, so those are wrong at that stage because they are shared by multiple chromosomes. A few lines later: https://github.com/mskilab/fragCounter/blob/575af9926e5177a39b45a31ad37048953a680ca4/R/fragCounter.R#L97<https://urldefense.com/v3/__https:/github.com/mskilab/fragCounter/blob/575af9926e5177a39b45a31ad37048953a680ca4/R/fragCounter.R*L97__;Iw!!C6sPl7C9qQ!HltWktHjF7GaKdCnz2diq1wcTv3rbTnfSgVzC39fF6L4WNp9p02TrtGK3mvrK9TNThM$> mean values are computed based on this number. At that point, my first value is fine (mean(cov.dt$reads[1:100])==tmp.cov$reads[1]) but the second value represent the mean between chr1:100001-200000 and chr2:1-100000 (mean(cov.dt$reads[c(101:200,which(cov.dt$seqnames==2)[1:100])])==tmp.cov$reads[2]), which makes sense because it does the job but the indices are not correct. The third value is the mean between chr1:200001-300000, chr2:100001-200000 and chr3:1-100000, an so forth for next indices. Also, tmp.cov contains way less lines than expected as it should be total/100 but only is 2473 (which nearly correspond to chr1 size since a single bin is 100,000bp). Since multiple chromosomes are picked for a single index, chromosomes names and IRanges are also wrong in tmp.cov.
This error can be corrected by unsing the commented line: https://github.com/mskilab/fragCounter/blob/575af9926e5177a39b45a31ad37048953a680ca4/R/fragCounter.R#L73<https://urldefense.com/v3/__https:/github.com/mskilab/fragCounter/blob/575af9926e5177a39b45a31ad37048953a680ca4/R/fragCounter.R*L73__;Iw!!C6sPl7C9qQ!HltWktHjF7GaKdCnz2diq1wcTv3rbTnfSgVzC39fF6L4WNp9p02TrtGK3mvrDKsIwGU$> (instead of the one indicated above). This way, lev1 values are different and are not identical between chromosomes. The mean values are then correctly computed and tmp.cov contains all the chromosomes (not only chr1).
Can you check, confirm and correct that?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https:/github.com/mskilab/fragCounter/issues/3?email_source=notifications&email_token=ABFUFY6JUGBRKOIQP6IGMQTQ64BWNA5CNFSM4KJUOFLKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IHU2MWQ__;!!C6sPl7C9qQ!HltWktHjF7GaKdCnz2diq1wcTv3rbTnfSgVzC39fF6L4WNp9p02TrtGK3mvrrpNK9bw$>, or unsubscribe<https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/ABFUFY2KHGIHZB6GKIBTR53Q64BWNANCNFSM4KJUOFLA__;!!C6sPl7C9qQ!HltWktHjF7GaKdCnz2diq1wcTv3rbTnfSgVzC39fF6L4WNp9p02TrtGK3mvrY9KNdek$>.
…________________________________
This message is for the recipient’s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi. There is an issue in the
multicoco
function where indices for bigger bins are wrongly generated.In the details: genomic regions are divided into bigger bins (I picked 1000bp windows so I end up with 100,000bp bins). The tricky part is that: https://github.com/mskilab/fragCounter/blob/575af9926e5177a39b45a31ad37048953a680ca4/R/fragCounter.R#L77 defines a new columns (
lev1
) where bigger bins are referenced with indices. At that point, the first 100 lines ofcov.dt
(that is chr1:1-100000) have a value of 2 and the next 100 lines (chr1:100001-200000) have a value of 3. The problem is that the first 100 lines of chromosome 2 also have a value of 3, so those are wrong at that stage because they are shared by multiple chromosomes. A few lines later: https://github.com/mskilab/fragCounter/blob/575af9926e5177a39b45a31ad37048953a680ca4/R/fragCounter.R#L97 mean values are computed based on this number. At that point, my first value is fine (mean(cov.dt$reads[1:100])==tmp.cov$reads[1]
) but the second value represent the mean between chr1:100001-200000 and chr2:1-100000 (mean(cov.dt$reads[c(101:200,which(cov.dt$seqnames==2)[1:100])])==tmp.cov$reads[2]
), which makes sense because it does the job but the indices are not correct. The third value is the mean between chr1:200001-300000, chr2:100001-200000 and chr3:1-100000, an so forth for next indices. Also,tmp.cov
contains way less lines than expected as it should be total/100 but only is 2473 (which nearly correspond to chr1 size since a single bin is 100,000bp). Since multiple chromosomes are picked for a single index, chromosomes names and IRanges are also wrong intmp.cov
.This error can be corrected by unsing the commented line: https://github.com/mskilab/fragCounter/blob/575af9926e5177a39b45a31ad37048953a680ca4/R/fragCounter.R#L73 (instead of the one indicated above). This way,
lev1
values are different and are not identical between chromosomes. The mean values are then correctly computed andtmp.cov
contains all the chromosomes (not only chr1).Can you check, confirm and correct that?
The text was updated successfully, but these errors were encountered: