-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
function(s) to identify and remove individuals based on GTscore conScore #30
Comments
I like the idea of removing individuals based on their |
@awbarclay , I'd considered having |
After talking this over with @krshedd, we think it would be great if contaminated fish could be given "0" genotypes before they are imported. That way, the fish will be removed using |
I like the idea of contaminated fish being no called prior to entering LOKI.
…On Thu, Sep 7, 2023 at 11:25 AM Andy Barclay ***@***.***> wrote:
After talking this over with @krshedd <https://github.com/krshedd>, we
think it would be great if contaminated fish could be given "0" genotypes
before they are imported. That way, the fish will be removed using
GCLr::remove_ind_miss_loci(). The lab staff would have to "no call" the
fish before importing the geotypes, which will require functions similar to
the ones that @krshedd <https://github.com/krshedd> suggested above to
determine a threshold and give contaminated fish "0" scores for all loci to
make their lives easier. Lab staff are already "no calling" fish for chip
projects, so it wouldn't be much different. @csjalbert
<https://github.com/csjalbert> is this something that could be
implemented in the future?
—
Reply to this email directly, view it on GitHub
<#30 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC3JXOBJH7UMCCWR6X3GNQ3XZINRZANCNFSM6AAAAAA4ODHLZU>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
My 2 cents, if a contaminated fish is very likely to already be < 80% successful and will be removed due to that rule, what is the benefit of loading it all up in Loki with 0/0 calls? Additionally, if it is all 0/0 in Loki, the assumption is going to be that it failed and that maybe it just needs to be rerun vs. crappy success rate fish that therefore know is crappy. |
@hahoyt there are instances where fish can have a high |
Oh, I see. So we can't assume a contaminated fish (based on conScore) will have a < 80% success rate. Also, I think that uSATs are scored and there is a clearly contaminated fish, the team no calls it for all markers. So this wouldn't be any different than that. Cool. |
Exactly, same as how we do SNPs on chips and uSATs, get rid of fish with junk/contaminated genotypes before they go into LOKI. |
All SNPs on chips are not 0/0'd out for a fish with contamination. The will have more 0/0's because of the fluffiness but the genotypers don't select the fish for all markers and 0/0 it out. Or at least, we never have. |
Right, sorry for adding confusion. The point is those 0/0s for SNPs on chips (sounds like a tasty snack?) likely push the fish <80% genotyping success, so they drop out in downstream analyses. That is not the case with GTscore, hence my desire to do something with |
Roger that. :-) |
I agree that it makes sense to deal with these contaminated samples. This seems like something that could be implemented in the GTscore pipeline. It could be as simple as a script that runs post-GTscore --that's how the genotype rate plots work. That said, a few questions to make sure I'm understanding correctly:
|
Thanks @csjalbert for the clarifying questions and forgive my lack of a detailed understanding of the order of operations for different pipeline steps.
Does that make sense? Anything major I'm missing? |
@krshedd this makes sense to me. I don't see a way around human review on a project-by-project basis, so it makes sense to set this up on V:, where lab staff have easy access. The only additional comment, is that I will not split LOKI files on the server. We can take care of the split on V: after the contaminated fish have been removed. I suppose this would be a 3rd function, that we may not even need, once we test the new importer.
I'll work on these functions soon and let you know what I can come up with. |
Just a note that apparently 60mb files no longer work with our importer. |
We currently do not have any functions to remove contaminated individual based on GTscore
conScore
. I propose that we create 2 functions:find_ind_con
- reads inconScore
from GTscore singleSNP_sampleSummary.txt file(s), plots density distribution ofconScore
or heterozygosity vs.conScore
similar to GTscore SampleSummaryPlots.pdf output, and outputs modified version of singleSNP_sampleSummary.txt; user inspects plot(s) to determineconScore
cutoff.remove_ind_con
- takes the output fromfind_ind_con
in concert with aconScore
cutoff to remove individuals above a certain threshold, this threshold may be specific to a given GT-seq panel.The idea would be for these 2 new functions to become part of our standard QA process along with
remove_ind_miss_loci
,dupcheck_within_silly
,remove_dups
,find_alt_species
, andremove_alt_species
. Previously, using TaqMan, contaminated individuals would likely be no-called for enough SNPs that they'd drop out withremove_ind_miss_loci
, but that is not necessarily the case with GT-seq.Open to other ideas, but @csjalbert and I can work on these when we analyze C015 SEAK coho baseline.
The text was updated successfully, but these errors were encountered: