-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using GATK CNV with TITAN #38
Comments
Hi @fpbarthel Yes, I was working with the team at the Broad to include files that would be compatible with the input format for TITAN. You should be able to use 2 of the output files ( TitanCNA/scripts/R_scripts/titanCNA.R Lines 35 to 38 in eb288ea
The current snakemake workflow that is in this GitHub repo does not support this but I may consider adding it in the future. Hope this helps. |
Hey @gavinha thanks for your answer! I'm actually not at the Broad so the pipelines I made do not output those exact files, and I cannot find a Broad WDL file on their github to output those files either. The output of Eg.
And
Would those be appropriate? For tumor samples, the het sites take into account hets in the matched normal, but the segments themselves are not necessarily filtered and germline CNV events are included from what I understand. |
Hi @fpbarthel Sorry I hadn't realized that original link was to Broad-specific workflows. Here are the instructions for using GATK4-ACNV that should be more useful: You will want to run all 3 steps and the final output will include TITAN input files. Let me know if this works. If so, I will consider including some of these steps in the TITAN snakemake. However, I want to make sure that GC content is handled (like it is in the current snakemake workflow in this repo). Best, |
Gavin; I'm happy to custom convert the right outputs of GATK into the copy number and allele frequency inputs but have a couple of questions about what TITAN expects:
Thanks again for helping support this. |
Hi @chapmanb and @gavinha, you may be interested in the (currently experimental/unsupported) GATK CNV post-processing WDL available here (realized I never linked to this in my original post), which I guess would take care of step 3 in the guide @gavinha shared(link). Not exactly sure yet how to convert it to something that can be used with TITAN, although I got it working for ABSOLUTE as the authors intended. |
Thanks for this. The Broad CNV workflow WDLs are an incredible resource. They've really been helping me learn the correct workflow and how all the tools fit together. That post-processing WDL has some a pretty extensive python bit modeling AllelicCapSeg output to fit into what ABSOLUTE needs. From my understanding, on the TitanCNA side most of these steps happen within TITAN, so I hope we can have a simpler conversion process. I'll definitely defer to Gavin though on what exactly we need. That's great you've got ABSOLUTE working and I'd have a lot of interest in seeing how it compares to TitanCNA in your hands once we have the process working. Thanks again. |
Looks like you're right @chapmanb . I guess I wrongly assumed some of that GATK post-processing WDL (in particular germline filtering) were important pre-processing for Titan. In GATK4 beta releases there seems to have been a tool called I did some digging digging into the beta GATK 4 source code, which has code for Titan conversion, and from that it looks like the output of
Either way, despite the updates to GATK4 in the current release compared to the beta, the Titan conversion tools here should be helpful in figuring out exactly what to do. |
I figured I'd test it out using both files as input and just see what happens:
Interestingly, Titan runs without issue for both inputs. However, the output from using the @gavinha your further thoughts on this much appreciated. Also regarding the hets file, I suppose there are several files that could potentially be used here. I'm using the
It seems like these genotypes somehow incorporated the normal counts, which I guess is similar to what you are doing with countPysam.py? One thing that is clearly missing is the quality score, but even in the GATK4 beta Titan conversion script they simply seem to leave this blank (see here). |
Thanks much for sharing this and for all the detective work resurrecting the old conversion scripts. You're exactly right that these are essentially file conversions of existing files and I'd implemented similar approaches to what you did as part of testing this out: The raw denoised log2 values (1000bp bins) versus segmentation bins comparison makes good sense as well since segmentation is designed to smooth out the noisiness. I've been feeding the denoised 1000bp inputs to TitanCNA in my initial tests with GATK4 inputs, wanting to give it the extra information present in the smaller bins but again will defer to Gavin's expertise. Glad we're approaching this the same way and making progress. Thanks again. |
Happy to close this issue if no further action required, are you OK @chapmanb? I feel confident using TITAN with the denoised segments from GATK4 as input, and the hets from The two plots in my post above are both In another post, Gavin writes:
To me it suggests that we want the CNA.pdf file to show a lot of data points, as in the top plot. Therefore, we should use the If one uses the |
Hi @fpbarthel and @chapmanb Thank you both for looking into this.
This is what I would recommend as well because it is the input that TitanCNA expects.
Well, it's not necessarily incorrect. Using Best, |
Gavin and Floris; https://github.com/bcbio/bcbio_validations/tree/master/TCGA-heterogeneity Thanks again for all the help and discussion, it's great to have this all in place in bcbio with some confidence bounds around the outputs. Much appreciated. |
Hi to both, I just wanted to briefly follow up with a concern with using the GATK workflow directly. To calculate hets, Looking at some of my results, I am noticing TITAN has called a far lower number of CDKN2A homozygous deletions than I am expecting (15% as opposed to 60% out of 600+ samples) and it seems that there are very few hets reported in this region in several samples. This has led me to believe that perhaps setting a minimum (or maximum?) depth threshold for the tumor sample is not appropriate? Especially when we assume the same experimental strategies were used to generate the data for the tumor and matching normals, we should be able to expect equal coverage of this region in both tumor and control UNLESS there is a deletion in the tumor. @gavinha I would be very interested to hear your thoughts on this.
Floris P.s. I've also raised my concern to the GATK team here P.p.s. related to a similar issue which reported the same for amplifications #50 |
Hi @gavinha wonder if you have any thoughts on this? I've noticed that the Floris |
Hi!
I'm wondering how to use the new GATK4 copy number segmentation as input to TITAN. I guess it should be possible (see eg. https://software.broadinstitute.org/gatk/documentation/article?id=11088) but I'm not sure how. I was able to use a recent pipeline to convert its output to a format that mirrors the Allelic CapSeg output, which could be used for eg. ABSOLUTE. But I'd like to use TITAN.
Any advice?
Thanks!
The text was updated successfully, but these errors were encountered: