Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option for TSV output #86

Open
whaleyr opened this issue Jan 29, 2022 · 3 comments
Open

Add option for TSV output #86

whaleyr opened this issue Jan 29, 2022 · 3 comments
Assignees

Comments

@whaleyr
Copy link
Contributor

whaleyr commented Jan 29, 2022

Add TSV output as an option for Phenotyper and Reporter (perhaps NamedAlleleMatcher too?).

  • What do we want in the TSV format?
  • Does this need to anticipate multi-sample runs?
  • How will this work with warnings/messages/caveats?

This was brought up in group discussion and issue #85

@katrinsangkuhl
Copy link
Contributor

katrinsangkuhl commented Feb 2, 2022

Update on batch mode discussion (Meeting notes from Biobank data analyses call 12/13/21). The discussion was about the matcher and phenotype output

  • CSV/TSV output format combining the results from all the separate PharmCAT runs

  • Proposed format: Index by sample with the different genotypes by gene in one file and the phenotypes in a second file; additional log file to include warnings (need further discussion: output after each matcher and phenotyper possible)

  • Could be provided as additional template scripts on PharmCAT GitHub ( - GitHub wiki to document uses of template scripts; - We should not be responsible for maintaining those scripts)

@whaleyr
Copy link
Contributor Author

whaleyr commented Feb 7, 2022

After internal discussion we decided to close this issue about TSV output from the reporter.

The data that comes out of the reporter is quite large and complicated. Showing only the small portion that appears in the first table of the report glosses over a lot of the complexity and documentation that people should know when interpreting the results. We feel it would be a disservice to the user to have an option that discards all that information.

@BinglanLi
Copy link
Contributor

I am reopening this issue after the discussion of reporting a TSV to assist large-scale data analysis. It is not to generate a TSV across all samples of interest as we previously discussed, but to focus on extracting PGx inferences of a single sample.

The purpose is to help calculate PGx frequencies. I think there should be a warning that this TSV output should not be used as a substitute of the report for interpreting a person's PGx testing results or prescribing recommendations.

There should be different tables for calculating different frequencies (genotypes vs phenotypes). And I think we can use base file name for the Sample ID below instead. In addition, the information of present and missing variation in VCF is not listed here because it is helpful for quality check but not so much for PGx frequency estimation.

For genotype frequencies, I am thinking about the following content:

Sample ID Diplotype Index Diplotype Haplotype Index Haplotype Function Warning
S1 Diplotype 1 *2/*3 Haplotype 1 *2 Poor Function Multiple Diplotypes
S1 Diplotype 1 *2/*3 Haplotype 2 *3 Poor Function Multiple Diplotypes
S1 Diplotype 2 *4/*5 Haplotype 1 *4 Normal Function Multiple Diplotypes
S1 Diplotype 2 *4/*5 Haplotype 2 *5 No Function Multiple Diplotypes
S1 Diplotype 3 *6/*7 Haplotype 1 *6 Normal Function Multiple Diplotypes
S1 Diplotype 3 *6/*7 Haplotype 2 *7 No Function Multiple Diplotypes

Note

  • For DPYD, the haplotypes mean the DPYD alleles a person carries.
  • CYP2C9 rs12777823

For phenotype frequencies, I am thinking about the following content:

Sample ID Phenotype Index Phenotype Diplotype Index Diplotype Function Warning
S1 Phenotype 1 Poor Metabolizer Diplotype 1 *2/*3 Poor Function/Poor Function Discrepant Phenotypes
S1 Phenotype 2 Intermediate Metabolizer Diplotype 1 *4/*5 Normal Function/No Function Discrepant Phenotypes
S1 Phenotype 2 Intermediate Metabolizer Diplotype 2 *6/*7 Normal Function/No Function Discrepant Phenotypes

Note:

  • For DPYD, only report the diplotypes that are used to infer the phenotype
  • CYP2C9 rs12777823

@BinglanLi BinglanLi reopened this Jul 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants