Skip to content

Commit

Permalink
Merge pull request #265 from leexgh/add-note-column
Browse files Browse the repository at this point in the history
Add '-n' note column in maf and error message into error report
  • Loading branch information
leexgh authored Sep 25, 2023
2 parents 32a6856 + e622c16 commit da4edc4
Show file tree
Hide file tree
Showing 39 changed files with 792 additions and 756 deletions.
8 changes: 2 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,7 @@ To use it, build the project using maven and run it like so:
--output-filename <OUTPUT DESTINATION> \
--isoform-override <mskcc or uniprot>

You can choose to replace the gene symbols in the new maf by the gene symbols
found by Genome Nexus by supplying the `-r` optional parameter. To output error
reporting to a file, supply the `-e` option a location for the file to be
saved. By running the jar without any arguments or by providing the optional
parameter `-h` you can view the full usage statement.
To output error reporting to a file, supply the `-e` option a location for the file to be saved. By running the jar without any arguments or by providing the optional parameter `-h` you can view the full usage statement.

## Annotate data with Docker
Genome Nexus Annotation Pipeline is available on Docker: https://hub.docker.com/r/genomenexus/gn-annotation-pipeline.
Expand Down Expand Up @@ -71,7 +67,7 @@ Make sure to adjust the file paths according to your specific requirements. Once
| `-t` | `--output-format` | extended, minimal or a file path which includes output format (FORMAT EXAMPLE: Chromosome,Hugo_Symbol,Entrez_Gene_Id,Center,NCBI_Build)|
| `-i` | `--isoform-override` | Isoform Overrides. Options: mskcc or uniprot|
| `-e` | `--error-report-location` | Error report filename (including path)|
| `-r` | `--replace-symbol-entrez` | Replace gene symbols and entrez id with what is provided by annotator"|
| `-r` | `--replace-symbol-entrez` | Replace gene symbols and entrez id with what is provided by annotator, this is enabled by default|
| `-p` | `--post-interval-size` | Number of records to make POST requests to Genome Nexus with at a time |
| `-s` | `--strip-matching-bases` | Strip matching allele bases. Options: first, all, none. For example: AAC/AAT, strip-off first: AC/AT, strip-off all: C/T, strip-off none: AAC/AAT |
| `-a` | `--add-original-genomic-location` | Add original genomic location data columns into the output, name columns with prefix 'IGNORE_Genome_Nexus_Original_'). This would be useful if saving a reference of original input is needed and won't be changed in any condition|
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ public class AnnotationPipeline {
private static final Logger LOG = LoggerFactory.getLogger(AnnotationPipeline.class);

private static void annotateJob(String[] args, String filename, String outputFilename, String outputFormat, String isoformOverride,
String errorReportLocation, boolean replace, String postIntervalSize, String stripMatchingBases, Boolean ignoreOriginalGenomicLocation, Boolean addOriginalGenomicLocation) throws Exception {
String errorReportLocation, boolean replace, String postIntervalSize, String stripMatchingBases, Boolean ignoreOriginalGenomicLocation, Boolean addOriginalGenomicLocation, Boolean noteColumn) throws Exception {
SpringApplication app = new SpringApplication(AnnotationPipeline.class);
app.setWebApplicationType(WebApplicationType.NONE);
app.setAllowBeanDefinitionOverriding(Boolean.TRUE);
Expand All @@ -86,6 +86,7 @@ private static void annotateJob(String[] args, String filename, String outputFil
.addString("stripMatchingBases", stripMatchingBases)
.addString("ignoreOriginalGenomicLocation", String.valueOf(ignoreOriginalGenomicLocation))
.addString("addOriginalGenomicLocation", String.valueOf(addOriginalGenomicLocation))
.addString("noteColumn", String.valueOf(noteColumn))
.toJobParameters();
JobExecution jobExecution = jobLauncher.run(annotationJob, jobParameters);
if (!jobExecution.getExitStatus().equals(ExitStatus.COMPLETED)) {
Expand Down Expand Up @@ -257,9 +258,10 @@ private static void annotate(Subcommand subcommand, String[] args) throws Annota
try {
annotateJob(args, subcommand.getOptionValue("filename"), subcommand.getOptionValue("output-filename"), outputFormat, subcommand.getOptionValue("isoform-override"),
subcommand.getOptionValue("error-report-location", ""),
true, subcommand.getOptionValue("post-interval-size", "100"), subcommand.getOptionValue("strip-matching-bases", "all"), subcommand.hasOption("ignore-original-genomic-location"), subcommand.hasOption("add-original-genomic-location"));
true, subcommand.getOptionValue("post-interval-size", "100"), subcommand.getOptionValue("strip-matching-bases", "all"), subcommand.hasOption("ignore-original-genomic-location"), subcommand.hasOption("add-original-genomic-location"), true);
// When you change the default value of post-interval-size, do not forget to update MutationRecordReader.postIntervalSize accordingly
// "replace-symbol-entrez" is true by default
// notecolumn is set to true, can reset to noteColumn parameter if have grouped arguments in the future
} catch (Exception e) {
throw new AnnotationFailedException(e);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@ private static Options getOptions() {
.addOption("p", "post-interval-size", true, "Number of records to make POST requests to Genome Nexus with at a time")
.addOption("s", "strip-matching-bases", true, "Strip matching allele bases, options are: first,all,none")
.addOption("d", "ignore-original-genomic-location", false, "Ignore original genomic location in input file (columns with prefix 'IGNORE_Genome_Nexus_Original_').")
.addOption("a", "add-original-genomic-location", false, "Add original genomic location input columns in the output, name columns with prefix 'IGNORE_Genome_Nexus_Original_')");
.addOption("a", "add-original-genomic-location", false, "Add original genomic location input columns in the output, name columns with prefix 'IGNORE_Genome_Nexus_Original_')")
.addOption("n", "note-column", false, "Add 'Genomic Location Explanation' column for variants that have altered genomic location");

return gnuOptions;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,9 @@ public class MutationRecordReader implements ItemStreamReader<AnnotatedRecord> {
@Value("#{jobParameters[addOriginalGenomicLocation] ?: 'false'}")
private Boolean addOriginalGenomicLocation;

@Value("#{jobParameters[noteColumn] ?: 'true'}")
private Boolean noteColumn;

private AnnotationSummaryStatistics summaryStatistics;
private List<AnnotatedRecord> allAnnotatedRecords = new ArrayList<>();
private Set<String> header = new LinkedHashSet<>();
Expand All @@ -107,9 +110,9 @@ public void open(ExecutionContext ec) throws ItemStreamException {
List<MutationRecord> mutationRecords = loadMutationRecordsFromMaf();
if (!mutationRecords.isEmpty()) {
if (postIntervalSize > 1) {
allAnnotatedRecords = annotator.getAnnotatedRecordsUsingPOST(summaryStatistics, mutationRecords, isoformOverride, replace, postIntervalSize, true, stripMatchingBases, ignoreOriginalGenomicLocation, addOriginalGenomicLocation);
allAnnotatedRecords = annotator.getAnnotatedRecordsUsingPOST(summaryStatistics, mutationRecords, isoformOverride, replace, postIntervalSize, true, stripMatchingBases, ignoreOriginalGenomicLocation, addOriginalGenomicLocation, noteColumn);
} else {
allAnnotatedRecords = annotator.annotateRecordsUsingGET(summaryStatistics, mutationRecords, isoformOverride, replace, true, stripMatchingBases, ignoreOriginalGenomicLocation, addOriginalGenomicLocation);
allAnnotatedRecords = annotator.annotateRecordsUsingGET(summaryStatistics, mutationRecords, isoformOverride, replace, true, stripMatchingBases, ignoreOriginalGenomicLocation, addOriginalGenomicLocation, noteColumn);
}
// if output-format option is supplied, we only need to convert its data into header
if (outputFormat != null) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ public void check_if_maf_file_still_the_same_when_annotating_with_uniprot_transc
JobParameters jobParameters = new JobParametersBuilder()
.addString("filename", inputFile)
.addString("outputFilename", actualFile)
.addString("replace", String.valueOf(false))
.addString("replace", String.valueOf(true))
.addString("isoformOverride", "uniprot")
.addString("errorReportLocation", null)
.toJobParameters();
Expand All @@ -85,7 +85,7 @@ public void check_if_maf_file_still_the_same_when_annotating_with_mskcc_transcri
JobParameters jobParameters = new JobParametersBuilder()
.addString("filename", inputFile)
.addString("outputFilename", actualFile)
.addString("replace", String.valueOf(false))
.addString("replace", String.valueOf(true))
.addString("isoformOverride", "mskcc")
.addString("errorReportLocation", null)
.toJobParameters();
Expand Down Expand Up @@ -192,7 +192,7 @@ public void run_vcf2maf_test_case_mskcc() throws Exception {
JobParameters jobParameters = new JobParametersBuilder()
.addString("filename", inputFile)
.addString("outputFilename", actualFile)
.addString("replace", String.valueOf(false))
.addString("replace", String.valueOf(true))
.addString("isoformOverride", "mskcc")
.addString("errorReportLocation", null)
.toJobParameters();
Expand All @@ -214,7 +214,7 @@ public void run_vcf2maf_test_case_uniprot() throws Exception {
JobParameters jobParameters = new JobParametersBuilder()
.addString("filename", inputFile)
.addString("outputFilename", actualFile)
.addString("replace", String.valueOf(false))
.addString("replace", String.valueOf(true))
.addString("isoformOverride", "uniprot")
.addString("errorReportLocation", null)
.toJobParameters();
Expand Down Expand Up @@ -248,7 +248,7 @@ public void check_if_nucleotide_context_provides_Ref_Tri_and_Var_Tri_columns() t
JobParameters jobParameters = new JobParametersBuilder()
.addString("filename", inputFile)
.addString("outputFilename", actualFile)
.addString("replace", String.valueOf(false))
.addString("replace", String.valueOf(true))
.addString("isoformOverride", "uniprot")
.addString("errorReportLocation", null)
.toJobParameters();
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#genome_nexus_version: 1.0.2
#isoform: mskcc
Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position End_Position Strand Consequence Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 dbSNP_RS dbSNP_Val_Status Tumor_Sample_Barcode Matched_Norm_Sample_Barcode Match_Norm_Seq_Allele1 Match_Norm_Seq_Allele2 Tumor_Validation_Allele1 Tumor_Validation_Allele2 Match_Norm_Validation_Allele1 Match_Norm_Validation_Allele2 Verification_Status Validation_Status Mutation_Status Sequencing_Phase Sequence_Source Validation_Method Score BAM_File Sequencer t_ref_count t_alt_count n_ref_count n_alt_count HGVSc HGVSp HGVSp_Short Transcript_ID RefSeq Protein_position Codons Exon_Number Annotation_Status
MET 4233 GRCh37 7 116411872 116411900 + splice_region_variant,intron_variant Splice_Region DEL TAACAAGCTCTTTCTTTCTCTCTGTTTTA - - ENST00000397752.3:c.2888-31_2888-3del p.X963_splice ENST00000397752 NM_000245.2 963 SUCCESS
PCM1 5108 GRCh37 8 17796382 17796383 + missense_variant Missense_Mutation DNP AC GT GT rs754721723 ENST00000325083.8:c.476_477inv p.Asn159Ser p.N159S ENST00000325083 NM_006197.3 159 aAC/aGT 5/39 SUCCESS
Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position End_Position Strand Consequence Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 dbSNP_RS dbSNP_Val_Status Tumor_Sample_Barcode Matched_Norm_Sample_Barcode Match_Norm_Seq_Allele1 Match_Norm_Seq_Allele2 Tumor_Validation_Allele1 Tumor_Validation_Allele2 Match_Norm_Validation_Allele1 Match_Norm_Validation_Allele2 Verification_Status Validation_Status Mutation_Status Sequencing_Phase Sequence_Source Validation_Method Score BAM_File Sequencer t_ref_count t_alt_count n_ref_count n_alt_count HGVSc HGVSp HGVSp_Short Transcript_ID RefSeq Protein_position Codons Exon_Number genomic_location_explanation Annotation_Status
MET 4233 GRCh37 7 116411872 116411900 + splice_region_variant,intron_variant Splice_Region DEL TAACAAGCTCTTTCTTTCTCTCTGTTTTA - - ENST00000397752.3:c.2888-31_2888-3del p.X963_splice ENST00000397752 NM_000245.2 963 SUCCESS
PCM1 5108 GRCh37 8 17796382 17796383 + missense_variant Missense_Mutation DNP AC GT GT rs754721723 ENST00000325083.8:c.476_477inv p.Asn159Ser p.N159S ENST00000325083 NM_006197.3 159 aAC/aGT 5/39 SUCCESS
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#genome_nexus_version: 1.0.2
#isoform: mskcc
Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position End_Position Strand Consequence Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 dbSNP_RS dbSNP_Val_Status Tumor_Sample_Barcode Matched_Norm_Sample_Barcode Match_Norm_Seq_Allele1 Match_Norm_Seq_Allele2 Tumor_Validation_Allele1 Tumor_Validation_Allele2 Match_Norm_Validation_Allele1 Match_Norm_Validation_Allele2 Verification_Status Validation_Status Mutation_Status Sequencing_Phase Sequence_Source Validation_Method Score BAM_File Sequencer t_ref_count t_alt_count n_ref_count n_alt_count HGVSc HGVSp HGVSp_Short Transcript_ID RefSeq Protein_position Codons Exon_Number Annotation_Status
PCM1 5108 GRCh37 8 17796382 17796383 + frameshift_variant Frame_Shift_Ins INS - AAC A ENST00000325083.8:c.476dup p.Asn159LysfsTer14 p.N159Kfs*14 ENST00000325083 NM_006197.3 159 aac/aaAc 5/39 SUCCESS
KMT2D 8085 GRCh37 12 49435045 49435045 + missense_variant Missense_Mutation SNP G C C rs758743247 ENST00000301067.7:c.6508C>G p.Gln2170Glu p.Q2170E ENST00000301067 NM_003482.3 2170 Caa/Gaa 31/54 SUCCESS
PCM1 5108 GRCh37 8 17796382 17796383 + missense_variant Missense_Mutation DNP AC AC GT rs754721723 ENST00000325083.8:c.476_477inv p.Asn159Ser p.N159S ENST00000325083 NM_006197.3 159 aAC/aGT 5/39 SUCCESS
Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position End_Position Strand Consequence Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 dbSNP_RS dbSNP_Val_Status Tumor_Sample_Barcode Matched_Norm_Sample_Barcode Match_Norm_Seq_Allele1 Match_Norm_Seq_Allele2 Tumor_Validation_Allele1 Tumor_Validation_Allele2 Match_Norm_Validation_Allele1 Match_Norm_Validation_Allele2 Verification_Status Validation_Status Mutation_Status Sequencing_Phase Sequence_Source Validation_Method Score BAM_File Sequencer t_ref_count t_alt_count n_ref_count n_alt_count HGVSc HGVSp HGVSp_Short Transcript_ID RefSeq Protein_position Codons Exon_Number genomic_location_explanation Annotation_Status
PCM1 5108 GRCh37 8 17796382 17796383 + frameshift_variant Frame_Shift_Ins INS - AAC A ENST00000325083.8:c.476dup p.Asn159LysfsTer14 p.N159Kfs*14 ENST00000325083 NM_006197.3 159 aac/aaAc 5/39 Start position changes from 17796382 to 17796383 is attributed to the presence of common bases A. Reference allele changes from AC to C is attributed to the presence of common bases A. Variant allele changes from AAC to AC is attributed to the presence of common bases A. SUCCESS
KMT2D 8085 GRCh37 12 49435045 49435045 + missense_variant Missense_Mutation SNP G C C rs758743247 ENST00000301067.7:c.6508C>G p.Gln2170Glu p.Q2170E ENST00000301067 NM_003482.3 2170 Caa/Gaa 31/54 SUCCESS
PCM1 5108 GRCh37 8 17796382 17796383 + missense_variant Missense_Mutation DNP AC AC GT rs754721723 ENST00000325083.8:c.476_477inv p.Asn159Ser p.N159S ENST00000325083 NM_006197.3 159 aAC/aGT 5/39 Start position changes from 17796381 to 17796382 is attributed to the presence of common bases A. Reference allele changes from AAC to AC is attributed to the presence of common bases A. Variant allele changes from AGT to GT is attributed to the presence of common bases A. SUCCESS
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#genome_nexus_version: 1.0.2
#isoform: uniprot
Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position End_Position Strand Consequence Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 dbSNP_RS dbSNP_Val_Status Tumor_Sample_Barcode Matched_Norm_Sample_Barcode Match_Norm_Seq_Allele1 Match_Norm_Seq_Allele2 Tumor_Validation_Allele1 Tumor_Validation_Allele2 Match_Norm_Validation_Allele1 Match_Norm_Validation_Allele2 Verification_Status Validation_Status Mutation_Status Sequencing_Phase Sequence_Source Validation_Method Score BAM_File Sequencer t_ref_count t_alt_count n_ref_count n_alt_count HGVSc HGVSp HGVSp_Short Transcript_ID RefSeq Protein_position Codons Exon_Number Annotation_Status
PCM1 5108 GRCh37 8 17796382 17796383 + frameshift_variant Frame_Shift_Ins INS - AAC A ENST00000325083.8:c.476dup p.Asn159LysfsTer14 p.N159Kfs*14 ENST00000325083 NM_006197.3 159 aac/aaAc 5/39 SUCCESS
KMT2D 8085 GRCh37 12 49435045 49435045 + missense_variant Missense_Mutation SNP G C C rs758743247 ENST00000301067.7:c.6508C>G p.Gln2170Glu p.Q2170E ENST00000301067 NM_003482.3 2170 Caa/Gaa 31/54 SUCCESS
PCM1 5108 GRCh37 8 17796382 17796383 + missense_variant Missense_Mutation DNP AC AC GT rs754721723 ENST00000325083.8:c.476_477inv p.Asn159Ser p.N159S ENST00000325083 NM_006197.3 159 aAC/aGT 5/39 SUCCESS
Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position End_Position Strand Consequence Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 dbSNP_RS dbSNP_Val_Status Tumor_Sample_Barcode Matched_Norm_Sample_Barcode Match_Norm_Seq_Allele1 Match_Norm_Seq_Allele2 Tumor_Validation_Allele1 Tumor_Validation_Allele2 Match_Norm_Validation_Allele1 Match_Norm_Validation_Allele2 Verification_Status Validation_Status Mutation_Status Sequencing_Phase Sequence_Source Validation_Method Score BAM_File Sequencer t_ref_count t_alt_count n_ref_count n_alt_count HGVSc HGVSp HGVSp_Short Transcript_ID RefSeq Protein_position Codons Exon_Number genomic_location_explanation Annotation_Status
PCM1 5108 GRCh37 8 17796382 17796383 + frameshift_variant Frame_Shift_Ins INS - AAC A ENST00000325083.8:c.476dup p.Asn159LysfsTer14 p.N159Kfs*14 ENST00000325083 NM_006197.3 159 aac/aaAc 5/39 Start position changes from 17796382 to 17796383 is attributed to the presence of common bases A. Reference allele changes from AC to C is attributed to the presence of common bases A. Variant allele changes from AAC to AC is attributed to the presence of common bases A. SUCCESS
KMT2D 8085 GRCh37 12 49435045 49435045 + missense_variant Missense_Mutation SNP G C C rs758743247 ENST00000301067.7:c.6508C>G p.Gln2170Glu p.Q2170E ENST00000301067 NM_003482.3 2170 Caa/Gaa 31/54 SUCCESS
PCM1 5108 GRCh37 8 17796382 17796383 + missense_variant Missense_Mutation DNP AC AC GT rs754721723 ENST00000325083.8:c.476_477inv p.Asn159Ser p.N159S ENST00000325083 NM_006197.3 159 aAC/aGT 5/39 Start position changes from 17796381 to 17796382 is attributed to the presence of common bases A. Reference allele changes from AAC to AC is attributed to the presence of common bases A. Variant allele changes from AGT to GT is attributed to the presence of common bases A. SUCCESS
Loading

0 comments on commit da4edc4

Please sign in to comment.