-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error: unable to open file or unable to determine types for file synthetic.vcf #60
Comments
@gianfilippo happy to see your interest in NeuSomatic.
In addition, from the few lines of the VCF you have provided, I see that the VCF is in the wrong format. You need a header line like |
thanks. you are right, clearly got confused.....I fixed it. Thanks for your help! INFO 2020-04-04 21:25:27,955 find_records (ForkPoolWorker-54) N_none: 536 ERROR 2020-04-04 21:25:27,974 find_records (ForkPoolWorker-51) ERROR 2020-04-04 21:25:28,010 main Aborting! |
@gianfilippo would you please make sure that in the |
Hi, I generated the truth.vcf with BAMSurgeon and the reference file is then same |
@gianfilippo I added more logging for failed asserts in a new branch called |
Hi, thanks for the help! INFO 2020-04-06 12:37:31,556 find_records (ForkPoolWorker-51) N_none: 546 |
@gianfilippo I think we should assume those bases are correct synthetic truth variants. So, we should keep them to have valid training. |
the preprocess step now works, but the train step fails with the error below. What do you think ? INFO 2020-04-06 17:24:37,805 make_weights_for_balanced_classes count length classes: [(0, 3846), (1, 1753), (2, 52), (3, 121)] |
@gianfilippo sounds good. |
I am using the --ensemble already (see below the command line). |
@gianfilippo did you also use |
yes |
below is the preprocess call |
@gianfilippo Can you share with me a few lines from |
the zipped files (few lines) are attached. For the candidates.tsv I attached the lines from the work.0 dir. I had a typo in the command line, now corrected |
@gianfilippo The |
it is working now, going through the iterations. I will let you know if it completes. Thanks! |
the network completed training. Below are the last few output lines |
@gianfilippo from the loss, it seems that the training has not converged as expected. Can you send me the training log. |
Hi, the file is attached. I think there may be something wrong in my synthetic.bam and vcfs. |
|
thanks for the suggestion. I have WES. I used a modified version of somaticseq/bamsurgeon script. The modifications simply allow the script to run on the Slurm Workload Manager. I required 100000 snvs, 20000 indels, 1000 svs, and ended up with 3179 SNVs and 609 INDELs |
@gianfilippo I think that is unexpected for the somaticseq/bamsurgeon. There should be sth wrong over there. Can you share your run script for bamsurgeon? |
Hi, the script is attached. I ran it on 4 threads and the attached file works on thread 1. The others are the same. I also included the mergeFiles script. |
Can you try the following somaticseq command, to see what's in the bam files for your snv/indel positions? If you have the latest SomaticSeq v3.4.0 installed,
You may also add caller vcf files as input above. All of the following are optional, e.g.,
And do one for indel as well. |
@gianfilippo Also, can you make sure you use |
Hi, thanks for the suggestions. The output from somatic_vcf2tsv.py for both snvs and indels, using also the callers vcf files are attached. |
Looking at the two tsv.txt files, it's not obvious to me why the four callers are all "negative" for all the variant positions. |
if you think the synthetic VCF file and the BAM files are ok, there must be something wrong in the way i set up the script using the callers. Likely some stupid mistake, at this point. I will let you know. Thanks for your help! |
Hi, it seems that now, using "--selector", BamSimulator_multiThreads.sh does not complete and I get core dumps and errors. I tried on 4 and 20 threads. Now I am rerunning it on a single thread just to see if I still get the same errors. |
@gianfilippo , make sure the synthetic_*.vcf files and the caller output vcf files are all obtained from the same pair of BAM files. |
Hi,
I am trying to run neusomatic in ensemble mode, but got stuck after the SomaticSeq.Wrapper.sh step and at the "preprocess.py --mode train" step.
I get the following error
Error: unable to open file or unable to determine types for file synthetic.vcf
the synthetic.vcf is generated by processing the Ensemble.s*.tsv files generated by SomaticSeq.Wrapper.sh, following the details on your repository (my understanding of it, of course)
cat <(cat Ensemble.s*.tsv |grep CHROM|head -1) <(cat Ensemble.s*.tsv |grep -v CHROM) | sed "s/nan/0/g" > ensemble_ann1.tsv
python preprocess.py --mode train --reference $GENFILE.fa --region_bed $INTERVALFILE --tumor_bam $syntheticTumor.bam --normal_bam $syntheticNormal.bam --work WORK --truth_vcf synthetic.vcf --ensemble_tsv ensemble_ann.tsv --min_mapq 10 --num_threads 20 --scan_alignments_binary $HOME/bin/neusomatic/neusomatic/bin/scan_alignments
A few lines from the synthetic.vcf are below
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SPIKEIN
chr1 1503492 . G A 100 PASS SOMATIC;VAF=0.2;DPR=9.66666666667 GT 0/1
chr1 3752754 . G A 100 PASS SOMATIC;VAF=0.307692307692;DPR=90.0 GT 0/1
chr1 3763621 . C A 100 PASS SOMATIC;VAF=0.222222222222;DPR=17.6666666667 GT 0/1
chr1 6152482 . T A 100 PASS SOMATIC;VAF=0.127868852459;DPR=304.666666667 GT 0/1
chr1 6199629 . G C 100 PASS SOMATIC;VAF=0.21978021978;DPR=181.333333333 GT 0/1
Can you please help me understand my mistake ?
Thanks
Gianfilippo
The text was updated successfully, but these errors were encountered: