-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure during Minimus2 Step #5
Comments
to add, the sequence in question which appears to be incorrectly formatted is actually: #72 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN |
MeGAmerge is not intended to be fed raw fastq files. You will first need to assemble the data using a current tool (MetaSpades, eg, using multiple assembly parameters. MeGAMerge is a post assembly tool to improve assemblies by leveraging the fact that no two assemblers seem to assemble the same data into the same contigs. |
So I think I may have found the root of the error but I am unsure how to resolve it.
Reads are from HiSeq 2500, 101 bp paired end reads
Here is the output from the debug mode, with force write (-force) enabled:
~/bin/MeGAMerge/MeGAMerge-1.1.pl -force -d -cpu=32 DF1_MegaMerge/ 1-STH063DF1_CGTACTAG-TAGATCGC_L003_R1_001.fastq 1-STH063DF1_CGTACTAG-TAGATCGC_L003_R2_001.fastq
COMMAND
perl /pub40/nfellaby/bin/MeGAMerge/MeGAMerge-1.1.pl -force -d -cpu=32 DF1_MegaMerge/ 1-STH063DF1_CGTACTAG-TAGATCGC_L003_R1_001.fastq 1-STH063DF1_CGTACTAG-TAGATCGC_L003_R2_001.fastq
The Merged FASTA will be stored in DF1_MegaMerge//MergedContigs.fasta
Reading 1-STH063DF1_CGTACTAG-TAGATCGC_L003_R1_001.fastq
Reading 1-STH063DF1_CGTACTAG-TAGATCGC_L003_R2_001.fastq
Running Newbler assembly with 4991773 sequences
runAssembly -force -large -rip -mi 98 -ml 80 -pairt -cpu 32 -a 200 -o DF1_MegaMerge//newbler DF1_MegaMerge//newblerIn.fasta
Initialized assembly project directory DF1_MegaMerge//newbler
1 read file successfully added.
newblerIn.fasta (Fasta dataset)
Assembly computation starting at: Tue Jul 26 13:19:23 2016 (v2.9 (20130529_1641))
Indexing newblerIn.fasta...
Warning: No quality scores file found.
-> 3310606 reads, 2193389201 bases.
Warning: Suspected 5' primer CGTACTAGTAGATCGC, 438341 exact matches found.
Warning: Suspected 3' primer CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC, 9273 exact matches found (displayed as the reverse complement in reads).
Setting up overlap detection...
-> 457701 of 457701, 457701 reads to align
Starting seed building...
-> 457701 of 457701
Building a tree for 4913601 seeds...
Computing alignments...
-> 457000 of 457701
7690 sequences marked as repeats, 7689 in alignments, removing from 2433 chords...
-> 457701 of 457701
Checkpointing...
Detangling alignments...
-> Level 4, Phase 9, Round 2...
Checkpointing...
Building contigs/scaffolds...
-> 32 large contigs, 276 all contigs
Computing signals...
-> 168066 of 168066...
Generating output...
-> 168066 of 168066...
Assembly computation succeeded at: Tue Jul 26 13:33:20 2016
423325 newbler singletons sequences
Checking 454PairAlign.txt
Unique Newbler Singletons (>=200 bp): 174250
opening DF1_MegaMerge//minimus.fasta for writing
opening DF1_MegaMerge//largefile.fasta for reading
opening DF1_MegaMerge//newbler/All.fasta for reading
Running Minimus2 with 1539831 sequences
Error: Temporary read file /tmp/tmp.18421.seq was not formatted as expected at line 8033:
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
toAmos -s DF1_MegaMerge//minimus.fasta -o DF1_MegaMerge//minimus.fasta.afg failed:
Died at /pub40/nfellaby/bin/MeGAMerge/MeGAMerge-1.1.pl line 275.
What I would say about the sequence file is that it contains huge chunks of 'N' inserts:
e.g.
#1
@HISEQ:99:C5LAVANXX:3:1101:2978:1923 1:N:0:CGTACTAGTAGATCGCA
GTGCAATGCTGTGATCTCGGCTAACCACAACCTCCGCCTCCCAGGTTCAAGCGATTCTCC
TGCCTCAGCCTCCCGAGTAGCTGGGATTACAGGCATGCGCCACCACACCTGGCTAATTTT
GTA+=<<@FGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
BGGGGGGG@HISEQ:99:C5LAVANXX:3:1101:3136:1908 1:N:0:CGTACTAGT
AGATCGCAATGAAGTAGTAAAACCTGCATCAAGGTTTATGGTTTCATTCGAAAAACTGCC
TCCAAAAGAAAATTATTTTTAAAAAGAAGATTTAGAAAAAATTACCTAATTTCCATTTTA
AATGACCCTCT+=<@BGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGEGGGGGGG@HISEQ:99:C5LAVANXX:3:1101:3083:1926 1:N:0:C
GTACTAGTAGATCGCCTGAGGATCTCTAAGTATTTAACATCAAAATACTGAAAATTGCCA
TTTTTCACCATTAATTGTAATTCAAATGGCATTTGATTAATGGATGTTCACCTTTTTCTA
AAAATTAAAAAATAAGATT+=<@BGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGG@HISEQ:99:C5LAVANXX:3:1101:3486:1940
1:N:0:CGTACTAGTAGATCGCTTCTAGCTACAAGAGACACAGTGGTCAGCAAAAGAAA
CACTGCCAGCCTTCCCAGAGCTTATAGATTAGTGAGGGAGAGGGTGAATAACCAGATAAA
CAAACGCTCGTCAGTGCTATGAAAGAA+<<@BGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG@HISEQ:99:C5LAVANXX:3:1101:3
425:1958 1:N:0:CGTACTAGTAGATCGCCACTATGCTTGGCAGCGTGGTGTGGATAA
TAACTGGGTGCCCTGGGAGGGGTGAGGACACTCCAAGCTACAGAAGATCAGGGAAGAGGG
AGATCAAAGTGGCCTAGAGATTTTAAGAAAATCTT+=<@BGGFGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG@HISEQ:99:C5LAVANXX:
3:1101:3282:1985 1:N:0:CGTACTAGTAGATCGCATCCCCACGTCATGGGACTCA
GTTAAAGGTAATTGAATCATGGGGGTGATTTCCCCCATGCTATTCTCATGATAGTGAGTA
ACTTCTTATGAGATCCGATGGTTTTATAAGGGGTTTCTTCCTT+=<@bgggggggggggg
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG@HISEQ:99:C5
LAVANXX:3:1101:3700:1927 1:N:0:CGTACTAGTAGATCGCTGCCAAATAGTTC
TCATAAATGTTACTTTCAAGTTTATGTAAACAATATTATGAAGGGCTTTGATTCATCACA
AAATTTAAAACTTTTTAAAATAACATTTCCCAAGTACATTTAAAGTAAGAC+=@BFGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG@HISE
Q:99:C5LAVANXX:3:1101:3538:1931 1:N:0:CGTACTAGTAGATCGCATAAAA
CTGAGCCTGCATGGGAATTTCAACATTCCAAAATAACACAAACAGCATTATTGGACCCTT
TGGGTCAAATAAATCATTTACCAAGAGAAGAAGGAAAGGAATTGATGGGGATAGGAAT+=
<BBGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGG@HISEQ:99:C5LAVANXX:3:1101:3695:1947 1:N:0:CGTACTAGTAGATC
GCTGGAGCAGTTTCGAAACACACTATTTGTAGAATGTGCAAGTGGATATTTAGGCCTCTC
TGAGGATTTCGTTGGAAACGGGATAAACCGCACAGAACTAAACAGAAGCATTCTCAGAAC
CTTCTT+<<@FGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCEGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGEGGGGGGGGGGFDFGGGGGGGEGGGGGGGGGGGGGGGGFGGG
GGGGGGGGFF@HISEQ:99:C5LAVANXX:3:1101:3778:1935 1:N:0:CGTACTA
GAAGATCGCCATAAAGTATCTCATCCTCAACTCACCCTCCATGGGTGATGGCTCTTGTCT
GTGGGGTTTACGGGAATTGGGTGTGTCCCTGGAAAAGGCTTCCCCAGGTGGTGGTGGTTG
TGGGGATTAAAG+<BGCFGGGGGGGCG
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNN
Whilst I am unsure about how the program is stitching these sections together, or whether they are placeholders, I am pretty sure that these are causing the issue..
Any suggestions would be marvellous:
Machine Stats:
Constant Metrics
CPU Count 64 CPUs
CPU Speed 3300 MHz
Memory Total 529195296 KB
Swap Space Total 0 KB
x86_64 GNU/Linux
Kernel: 3.16.7-ckt25
Thanks for your time,
nicholas.
The text was updated successfully, but these errors were encountered: