-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange truncation of fastq files when demultiplexing short Illumina inserts libraries #90
Comments
Hi! Thanks for sharing the result files, and for testing out anglerfish. Sorry if I'm misunderstanding something about your library or sequencing setup.
Anglerfish will only output in the demuxed.fastq.gz files the library inserts or more accurately, the sequence between the two matching i7 and i5 templates. I'll give an example using paftools view for the minimap2 alignment file you included, for one read (the query sequence)
We see that the first 33 bases of this 182 bp long read does not match the template (but probably still is adaptor), the second not mathcing area is between bp 102 and 129, which anglerfish counts as the insert. And indeed the majority of the 4000 reads - it's very fortunate that all except 2 reads map to either i5 or i7, because then I know the length distribution - seem to be this short (190 +- 31 bp):
This is all to say that, if you expected longer inserts (90 as you say, in all these wells or only 4 of them?) I am not able to explain why these are so short from the anglerfish output alone. Even the few "long" ones, say over 300 bp
Seem to be mostly be concatenated shorter fragments, e.g. i5-insert-i7-i5-insert-i7 constructs. Which is not uncommon to see.
That's really cool. It's something I've had in the back of my mind, but never implemented because it doesn't really fit with the workflow in our lab. |
Hi there,
Have been using your library to demultiplex libraries - 384 well illumina libraries per ONT run - which have a similar construction to Tru-seq paired indexes.
The first sets tested worked very well - library insert size range 200 - 600 bp - targeted PCR library with intentional bias to see if the input ratio would be picked up and was.
However, we are also interested in the performance of the indexing plates - both for cross contamination and total library read counts per index - then compare back to the original Illumina library run. Using a 90 bp insert (x 4 - one per quadrant of the plate) and final amplicon 165 bp long. The combined demultiplexing report of 800 plus files provides a relatively realistic number of index pairs found - however the demux fastq.gz files - one per 384 well, provides only short reads of mean 26 bases +/- 40 bp. Strangely. The filtered fastq files were filtered for 180 to 400 bp ranges and pre-mapped to verify that number quantity of mappable reads. Post Anglerfish demux results in low or no alignment (bowtie2 or Minknow medeka) - looking through with fastqc showed they were too short - BLAST did no find many that aligned with insert or the i5 or i7 region.
I have attached a zipped file with: output of Anglerfish / text with settings used (have tried with and without -lenient) / fastq filtered file / the resulting demux reads from this file / a copy of the samples sheet - 384 indexes. I am using v0.6.1 - Ubuntu on WSL - Python 3.10 - minimap2 is installed. Unsure if very short amplicons causes problems with the pipeline. The fragment range includes singles and duplicated reads - expected.
anglerfish_test.zip
Thanks very much for this pipeline! We are hoping to pipe to this for real-time demultiplexing.
The text was updated successfully, but these errors were encountered: