You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently came across a very interesting behaviour. I am trying to reprocess a public dataset that consists of 22 10x GEX runs (I've checked and I'm pretty positive that none of those are ATAC etc). Here is the link to the dataset:
SRA has failed to recognise the "technical" R1 read, so they have made the submitter's 10x BAM files available. However, upon running the latest (v1.4.1) bamtofastq (each job has completed successfully etc), I have discovered that samples were split into two big groups. Group 1 (GSM4115877-GSM4115889) has generated normal index, technical R1 (26 bp), and biological R2 (98 bp). However, Group 2 (GSM4115868-GSM4115876) has generated 4 reads: index, R1 which is a biological read (98 bp), R2 containing cell barcode (16 bp), and R3 containing UMI (10 bp).
All BAM tags/headers appear to be the same, even made by the same version of Cell Ranger (v3 I think).
SRR10254548.bam AS BC CB CR CY HI li NH nM QT RE RG UB UR UY
SRR10254549.bam AS BC CB CR CY HI li NH nM QT RE RG UB UR UY xf
..............
SRR10254569.bam AS BC CB CR CY HI li NH nM QT RE RG UB UR UY xf
Do you know what is causing it, and I can I fix it?
For your convenience, here are some (NCBI) links to an "offending" and a "normal-behaving" BAM files:
OK I realized now that those are samples done with v1 chemistry. Is there a way to run bamtofastq to produce a normal pair of R1/R2 files, or do I have to combine them using some custom script?
Dear bamtofastq developer team,
I recently came across a very interesting behaviour. I am trying to reprocess a public dataset that consists of 22 10x GEX runs (I've checked and I'm pretty positive that none of those are ATAC etc). Here is the link to the dataset:
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE138669
SRA has failed to recognise the "technical" R1 read, so they have made the submitter's 10x BAM files available. However, upon running the latest (v1.4.1)
bamtofastq
(each job has completed successfully etc), I have discovered that samples were split into two big groups. Group 1 (GSM4115877-GSM4115889) has generated normal index, technical R1 (26 bp), and biological R2 (98 bp). However, Group 2 (GSM4115868-GSM4115876) has generated 4 reads: index, R1 which is a biological read (98 bp), R2 containing cell barcode (16 bp), and R3 containing UMI (10 bp).All BAM tags/headers appear to be the same, even made by the same version of Cell Ranger (v3 I think).
Do you know what is causing it, and I can I fix it?
For your convenience, here are some (NCBI) links to an "offending" and a "normal-behaving" BAM files:
bad BAM: https://sra-pub-src-2.s3.amazonaws.com/SRR10254550/SC4possorted_genome_bam.bam.1
good BAM: https://sra-pub-src-2.s3.amazonaws.com/SRR10254567/SC185possorted_genome_bam.bam.1
Thank you in advance!
-- Alex
The text was updated successfully, but these errors were encountered: