Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange read-splitting behaviour #184

Open
apredeus opened this issue Mar 11, 2024 · 3 comments
Open

Strange read-splitting behaviour #184

apredeus opened this issue Mar 11, 2024 · 3 comments

Comments

@apredeus
Copy link

Dear bamtofastq developer team,

I recently came across a very interesting behaviour. I am trying to reprocess a public dataset that consists of 22 10x GEX runs (I've checked and I'm pretty positive that none of those are ATAC etc). Here is the link to the dataset:

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE138669

SRA has failed to recognise the "technical" R1 read, so they have made the submitter's 10x BAM files available. However, upon running the latest (v1.4.1) bamtofastq (each job has completed successfully etc), I have discovered that samples were split into two big groups. Group 1 (GSM4115877-GSM4115889) has generated normal index, technical R1 (26 bp), and biological R2 (98 bp). However, Group 2 (GSM4115868-GSM4115876) has generated 4 reads: index, R1 which is a biological read (98 bp), R2 containing cell barcode (16 bp), and R3 containing UMI (10 bp).

GSM4115868 I1 R1 R2 R3
GSM4115869 I1 R1 R2 R3
GSM4115870 I1 R1 R2 R3
GSM4115871 I1 R1 R2 R3
GSM4115872 I1 R1 R2 R3
GSM4115873 I1 R1 R2 R3
GSM4115874 I1 R1 R2 R3
GSM4115875 I1 R1 R2 R3
GSM4115876 I1 R1 R2 R3
GSM4115877 I1 R1 R2
GSM4115878 I1 R1 R2
GSM4115879 I1 R1 R2
GSM4115880 I1 R1 R2
GSM4115881 I1 R1 R2
GSM4115882 I1 R1 R2
GSM4115883 I1 R1 R2
GSM4115884 I1 R1 R2
GSM4115885 I1 R1 R2
GSM4115886 I1 R1 R2
GSM4115887 I1 R1 R2
GSM4115888 I1 R1 R2
GSM4115889 I1 R1 R2

All BAM tags/headers appear to be the same, even made by the same version of Cell Ranger (v3 I think).

SRR10254548.bam AS BC CB CR CY HI li NH nM QT RE RG UB UR UY
SRR10254549.bam AS BC CB CR CY HI li NH nM QT RE RG UB UR UY xf
..............
SRR10254569.bam AS BC CB CR CY HI li NH nM QT RE RG UB UR UY xf

Do you know what is causing it, and I can I fix it?

For your convenience, here are some (NCBI) links to an "offending" and a "normal-behaving" BAM files:

bad BAM: https://sra-pub-src-2.s3.amazonaws.com/SRR10254550/SC4possorted_genome_bam.bam.1
good BAM: https://sra-pub-src-2.s3.amazonaws.com/SRR10254567/SC185possorted_genome_bam.bam.1

Thank you in advance!

-- Alex

@apredeus
Copy link
Author

OK I realized now that those are samples done with v1 chemistry. Is there a way to run bamtofastq to produce a normal pair of R1/R2 files, or do I have to combine them using some custom script?

Thank you in advance!

@mortunco
Copy link

Hi. Same here for the same issue. Perturbseq (dixit et al ). Thanks in advance.

@RickyLau0910
Copy link

Hi, here is a guidance from 10x Genomics on ‘‘How to format v1 chemistry datasets to work with current Cell Ranger versions?’’ which is helpful to me.
https://kb.10xgenomics.com/hc/en-us/articles/360043386291-How-to-format-v1-chemistry-datasets-to-work-with-current-Cell-Ranger-versions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants