Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Support for combinatorial barcode indexing(like SHARE) not present #156

Open
emattei opened this issue Mar 25, 2024 · 3 comments
Open
Labels
bug Something isn't working

Comments

@emattei
Copy link

emattei commented Mar 25, 2024

Hi,
I am interested in using chromap to run SHARE-seq data. The barcodes come from three rounds and splitting and pooling.
These three barcodes (8bp each) should be corrected individually allowing 1 mismatch but chromap requires to pass a list of 7M barcodes- that is the cartesian product of the R1R2R3. This is not correct because it will be 1 mismatch in 24 bps instead of 1 mismatch for each round of 8bp barcode.

I see that in the README is stated "This option also supports combinatorial barcoding, such as SHARE-seq. "
Is combinatorial barcoding really supported but how to pass the whitelist in this case is not documented?

Thank you

@emattei emattei added the bug Something isn't working label Mar 25, 2024
@mourisl
Copy link
Collaborator

mourisl commented Mar 26, 2024

Do you have a whitelist for 8bp? It seems that would be 4^8=65536 entries in the whitelist at most, which might be easy to have conflict. I think the current best way is to run Chromap without whitelist, and correct the barcode later by collecting the real barcode based on abundances or filter the barcode with too few reads.

@emattei
Copy link
Author

emattei commented Mar 26, 2024

yes I have a barcode whitelist and it is 192 barcodes long. There are not conflicts and all the barcodes are 3 hamming distance away from each other. Here the problem is that if I pass a read format like this "bc:65:72,bc:103:110,bc:141:148,r1:0:-1,r2:0:49" where I pass the three barcodes, chromap expects the whitelist to contain 24 bp barcodes and correct using a distance of 1 or 2 which is not the correct way. Each 8bp barcode should be corrected independently against the 192 possibilities.

It seems like you are confirming that chromap doesn't support correction for combinatorial barcoding. Is that a correct statement?

@mourisl
Copy link
Collaborator

mourisl commented Mar 26, 2024

Right. The current version of Chromap concatenates the barcode segments first and then conducts error correction. We can add the feature to support segment-wise error correction in the future version.

I think 1 correction in the 24bp can still resolve most of the barcode sequencing errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants