Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Last record missing in barcode corrected BAM file #33

Open
hukai916 opened this issue Dec 24, 2021 · 8 comments
Open

Last record missing in barcode corrected BAM file #33

hukai916 opened this issue Dec 24, 2021 · 8 comments
Assignees
Labels

Comments

@hukai916
Copy link

Hi developers,

I encountered a mysterious issue when using Pheniqs for barcode correction.

Basically, when performing the correction, I first prepare a CRAM file containing barcodes to be corrected and a json file according to the tutorial, then run Pheniqs. However, the resulting BAM files is one record less than the CRAM, and the missing one seems to be always the last record in the CRAM.

This happens only occasionally. For example, I run the same code for 6 times and capture one (bam3):

$ samtools view -c pbmc_500_10p_2_aa.corrected.bam
450000
$ samtools view -c pbmc_500_10p_2_aa.corrected.bam1
450000
$ samtools view -c pbmc_500_10p_2_aa.corrected.bam2
450000
$ samtools view -c pbmc_500_10p_2_aa.corrected.bam3
449999
$ samtools view -c pbmc_500_10p_2_aa.corrected.bam4
450000
$ samtools view -c pbmc_500_10p_2_aa.corrected.bam5
450000

Not a big matter, but still annoying, I can provide more info if you would like to look into it.

Thanks,

--Kai

@moonwatcher moonwatcher self-assigned this Dec 25, 2021
@moonwatcher
Copy link
Contributor

Ill need more details. but is it possible one read fails decoding and the filter outgoing qc fail or filter incoming qc fail or is on?

@hukai916
Copy link
Author

What info do you need? I am attaching my test data and code here:
https://www.dropbox.com/sh/6idqunobck72q5j/AAClfpkwhusMVDPNx6HU7wSYa?dl=0

Note that bam2 is missing the last record where bam1 and bam3 are correct.
Let me know if you need other info. Thanks!

@moonwatcher
Copy link
Contributor

@hukai916 so all bam files in the dropbox folder are outputs from the cram file? Can you please post the json config file you use and which version of pheniqs you are using?

log_decode2.txt seems to be the one showing a missing record. 450000 are reported in incoming and only 449999 are reported in sample. being the last record does point to some thread synchronization issue but obviously an extremely rare one since I have never encountered it. How many cores is this running on?

@hukai916
Copy link
Author

hukai916 commented Dec 26, 2021

Hi, all bams are created from the same cram using the same pheniqs command. The missing-last-record cases occur occasionally, roughly 10%-20%, no matter how many cores I use. I will give more info after the holidays.

@moonwatcher
Copy link
Contributor

Can you please post pbmc_500_5p_3_aa.json and I will try and replicate it.

L.

@hukai916
Copy link
Author

Hi L.,
I have uploaded the json file to the same folder.
The Pheniqs I used is: pheniqs version 2.1.0 @moonwatcher

--Kai

@moonwatcher
Copy link
Contributor

moonwatcher commented Jan 10, 2022

Hello Kai

Just a quick update., I have been trying to reproduce this but have so far been unsuccessful. I tried both with a build of pheniqs 2.1.0 and with head.

A side comment: using an extension of ".bam1" pheniqs does not detect that you actually wanted bam encoding as so it will revert to the default SAM output. This is why the files you provided are bigger than I expected. They are actually uncompressed, simple text in SAM format. You can override the default output format with --format bam.

Are you observing this on MacOS or Linux? What exact platform? might also be useful to know what dependencies. Did you build the binary yourself or installed it from Conda?

It is possible this is related to one of the dependencies. Since this is so tricky to replicate, will you be willing to test on custom build? I can provide instructions if you are interested.

@hukai916
Copy link
Author

hukai916 commented Jan 11, 2022

Thanks L.,

I installed Pheniqs with conda on a Ubuntu Xenial container (Docker). The container contains solely miniconda, Pheniqs, pysam, and SAMTools. The container can be pulled by:

docker pull hukai916/pheniqs_xenial:0.2

I can perform more tests, pls share instructions. Thanks!

--Kai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants