Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about genotype phasing for each SNV in each cell #75

Open
Li-Chengyu opened this issue Sep 10, 2024 · 2 comments
Open

Questions about genotype phasing for each SNV in each cell #75

Li-Chengyu opened this issue Sep 10, 2024 · 2 comments

Comments

@Li-Chengyu
Copy link

Hi! Dr. Dou,

Thank you for developing this great software Monopogen for both germline and somatic SNVs detection in single-cell sequencing data. We found it very efficient in somatic mutations calling in human brain snATAC-seq data, and we are going to do some modifications to the Monopogen scripts to make it more appropriate to our own data. There are some questions when I'm studying your scripts and found it hard for me to understand.

  1. In the scrpit somatic.py, line 298: mat=mat.groupby(by=['snvIndex','cellIndex'], as_index=False).first()

    You keep only the first allele record when scanning reads coverage for each SNV in each cell, considering the widespread allelic dropout in single-cell sequencing data. But in our snATAC-seq data, there are still 8% SNVs covered by both reference and alternative reads in one single cell. Is it a small proportion that can be ignored for the following analysis, or should we assign value 1 to the SNV in the cell when both reference and alternative reads are observed?

  2. In the script somatic.py, line 309 to 337:

    You phase the genotype for each germline SNV in each cell, but why should the phased genotype be flipped when only reference reads (value 0) are observed in the cell? In my opinion, all the phased genotypes are the same across cells for one SNV if it is germline.

Looking forward to your reply!

Sincerely,
Chengyu

@jinzhuangdou
Copy link
Collaborator

jinzhuangdou commented Sep 14, 2024

  1. We usually observed one allele in one cell. If your data has 8% SNVs covered by both, you may keep both when transferring bam files to the matrices. We will upgrade this function in the future.
  2. Yes, all the phased genotypes are the same for one SNV. In the element phase_info, x|x, the left denote the number of reads supporting reference allele and the right for alternative allele. If your genotype typing is 1(alt)|0(ref), and you observed one ref allele in one cell, it could be write as 0(alt allele number) |1 (ref allele number)

@jinzhuangdou
Copy link
Collaborator

We have updated the version to identify the read depth for both reference and alternative alleles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants