Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

valid read pairs is too few compared with other tools #340

Open
shiyi-pan opened this issue Jan 8, 2021 · 4 comments
Open

valid read pairs is too few compared with other tools #340

shiyi-pan opened this issue Jan 8, 2021 · 4 comments

Comments

@shiyi-pan
Copy link

Hi, I want to use TADbit to detect TADs, the TADbit detect 25,521,413 valid read pairs only ,but other tools such as HiCExplorer could detect 58949201 read pairs using the same data, I think maybe my code has some problem, could you give me some advise? I ues GEM as mapper and here is my code, thank you very much:

from pytadbit.mapping.full_mapper import full_mapping
species = 'soybean'

for the first side of the reads

full_mapping(mapper_index_path='NN1138-2.v1.0.genome.gem',
out_map_dir='results/fragment/{0}/01_mapping/mapped_{0}_r1/'.format(species),
fastq_path='soybean_R1.fastq.gz',
r_enz='MboI', frag_map=True, clean=True, nthreads=8,
temp_dir='results/fragment/{0}/01_mapping/mapped_r1_tmp/'.format(species))

@david-castillo
Copy link
Contributor

david-castillo commented Jan 10, 2021

Hi,

The number of reads that you get with any pipeline depends on several factors but mainly on two: the mismatch you allow in the mapping of the reads and the filters you applied to the mapped reads.
By default we use GEM, but even different versions of GEM have different default mismatches. Version 2 is about 4% or 6% if I remember correctly, other mappers allow higher mismatches like 15%. We also support bowtie2 and hisat2 and those will also produce a different number of reads depending on the parameters you use. By default we use the --very-sensitive option.
Then you have the filters to discard what you consider non-informative or errors. Each pipeline has his own (although conceptually similar) filters.

Regards

David

@shiyi-pan
Copy link
Author

Thank you for your reply, David. I will try other mappers like bowtie2 and hisat2 . I don't know clearly how to use bowtie2 into the TADbit pipeline . Is it just mapping Hi-C reads with bowtie2 and parse the bam file with parse_map function or not ?

@david-castillo
Copy link
Contributor

You can use the same code that you already have but specify another mapper in the full_mapper function (like mapper='bowtie2'). Be sure you input the right mapper_index_path because for bowtie2 is not a single file, you have to specify the path and the prefix to the index.

http://3dgenomes.github.io/TADbit/reference/reference_mapping.html#mapping

You can also customize bowtie2 parameters with the mapper_params parameter if you need.

After that the parse is exactly the same.

Regards

David

@shiyi-pan
Copy link
Author

Thanks a lot , David.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants