Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incomplete bed output file generated from Chromap with Chip-seq data #122

Open
ming1211 opened this issue Nov 16, 2022 · 19 comments
Open

incomplete bed output file generated from Chromap with Chip-seq data #122

ming1211 opened this issue Nov 16, 2022 · 19 comments

Comments

@ming1211
Copy link

I tried twice, one used raw_data

chromap --preset chip -t 24 -x chromap-tair10_all -r lower_TAIR10_Chr.all.fasta -1 CRR303826_f1.fq.gz -2 CRR303826_r2.fq.gz --trim-adapters -o CRR303826_chromap.bed
but the bed output contains incomplete lines

chr5 17410543 17410997 N 60 +
chr5 1795 N 48 -
chr1 21850 62513 N 46 -
chr1 21850 63194 N 60 +
chr1 21850 49376 N 30 -

then I tried the command without the parameter -t 24

chromap --preset chip -t 24 -x chromap-tair10_all -r lower_TAIR10_Chr.all.fasta -1 CRR303826_f1.fq.gz -2 CRR303826_r2.fq.gz --trim-adapters -o CRR303826_chromap.bed
the incomplete line is as follows

chrM 366743 366811 N 60 +
chrM 366745 366856 N 60 -
chrM 366750 366871 N 60 -
chrM 366751 366900 N 60 -
chrM 366814 366907 N 60 -
0 -
chr1 10624289 10624580 N 60 +
chr1 10624326 10624637 N 60 -
chr1 10624327 10624639 N 60 +
chr1 10624333 10624491 N 60 -
chr1 10624337 10624450 N 60 +
chr1 10624347 10624846 N 60 -

the other time I used data trimmed by Trimmomatic
chromap --preset chip -t 24 -x chromap-tair10_all -r lower_TAIR10_Chr.all.fasta -1 CRR303826_f1-paired.fq.gz-2 CRR303826_r2-paired.fq.gz --trim-adapters -o CRR303826_chromap.bed
there are also incomplete lines in the output bed.

chr3 9598971 9599119 N 60 -
chr3 9599003 9599351 N 60 -
chr3 9599010 9599280 N 60 +
chr3 9599050 9599476 N 60 +
chr3 9599052 9599520 N 60 -
chr3 9599070 95992421945 50354 N 39 -
chr1 21945 50840 N 39 -
chr1 21945 29959 N 60 -
chr1 21945 54054 N 61 +
chr1 21945 42568 N 33 -
chr1 21945 42783 N 33 -

@mourisl
Copy link
Collaborator

mourisl commented Nov 16, 2022

Which version of Chromap are you using? There were some bugs regarding the alignment near the end of a chromosome and using a different number of threads. Many of them have been fixed recently.

@ming1211
Copy link
Author

ming1211 commented Nov 16, 2022 via email

@mourisl
Copy link
Collaborator

mourisl commented Nov 16, 2022

Could you please use "git pull" (or git clone), and recompile it to get the most recent version of Chromap? The r407 was released about half a year ago, and many bugs have been fixed since then.

"Didn't reach the end of sequence file, which might be corrupted!" could be the issue in fastq. Could you please try something like "gzip -t" to test the files? This might be a copy/paste error on github: for the command "chromap --preset chip -t 24 -x chromap-tair10_all -r lower_TAIR10_Chr.all.fasta -1 CRR303826_f1-paired.fq.gz-2 CRR303826_r2-paired.fq.gz --trim-adapters -o CRR303826_chromap.bed", it misses the space before the "-2" option.

@ming1211
Copy link
Author

ming1211 commented Nov 16, 2022 via email

@mourisl
Copy link
Collaborator

mourisl commented Nov 16, 2022

What was your command for this run? Could you please check whether you have the writing permission in the output folder?

@ming1211
Copy link
Author

chromap --preset chip -x /genome/tair_10/chromap-tair10_all -r /genome/tair_10/lower_TAIR10_Chr.all.fasta -1 CRR303826_f1-paired.fq.gz -2 CRR303826_r2-paired.fq.gz -o CRR303826_chromap.bed

It is the command I used. I'm sure that I have the permission.

@mourisl
Copy link
Collaborator

mourisl commented Nov 16, 2022

Could you please run "./chromap -v" to check the version and "which chromap" to make sure it is on the right path? Just want to make sure, could you please add the full path to the -o option?

@ming1211
Copy link
Author

Hi, Dear Mourisl, Thanks for all your reply.
I retried Chromap several times. But the results are still always unsatisfactory.
the chromap version is 0.2.3-r452

cmd="$pathTochromap -i -r /SollycM82_v1.0.fasta -o ./chromap-M82_all"
eval $cmd
cmd2="$pathTochromap --preset chip -x ./chromap-M82_all -r /SollycM82_v1.0.fasta -1 /Trimmomatic/trim_M82.fastq.gz -o ./M82_chromap.bed"
eval $cmd2

the bed output is as below.
the same trimmed fastq.gz and the same genome file, Bowtie can generate useul output succcessfully.

qwang11@node11:~/work/chromap$ head M82_chromap.bed -n 50
chr1 96 173 N 49 +
chr1 145 221 N 58 -
chr1 308 385 N 55 +
chr1 343 419 N 36 +
chr1 432 508 N 48 +
chr1 449 525 N 40 +
chr1 461 537 N 34 +
chr1 462 538 N 34 +
chr1 1220 1296 N 51 +
chr1 1237 1313 N 59 +
chr1 1254 1330 N 53 -
chr1 1280 1356 N 42 -
chr1 1282 1358 N 42 -
chr1 1415 1491 N 44 +
chr1 1544 1620 N 32 +
chr1 1598 1674 N 50 +
chr1 1631 1707 N 36 -

Can you have a look if there is something wrong with my command?
Thanks again.

@mourisl
Copy link
Collaborator

mourisl commented Nov 17, 2022

Could you please share what is the output on the screen, such as "Mapped 500000 XXX" and the final summary "Number of mapped reads XXX, ..., Total time: XXXs"? Thank you.

@ming1211
Copy link
Author

Build index for the reference.
Kmer length: 17, window size: 7
Reference file: /SollycM82_v1.0.fasta
Output file: ./chromap-M82_all
Loaded all sequences successfully in 7.81s, number of sequences: 1237, number of bases: 829074230.
Collecting minimizers.
Collected 207873466 minimizers.
Sorting minimizers.
Sorted all minimizers.
Kmer size: 17, window size: 7.
Lookup table size: 108485844, # buckets: 268435456, occurrence table size: 122021551, # singletons: 85851915.
Built index successfully in 80.08s.
[M::Statistics] kmer size: 17; skip: 7; #seq: 1237
[M::Statistics::1.785] distinct minimizers: 108485844 (79.14% are singletons); average occurrences: 1.916; average spacing: 3.988
Saved in 94.55s.
Preset parameters for ChIP-seq are used.
Start to map reads.
Parameters: error threshold: 8, min-num-seeds: 2, max-seed-frequency: 500,1000, max-num-best-mappings: 1, max-insert-size: 2000, MAPQ-threshold: 30, min-read-length: 30, bc-error-threshold: 1, bc-probability-threshold: 0.90
Number of threads: 1
Analyze bulk data.
Won't try to remove adapters on 3'.
Will remove PCR duplicates after mapping.
Will remove PCR duplicates at bulk level.
Won't allocate multi-mappings after mapping.
Only output unique mappings after mapping.
Only output mappings of which barcodes are in whitelist.
Output mappings in BED/BEDPE format.
Reference file: /SollycM82_v1.0.fasta
Index file: ./chromap-M82_all
1th read 1 file: /Trimmomatic/trim_M82.fastq.gz
Output file: ./M82_chromap.bed
Loaded all sequences successfully in 1.47s, number of sequences: 1237, number of bases: 829074230.
Kmer size: 17, window size: 7.
Lookup table size: 108485844, occurrence table size: 122021551.
Loaded index successfully in 2.33s.
Loaded 500000 reads in 0.98s.
Loaded 500000 reads in 0.97s.
Mapped in 18.89s.
Loaded 500000 reads in 0.76s.
Mapped in 18.69s.
......(similar outprint)
Mapped all reads in 1066.14s.
Number of reads: 29481645.
Number of mapped reads: 28068546.
Number of uniquely mapped reads: 25152571.
Number of reads have multi-mappings: 2915975.
Number of candidates: 1783296621.
Number of mappings: 28068546.
Number of uni-mappings: 25152571.
Number of multi-mappings: 2915975.
Sorted, deduped and outputed mappings in 10.59s.
uni-mappings: 7999112, multi-mappings: 1350545, total: 9349657.
Number of output mappings (passed filters): 6892883
Total time: 1091.66s.
/var/lib/slurm-llnl/slurmd/job873972/slurm_script : ligne 33 : 26421 Erreur de segmentation
chromap --preset chip -x /genome/tair_10/chromap-tair10_all -r /genome/tair_10/lower_TAIR10_Chr.all.fasta -1 CRR303826_f1-paired.fq.gz -2 CRR303826_r2-paired.fq.gz -o CRR303826_chromap.bed

@haowenz
Copy link
Owner

haowenz commented Nov 17, 2022

Why did you map single-end read?

@ming1211
Copy link
Author

this is a single-end data.

@haowenz
Copy link
Owner

haowenz commented Nov 17, 2022

And what was the issue with the mapping output for this single-end data?

@ming1211
Copy link
Author

the fourth column of the bed output is null, is it normal?

chr1 0 76 NB501040:284:HW72FBGXG:2:13204:16534:7425 1 - 76M
chr1 9 85 NB501040:284:HW72FBGXG:4:23506:22244:13876 1 - 76M
chr1 13 89 NB501040:284:HW72FBGXG:1:13104:26062:18156 1 - 76M
chr1 14 90 NB501040:284:HW72FBGXG:2:23203:5509:14520 1 - 76M
chr1 23 99 NB501040:284:HW72FBGXG:1:12307:2963:2171 0 + 76M
chr1 25 101 NB501040:284:HW72FBGXG:3:12402:5778:12121 6 + 76M
chr1 26 102 NB501040:284:HW72FBGXG:3:23406:9910:6155 6 + 76M
chr1 26 102 NB501040:284:HW72FBGXG:3:21401:26618:20265 0 - 76M
chr1 27 103 NB501040:284:HW72FBGXG:1:23205:3962:4993 7 - 76M
chr1 31 107 NB501040:284:HW72FBGXG:4:12505:23685:9779 6 + 76M
above is the bowtie2 output.

@mourisl
Copy link
Collaborator

mourisl commented Nov 17, 2022

We don't record the read id in the bed format (as a way to be more efficient), so the fourth column is all N. This is normal.

@haowenz
Copy link
Owner

haowenz commented Nov 17, 2022 via email

@ming1211
Copy link
Author

ming1211 commented Nov 17, 2022 via email

@ming1211
Copy link
Author

@haowenz @mourisl
Dear Haowen and Song,

An error as below occurred after I add --SAM. Do you know what's the mistake?
chromap: src/temp_mapping.h:42: void ::TempMappingFileHandle::InitializeTempMappingLoading(uint32_t) [with MappingRecord = chromap::SAMMapping; uint32_t = unsigned int]: Assertion `file != NULL' failed.
/var/lib/slurm-llnl/slurmd/job874320/slurm_script : ligne 33 : 23443 Abandon

the command is:
Chromap --preset chip -x ./chromap-M82_all -r /M82_ref/SollycM82_v1.0.fasta -1 /trim_001.fastq.gz --SAM -o ./M82_chromap.sam

@haowenz
Copy link
Owner

haowenz commented Nov 18, 2022

Can you post the full log? It is hard to know what happened with just one line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants