Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect peakcalling #7

Closed
kevinmuret opened this issue Jan 11, 2022 · 6 comments
Closed

Incorrect peakcalling #7

kevinmuret opened this issue Jan 11, 2022 · 6 comments
Assignees

Comments

@kevinmuret
Copy link

Hello,

Another big problem when using f-seq2:

  • It calls peaks where there are no reads
  • It does not call peaks where it should
  • It only calls on half of each chromosome

Are these known problems?
When I use f-seq1, the calls are done correctly.

  • Half of the chromosome not processed by fseq2 :
    image

  • Abnormal peaks on fseq2:
    image

Thanks,

Kévin

@nsamzhao
Copy link
Collaborator

Hi Kévin,
I suspect there are some errors when reading in files to memory. Could you show the log information for this run?

Sam

@nsamzhao
Copy link
Collaborator

Another potential reason: did you specify chrom sizes in this run? It can limit what reads to use for peak calling.
See here https://github.com/Boyle-Lab/F-Seq2#-chrom_size_file

@kevinmuret
Copy link
Author

Hi Sam,

Here the command-line :

  • fseq2 callpeak test.bed -o fseq_results -name tot -l 50 -t 8.0 -v -cpus 12 -pe

Here the .o file :
load module python/3.8.11 (Python)
load module flavor/hdf5/serial (HDF5 flavor)
load module hdf5/1.10.6 (HDF5)
load module h5py/3.5.0 (h5py)
load module bedtools/2.30.0 (bedtools)
load module pysam/0.18.0 (pysam)
load module pybedtools/0.8.2 (pybedtools)
load module f-seq2/2.0.3 (f-seq2)

Loading f-seq2/2.0.3
Loading requirement: flavor/hdf5/serial hdf5/1.10.6 h5py/3.5.0 bedtools/2.30.0
pysam/0.18.0 pybedtools/0.8.2
PROGRAM: fseq2.0.3 callpeak

F-Seq Version 2.0.3

#1: Read in files and calculate parameters
#1: Done
#1: Settings:

bandwidth = 8.333
threshold = 3.195
lambda bg lower bound = 0.000
est. fragment size = 0
avg. pe fragment size before filter = 73968668.585
avg. pe fragment size after filter = 73968668.585
filter out rate  = 0.000
sequence length = 3011413
total cuts = 14096429

#2: Reconstruct signal and call peaks

chr13_random: first=783, last=200160, completed in 0.115 seconds.
chr17_random: first=809, last=314264, completed in 0.105 seconds.
chr1_random: first=785, last=615495, completed in 0.168 seconds.

no peaks find on chr16_random
chr19: first=1500110, last=30671234, completed in 6.738 seconds.
chr18: first=1500052, last=45385619, completed in 10.716 seconds.
chr17: first=1500038, last=47636335, completed in 11.298 seconds.
chr16: first=1500161, last=49159477, completed in 11.324 seconds.
chr3_random: first=588, last=20939, completed in 0.519 seconds.
chr4_random: first=30, last=80090, completed in 2.194 seconds.
chr15: first=1500110, last=51747437, completed in 13.567 seconds.
chr12: first=1500469, last=60628683, completed in 15.581 seconds.
chr11: first=1500003, last=60920994, completed in 16.545 seconds.
chr13: first=1500257, last=60142105, completed in 16.506 seconds.
chr14: first=1500160, last=62596495, completed in 16.488 seconds.
chr10: first=1500323, last=64996577, completed in 16.590 seconds.
chrM: first=4, last=8169, completed in 0.061 seconds.
chr8_random: first=132, last=424597, completed in 0.094 seconds.
chr9_random: first=768, last=224671, completed in 0.080 seconds.
chrX_random: first=550, last=892522, completed in 0.174 seconds.
chrY: first=1406, last=1449883, completed in 0.268 seconds.
chrUn_random: first=93, last=2950171, completed in 0.561 seconds.
chr2: first=1500801, last=90873982, completed in 21.303 seconds.
chr1: first=1500035, last=98597719, completed in 21.875 seconds.
chrY_random: first=167, last=29339847, completed in 5.299 seconds.
chr3: first=1500151, last=79799354, completed in 17.112 seconds.
chr5: first=1500572, last=76268628, completed in 17.800 seconds.
chr4: first=1501363, last=77814893, completed in 20.757 seconds.
chr6: first=1500395, last=74758302, completed in 18.508 seconds.
chr9: first=1500006, last=62037989, completed in 15.630 seconds.
no peaks find on chr7_random
chr8: first=1501670, last=65868849, completed in 16.535 seconds.
no peaks find on chr5_random
chr7: first=1500956, last=76262291, completed in 18.464 seconds.
chrX: first=1500139, last=83325004, completed in 16.520 seconds.

#2: Compute p-value and q-value
#2: Done

#3: Write output
#3: Done

Thanks for using F-seq2.0.3!

Here the .e file :
/env/products/f-seq2/2.0.3/lib/python3.8/site-packages/fseq2/fseq2.py:57: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction.
bed['end_new'] = (bed[['end', 'strand']]).max(axis=1)
/env/products/f-seq2/2.0.3/lib/python3.8/site-packages/fseq2/fseq2.py:530: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
peak_regions = pd.concat([pd.Series([chrom] * peak_regions.shape[0], name='chrom'),
/env/products/f-seq2/2.0.3/lib/python3.8/site-packages/fseq2/fseq2.py:532: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
peak_indexes = pd.concat([pd.Series([chrom] * peak_indexes.shape[0], name='chrom'),
/env/products/f-seq2/2.0.3/lib/python3.8/site-packages/fseq2/fseq2.py:532: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
peak_indexes = pd.concat([pd.Series([chrom] * peak_indexes.shape[0], name='chrom'),
/env/products/f-seq2/2.0.3/lib/python3.8/site-packages/fseq2/fseq2.py:532: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
peak_indexes = pd.concat([pd.Series([chrom] * peak_indexes.shape[0], name='chrom'),
/env/products/f-seq2/2.0.3/lib/python3.8/site-packages/fseq2/fseq2.py:906: RuntimeWarning: divide by zero encountered in log10
result_df['q_value'] = -np.log10(multipletests(pvals=result_df['p_value'], method='fdr_bh', is_sorted=True)[1])

@kevinmuret
Copy link
Author

I try also on the smallest chromosome and always bad peakcalling (same as the total genome) and always half of the chromosome ... very strange

image

@nsamzhao
Copy link
Collaborator

Is your data paired-end? If so, can you make sure it is in bedpe format?
F-Seq2 doc: https://github.com/Boyle-Lab/F-Seq2/blob/master/INPUT_FORMAT.md#bampe-and-bedpe
bedpe format: https://bedtools.readthedocs.io/en/latest/content/general-usage.html#bedpe-format
If not paired-end data, run without -pe

@kevinmuret
Copy link
Author

Yes ! It works ! Sorry I thought the BEDPE format was similar to the BED format! Thanks !

@nsamzhao nsamzhao closed this as completed Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants