Incorrect peakcalling #7

kevinmuret · 2022-01-11T22:07:03Z

Hello,

Another big problem when using f-seq2:

It calls peaks where there are no reads
It does not call peaks where it should
It only calls on half of each chromosome

Are these known problems?
When I use f-seq1, the calls are done correctly.

Half of the chromosome not processed by fseq2 :
Abnormal peaks on fseq2:

Thanks,

Kévin

nsamzhao · 2022-01-11T22:23:16Z

Hi Kévin,
I suspect there are some errors when reading in files to memory. Could you show the log information for this run?

Sam

nsamzhao · 2022-01-12T18:32:37Z

Another potential reason: did you specify chrom sizes in this run? It can limit what reads to use for peak calling.
See here https://github.com/Boyle-Lab/F-Seq2#-chrom_size_file

kevinmuret · 2022-01-13T11:45:16Z

Hi Sam,

Here the command-line :

fseq2 callpeak test.bed -o fseq_results -name tot -l 50 -t 8.0 -v -cpus 12 -pe

Here the .o file :
load module python/3.8.11 (Python)
load module flavor/hdf5/serial (HDF5 flavor)
load module hdf5/1.10.6 (HDF5)
load module h5py/3.5.0 (h5py)
load module bedtools/2.30.0 (bedtools)
load module pysam/0.18.0 (pysam)
load module pybedtools/0.8.2 (pybedtools)
load module f-seq2/2.0.3 (f-seq2)

Loading f-seq2/2.0.3
Loading requirement: flavor/hdf5/serial hdf5/1.10.6 h5py/3.5.0 bedtools/2.30.0
pysam/0.18.0 pybedtools/0.8.2
PROGRAM: fseq2.0.3 callpeak

F-Seq Version 2.0.3

#1: Read in files and calculate parameters
#1: Done
#1: Settings:

bandwidth = 8.333
threshold = 3.195
lambda bg lower bound = 0.000
est. fragment size = 0
avg. pe fragment size before filter = 73968668.585
avg. pe fragment size after filter = 73968668.585
filter out rate  = 0.000
sequence length = 3011413
total cuts = 14096429

#2: Reconstruct signal and call peaks

chr13_random: first=783, last=200160, completed in 0.115 seconds.
chr17_random: first=809, last=314264, completed in 0.105 seconds.
chr1_random: first=785, last=615495, completed in 0.168 seconds.

no peaks find on chr16_random
chr19: first=1500110, last=30671234, completed in 6.738 seconds.
chr18: first=1500052, last=45385619, completed in 10.716 seconds.
chr17: first=1500038, last=47636335, completed in 11.298 seconds.
chr16: first=1500161, last=49159477, completed in 11.324 seconds.
chr3_random: first=588, last=20939, completed in 0.519 seconds.
chr4_random: first=30, last=80090, completed in 2.194 seconds.
chr15: first=1500110, last=51747437, completed in 13.567 seconds.
chr12: first=1500469, last=60628683, completed in 15.581 seconds.
chr11: first=1500003, last=60920994, completed in 16.545 seconds.
chr13: first=1500257, last=60142105, completed in 16.506 seconds.
chr14: first=1500160, last=62596495, completed in 16.488 seconds.
chr10: first=1500323, last=64996577, completed in 16.590 seconds.
chrM: first=4, last=8169, completed in 0.061 seconds.
chr8_random: first=132, last=424597, completed in 0.094 seconds.
chr9_random: first=768, last=224671, completed in 0.080 seconds.
chrX_random: first=550, last=892522, completed in 0.174 seconds.
chrY: first=1406, last=1449883, completed in 0.268 seconds.
chrUn_random: first=93, last=2950171, completed in 0.561 seconds.
chr2: first=1500801, last=90873982, completed in 21.303 seconds.
chr1: first=1500035, last=98597719, completed in 21.875 seconds.
chrY_random: first=167, last=29339847, completed in 5.299 seconds.
chr3: first=1500151, last=79799354, completed in 17.112 seconds.
chr5: first=1500572, last=76268628, completed in 17.800 seconds.
chr4: first=1501363, last=77814893, completed in 20.757 seconds.
chr6: first=1500395, last=74758302, completed in 18.508 seconds.
chr9: first=1500006, last=62037989, completed in 15.630 seconds.
no peaks find on chr7_random
chr8: first=1501670, last=65868849, completed in 16.535 seconds.
no peaks find on chr5_random
chr7: first=1500956, last=76262291, completed in 18.464 seconds.
chrX: first=1500139, last=83325004, completed in 16.520 seconds.

#2: Compute p-value and q-value
#2: Done

#3: Write output
#3: Done

Thanks for using F-seq2.0.3!

Here the .e file :
/env/products/f-seq2/2.0.3/lib/python3.8/site-packages/fseq2/fseq2.py:57: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction.
bed['end_new'] = (bed[['end', 'strand']]).max(axis=1)
/env/products/f-seq2/2.0.3/lib/python3.8/site-packages/fseq2/fseq2.py:530: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
peak_regions = pd.concat([pd.Series([chrom] * peak_regions.shape[0], name='chrom'),
/env/products/f-seq2/2.0.3/lib/python3.8/site-packages/fseq2/fseq2.py:532: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
peak_indexes = pd.concat([pd.Series([chrom] * peak_indexes.shape[0], name='chrom'),
/env/products/f-seq2/2.0.3/lib/python3.8/site-packages/fseq2/fseq2.py:532: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
peak_indexes = pd.concat([pd.Series([chrom] * peak_indexes.shape[0], name='chrom'),
/env/products/f-seq2/2.0.3/lib/python3.8/site-packages/fseq2/fseq2.py:532: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
peak_indexes = pd.concat([pd.Series([chrom] * peak_indexes.shape[0], name='chrom'),
/env/products/f-seq2/2.0.3/lib/python3.8/site-packages/fseq2/fseq2.py:906: RuntimeWarning: divide by zero encountered in log10
result_df['q_value'] = -np.log10(multipletests(pvals=result_df['p_value'], method='fdr_bh', is_sorted=True)[1])

kevinmuret · 2022-01-13T11:47:58Z

I try also on the smallest chromosome and always bad peakcalling (same as the total genome) and always half of the chromosome ... very strange

nsamzhao · 2022-01-13T16:22:10Z

Is your data paired-end? If so, can you make sure it is in bedpe format?
F-Seq2 doc: https://github.com/Boyle-Lab/F-Seq2/blob/master/INPUT_FORMAT.md#bampe-and-bedpe
bedpe format: https://bedtools.readthedocs.io/en/latest/content/general-usage.html#bedpe-format
If not paired-end data, run without -pe

kevinmuret · 2022-01-14T09:39:15Z

Yes ! It works ! Sorry I thought the BEDPE format was similar to the BED format! Thanks !

aboyle assigned nsamzhao Jan 11, 2022

nsamzhao closed this as completed Mar 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect peakcalling #7

Incorrect peakcalling #7

kevinmuret commented Jan 11, 2022

nsamzhao commented Jan 11, 2022

nsamzhao commented Jan 12, 2022

kevinmuret commented Jan 13, 2022

kevinmuret commented Jan 13, 2022

nsamzhao commented Jan 13, 2022

kevinmuret commented Jan 14, 2022

Incorrect peakcalling #7

Incorrect peakcalling #7

Comments

kevinmuret commented Jan 11, 2022

nsamzhao commented Jan 11, 2022

nsamzhao commented Jan 12, 2022

kevinmuret commented Jan 13, 2022

Loading f-seq2/2.0.3 Loading requirement: flavor/hdf5/serial hdf5/1.10.6 h5py/3.5.0 bedtools/2.30.0 pysam/0.18.0 pybedtools/0.8.2 PROGRAM: fseq2.0.3 callpeak

F-Seq Version 2.0.3

#2: Compute p-value and q-value #2: Done

#3: Write output #3: Done

kevinmuret commented Jan 13, 2022

nsamzhao commented Jan 13, 2022

kevinmuret commented Jan 14, 2022

Loading f-seq2/2.0.3
Loading requirement: flavor/hdf5/serial hdf5/1.10.6 h5py/3.5.0 bedtools/2.30.0
pysam/0.18.0 pybedtools/0.8.2
PROGRAM: fseq2.0.3 callpeak

#2: Compute p-value and q-value
#2: Done

#3: Write output
#3: Done