You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, thanks for the nice and user-friendly tool! I have atac-seq data (150 bp pair-end) from small plant genome, and have been trying HMMRATAC for a while. So the problem for me is that I can't get the model right, as I had tried many combination of -l, -u and even -z. I showed one of the example as below.
Version: 1.2.10
Arguments Used:
-b Sample.atac.forHMMRATAC.bam
-i Sample.atac.forHMMRATAC.bam.bai
-g Sample.genome
-o Sample_60_25
--bedgraph True
-u 60
-l 25
Fragment Expectation Maximum Done
Mean 50.0 StdDevs 20.0
Mean 190.63425024911237 StdDevs 62.502097145190604
Mean 400.60895549134216 StdDevs 50.93694387902569
Mean 729.022089951699 StdDevs 145.67453137113827
ScalingFactor 103.086173
Training Regions found and Zscore regions for exclusion found
Training Fragment Pileup completed
Kmeans Model:
HMM with 3 state(s)
State 0
Pi: 0.3333333333333333
Aij: 0.333 0.333 0.333
Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.531 1.169 1.604 4.339 ]
State 1
Pi: 0.3333333333333333
Aij: 0.333 0.333 0.333
Opdf: Multi-variate Gaussian distribution --- Mean: [ 1.075 1.678 1.373 0.931 ]
State 2
Pi: 0.3333333333333333
Aij: 0.333 0.333 0.333
Opdf: Multi-variate Gaussian distribution --- Mean: [ 2.349 4.864 4.519 4.93 ]
Model created and refined. See Can_60_25.model
Model:
HMM with 3 state(s)
State 0
Pi: 0.3333333333333333
Aij: 0.979 0.015 0.006
Opdf: Multi-variate Gaussian distribution --- Mean: [ 0.501 0.855 0.831 1.056 ]
State 1
Pi: 0.3333333333333333
Aij: 0.012 0.985 0.003
Opdf: Multi-variate Gaussian distribution --- Mean: [ 1.495 2.338 1.724 0.847 ]
State 2
Pi: 0.3333333333333333
Aij: 0.015 0.009 0.976
Opdf: Multi-variate Gaussian distribution --- Mean: [ 1.065 1.737 2.032 2.792 ]
Genome split and subtracted masked regions
0 round viterbi done
37 round viterbi done
Total time (seconds)= 4519
I had tried at least ten different combinations of -l and -u, and none of them can get the ideal model you described in your paper. After checking some of the log files showed in the issues channel, I realised that most of people have mean values of 0 for state 0 (which is the background or starting model if I understand correctly?). So does this mean that the background noise of my data is pretty high?
I also attach a insert size picture here (our data were generated though the purification of nuclei, and the insert size calculation is for the clean bam after multiple filtering) Can.dedup.clean.bam.insertsize.hist.pdf
Could you please give me some advice how I could solve the problem of my data? Thanks a lot in advance!
The text was updated successfully, but these errors were encountered:
Hi,
A few things. 1. the fragment length distribution is off. You don't see any multi-modal behavior in your plot. Generally, this indicates that there is something wrong with the data. This is one of the ways people QC ATAC-seq data (including with ATACQC and from the original ATAC-seq authors). Because of this, I can't say for sure that any of my following suggestions will help nor can I be sure that any other tool wouldn't produce misleading results.
2. You are right that the first state does seem to be causing trouble. Not so much in that the values aren't zeroes, but that they aren't the smallest values. For example, values 3 and 4 for that state are higher than values 3 and 4 for state 2 and value 4 is almost as high as value 4 in state 3. One option might be to remove the fourth (and maybe 3rd) value from the estimation. You would use the --trim option to remove the fourth value (--trim 1) or both 3rd and 4th (--trim 2).
3. Another option is that the algorithm is having trouble finding training sites that don't show this behavior, in which case you could give it a list of training sites to use to train a more appropriate model. If you happen to have some known enhancers/promoters/cis-regulatory elements, you could input them with the -t option.
Thanks a lot! here is the distribution figure. Before I was not sure why I can't see the pattern of multi-modal. Now with your confirmation, I'm kinda sure now. Thanks! I will try with the --trim and see what happen. Unfortunately we don't have other data for the known elements to feed -t option.
Hello, thanks for the nice and user-friendly tool! I have atac-seq data (150 bp pair-end) from small plant genome, and have been trying HMMRATAC for a while. So the problem for me is that I can't get the model right, as I had tried many combination of -l, -u and even -z. I showed one of the example as below.
I had tried at least ten different combinations of -l and -u, and none of them can get the ideal model you described in your paper. After checking some of the log files showed in the issues channel, I realised that most of people have mean values of 0 for state 0 (which is the background or starting model if I understand correctly?). So does this mean that the background noise of my data is pretty high?
I also attach a insert size picture here (our data were generated though the purification of nuclei, and the insert size calculation is for the clean bam after multiple filtering)
Can.dedup.clean.bam.insertsize.hist.pdf
Could you please give me some advice how I could solve the problem of my data? Thanks a lot in advance!
The text was updated successfully, but these errors were encountered: