-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chess sim result wit all NAN #16
Comments
Hi @chenggang108,
Then, to speed up things a bit, it might make sense to convert the input files to fanc format, then you won't need to wait 1.5 hours every time you run
Could you then re-run the analysis on the converted data and paste the logs here? |
Hi Nick, Thanks |
Dear @nickmachnik Same problem for me after converting the cool data to fanc format using following command The logs are following:
Hope for your help. Best wishes, |
Hi @biozzq , I think I will have to try to reproduce this to understand what is going on. Do you have a suggestion for small example dataset in cooler format that I could use for this (not necessarily yours, I understand if you don't want to share that data)? @chenggang108 , does the error persist for you after the upgrade? |
Dear @nickmachnik https://drive.google.com/drive/folders/1dm66NJD8LgZ-N8HTNNo4FKkYIWI65zCc?usp=sharing Hope these files can help you. Best wishes, |
Dear @nickmachnik The commands I used are as following
Best wishes, |
Hi @nickmachnik I did all the updates and then tried three things: First I ran chess sim with cool files; not working Second, I converted .cool file to .fanc files by fanc from-cooler; not working I finally prepare .hic with my allvalidpairs generated by hicpro; these .hic files work, but I need to remove 'chr' from the bed file prepared by chess pair. I did not generate .fanc file from allvalidpairs only because I am not familiar with FNA-C. It looks there is some thing wrong with the cool files |
@biozzq I can reproduce the all nan output with your data, but I don't know what is wrong yet, I will try to find out. |
Dear @nickmachnik Thank you very much, I will try to use the .hic generated by hicpro. More, which normalization should be done before running chess? From your publication (following part), you used the KR normalization but not ICE. However, I used ICE using hicpro most of time. Also, from following context, i think you should do the normalization after masking the bins as zero. Is this right? "Finally, bins with less than 25% (human) or 10% (mouse) of the median number of fragments per bin were masked and the matrix was normalized using Knight–Ruiz (KR) matrix balancing on each chromosome independently." Best wishes, |
Hi @biozzq , I am not sure what you mean by 'masking the bins as zero' though, could you elaborate? |
Hi, @kaukrise found a potential fix for this issue, see #23 (comment) |
Dear @nickmachnik Sorry, I was not clear. As masking bins can be done by different ways, for example, treating the interaction frequency between these bins as zero, and can also remove these bins from the concat maps. Thus, I want to confirm which way you used in your study. Thank you. Best wishes, |
FAN-C uses numpy masked arrays for masking bins. They are simply ignored in downstream analyses, this should be different from setting them to 0. You can read more about the FAN-C pipeline here. |
Hi,
I have an issue similar to #5 and # 9. The results from chess sim are all NAN. I checked the conversation in #5 and #9 issues, but my situation looks different.
Here is how I run it:
First of all, I downloaded the example files and ran them successfully. It means my system works.
I generate a bed file by chess pair:
head mm10_chr1_3mb_win_100kb_step.bed
chr1 1 3000001 chr1 1 3000001 0 . + +
chr1 100001 3100001 chr1 100001 3100001 1 . + +
chr1 200001 3200001 chr1 200001 3200001 2 . + +
chr1 300001 3300001 chr1 300001 3300001 3 . + +
chr1 400001 3400001 chr1 400001 3400001 4 . + +
chr1 500001 3500001 chr1 500001 3500001 5 . + +
chr1 600001 3600001 chr1 600001 3600001 6 . + +
chr1 700001 3700001 chr1 700001 3700001 7 . + +
chr1 800001 3800001 chr1 800001 3800001 8 . + +
chr1 900001 3900001 chr1 900001 3900001 9 . + +
Then I run:
chess sim reference.balanced.chr1.cool query.chr1.cool mm10_chr1_3mb_win_100kb_step_test.bed test3_chr1.tsv
The cool files are balanced by cooler and the resolution are 20kb
Here is how the log shows: 2020-11-09 19:38:59,424 INFO Note: detected 72 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
2020-11-09 19:38:59,424 INFO Note: NumExpr detected 72 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2020-11-09 19:38:59,424 INFO NumExpr defaulting to 8 threads.
2020-11-09 19:39:02,755 INFO CHESS version: 0.3.4
2020-11-09 19:39:02,755 INFO FAN-C version: 0.9.6
2020-11-09 19:39:02,759 INFO Loading reference contact data
Expected 100% (3209946 of 3209946) |#####| Elapsed Time: 0:06:28 Time: 0:06:28
Expected 100% (5892473 of 5892473) |#####| Elapsed Time: 0:11:48 Time: 0:11:48
2020-11-09 21:00:25,584 INFO Loading region pairs
2020-11-09 21:00:25,783 INFO Launching workers
2020-11-09 21:00:26,240 INFO Submitting pairs for comparison
2020-11-09 21:02:40,942 INFO Could not compute similarity for 1925 region pairs.This can be due to faulty coordinates, too smallregion sizes or too many unmappable bins.
I couldn't figure out the problem. The window size is big engoug. I also tried to remove 'chr' in the bed file like said in #5. But it does not work.
Could you please help with it?
Thanks
Gang
The text was updated successfully, but these errors were encountered: