-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running a job for a long time without output (kunpeng920 CPU) #286
Comments
Which version of dorado are you using? How large is your input? It's possible dorado is collecting some metadata from the pod5s first and that's taking a while. Is your data on an external disk? Can you try running with a smaller dataset for debugging? |
dorado version is 0.3.1
/dev/shm only one pod5 file,the file equal 1.2G
Is your data on an external disk ?
no ,I copy pod5 file to /dev/shm of localhost
…---- Replied Message ----
| From | Joyjit ***@***.***> |
| Date | 07/07/2023 09:56 |
| To | ***@***.***> |
| Cc | ***@***.***>***@***.***> |
| Subject | Re: [nanoporetech/dorado] Running a job for a long time without output (Issue #286) |
Which version of dorado are you using? How large is your input?
It's possible dorado is collecting some metadata from the pod5s first and that's taking a while. Is your data on an external disk? Can you try running with a smaller dataset for debugging?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Setup looks good to me. I did a digging online about the Can you also try to run with |
yes, I found jemalloc: Unsupported page size issue online , so I set I will try to run with |
Dear author |
add '-x cpu' to scritps , After the scritps ran 1 hour later , there is still no effective output as follows:
and no cpu utilization
|
Hmm I'm not sure what this would entail tbh. It feels more like something jemalloc would have to support rather than something we can add in The fact that it's not making any progress with CPU either makes me think of I/O issues. Have you tried to run dorado (same binary) in any other environment? I can suggest the following -
|
1、I have no aarch64 local machine
2、I copied data to memory file system: /dev/shm , so I think I/O no problem
…---- Replied Message ----
| From | Joyjit ***@***.***> |
| Date | 07/10/2023 22:51 |
| To | ***@***.***> |
| Cc | ***@***.***>***@***.***> |
| Subject | Re: [nanoporetech/dorado] Running a job for a long time without output (Issue #286) |
Dear author
I would like to ask you, is it possible to add a version for aarch64 architecture system page size (64K) in a dorado binary and source code distribution? that may completely solve the "jemalloc: Unsupported page size" issue
Hmm I'm not sure what this would entail tbh. It feels more like something jemalloc would have to support rather than something we can add in dorado.
The fact that it's not making any progress with CPU either makes me think of I/O issues. Have you tried to run dorado (same binary) in any other environment? I can suggest the following -
Run dorado on a local machine instead of the cluster with the data local as well
Copy the data to /tmp in your HPC job first and then run dorado on the copied data
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Hi @zhoujingyu13687306871 - are you able to compile Dorado yourself on the kunpeng920 machine by any chance? This is not a problem we've encountered before, I suspect that during compilation the page size of your host would be detected and Dorado will be compiled to work with the appropriate (64KB?) page size (Side note is that this may have performance implications, though I think it will be fine) |
yes, I compiled dorado on kunpeng920 machine, which system page size is 64K |
OK - this is probably because the POD5 dependency is not compiled to use 64KB page size. We are investigating a solution |
thank you very much
…---- Replied Message ----
| From | Mike ***@***.***> |
| Date | 07/12/2023 18:23 |
| To | ***@***.***> |
| Cc | ***@***.***>***@***.***> |
| Subject | Re: [nanoporetech/dorado] Running a job for a long time without output (Issue #286) |
OK - this is probably because the POD5 dependency is not compiled to use 64KB page size. We are investigating a solution
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@vellamike |
Hi @zhoujingyu13687306871 - we haven't looked at fixing this yet |
See #637 for updates |
dear author:
I submit the job to run on a single node of the cluster, but after a long time, there is no output. The single-node CPU is aarch64 architecture, the cpu model is kunpeng920, the GPU is A100-40 pcie, I would show you cpu information and the script content is as follows:
After running for 1 hour, there is only debug content, and no real results are output, as shown in the figure below:
the output debug content ion the left, and the GPU utilization information on the right,and the fig below is the CPU utilization, which present S state for a long time. I don't know whether it is caused by the CPU instruction set or the system page size (: Unsupported system page size), I hope to get your reply, thank you!
The text was updated successfully, but these errors were encountered: