ask help for run dpgen #1728
maruru0902
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I got such error message when i test CH4 from tutorial. Please give me some advice. Thank you
INFO:dpgen:start running
INFO:dpgen:continue from iter 000 task 06
INFO:dpgen:=============================iter.000000==============================
INFO:dpgen:-------------------------iter.000000 task 07--------------------------
2025-03-18 21:22:54,596 - INFO : info:check_all_finished: False
2025-03-18 21:22:54,640 - INFO : job: 3b6c1d0b7bd7ffe55458965b0fe8e3701bd23914 submit; job_id is 788079
2025-03-18 21:22:54,644 - INFO : job: c93b47a7b407b4803c4c2a44fb8673c46f55c85f submit; job_id is 788082
2025-03-18 21:22:54,650 - INFO : job: 6d98526c0f06b02d25c558fd742c2b32dbb80e07 submit; job_id is 788086
2025-03-18 21:22:54,656 - INFO : job: 8516491a1e772723c5ec1004ae1e76156bf2b4c9 submit; job_id is 788091
2025-03-18 21:22:54,661 - INFO : job: 0d1382560fb12b8c32b145c683f7967712e9c29d submit; job_id is 788097
2025-03-18 21:22:54,667 - INFO : job: badaae76be609089ec6ad1ae55b09bb028f08e3e submit; job_id is 788101
2025-03-18 21:22:54,672 - INFO : job: e10a4c341ffe1e63cd43d989a050ba40cf3a159b submit; job_id is 788106
2025-03-18 21:22:54,678 - INFO : job: 18522bc59005e9de2fc8f8b45bfd767ab1de078a submit; job_id is 788111
2025-03-18 21:22:54,684 - INFO : job: 7032c9cdf23ef7581450e8bc438e1c0a372ac3cd submit; job_id is 788116
2025-03-18 21:22:54,690 - INFO : job: ef0ce53101b463dfcee47babff5266d9226bba8c submit; job_id is 788122
2025-03-18 21:22:54,696 - INFO : job: 31af1fac65eb1a27519a11c5e9e39e8608da4d16 submit; job_id is 788126
2025-03-18 21:22:54,704 - INFO : job: 8b4363367a5acec248ab37374aef5f39a71818c7 submit; job_id is 788132
2025-03-18 21:22:54,711 - INFO : job: 2d2e519af624efa1cfdfc16fbf117d5fc9baa1d2 submit; job_id is 788137
2025-03-18 21:22:54,717 - INFO : job: 7dd38e4d96ad25bf6644a82ba79a15093866d5a4 submit; job_id is 788142
2025-03-18 21:22:54,723 - INFO : job: 78b3c01f18c6c214e19a179ac6bd678fa92cd416 submit; job_id is 788146
2025-03-18 21:22:54,731 - INFO : job: c101af64b9e50f8a82e9c20f7ab87fa8572ebde0 submit; job_id is 788152
2025-03-18 21:22:54,738 - INFO : job: 8f7e7186419dce4573872b512b651c54a2600d5b submit; job_id is 788157
2025-03-18 21:22:54,744 - INFO : job: 76a2fd8e5aa5104fa83a962e8c9efdb072ea6d08 submit; job_id is 788162
2025-03-18 21:22:54,752 - INFO : job: c35f9d8b932e3c3922ebac3b714af87404bd01dc submit; job_id is 788167
2025-03-18 21:22:54,761 - INFO : job: 044e451365c4edc0ec0b91625c4e17967926880e submit; job_id is 788172
2025-03-18 21:22:56,808 - INFO : job: 0d1382560fb12b8c32b145c683f7967712e9c29d 788097 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:22:56,814 - INFO : job:0d1382560fb12b8c32b145c683f7967712e9c29d re-submit after terminated; new job_id is 798465
2025-03-18 21:22:57,102 - INFO : job:0d1382560fb12b8c32b145c683f7967712e9c29d job_id:798465 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:22:57,102 - INFO : job: badaae76be609089ec6ad1ae55b09bb028f08e3e 788101 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:22:57,108 - INFO : job:badaae76be609089ec6ad1ae55b09bb028f08e3e re-submit after terminated; new job_id is 798476
2025-03-18 21:22:57,397 - INFO : job:badaae76be609089ec6ad1ae55b09bb028f08e3e job_id:798476 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:22:57,397 - INFO : job: e10a4c341ffe1e63cd43d989a050ba40cf3a159b 788106 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:22:57,401 - INFO : job:e10a4c341ffe1e63cd43d989a050ba40cf3a159b re-submit after terminated; new job_id is 798516
2025-03-18 21:22:57,690 - INFO : job:e10a4c341ffe1e63cd43d989a050ba40cf3a159b job_id:798516 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:22:57,691 - INFO : job: 18522bc59005e9de2fc8f8b45bfd767ab1de078a 788111 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:22:57,696 - INFO : job:18522bc59005e9de2fc8f8b45bfd767ab1de078a re-submit after terminated; new job_id is 798943
2025-03-18 21:22:57,981 - INFO : job:18522bc59005e9de2fc8f8b45bfd767ab1de078a job_id:798943 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:22:57,981 - INFO : job: 7032c9cdf23ef7581450e8bc438e1c0a372ac3cd 788116 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:22:57,986 - INFO : job:7032c9cdf23ef7581450e8bc438e1c0a372ac3cd re-submit after terminated; new job_id is 799433
2025-03-18 21:22:58,276 - INFO : job:7032c9cdf23ef7581450e8bc438e1c0a372ac3cd job_id:799433 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:22:58,276 - INFO : job: ef0ce53101b463dfcee47babff5266d9226bba8c 788122 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:22:58,281 - INFO : job:ef0ce53101b463dfcee47babff5266d9226bba8c re-submit after terminated; new job_id is 799947
2025-03-18 21:22:58,570 - INFO : job:ef0ce53101b463dfcee47babff5266d9226bba8c job_id:799947 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:22:58,570 - INFO : job: 31af1fac65eb1a27519a11c5e9e39e8608da4d16 788126 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:22:58,575 - INFO : job:31af1fac65eb1a27519a11c5e9e39e8608da4d16 re-submit after terminated; new job_id is 800476
2025-03-18 21:22:58,863 - INFO : job:31af1fac65eb1a27519a11c5e9e39e8608da4d16 job_id:800476 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:22:58,863 - INFO : job: 8b4363367a5acec248ab37374aef5f39a71818c7 788132 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:22:58,868 - INFO : job:8b4363367a5acec248ab37374aef5f39a71818c7 re-submit after terminated; new job_id is 801003
2025-03-18 21:22:59,158 - INFO : job:8b4363367a5acec248ab37374aef5f39a71818c7 job_id:801003 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:22:59,158 - INFO : job: 2d2e519af624efa1cfdfc16fbf117d5fc9baa1d2 788137 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:22:59,163 - INFO : job:2d2e519af624efa1cfdfc16fbf117d5fc9baa1d2 re-submit after terminated; new job_id is 801521
2025-03-18 21:22:59,452 - INFO : job:2d2e519af624efa1cfdfc16fbf117d5fc9baa1d2 job_id:801521 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:22:59,453 - INFO : job: 7dd38e4d96ad25bf6644a82ba79a15093866d5a4 788142 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:22:59,458 - INFO : job:7dd38e4d96ad25bf6644a82ba79a15093866d5a4 re-submit after terminated; new job_id is 802051
2025-03-18 21:22:59,746 - INFO : job:7dd38e4d96ad25bf6644a82ba79a15093866d5a4 job_id:802051 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:22:59,746 - INFO : job: 78b3c01f18c6c214e19a179ac6bd678fa92cd416 788146 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:22:59,751 - INFO : job:78b3c01f18c6c214e19a179ac6bd678fa92cd416 re-submit after terminated; new job_id is 802572
2025-03-18 21:23:00,040 - INFO : job:78b3c01f18c6c214e19a179ac6bd678fa92cd416 job_id:802572 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:00,041 - INFO : job: c101af64b9e50f8a82e9c20f7ab87fa8572ebde0 788152 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:23:00,045 - INFO : job:c101af64b9e50f8a82e9c20f7ab87fa8572ebde0 re-submit after terminated; new job_id is 803088
2025-03-18 21:23:00,333 - INFO : job:c101af64b9e50f8a82e9c20f7ab87fa8572ebde0 job_id:803088 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:00,334 - INFO : job: 8f7e7186419dce4573872b512b651c54a2600d5b 788157 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:23:00,338 - INFO : job:8f7e7186419dce4573872b512b651c54a2600d5b re-submit after terminated; new job_id is 803599
2025-03-18 21:23:00,627 - INFO : job:8f7e7186419dce4573872b512b651c54a2600d5b job_id:803599 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:00,627 - INFO : job: 76a2fd8e5aa5104fa83a962e8c9efdb072ea6d08 788162 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:23:00,632 - INFO : job:76a2fd8e5aa5104fa83a962e8c9efdb072ea6d08 re-submit after terminated; new job_id is 804108
2025-03-18 21:23:00,921 - INFO : job:76a2fd8e5aa5104fa83a962e8c9efdb072ea6d08 job_id:804108 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:00,921 - INFO : job: c35f9d8b932e3c3922ebac3b714af87404bd01dc 788167 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:23:00,926 - INFO : job:c35f9d8b932e3c3922ebac3b714af87404bd01dc re-submit after terminated; new job_id is 804644
2025-03-18 21:23:01,214 - INFO : job:c35f9d8b932e3c3922ebac3b714af87404bd01dc job_id:804644 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:01,214 - INFO : job: 044e451365c4edc0ec0b91625c4e17967926880e 788172 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:23:01,219 - INFO : job:044e451365c4edc0ec0b91625c4e17967926880e re-submit after terminated; new job_id is 805183
2025-03-18 21:23:01,505 - INFO : job:044e451365c4edc0ec0b91625c4e17967926880e job_id:805183 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:32,310 - INFO : job: 3b6c1d0b7bd7ffe55458965b0fe8e3701bd23914 788079 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:23:32,317 - INFO : job:3b6c1d0b7bd7ffe55458965b0fe8e3701bd23914 re-submit after terminated; new job_id is 806847
2025-03-18 21:23:32,603 - INFO : job:3b6c1d0b7bd7ffe55458965b0fe8e3701bd23914 job_id:806847 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:32,604 - INFO : job: c93b47a7b407b4803c4c2a44fb8673c46f55c85f 788082 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:23:32,609 - INFO : job:c93b47a7b407b4803c4c2a44fb8673c46f55c85f re-submit after terminated; new job_id is 806858
2025-03-18 21:23:32,898 - INFO : job:c93b47a7b407b4803c4c2a44fb8673c46f55c85f job_id:806858 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:32,899 - INFO : job: 6d98526c0f06b02d25c558fd742c2b32dbb80e07 788086 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:23:32,904 - INFO : job:6d98526c0f06b02d25c558fd742c2b32dbb80e07 re-submit after terminated; new job_id is 806898
2025-03-18 21:23:33,193 - INFO : job:6d98526c0f06b02d25c558fd742c2b32dbb80e07 job_id:806898 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:33,193 - INFO : job: 8516491a1e772723c5ec1004ae1e76156bf2b4c9 788091 terminated; fail_cout is 1; resubmitting job
2025-03-18 21:23:33,198 - INFO : job:8516491a1e772723c5ec1004ae1e76156bf2b4c9 re-submit after terminated; new job_id is 807304
2025-03-18 21:23:33,487 - INFO : job:8516491a1e772723c5ec1004ae1e76156bf2b4c9 job_id:807304 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:33,488 - INFO : job: 0d1382560fb12b8c32b145c683f7967712e9c29d 798465 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:33,492 - INFO : job:0d1382560fb12b8c32b145c683f7967712e9c29d re-submit after terminated; new job_id is 807825
2025-03-18 21:23:33,782 - INFO : job:0d1382560fb12b8c32b145c683f7967712e9c29d job_id:807825 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:33,782 - INFO : job: badaae76be609089ec6ad1ae55b09bb028f08e3e 798476 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:33,787 - INFO : job:badaae76be609089ec6ad1ae55b09bb028f08e3e re-submit after terminated; new job_id is 808339
2025-03-18 21:23:34,075 - INFO : job:badaae76be609089ec6ad1ae55b09bb028f08e3e job_id:808339 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:34,075 - INFO : job: e10a4c341ffe1e63cd43d989a050ba40cf3a159b 798516 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:34,080 - INFO : job:e10a4c341ffe1e63cd43d989a050ba40cf3a159b re-submit after terminated; new job_id is 808852
2025-03-18 21:23:34,369 - INFO : job:e10a4c341ffe1e63cd43d989a050ba40cf3a159b job_id:808852 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:34,369 - INFO : job: 18522bc59005e9de2fc8f8b45bfd767ab1de078a 798943 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:34,374 - INFO : job:18522bc59005e9de2fc8f8b45bfd767ab1de078a re-submit after terminated; new job_id is 809392
2025-03-18 21:23:34,663 - INFO : job:18522bc59005e9de2fc8f8b45bfd767ab1de078a job_id:809392 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:34,663 - INFO : job: 7032c9cdf23ef7581450e8bc438e1c0a372ac3cd 799433 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:34,668 - INFO : job:7032c9cdf23ef7581450e8bc438e1c0a372ac3cd re-submit after terminated; new job_id is 809910
2025-03-18 21:23:34,957 - INFO : job:7032c9cdf23ef7581450e8bc438e1c0a372ac3cd job_id:809910 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:34,957 - INFO : job: ef0ce53101b463dfcee47babff5266d9226bba8c 799947 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:34,962 - INFO : job:ef0ce53101b463dfcee47babff5266d9226bba8c re-submit after terminated; new job_id is 810427
2025-03-18 21:23:35,250 - INFO : job:ef0ce53101b463dfcee47babff5266d9226bba8c job_id:810427 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:35,250 - INFO : job: 31af1fac65eb1a27519a11c5e9e39e8608da4d16 800476 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:35,255 - INFO : job:31af1fac65eb1a27519a11c5e9e39e8608da4d16 re-submit after terminated; new job_id is 810951
2025-03-18 21:23:35,542 - INFO : job:31af1fac65eb1a27519a11c5e9e39e8608da4d16 job_id:810951 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:35,542 - INFO : job: 8b4363367a5acec248ab37374aef5f39a71818c7 801003 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:35,547 - INFO : job:8b4363367a5acec248ab37374aef5f39a71818c7 re-submit after terminated; new job_id is 811451
2025-03-18 21:23:35,835 - INFO : job:8b4363367a5acec248ab37374aef5f39a71818c7 job_id:811451 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:35,835 - INFO : job: 2d2e519af624efa1cfdfc16fbf117d5fc9baa1d2 801521 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:35,840 - INFO : job:2d2e519af624efa1cfdfc16fbf117d5fc9baa1d2 re-submit after terminated; new job_id is 811960
2025-03-18 21:23:36,126 - INFO : job:2d2e519af624efa1cfdfc16fbf117d5fc9baa1d2 job_id:811960 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:36,126 - INFO : job: 7dd38e4d96ad25bf6644a82ba79a15093866d5a4 802051 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:36,131 - INFO : job:7dd38e4d96ad25bf6644a82ba79a15093866d5a4 re-submit after terminated; new job_id is 812493
2025-03-18 21:23:36,418 - INFO : job:7dd38e4d96ad25bf6644a82ba79a15093866d5a4 job_id:812493 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:36,418 - INFO : job: 78b3c01f18c6c214e19a179ac6bd678fa92cd416 802572 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:36,423 - INFO : job:78b3c01f18c6c214e19a179ac6bd678fa92cd416 re-submit after terminated; new job_id is 812999
2025-03-18 21:23:36,712 - INFO : job:78b3c01f18c6c214e19a179ac6bd678fa92cd416 job_id:812999 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:36,712 - INFO : job: c101af64b9e50f8a82e9c20f7ab87fa8572ebde0 803088 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:36,717 - INFO : job:c101af64b9e50f8a82e9c20f7ab87fa8572ebde0 re-submit after terminated; new job_id is 813517
2025-03-18 21:23:37,005 - INFO : job:c101af64b9e50f8a82e9c20f7ab87fa8572ebde0 job_id:813517 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:37,005 - INFO : job: 8f7e7186419dce4573872b512b651c54a2600d5b 803599 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:37,010 - INFO : job:8f7e7186419dce4573872b512b651c54a2600d5b re-submit after terminated; new job_id is 814066
2025-03-18 21:23:37,299 - INFO : job:8f7e7186419dce4573872b512b651c54a2600d5b job_id:814066 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:37,299 - INFO : job: 76a2fd8e5aa5104fa83a962e8c9efdb072ea6d08 804108 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:37,304 - INFO : job:76a2fd8e5aa5104fa83a962e8c9efdb072ea6d08 re-submit after terminated; new job_id is 814601
2025-03-18 21:23:37,592 - INFO : job:76a2fd8e5aa5104fa83a962e8c9efdb072ea6d08 job_id:814601 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:37,592 - INFO : job: c35f9d8b932e3c3922ebac3b714af87404bd01dc 804644 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:37,597 - INFO : job:c35f9d8b932e3c3922ebac3b714af87404bd01dc re-submit after terminated; new job_id is 815114
2025-03-18 21:23:37,885 - INFO : job:c35f9d8b932e3c3922ebac3b714af87404bd01dc job_id:815114 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:23:37,886 - INFO : job: 044e451365c4edc0ec0b91625c4e17967926880e 805183 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:23:37,890 - INFO : job:044e451365c4edc0ec0b91625c4e17967926880e re-submit after terminated; new job_id is 815631
2025-03-18 21:23:38,177 - INFO : job:044e451365c4edc0ec0b91625c4e17967926880e job_id:815631 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:24:08,984 - INFO : job: 3b6c1d0b7bd7ffe55458965b0fe8e3701bd23914 806847 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:24:08,989 - INFO : job:3b6c1d0b7bd7ffe55458965b0fe8e3701bd23914 re-submit after terminated; new job_id is 817309
2025-03-18 21:24:09,273 - INFO : job:3b6c1d0b7bd7ffe55458965b0fe8e3701bd23914 job_id:817309 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:24:09,273 - INFO : job: c93b47a7b407b4803c4c2a44fb8673c46f55c85f 806858 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:24:09,280 - INFO : job:c93b47a7b407b4803c4c2a44fb8673c46f55c85f re-submit after terminated; new job_id is 817322
2025-03-18 21:24:09,568 - INFO : job:c93b47a7b407b4803c4c2a44fb8673c46f55c85f job_id:817322 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:24:09,568 - INFO : job: 6d98526c0f06b02d25c558fd742c2b32dbb80e07 806898 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:24:09,573 - INFO : job:6d98526c0f06b02d25c558fd742c2b32dbb80e07 re-submit after terminated; new job_id is 817337
2025-03-18 21:24:09,859 - INFO : job:6d98526c0f06b02d25c558fd742c2b32dbb80e07 job_id:817337 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:24:09,860 - INFO : job: 8516491a1e772723c5ec1004ae1e76156bf2b4c9 807304 terminated; fail_cout is 2; resubmitting job
2025-03-18 21:24:09,864 - INFO : job:8516491a1e772723c5ec1004ae1e76156bf2b4c9 re-submit after terminated; new job_id is 817708
2025-03-18 21:24:10,153 - INFO : job:8516491a1e772723c5ec1004ae1e76156bf2b4c9 job_id:817708 after re-submitting; the state now is <JobStatus.running: 3>
2025-03-18 21:24:10,153 - INFO : job: 0d1382560fb12b8c32b145c683f7967712e9c29d 807825 terminated; fail_cout is 3; resubmitting job
Traceback (most recent call last):
File "/homea/wangyl/miniconda3/envs/deepmd/lib/python3.12/site-packages/dpdispatcher/submission.py", line 356, in handle_unexpected_submission_state
job.handle_unexpected_job_state()
File "/homea/wangyl/miniconda3/envs/deepmd/lib/python3.12/site-packages/dpdispatcher/submission.py", line 855, in handle_unexpected_job_state
raise RuntimeError(err_msg)
RuntimeError: job:0d1382560fb12b8c32b145c683f7967712e9c29d 807825 failed 3 times.
Possible remote error message: ==> /homea/wangyl/dpgen/dpgen_example/run/work/c6ba8e6375537415ea5880ac6caf51ae51946e1b/task.000.000009/fp.log <==
/homea/wangyl/dpgen/dpgen_example/run/work/c6ba8e6375537415ea5880ac6caf51ae51946e1b/0d1382560fb12b8c32b145c683f7967712e9c29d.sub.run: line 6: mpirun: command not found
/homea/wangyl/dpgen/dpgen_example/run/work/c6ba8e6375537415ea5880ac6caf51ae51946e1b/0d1382560fb12b8c32b145c683f7967712e9c29d.sub.run: line 6: mpirun: command not found
/homea/wangyl/dpgen/dpgen_example/run/work/c6ba8e6375537415ea5880ac6caf51ae51946e1b/0d1382560fb12b8c32b145c683f7967712e9c29d.sub.run: line 6: mpirun: command not found
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/homea/wangyl/miniconda3/envs/deepmd/bin/dpgen", line 10, in
sys.exit(main())
^^^^^^
File "/homea/wangyl/miniconda3/envs/deepmd/lib/python3.12/site-packages/dpgen/main.py", line 255, in main
args.func(args)
File "/homea/wangyl/miniconda3/envs/deepmd/lib/python3.12/site-packages/dpgen/generator/run.py", line 5474, in gen_run
run_iter(args.PARAM, args.MACHINE)
File "/homea/wangyl/miniconda3/envs/deepmd/lib/python3.12/site-packages/dpgen/generator/run.py", line 4826, in run_iter
run_fp(ii, jdata, mdata)
File "/homea/wangyl/miniconda3/envs/deepmd/lib/python3.12/site-packages/dpgen/generator/run.py", line 4048, in run_fp
run_fp_inner(
File "/homea/wangyl/miniconda3/envs/deepmd/lib/python3.12/site-packages/dpgen/generator/run.py", line 4027, in run_fp_inner
submission.run_submission()
File "/homea/wangyl/miniconda3/envs/deepmd/lib/python3.12/site-packages/dpdispatcher/submission.py", line 260, in run_submission
self.handle_unexpected_submission_state()
File "/homea/wangyl/miniconda3/envs/deepmd/lib/python3.12/site-packages/dpdispatcher/submission.py", line 360, in handle_unexpected_submission_state
raise RuntimeError(
RuntimeError: Meet errors will handle unexpected submission state.
Debug information: remote_root==/homea/wangyl/dpgen/dpgen_example/run/work/c6ba8e6375537415ea5880ac6caf51ae51946e1b.
Debug information: submission_hash==c6ba8e6375537415ea5880ac6caf51ae51946e1b.
Please check error messages above and in remote_root. The submission information is saved in /homea/wangyl/.dpdispatcher/submission/c6ba8e6375537415ea5880ac6caf51ae51946e1b.json.
For furthur actions, run the following command with proper flags: dpdisp submission c6ba8e6375537415ea5880ac6caf51ae51946e1b
Beta Was this translation helpful? Give feedback.
All reactions