-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simulation becomes slow due to Legion warning1097 #5
Comments
Hi, I am sorry for the late reply. Line 1740 in 8794571
Line 1746 in 8794571
|
I appreciate your kind reply :) The following is the specification of the system. There are two computational nodes.
I will try your temporary solution and share the results. Additionally, could I ask one more question? I have the GPU workstation which has four RTX3080 (I turned off GASNET option since it does not have any network system in order to connect GPUs. They are embedded in one hardware system). When I tried to run HTR-solver w/ GPU on the GPU workstation, I notice that the current legion on the HTR-solver Github supports old Nvidia architecture. So, I used the latest legion provided in the Stanford Gitlab, and HTR-solver is successfully installed with GPU_ARCH = AMPERE. However, there is an error during the simulation. the following is the message. I guess that the error is caused because RTX3080 has the sm_86 architecture, not sm_80. prometeo_ConstPropMix.exec: /home/mwkim/htr_stanfrod/src/Utils/task_helper.hpp:136: void TaskHelper::base_gpu_wrapper(const Legion::Task*, const std::vectorLegion::PhysicalRegion&, Legion::Context, Legion::Runtime*) [with T = LoadMixtureTask; Legion::Context = Legion::Internal::TaskContext*]: Assertion `task->arglen == 0' failed. Another method is using the old architecture. I set GPU_ARCH = VOLTA. Also, it is installed while the simulation does not work. In this case, the warning message shows "the program installed in lower architecture than sm_80". I hope I am not bothering you. |
This choice could be suboptimal in my experience. I would rather use the psm conduit as done for the HPC Kraken @ CERFACS.
are you sure that you are starting from a clean build? This usually error usually happens when the objects built with different setups are mixed in the same executable. |
First one. "configure error: Requested PMI support could not be found". Thus, I change psm to ibv. I did not try to fix it. But, I should do it at this time... Second one |
Hello, I am a user of HTR-solver using an HPC system (unfortunately, a CPU-only system).
I have a problem when I run the test case named "Franko".
While the simulation runs, the output returns the message as below.
[21 - 2aab1f891880] 116.356580 {4}{runtime}: [warning 1097] LEGION WARNING: WARNING: The runtime has failed to memoize the trace more than 5 times, due to the absence of a replayable template. It is highly likely that trace 0 will not be memoized for the rest of execution. The most recent template was not replayable for the following reason: Remote shard not replyable. Please change the mapper to stop making memoization requests. (from file /home01/x2242a06/legion/runtime/legion/legion_trace.cc:2019)
For more information see: http://legion.stanford.edu/messages/warning_code.html#warning_code_1097
The warning does not stop the simulation. However, the simulation becomes slower and slower. The initial wall time for the one-step is 3sec, and the wall time increases continuously.
As shown in the figure, the iterations are even only 5000 steps, the wall-time increases almost 10-15esc. If I do the simulation until 50000 iterations, it would be 100-150 sec in this tendency.
Here, I have the first question. Is this tendency normal in the Franko case? In my guess, the periodic wall-time can be normal while the continuous increment of the wall-time is unusual behavior.
The second one is that if not, have you had this experience? I would like to how can I solve this problem.
The text was updated successfully, but these errors were encountered: