-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run Regent on a distributed system #69
Comments
Hi @crl123, This means that Task Bench is computing the wrong result. I'm a little confused, I thought the Regent implementation was fully debugged. I'm not expecting this to make a difference, but can you confirm what Task Bench branch/tag you're on? I'll try to confirm on my end as well. |
I'm on the 'origin/master' branch. |
Ok, I'm a bit swamped with things going on this week, but I'll try to find time to verify the Regent implementation on my own machine. |
Sorry for taking so long to get back to this. Looking back at your configuration here, I don't see any settings for the network. Typically you'd use something like:
Otherwise what you're doing is running N copies of the single-node program. Which is probably why this is misbehaving. |
Hi @elliottslaughter. I have a further question about multi-node benchmarks. |
@ysfess22 Please submit this as a new issue unless it's specifically related to the original posting. The answer will depend on how you have configured your system, and I will require more information, which will clog this thread if it's not specifically related. |
Good afternoon,
I am running Regent on my cluster of 9 node with the following parameters:
mpirun -np 9 -ppn 1 ./TaskBench/task-bench/regent/main.shard14 -steps 10 -type fft -kernel compute_bound -iter 1000000
And it is giving me the following problem:
main.shard14: core.cc:588: void TaskGraph::execute_point(long int, long int, char*, size_t, const char**, const size_t*, size_t, char*, size_t) const: Assertion `input[i].second == dep' failed.
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 6
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
And sometimes the following problem:
main.shard14: core.cc:565: void TaskGraph::execute_point(long int, long int, char*, size_t, const char**, const size_t*, size_t, char*, size_t) const: Assertion `offset <= point && point < offset+width' failed.
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 6
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
I have the same problem when I use the tree type, but when I use the stencil_1d type I don't have the problem.
I compile regent as follows:
DEFAULT_FEATURES=0 USE_REGENT=1 ./get_deps.sh
export CXX=mpicxx
export CC=mpicc
./build_all.sh
Thank you in advance for your help,
The text was updated successfully, but these errors were encountered: