Run Regent on a distributed system #69

crl123 · 2020-11-04T17:35:16Z

Good afternoon,
I am running Regent on my cluster of 9 node with the following parameters:
mpirun -np 9 -ppn 1 ./TaskBench/task-bench/regent/main.shard14 -steps 10 -type fft -kernel compute_bound -iter 1000000
And it is giving me the following problem:
main.shard14: core.cc:588: void TaskGraph::execute_point(long int, long int, char*, size_t, const char**, const size_t*, size_t, char*, size_t) const: Assertion `input[i].second == dep' failed.

= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 6
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
And sometimes the following problem:
main.shard14: core.cc:565: void TaskGraph::execute_point(long int, long int, char*, size_t, const char**, const size_t*, size_t, char*, size_t) const: Assertion `offset <= point && point < offset+width' failed.

= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= EXIT CODE: 6
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6)
I have the same problem when I use the tree type, but when I use the stencil_1d type I don't have the problem.
I compile regent as follows:
DEFAULT_FEATURES=0 USE_REGENT=1 ./get_deps.sh
export CXX=mpicxx
export CC=mpicc
./build_all.sh
Thank you in advance for your help,

elliottslaughter · 2020-11-04T17:50:40Z

Hi @crl123,

This means that Task Bench is computing the wrong result. I'm a little confused, I thought the Regent implementation was fully debugged.

I'm not expecting this to make a difference, but can you confirm what Task Bench branch/tag you're on?

I'll try to confirm on my end as well.

crl123 · 2020-11-04T17:56:15Z

I'm on the 'origin/master' branch.
I updated the repository in my local machine on this Sunday.

elliottslaughter · 2020-11-04T18:00:45Z

Ok, I'm a bit swamped with things going on this week, but I'll try to find time to verify the Regent implementation on my own machine.

elliottslaughter · 2022-09-13T18:02:10Z

Sorry for taking so long to get back to this.

Looking back at your configuration here, I don't see any settings for the network. Typically you'd use something like:

export USE_GASNET=1
export CONDUIT=aries

Otherwise what you're doing is running N copies of the single-node program. Which is probably why this is misbehaving.

ysfess22 · 2023-10-05T17:44:03Z

Hi @elliottslaughter. I have a further question about multi-node benchmarks.
Using gasnet the way you explained for a cluster with two nodes (udp conduit) creates double the number of tasks in the graph; half of the tasks is ran by node 1 and the other half by node 2. Is that the expected behaviour? Or is there a way to have the tasks be split between nodes? E.g., Given a 10x10 stencil graph, the 100 tasks would be split between two nodes.

elliottslaughter · 2023-10-05T17:48:35Z

@ysfess22 Please submit this as a new issue unless it's specifically related to the original posting.

The answer will depend on how you have configured your system, and I will require more information, which will clog this thread if it's not specifically related.

hyviquel mentioned this issue Jan 5, 2021

Compute-Intensive Kernel Hits NaN #71

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run Regent on a distributed system #69

Run Regent on a distributed system #69

crl123 commented Nov 4, 2020 •

edited

Loading

elliottslaughter commented Nov 4, 2020

crl123 commented Nov 4, 2020

elliottslaughter commented Nov 4, 2020

elliottslaughter commented Sep 13, 2022

ysfess22 commented Oct 5, 2023

elliottslaughter commented Oct 5, 2023

Run Regent on a distributed system #69

Run Regent on a distributed system #69

Comments

crl123 commented Nov 4, 2020 • edited Loading

elliottslaughter commented Nov 4, 2020

crl123 commented Nov 4, 2020

elliottslaughter commented Nov 4, 2020

elliottslaughter commented Sep 13, 2022

ysfess22 commented Oct 5, 2023

elliottslaughter commented Oct 5, 2023

crl123 commented Nov 4, 2020 •

edited

Loading