-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for enhanced hypercube topologies. #1
base: master
Are you sure you want to change the base?
Conversation
Dependent on the switches being consistently cabled in port-to-dimension order (in this respect, it is identical to EHC support in OFA/Infiniband.) It is activated by setting the Hypercube tag to 1 in the sm section of the xml file. Either the shortestpath or dgshortestpath RoutingAlgorithm can be used. Traffic is distributed only across the port group associated with the lowest dimension to a neighbor switch along a shortest path. The cost of a hop needs to be constant so that it follows dimension ordering rules and is not influenced by differing port speeds.
Hello Andy, I am Scott Breyer and hold the role of maintainer for this github repository. I wanted to get back to you to let you know we are processing your patch, I apologize for the delay in responding. A couple of preliminary questions for you: · We were under the impression that a much larger patch was in the works; this seems to handle hypercube with a much smaller scope. Are more patches in the pipe, or is this the complete proposed patch for hypercube? · A reviewer indicated that the patch does not seem to handle errors where cabling is not consistent with “dimension ordered cabling”. How are those identified and reported to the sysadmin? Thanks, From: Andy Warner [mailto:[email protected]] Dependent on the switches being consistently cabled in port-to-dimension order It is activated by setting the Hypercube tag to 1 in the sm section of the xml file. You can view, comment on, or merge this pull request online at: Commit Summary
File Changes
Patch Links: — |
There are 2 more features in process and to be expected:
You are correct, there is no identification of inconsistent cabling. We have not done that in the past for IB. While possibly a nice feature, we are not sure how we might approach it. Do not expect us to implement this. |
The mentioned additions seem like they will be larger in scope, given that the shortest path and dg shortest path algorithms are only 1500 lines of code, it would seem better to create a new routing algorithm for hypercube and also avoid impacts to global functions like SpeedWidth_to_Cost. If in the end there are significant portions of common code, it can be refactored into a “routing library” for use by multiple algorithms. It is often challenging to review and validate routing algorithms, so limiting the number of modes per routing algorithm would be preferred. It would seem this 1st cut at a hypercube algorithm, if copied to be a separate source file, could in fact remove portions of the shortest path code, as features like SpineFirstRouting are orthogonal to hypercube and some of the balancing and alternate path features are likely to conflict with DOR. For questions like recognizing errors, having configurable port ordering, etc. Please look at the existing DOR code. That code was fully functional on truescale but was not ported to OPA yet. The main area needing porting was the toroidal handling since the expensive SL2VL mechanism used for datelines in IB is better handled via SC2SC in OPA. However toroidal would not be relevant to hypercube, so that part of the code could be if’ed. The current code is if’def out, search for CONFIG_INCLUDE_DOR in the code base and see the sm_dor.c file. Given the list of additional goals, that code is a better starting point. One limitation in that code, which should be easy enough to correct, is that it’s config input was not prepared for mesh dimensions of size 2 (eg. hypercube), but that’s really a question about parsing PortPair opafm.xml input. Depending on what your RAS strategy is, the code also add an alternate up/down algorithm which was routed on a separate LMC LID as a low performance resilient alternative route to handle disrupted fabrics. Since you don’t mention that, you could potentially ifdef out that code for this initial hypercube solution. Todd Rimmer From: Andy Warner [mailto:[email protected]] There are 2 more features in process and to be expected:
You are correct, there is no identification of inconsistent cabling. We have not done that in the past for IB. While possibly a nice feature, we are not sure how we might approach it. Do not expect us to implement this. — |
Dependent on the switches being consistently cabled in port-to-dimension order
(in this respect, it is identical to EHC support in OFA/Infiniband.)
It is activated by setting the Hypercube tag to 1 in the sm section of the xml file.
Either the shortestpath or dgshortestpath RoutingAlgorithm can be used. Traffic is
distributed only across the port group associated with the lowest dimension to a
neighbor switch along a shortest path. The cost of a hop needs to be constant so
that it follows dimension ordering rules and is not influenced by differing port speeds.