Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for enhanced hypercube topologies. #1

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

andyw-lala
Copy link

Dependent on the switches being consistently cabled in port-to-dimension order
(in this respect, it is identical to EHC support in OFA/Infiniband.)

It is activated by setting the Hypercube tag to 1 in the sm section of the xml file.
Either the shortestpath or dgshortestpath RoutingAlgorithm can be used. Traffic is
distributed only across the port group associated with the lowest dimension to a
neighbor switch along a shortest path. The cost of a hop needs to be constant so
that it follows dimension ordering rules and is not influenced by differing port speeds.

Dependent on the switches being consistently cabled in port-to-dimension order
(in this respect, it is identical to EHC support in OFA/Infiniband.)

It is activated by setting the Hypercube tag to 1 in the sm section of the xml file.
Either the shortestpath or dgshortestpath RoutingAlgorithm can be used. Traffic is
distributed only across the port group associated with the lowest dimension to a
neighbor switch along a shortest path.  The cost of a hop needs to be constant so
that it follows dimension ordering rules and is not influenced by differing port speeds.
@sjb017
Copy link
Contributor

sjb017 commented Mar 31, 2016

Hello Andy,

I am Scott Breyer and hold the role of maintainer for this github repository.

I wanted to get back to you to let you know we are processing your patch, I apologize for the delay in responding.

A couple of preliminary questions for you:

· We were under the impression that a much larger patch was in the works; this seems to handle hypercube with a much smaller scope. Are more patches in the pipe, or is this the complete proposed patch for hypercube?

· A reviewer indicated that the patch does not seem to handle errors where cabling is not consistent with “dimension ordered cabling”. How are those identified and reported to the sysadmin?

Thanks,
Scott Breyer
Intel Corporation

From: Andy Warner [mailto:[email protected]]
Sent: Monday, March 28, 2016 6:07 PM
To: 01org/opa-fm
Subject: [01org/opa-fm] Add support for enhanced hypercube topologies. (#1)

Dependent on the switches being consistently cabled in port-to-dimension order
(in this respect, it is identical to EHC support in OFA/Infiniband.)

It is activated by setting the Hypercube tag to 1 in the sm section of the xml file.
Either the shortestpath or dgshortestpath RoutingAlgorithm can be used. Traffic is
distributed only across the port group associated with the lowest dimension to a
neighbor switch along a shortest path. The cost of a hop needs to be constant so
that it follows dimension ordering rules and is not influenced by differing port speeds.


You can view, comment on, or merge this pull request online at:

#1

Commit Summary

  • Add support for enhanced hypercube topologies.

File Changes

Patch Links:


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHubhttps://github.com//pull/1

@andyw-lala
Copy link
Author

There are 2 more features in process and to be expected:

  1. Port ordering. This allows one to change the port<->dimension mapping when the cabling is not in port order. This is undergoing internal review & testing right now.
  2. Port weights/cost. This adds cost to routes that might otherwise perturb the EHC structure. For instance, when a storage fabric is connected to multiple points in an EHC.

You are correct, there is no identification of inconsistent cabling. We have not done that in the past for IB. While possibly a nice feature, we are not sure how we might approach it. Do not expect us to implement this.

@ToddRimmer
Copy link

The mentioned additions seem like they will be larger in scope, given that the shortest path and dg shortest path algorithms are only 1500 lines of code, it would seem better to create a new routing algorithm for hypercube and also avoid impacts to global functions like SpeedWidth_to_Cost. If in the end there are significant portions of common code, it can be refactored into a “routing library” for use by multiple algorithms. It is often challenging to review and validate routing algorithms, so limiting the number of modes per routing algorithm would be preferred.

It would seem this 1st cut at a hypercube algorithm, if copied to be a separate source file, could in fact remove portions of the shortest path code, as features like SpineFirstRouting are orthogonal to hypercube and some of the balancing and alternate path features are likely to conflict with DOR.

For questions like recognizing errors, having configurable port ordering, etc. Please look at the existing DOR code. That code was fully functional on truescale but was not ported to OPA yet. The main area needing porting was the toroidal handling since the expensive SL2VL mechanism used for datelines in IB is better handled via SC2SC in OPA. However toroidal would not be relevant to hypercube, so that part of the code could be if’ed. The current code is if’def out, search for CONFIG_INCLUDE_DOR in the code base and see the sm_dor.c file. Given the list of additional goals, that code is a better starting point. One limitation in that code, which should be easy enough to correct, is that it’s config input was not prepared for mesh dimensions of size 2 (eg. hypercube), but that’s really a question about parsing PortPair opafm.xml input. Depending on what your RAS strategy is, the code also add an alternate up/down algorithm which was routed on a separate LMC LID as a low performance resilient alternative route to handle disrupted fabrics. Since you don’t mention that, you could potentially ifdef out that code for this initial hypercube solution.

Todd Rimmer
DCSG Architecture
Voice: 610-312-2152 Fax: 610-312-2233
[email protected]:[email protected]

From: Andy Warner [mailto:[email protected]]
Sent: Thursday, March 31, 2016 12:15 PM
To: 01org/opa-fm
Subject: Re: [01org/opa-fm] Add support for enhanced hypercube topologies. (#1)

There are 2 more features in process and to be expected:

  1. Port ordering. This allows one to change the port<->dimension mapping when the cabling is not in port order. This is undergoing internal review & testing right now.
  2. Port weights/cost. This adds cost to routes that might otherwise perturb the EHC structure. For instance, when a storage fabric is connected to multiple points in a EHC.

You are correct, there is no identification of inconsistent cabling. We have not done that in the past for IB. While possibly a nice feature, we are not sure how we might approach it. Do not expect us to implement this.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHubhttps://github.com//pull/1#issuecomment-204004621

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants