Topological Routing in DC

Authors

Sergio Leon Gaixas (@gaixas1) - Universitat Politècnica de Catalunya (ES)

Scenario description

In this tutorial, we show how to configure and use topological routing and forwarding policies for datacenter. In special, we show how to work with the Groups Rules and Exceptions forwarding policies (GRE family) and both distributed and centralized routing.

The scenario used in this tutorial can be found under the folder “/examples/Tutorials/Topological_DC_Routing”, and is composed by the following files:

net.ned: Network description.
omnet.ini: Network and overall configuration.
shimqoscube.xml: QoSCubes definition for shim-DIFs.
qoscube.xml: QoSCubes definition for upper DIFs.
shimconnectionset[Central].xml: Definition of preconfigured flows.
qosreq.xml: QoS requirements of preconfigured flows.
directory.xml: Configuration of IPCP locations

The network described in “net.ned” is a modified clos DC network with 8 pods with 6 ToRs each and 4 spine-sets of 3 spines each as seen in figure 1. In addition, for the centralized routing configuration, manager servers have been connected to the two first ToRs of each pod.

net

Figure 1. Tutorial network.

The main objective of this scenario is to show how routing and forwarding policies behave after some flow failures. In order to do that, multiple flows are “killed” after the network is stable and, after it has been stabilized again, all ToRs try to communicate between them.

In this scenario we consider 4 different configurations:

Default configuration

Errorless configuration with a default link state routing and forwarding tables.

GRE Static configuration “routingxceptions”

Errorless configuration with GRE policies without routing

GRE Distributed configuration “routingxceptionsErrors”

Scenario with multiple random errors, GRE policies and distributed erro-based routing.

GRE Centraliced configuration “centralized”

Scenario with multiple random errors, GRE policies and centralized erro-based routing.

Simulation setup

Net.ned

The configured network is that of a small datacenter network fabric connecting all ToRs in a Clos topology.

In this network, instead of more common used node modules, in order to test reachability between nodes, we use of “ReachabilityTest_Switch” nodes together with the “ReachabilityTest_Listener” in order to inject and monitor traffic between all ToRs in the network.

In addition, in order to randomly “kill” nodes, we use the “FailureSimulation”, in charge of listening to established flows and to decide if those have to be disconnected.

Finally, each pod has two ToRs connected to a “Manager” node. Those are used for the centralized routing scenario and represents servers located at the ToR Rack.

Omnet.ini

Let’s explain the basic configuration of the network. In this scenario, we have two DIF levels.

At base level “ipcProcess0” we have the shim DIFs (mediums) interconnecting all the nodes. For the configuration of those, we used a simple naming rule where each shim has a unique and node addresses where given as 0 to spine nodes, 1 to fabric nodes, 2 to ToRs and 3 to Masters. Then, at time = 0, flows through all shims are allocated, connecting upper IPCPs, as described in “shimconnectionset.xml” ("shimconnectionsetCentral.xml" for the centralized scenario).

In the upper layer, we have the “F” DIF, where addresses where given location dependent as hexadecimal values in the form AABB (being AA the group, either spine-set or pod, and BB the node id within that group). In that DIF, we don’t pre-allocate any flow, as all the testing in this scenario is done via PDUs directly injected into the RMT.

# Shims
**._0_0.ipcProcess0[*].ipcAddress = "0"
**._0_0.ipcProcess0[0].difName = "00_3"
**._0_0.ipcProcess0[1].difName = "00_4"
**._0_0.ipcProcess0[2].difName = "00_5"
**._0_0.ipcProcess0[3].difName = "00_6"
**._0_0.ipcProcess0[4].difName = "00_7"
**._0_0.ipcProcess0[5].difName = "00_8"
**._0_0.ipcProcess0[6].difName = "00_9"
**._0_0.ipcProcess0[7].difName = "00_a"
...

# Fabric 
**.ipcProcess1.difName = "F"

**._0_0.ipcProcess1.ipcAddress = "0000"
**._0_1.ipcProcess1.ipcAddress = "0001"
**._0_2.ipcProcess1.ipcAddress = "0002"
**._0_3.ipcProcess1.ipcAddress = "0003"

**._1_0.ipcProcess1.ipcAddress = "0100"
**._1_1.ipcProcess1.ipcAddress = "0101"
**._1_2.ipcProcess1.ipcAddress = "0102"
**._1_3.ipcProcess1.ipcAddress = "0103"
...

#Static/Distributed
**.ra.preallocation = xmldoc("shimconnectionset.xml", "ConnectionSet")
#Centralized
**.ra.preallocation = xmldoc("shimconnectionsetCentral.xml", "ConnectionSet")

Routing and forwarding

In order to rely correctly PDUs between the different DIF levels, routing and forwarding policies are configured as follows. At shims, we don't use any special policy as we are “on wire”. At the F DIF we have multiple configurations:

Default configuration

In the default configuration, a simple link state and forwarding table is used.

**.ipcProcess1.resourceAllocator.pdufgPolicyName = "SimpleGenerator"
**.ipcProcess1.relayAndMux.ForwardingPolicyName = "MiniTable"
**.ipcProcess1.routingPolicyName = "SimpleLS"

routingxceptions

In this scenario, we have the basic GRE configuration without the use of routing.


## Routing static rules&exceptions
**._*_F*.ipcProcess1.resourceAllocator.pdufgPolicyName = "StaticGenerator"
**._*_F*.ipcProcess1.relayAndMux.ForwardingPolicyName = "SimpleTable"
**._*_F*.ipcProcess1.routingPolicyName = "DummyRouting"
**._{0..2}_*.ipcProcess1.resourceAllocator.pdufgPolicyName ="GRE_Clos2R"
**._{0..2}_*.ipcProcess1.relayAndMux.ForwardingPolicyName = "Clos2"

**._*_{0..2}.ipcProcess1.resourceAllocator.pdufgPolicyName ="GRE_Clos1R"
**._*_{0..2}.ipcProcess1.relayAndMux.ForwardingPolicyName = "Clos1"

**._*_*.ipcProcess1.resourceAllocator.pdufgPolicyName ="GRE_Clos0R"
**._*_*.ipcProcess1.relayAndMux.ForwardingPolicyName = "Clos0"

**.ipcProcess1.resourceAllocator.pduFwdGenerator.pods = 8
**.ipcProcess1.resourceAllocator.pduFwdGenerator.fabrics = 3
**.ipcProcess1.resourceAllocator.pduFwdGenerator.spines = 4
**.ipcProcess1.resourceAllocator.pduFwdGenerator.tors = 6

**.ipcProcess1.routingPolicyName = "eRouting"

routingxceptionsErrors

In this scenario, we have a GRE configuration with distributed routing.

**._*_F*.ipcProcess1.resourceAllocator.pdufgPolicyName = "StaticGenerator"
**._*_F*.ipcProcess1.relayAndMux.ForwardingPolicyName = "SimpleTable"
**._*_F*.ipcProcess1.routingPolicyName = "DummyRouting"

**._{0..2}_*.ipcProcess1.resourceAllocator.pdufgPolicyName ="GRE_Clos2R"
**._{0..2}_*.ipcProcess1.relayAndMux.ForwardingPolicyName = "Clos2"

**._*_{0..2}.ipcProcess1.resourceAllocator.pdufgPolicyName ="GRE_Clos1R"
**._*_{0..2}.ipcProcess1.relayAndMux.ForwardingPolicyName = "Clos1"

**._*_*.ipcProcess1.resourceAllocator.pdufgPolicyName ="GRE_Clos0R"
**._*_*.ipcProcess1.relayAndMux.ForwardingPolicyName = "Clos0"

**.ipcProcess1.resourceAllocator.pduFwdGenerator.pods = 8
**.ipcProcess1.resourceAllocator.pduFwdGenerator.fabrics = 3
**.ipcProcess1.resourceAllocator.pduFwdGenerator.spines = 4
**.ipcProcess1.resourceAllocator.pduFwdGenerator.tors = 6

**.ipcProcess1.routingPolicyName = "eRouting"

centralized

In this scenario, we have a GRE configuration with centralized routing.

**._*_F*.ipcProcess1.**.addrComparatorName = "EndPoint"
**.ipcProcess1.**.addrComparatorName = "ReachabilityTest_Comparator"

## Routing default Link state
**._*_F*.ipcProcess1.resourceAllocator.pdufgPolicyName = "GRE_ManagerClos"
**._*_F*.ipcProcess1.relayAndMux.ForwardingPolicyName = "DefaultGW"


**._{0..2}_*.ipcProcess1.resourceAllocator.pdufgPolicyName ="GRE_ClosSpine"
**._{0..2}_*.ipcProcess1.relayAndMux.ForwardingPolicyName = "Clos2"

**._*_{0..2}.ipcProcess1.resourceAllocator.pdufgPolicyName ="GRE_ClosFabric"
**._*_{0..2}.ipcProcess1.relayAndMux.ForwardingPolicyName = "Clos1"

**._*_*.ipcProcess1.resourceAllocator.pdufgPolicyName ="GRE_ClosToR"
**._*_*.ipcProcess1.relayAndMux.ForwardingPolicyName = "Clos0"


**._*_F*.ipcProcess1.routingPolicyName = "RoutingManager"
**._{0..2}_*.ipcProcess1.routingPolicyName = "RoutingDumb"
**._*_*.ipcProcess1.routingPolicyName = "RoutingClient"

**.ipcProcess1.resourceAllocator.pduFwdGenerator.pods = 8
**.ipcProcess1.resourceAllocator.pduFwdGenerator.fabrics = 3
**.ipcProcess1.resourceAllocator.pduFwdGenerator.spines = 4
**.ipcProcess1.resourceAllocator.pduFwdGenerator.tors = 6

**.ipcProcess1.routingPolicy.pods = 8
**.ipcProcess1.routingPolicy.fabrics = 3
**.ipcProcess1.routingPolicy.spines = 4
**.ipcProcess1.routingPolicy.tors = 6

Failures simulation and reachability test

In this example, we use the "FailureSimulation" object and compatible routing poliies to emulate link failures in the network in order to test the usage of routing policies.

The simulation of failures is quite simple. Whenever a new flow is stablished in a module compatible with the failure simulation, in this case between IPCPs in the F DIF, it is registered into the FailureSimulation. After time "killAt", it starts to kill "amount" of links at intervals of "interKill"s. In addition, those flows can be resurrected at "resurrectAt".

**.fails.amount = 4
**.fails.killAt = 100
**.fails.interKill = 0.1
**.fails.resurrectAt = 0

* In this case, it is preferable to let the interKill time be enought for the routing updates to reach a stable state between failures, or at least avoid a 0 interKill time, as otherwise more than one flow can be killed at the same node and routing policies would send the same update at the same timestamp (a problem of this kind of simulations).

In addition to removing random links, we are also interested in check how forwarding in the network is achieved after it has reached again an stable state. In order to do that, we use the "test" module in the "ReachabilityTest_Switch"s to send messages between all ToRs.

The configuration of the test parameters goes as follows:

QoS: QoS of the test messages.
header_size: Size of the Test PDU
test_ini: Time for the test start. -1 implies don't start.
interval: Time between each PDU sent at each node.
nodes: String with node addresses to test, separated by a space.

**.test.QoS = "A"
**.test.header_size = 20
**._{0..2}_*.test.ini = -1
**._*_{0..2}.test.ini = -1
**.test.interval = 0.01
**.test.ini = 150
**.test.nodes = "0303 0304 0305 0306 0307 0308 0403 0404 0405 0406 0407 0408 0503 0504 0505 0506 0507 0508 0603 0604 0605 0606 0607 0608 0703 0704 0705 0706 0707 0708 0803 0804 0805 0806 0807 0808 0903 0904 0905 0906 0907 0908 0a03 0a04 0a05 0a06 0a07 0a08"

In addition, to avoid storing lost messages or printing those, we add the following:

**.coutLookError = false
**.deleteIfNotValid = true

Results

There are two possible results to be shown in this example:

Reachability test

The results shown by default whenever the reachability test is done. After the simulation ends, it prints the number of reachability messages sent and received, as well as the log of failures (if any). When performing the test in a stable state, the number of failures should be 0, unless some nodes are disconnected from the network (or the paths to reach them outside form the "valid" ones.

Routing/Forwarding information

Interesting information about groups and exceptions stored in forwarding tables, as well as routing information can be shown setting the parameter "printAtEnd" as true for the desired modules.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly