-
Notifications
You must be signed in to change notification settings - Fork 397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NoC Turn Model Routing Algorithms #2496
Conversation
std::hash<vtr::StrongID>() was implemented as the identity function. The generated hash values were very small, causing west-first algorithm to almost choose the right direction when down/up was also available.
…elect_next_direction. The size of size_t is machine and implementation dependent.
A cycle in NoC routing configuration may cause deadlock when packets wait on each other in a cycle.
Odd-even algorithm needs to know where the source router was, so we need to add it as a new argument to get_legal_directions.
…ize NoC weighting factors
I assumed the placer would place 64 logical router in an 8x8 grid. There are 64*63=4032 traffic flows. Within the 8x8 grid, there are 112 links. If these traffic flows are routed with minimal paths, generated routes would traverse 21504 edges. Ideally, the placement algorithm evenly distributes this aggregate bandwidth of 21504*BW over 112 links.
I forgot that there are two links between to neighboring routers. So the total number edges withing an 8x8 grid is 224.
To minimize aggregate bandwidth, the placer would 32 routers in a 6x6 grid. In such a placement, traffic flow routes traverse 3680 links. There are 120 links within a 6x6 grid. Assuming that an aggregate bandwidth if 3680*BW is evenly distributed among all 120 links, we can increase each traffic flow's bandwidths upto 3.26e4 without causing congestion.
The link bandwidth is 1e6 and the central router sends data to other 31 routers. Ideally, total traffic bandwidth is divided equally over 4 links of the central router. If each of these 4 links carry data for 8 routers, we need to divide the link bandwidth by 8. 1e6 / 8 = 1.25e5
…s so that a congestion free solution exists
This is the link to QoR spreadsheet for synthetic benchmarks. It is still being updated with new results. When completed, I'll post the summary here. |
Looks good. |
Also nice to give a normalized / summary tab. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partly done ... let's edit more tomorrow or Wed.
Aggregate bandwidth compared to master branch without congestion modeling
In all algorithms, when the congestion weighting grows, the aggregate bandwith increases. To avoid congestion, the placement engine places some routers farer from each other to expose more path diversity, leading to more links being traversed by certain traffic flows, thus resulting in higher overall bandwidth. When the congestion weighting factor is zero, all algorithms have the same aggregate bandwidth. This is because they all employ minimal routing, generate routes with the same number of links. Since other NoC-related cost terms only depend on the total number of traversed links by each traffic flow, regardless of which links form the route, all algorithms behave the same as XY routing in master without any congestion modeling. |
Aggregate latency compated to master branch with no congestion modeling
The aggregate latency shows a similar behavior as the aggregate bandwith. Therefore, the same arguments can be repeated about the aggregate latency. |
Normalized average congestion ratio
Even when congestion ratio is zero, Turn Model algorithms have lower congestion ratio. Even though these algortihms have the same placement as XY algorithm, they can exploit path diversity to some extent and reduce congestion compared to XY algorithm that cannot benefit from path diversity at all. Setting congestion weighting factor to a non-zero value helps the placement algorithm to reduce the congestion in all algorithms. Increasing the congestion weighting factor slightly improves congestion. However, it should be noted that this comes at the cost of higher aggregate bandwidth and latency. Odd-even algorithm is performing better than other algorithms. It is abled to reduce congestion by 94% while increasing the aggregate bandwidth only by 9%. Some other algorithms can achieve the same congestion improvement but increase the aggregate bandwith by at least 12%. |
Placement time compared to master without congestion
Congestion modeling and virtual method overrides in new routing algorithms are the main reasons of runtime increase. The number of swaps during the plecement stays almost constant. |
Thanks, these results look good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good ... some commenting suggestion and a few namespace suggestions attached.
@@ -0,0 +1,994 @@ | |||
<traffic_flows> | |||
<single_flow src=".*noc_router_adapter_block_1[^\d].*" dst=".*noc_router_adapter_block_2[^\d].*" bandwidth="3.26e4" latency_cons="21e-9" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment what the purpose / style of this benchmark is.
...chmarks/noc/Synthetic_Designs/congestion_traffic_flow_files/complex_64_noc_bucket_sort.fixed
Show resolved
Hide resolved
...oc/Synthetic_Designs/congestion_traffic_flow_files/complex_64_noc_gaussian_elimination.fixed
Show resolved
Hide resolved
...nchmarks/noc/Synthetic_Designs/congestion_traffic_flow_files/complex_64_noc_genome_seq.fixed
Show resolved
Hide resolved
...enchmarks/noc/Synthetic_Designs/congestion_traffic_flow_files/complex_64_noc_page_rank.fixed
Show resolved
Hide resolved
@vaughnbetz |
Thanks. Can you also confirm there is no CPU time, memory use or quality change when NoC optimization is off? That should be the case, but I like to make sure. A run on one significant size circuit under controlled circumstances should be enough to confirm. |
I added a new sheet to compare the QoR for titan benchmarks on this branch with master. Here is the summary:
|
Description
This PR adds four new NoC routing algrotihms from Turn Model paper. XY routing algorithm can only generate a single route for a source/destination location pair. These new algorithms can exploit path diversity in a mesh topology and generate mulitple routes. This PR also adds a new class to check deadlock freedom in generated NoC routes.
Motivation and Context
After adding NoC congestion modeling to VPR, it turned out XY routing algorithm is not able to get rid of congestion in some benchmarks. Turn Model routing algorithms exploit path diversity and distribute traffic flows more evenly across NoC links. When combined with NoC congestion modeling, these algorithm are more effective in reducing NoC link congestion.
The main reason we do not route traffic flows arbitrarily is to avoid deadlock. Turn model routing algorithms guarantee deadlock freedom. However, deadlock freedom is not currently checked in VPR. This PR implements a class to represent channel dependency graph, which is used to detect possible deadlocks in a NoC routing solution.
Traffic flow files for old synthetic benchmarks is updated in this PR. In some old benchmarks, a congestion-free solution was not possible because some traffic flows had a bandwidth higher than the link bandwidth. This PR updates traffic flow bandwidths in flow files so that a congestion-free solution is feasible.
New synthetic benchmarks have been added to represent designs that access HBM channels. In these benchmarks, some NoC routers are locked down at the boundary of device to imitate an HBM channel accessible through a NoC router.
How Has This Been Tested?
QoR has been measured on modified synthetic NoC benchmarks.
Types of changes
Checklist: