Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converter from rr_nodes to RRGraph object #1048

Open
wants to merge 59 commits into
base: master
Choose a base branch
from

Conversation

tangxifan
Copy link
Contributor

@tangxifan tangxifan commented Nov 15, 2019

Continued Effort on refactoring the Routing Resource Graph (RRG) to a unified RRGraph object.
This change brings the RRGraph to the DeviceContext of VPR engine, and it will create an RRGraph object by porting the rr_nodes information (classic routing resource graph) to the refactored object.
This change has not impact any routers, analyzers and drawers which are major clients of routing resource graph.
But later, changes will be made to these downstream functions.
Detailed design document:
https://docs.google.com/document/d/1LMIlpYoppFtuSi_OZZJNLZ7edWFuuC0js4hm1Ah0n2M/edit?usp=sharing

Description

  1. An RRGraph object has been added to DeviceContext data structure, in parallel to the classic rr_nodes. This object when replace the legacy data structures, i.e., rr_nodes, rr_node_indices, rr_switches, rr_segments etc. when refactoring is done.

  2. A loading function create_rr_graph() is introduced, which loads the rr_node information to RRGraph object. This function duplicates the rr_node information.

  3. Multiple bug fixing to checking codes for RRGraph.

  4. Added default constructors in vtr_geometry.h to resolve compatibility issues in Clang compilers.

  5. Changes in golden_results for vtr_strong regression tests, which are mainly for relaxing the memory usage. This is actually temporary! Since we duplicate the routing resource graph, it is expected that the memory usage will explode.
    On my local machine test, four tests in vtr_strong fails:

  • regression_tests/vtr_reg_strong/strong_titan
    stratixiv_arch.timing.xml/ucsb_152_tap_fir_stratixiv_arch_timing.blif/common max_vpr_mem: previous_golden = 1011288 current_golden = 1368492
  • regression_tests/vtr_reg_strong/strong_custom_grid
    non_column.xml/raygentop.v/common max_vpr_mem: previous_golden = 91316 current_golden = 193908
  • regression_tests/vtr_reg_strong/strong_custom_grid
    non_column_tall_aspect_ratio.xml/raygentop.v/common max_vpr_mem: previous_golden = 87860 current_golden = 191944
  • regression_tests/vtr_reg_strong/strong_custom_grid
    multiple_io_types.xml/raygentop.v/common max_vpr_mem: previous_golden = 417392 current_golden = 1258700

The peak memory usage will exceed the current QoR because routing resource graph is one of most memory consuming data structure in VPR. During refactoring, we duplicate the data structure and this leads to memory overhead.
To ease the integration, I have done a run to rewrite the golden results. Feel free to bug this if you want only a few to be relaxed.
For sure, the golden results will be improved when refactoring is done. The duplication on routing resource graph will not exist.

Related Issue

Continued Effort on refactoring the Routing Resource Graph (RRG) to a unified RRGraph object.

#990

Motivation and Context

With an aim to group the discrete routing-related data structures into a unified object, which can ease router development and improve runtime and memory usage.
RRGraph object is designed to in a general way how routing resources are connected in a FPGA fabric, replacing the classical rr_node and rr_edge data structures.

How Has This Been Tested?

Passed basic and strong regression tests with relaxation on memory usage. (See details in Description section 5)

CentOS, g++-8.2
TravisCI

Types of changes

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

@probot-autolabeler probot-autolabeler bot added lang-cpp C/C++ code libvtrutil tests VPR VPR FPGA Placement & Routing Tool VTR Flow VTR Design Flow (scripts/benchmarks/architectures) labels Nov 15, 2019
Copy link
Contributor

@mithro mithro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do the golden files change?

@tangxifan
Copy link
Contributor Author

Why do the golden files change?

This is mainly temporary during our code construction process. To enforce incremental pull requests, we will first adapt downstream routers to use RRGraph objects and then refactor the builders for RRGraph (You can see more details in our design document https://docs.google.com/document/d/1LMIlpYoppFtuSi_OZZJNLZ7edWFuuC0js4hm1Ah0n2M/edit?usp=sharing).

Therefore, at this stage, I will create a RRGraph object, which duplicates the routing resource graph modeled by legacy data structure rr_node. This will cause memory overhead considered by current golden results. However, this duplication will be eliminated when refactoring is done.
The RRGraph object is a unified data structure to contain rr_node, rr_node_indices, rr_segment as well as rr_switches, without adding more information other than those already in these legacy data structures. So, there would not be memory overhead in the refactored version.

Actually, I put this issue open for discussion. Feel free to shot how you want to keep the current golden results.

@kmurray
Copy link
Contributor

kmurray commented Nov 19, 2019

I quickly ran this PR on the Titan23 design neuron (which is a moderate sized design; the vtr_reg_basic vtr_reg_strong tests only include small designs so that they run fast) and got the following:

## Build routing resource graph took 32.35 seconds (max_rss 2405.3 MiB, delta_rss +612.6 MiB)
  RR Graph Nodes: 5149691
  RR Graph Edges: 40724744
## Build routing resource graph object
## Build routing resource graph object took 30.41 seconds (max_rss 5911.4 MiB, delta_rss +3506.2 MiB)

which implies that the new object is using substantially more memory (5.7x) than the previous data structure. We probably need to look into why this is. I would expect the new RR graph object to use somewhat more (maybe 20-30%) but not this much more. My best guess is that there is some memory fragmentation occurring which is the problem. (I've previously seen these types of effects with the old RR graph as well).

Looking at the creation code, it seems like you aren't doing any reservations for the per-node edge lists. While you do reserve the edge elements themselves, we don't reserve the RRGraph::node_in_edges_ or RRGraph::node_out_edges_ sub vectors. That is probably the first thing I would try, as they will otherwise be constantly reallocated whenever we do RRGraph::add_edge().

After that I've had good success using the Valgrind's 'Massif' heap profiler to figure out memory usage hot spots.

@tangxifan
Copy link
Contributor Author

I quickly ran this PR on the Titan23 design neuron (which is a moderate sized design; the vtr_reg_basic vtr_reg_strong tests only include small designs so that they run fast) and got the following:

## Build routing resource graph took 32.35 seconds (max_rss 2405.3 MiB, delta_rss +612.6 MiB)
  RR Graph Nodes: 5149691
  RR Graph Edges: 40724744
## Build routing resource graph object
## Build routing resource graph object took 30.41 seconds (max_rss 5911.4 MiB, delta_rss +3506.2 MiB)

which implies that the new object is using substantially more memory (5.7x) than the previous data structure. We probably need to look into why this is. I would expect the new RR graph object to use somewhat more (maybe 20-30%) but not this much more. My best guess is that there is some memory fragmentation occurring which is the problem. (I've previously seen these types of effects with the old RR graph as well).

Looking at the creation code, it seems like you aren't doing any reservations for the per-node edge lists. While you do reserve the edge elements themselves, we don't reserve the RRGraph::node_in_edges_ or RRGraph::node_out_edges_ sub vectors. That is probably the first thing I would try, as they will otherwise be constantly reallocated whenever we do RRGraph::add_edge().

After that I've had good success using the Valgrind's 'Massif' heap profiler to figure out memory usage hot spots.

Thanks for the insights. Very helpful! I will follow this idea and optimize the RRGraph object.
So I will revert back to the old golden results and see if I can fit again.
By the way, I am thinking about other sources of the problem.

  • Does the current statistics include memory usage for rr_node_indices? Current RRGraph contains it besides rr_nodes.
  • We have two big mappings old_to_new_rr_node and old_to_new_rr_edge. Will these create memory overhead?

@kmurray
Copy link
Contributor

kmurray commented Nov 20, 2019

* Does the current statistics include memory usage for `rr_node_indices`? Current `RRGraph` contains it besides `rr_nodes`.

Yes, I believe it does. Since rr_node_indices is created at the same time as the RR graph (in the traditional RR graph builder) it should be included in the +612MiB

* We have two big mappings `old_to_new_rr_node` and `old_to_new_rr_edge`. Will these create memory overhead?

Yes, they will have some overhead. I'd focus on the edge lists first since I expect they will be the bigger issue.

You currently are using:

     std::map<int, RRNodeId> old_to_new_rr_node; 

Since both the key and value are small (2x32-bits) the overhead for a std::map BST node will be large (at least 2x64 bit pointers per key-value pair). Since we know the old RR node indices are contiguous integers you could achieve the same look-up effect with:

std::vector<RRNodeId> old_to_new_rr_node(rr_nodes.size(), -1);

which would be much lower overhead (no pointer overhead per node). Note that we sized it exactly to avoid dynamically growing its size which can cause memory fragmentation.

The edge look-up is more challenging since the key is not a contiguous ID range:

     std::map<std::pair<int, int>, RREdgeId> old_to_new_rr_edge; // Key: 

Since we know precisely how many edges there are in the graph, you could do something like:

std::vector<std::pair<std::pair<int,short>,RREdgeId> old_to_new_rr_edges_vec(num_edges);

for (int inode = 0; inode < rr_nodes.size(); ++i) {
    for (short iedge : rr_nodes[inode].edges()) {
        RREdgeId edge_id = rr_graph.add_node();
        old_to_new_rr_edges_vec.emplace_back({{inode,iedge},edge_id});
    }
}
vtr::flat_map<std::pair<int,short>,RREdgeId> old_to_new_rr_edges(std::move(old_to_new_rr_edges_vec));

Which uses a vtr::flat_map (a map implemented as a sorted vector, rather than a BST), constructed directly from an exactly sized vector. That would again avoid the 2x64bit pointer overhead for each BST node used by std::map.

@tangxifan tangxifan force-pushed the rr_graph_refactoring branch from 057ded1 to e0a212a Compare November 20, 2019 23:50
@tangxifan tangxifan force-pushed the rr_graph_refactoring branch 2 times, most recently from a8a3784 to 62ede3a Compare November 20, 2019 23:54
@kmurray
Copy link
Contributor

kmurray commented Nov 21, 2019

Re-running on neuron with the latest changes does show improvement:

## Build routing resource graph took 29.99 seconds (max_rss 2505.3 MiB, delta_rss +656.2 MiB)
  RR Graph Nodes: 5149691
  RR Graph Edges: 40724744
## Build routing resource graph object
## Build routing resource graph object took 3.80 seconds (max_rss 3787.9 MiB, delta_rss +1282.6 MiB)

The object data structure only uses 2x the baseline RR graph (down from 5.7x).

@LNIS-Projects
Copy link

LNIS-Projects commented Nov 21, 2019

Re-running on neuron with the latest changes does show improvement:

## Build routing resource graph took 29.99 seconds (max_rss 2505.3 MiB, delta_rss +656.2 MiB)
  RR Graph Nodes: 5149691
  RR Graph Edges: 40724744
## Build routing resource graph object
## Build routing resource graph object took 3.80 seconds (max_rss 3787.9 MiB, delta_rss +1282.6 MiB)

The object data structure only uses 2x the baseline RR graph (down from 5.7x).

Indeed, I found that the node_in_edge and node_out_edge reservation do help in reducing memory fragments.
After applying this, the number of failed test cases in regression tests drops from 4 to 2.
Moreover, after I removed the big edge mapping old_to_new_rr_edge in convert_rr_graph, the memory footprint drops significantly.
After applying this, the number of failed test cases in regression tests further drops from 2 to 1.
The memory footprint of the test case multiple_io_types.xml/raygentop.v/common drops from 1GB to 600MB, while the golden result is 400MB.
Now I keep looking into the memory hotspots in convert_rr_graph.
I have also tried to reserve memory in the fast node_lookup but it appears to have no impact on the memory footprint.

However, we have to admit that RRGraph object indeed consumes more memory than the legacy rr_node. This is because:

  • The input edges for each node are now stored in RRGraph object, while they are not in rr_node. Note that rr_node only keeps the number of input edges, which is fan_in. This is the major source of memory overhead. But in practice, node_in_edges is very important and useful when you traverse the graph and apply back-annotation. Memory investment is worthwhile.
  • The ids of node and edge cost additional memory.
  • Segment id for each node costs additional memory. I may try to remove it and see memory benefits.

Any thoughts are warmly welcomed.

@kmurray
Copy link
Contributor

kmurray commented Nov 21, 2019

However, we have to admit that RRGraph object indeed consumes more memory than the legacy rr_node.

Yes this is true.

Previously when I've looked at this I've found it helpful to work out on a spread sheet what the memory usage is. Its much easier to experiment with a spreadsheet than re-coding everything.Take a look at this spreadsheet which works out roughly where we spend memory.

The major culprits are things which are O(edges) as there are far more edges than anything else.

Optimizations which seem to be no-brainers:

  • Use uint16_t for num_config in/out edges (size_t is hugely overkill, and the old RR graph also used shorts for this).
  • We can re-define the NodeLookup from:
    /* Fast look-up to search a node by its type, coordinator and ptc_num 
     * Indexing of fast look-up: [0..xmax][0..ymax][0..NUM_TYPES-1][0..ptc_max][0..NUM_SIDES-1] 
     */
    typedef std::vector<std::vector<std::vector<std::vector<std::vector<RRNodeId>>>>> NodeLookup;
    mutable NodeLookup node_lookup_;

to

    /* Fast look-up to search a node by its type, coordinator and ptc_num 
     * Indexing of fast look-up: [0..xmax][0..ymax][0..NUM_TYPES-1][0..ptc_max][0..NUM_SIDES-1] 
     */
    typedef vtr::NdMatrix<std::vector<std::vector<RRNodeId>>,3> NodeLookup;
    mutable NodeLookup node_lookup_;

Since the width, height and number of RR types are fixed (i.e. known) at RR graph construction time. This would avoid the std::vector overhead (24 bytes for each instance). We'd want to keep the inner two dimensions as std::vector, since they are not fully dense.

Potential functionality preserving optimizations:

  • Instead of storing the each valid Id (node_ids_, edge_ids_), simply store the delimeters of the valid range in a vector and do binary search to check whether a given Id is in the valid range or not. Since we usually only have valid IDs this should be effectively constant time, but will save ~175MiB (mostly from eliminating edge_ids).

  • In various places where we're storing a small number of elements we could try using vtr::small_vector, which trades-off maximum vector size for more efficient storage (For small sizes it can also store elements in place within the object). For instance this could be used for the inner dimensions of the NodeLookup:

    /* Fast look-up to search a node by its type, coordinator and ptc_num 
     * Indexing of fast look-up: [0..xmax][0..ymax][0..NUM_TYPES-1][0..ptc_max][0..NUM_SIDES-1] 
     */
    typedef vtr::NdMatrix<vtr::small_vector<vtr::small_vector<RRNodeId,uint8_t>,uint16_t>,3> NodeLookup;
    mutable NodeLookup node_lookup_;

Since NUM_SIDES=4 we can get away with using 1 byte to store the size of the inner dimension, and since the number of PTCs is at most a few hundred we can use 2 bytes for the size.

Potential Optimizations with more trade-offs:

  • Don't store backward edges (i.e. node_in_edges). We loose some functionality (but match the functionality of the old RR graph). Saves ~260MiB.

  • Use a unified array per node to store all edges (in, out, config, non config) along with counts to calculate the start/end of each range of edges.

Small optimizations which probably don't offer a big pay off:

  • Use a union for node side/direction
  • Store only unique RC values

@kmurray
Copy link
Contributor

kmurray commented Nov 21, 2019

If I had to push for the simplest path forward it would be to just apply the obvious optimizations, followed by:

  • Storing only the delimters of valid ID ranges

and if that is insufficient:

  • Drop storing the node_in_edges

@vaughnbetz
Copy link
Contributor

Discussing this with Kevin ...
I think the edge_ids vector can be removed.
Current use: storing whether or not an id is valid. This can be more efficiently stored as:
unordered_map of the invalid edges (as usually this hash table will be empty).

Checking if an edge is valid: checks if the id is in range (0 < id < edge_ids.size) and probes the unordered_map to see if it is invalid.

Compressing rr-graph to remove invalid edges: You can walk through the hash table and put the invalid edge ids in a vector, sort it, use that to walk (once) through the edge_ids and reindex them as we can see immediately by how much to reduce the value of every edge id.

Should do the same thing for node_ids as it gives savings too.

Expected savings from the spreadsheet Kevin attached: 11.8% from edges + 1.5% from nodes = 13.3% memory footprint. Should have no impact on runtime.

@vaughnbetz
Copy link
Contributor

From the spreadsheet, bidirectional edges are costing about 21.8% memory footprint. Convenient though. Suggest optimizing everything else and then we can think about that one, as deleting them is a real trade-off.

@vaughnbetz
Copy link
Contributor

I like the single node edge array + count delimiter option. 13% expected memory reduction, no impact on typical users of the rr-graph. We lose constant time insertion of an arbitrary type of edge as we don't have unique vectors for each category, but only the rr-graph builder would care about that, and it could just set up the edge_* arrays, and then the end of the rr-graph builder could walk them to set up the node_edges* data.

tangxifan and others added 11 commits November 25, 2019 22:51
This avoids exposing the details of how we are tracking invalid edges
through-out the RRGraph implementation code.
Previously used a std::unordered_map but the value was unused.
This should be more run-time and memory efficient than creating
a vector of the entire range and returning it.
Users of the RRGraph shouldn't care how the edge/node iteration is
implemented so move the implementation below the public methods to
improve readability.
We now use a single array to store the edges associated with a node.
The array is sorted into sub-ranges (e.g. incoming/outgoing,
configurable/nonconfigurable) which still allows efficient iteration
through the edge subranges. We store the sizes of the various sub-ranges
to enable quickly determination of the delimiters between sub-ranges.

We also use a raw pointer for the edge array (rather than a vtr::vector)
which further saves memory.

The create_edge() routine now no longer inserts the edge into the node
data. Instead a single call is made to rebuild_node_edges() which will
walk the various RRGraph::edge_* members to precisely allocate the
relevant node_edges_ and node_num*edges_ data members.

The trade-off of this change is that the various node_*_edges() will not
work properly after an edge is added to the RR graph until
rebuild_node_edges() is called. Currently it rebuilds the node edges
from scratch, and so calls to it should be minimized (the preferred
approach is to add all edges and then call rebuild_node_edges() once).
If needed, support for incrementally rebuilding node_edges
could be added (while the use of a single array per node type in general
require insertion not at the end of the array, an O(n) operation, the number
of edges per node is typically bounded by a small constant, so it would
still be reasonably efficient); it is not currently required and so is
left as *potential* future work should the need for it arise.
Instead clients should use the .size() member of the relevant range.
@tangxifan
Copy link
Contributor Author

It seems that we have some errors in getting titan benchmarks.

Building target(s): get_titan_benchmarks
Downloading (~1GB) and extracting Titan benchmarks (~10GB) into VTR source tree.
  File "./vtr_flow/scripts/download_titan.py", line 86
    print "Found existing {} with matching checksum (skipping download and extraction)".format(tar_gz_filename)
                                                                                      ^
SyntaxError: invalid syntax

@tangxifan
Copy link
Contributor Author

Are the titan failures due to the regtest flagging the resident memory set as being out of bounds? Or did the compile fail (out of memory or anything else)?
Can you produce a spreadsheet that gives before and after peak memory footprint (with and without the new rr-graph object being created) as that would be another useful view of the memory footprint impact (it'll be higher than the real impact as both rr-graphs are loaded, but it would let us lock for unusually large increases).

All the benchmarks passed the run_vtr_flow, meaning that packing, placement and routing are successful. Errors occurred only when parse_vtr_flow checks the memory usage and complains they exceed the golden results too much.

I am reworking the spreadsheet to record golden memory results and current memory usage.
Actually, I have another idea to capture the memory size of rr_node.
I would like to call only VPR router for each benchmark. In this case, the packer and placer will only read-in previous results, which should not create too much holes in memory.
Then router will build rr_graph and we can find the actual memory size of rr_node.
What do you think?

@kmurray
Copy link
Contributor

kmurray commented Nov 26, 2019

I am reworking the spreadsheet to record golden memory results and current memory usage.
Actually, I have another idea to capture the memory size of rr_node.
I would like to call only VPR router for each benchmark. In this case, the packer and placer will only read-in previous results, which should not create too much holes in memory.
Then router will build rr_graph and we can find the actual memory size of rr_node.
What do you think?

Yes, that seems reasonable.

Also, your changes in 11f719a put the vtr::malloc_trim() calls after the vtr::ScopedStartFinishTimers are constructed. You'll actually want to do them the other way around (since the timers record the current memory usage when constructed in order to calculate the delta, we want to ensure memory is trimmed first).

@probot-autolabeler probot-autolabeler bot added docs Documentation infra Project Infrastructure lang-python Python code scripts Utility & Infrastructure scripts labels Nov 26, 2019
@tangxifan
Copy link
Contributor Author

I am reworking the spreadsheet to record golden memory results and current memory usage.
Actually, I have another idea to capture the memory size of rr_node.
I would like to call only VPR router for each benchmark. In this case, the packer and placer will only read-in previous results, which should not create too much holes in memory.
Then router will build rr_graph and we can find the actual memory size of rr_node.
What do you think?

Yes, that seems reasonable.

Also, your changes in 11f719a put the vtr::malloc_trim() calls after the vtr::ScopedStartFinishTimers are constructed. You'll actually want to do them the other way around (since the timers record the current memory usage when constructed in order to calculate the delta, we want to ensure memory is trimmed first).

Thanks for the advice. Just made the modification.
From what I observed so far, peak memory is reached most of time when create_device() function is executed. After looking into that, I realize that you create a rr_graph before placement and routing. But it seems that this rr_graph is a detailed rr_graph? Are we still using global rr_graphs for placer?

@vaughnbetz
Copy link
Contributor

We create a detailed routing (wires, switches) rr-graph for the placer, so we can profile / search the routing graph to produce delay lookups that the placer can quickly access.

@tangxifan
Copy link
Contributor Author

I have summarized in the spreadsheet about the peak memory usage of VPR flow and rr_graph.
It also compares the memory usage of golden results, before refactoring (tested on our local server) and after refactoring (tested on our local server).
I have tested over both VTR benchmarks and Titan benchmarks.
I noticed that on my local server, it seems that all the Titan benchmarks failed in QoR (too much overhead in memory usage)even before refactoring.
Can you check quickly on your local computers and tell me if it goes wrong at your side as well?

In the spreadsheet, you may notice that even when I turn on router only, the memory statistics on rr_node is not accurate. You can see some benchmarks give zero usage which does not make any sense. But when you compare to the memory statistics of running vpr flow, it is more accurate.

@vaughnbetz
Copy link
Contributor

For the Titan benchmarks, what is the difference between the two router-only runs? I see two columns, with and without refactoring, for the router only runs.

@tangxifan
Copy link
Contributor Author

For the Titan benchmarks, what is the difference between the two router-only runs? I see two columns, with and without refactoring, for the router only runs.

Sorry. That is a typo. The left part is full VPR flow-run while the right part is router-only run.
I have corrected it in the spreadsheet.

@litghost litghost mentioned this pull request Jan 24, 2020
9 tasks
@tangxifan
Copy link
Contributor Author

Hi all, I understand that the memory footprint has become a critical concern here. To address that, I suggest unit tests before deployment. My plan is as follows:

  1. Move data structure rr_node and RRGraph out from vpr to a library librrgraph in libs.
    And I remove the conversion function from VPR codes.
  2. Build readers and writers for RRGraph as we did for rr_node. We make sure that the RRGraph can read the same XML and output the same XML as rr_node did.
  3. Compare the memory footprint of both objects with a set of routing resource graphs (*.xml)
  4. Optimize the internal data organization of RRGraph if needed.

As such, we can be confident that before deployment

  1. RRGraph object is functional
  2. RRGraph object will not cause overheads

After this, we can continue our incremental refactoring.
Your advice is warmly welcomed.

@kmurray kmurray mentioned this pull request Jan 31, 2020
9 tasks
@tangxifan
Copy link
Contributor Author

Hi all,
I have experimentally deployed the RRGraph object in VPR8 version (commit 2780988).
My implementation is now under testing in the OpenFPGA framework.
My modification covers

  1. routing resource graph builder
  2. Routers
  3. Routing results storage.
  4. Routing stats print-out
  5. Drawer
    Currently, I have not seen any memory overhead reported by vtr-basic and vtr-strong regression tests. It means that it really worths a try to land this refactoring effort.
    During this effort, I have seen that current rr_node_indicies have a lot of exceptions, which are not allowed by the new data structures. For example, the indexing on SOURCE/SINK/OPIN/IPIN of grid whose width and height is >1. These all bring a lot of difficulties, i.e., many hard-to-fix bugs, when adapting the rr_graph builders.
    In terms of QoR, I have seen some QoR shift:
    1. minimum routable channel width is reduced for some benchmarks. I am still investigating why.
    2. critical path delay is slightly reduced. I suspect there is something wrong in the rc-tree annotation.
    However, these problems should be addressed very soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation infra Project Infrastructure lang-cpp C/C++ code lang-python Python code libvtrutil scripts Utility & Infrastructure scripts tests VPR VPR FPGA Placement & Routing Tool VTR Flow VTR Design Flow (scripts/benchmarks/architectures)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants