Converter from rr_nodes to RRGraph object #1048

tangxifan · 2019-11-15T22:23:25Z

Continued Effort on refactoring the Routing Resource Graph (RRG) to a unified RRGraph object.
This change brings the RRGraph to the DeviceContext of VPR engine, and it will create an RRGraph object by porting the rr_nodes information (classic routing resource graph) to the refactored object.
This change has not impact any routers, analyzers and drawers which are major clients of routing resource graph.
But later, changes will be made to these downstream functions.
Detailed design document:
https://docs.google.com/document/d/1LMIlpYoppFtuSi_OZZJNLZ7edWFuuC0js4hm1Ah0n2M/edit?usp=sharing

Description

An RRGraph object has been added to DeviceContext data structure, in parallel to the classic rr_nodes. This object when replace the legacy data structures, i.e., rr_nodes, rr_node_indices, rr_switches, rr_segments etc. when refactoring is done.
A loading function create_rr_graph() is introduced, which loads the rr_node information to RRGraph object. This function duplicates the rr_node information.
Multiple bug fixing to checking codes for RRGraph.
Added default constructors in vtr_geometry.h to resolve compatibility issues in Clang compilers.
Changes in golden_results for vtr_strong regression tests, which are mainly for relaxing the memory usage. This is actually temporary! Since we duplicate the routing resource graph, it is expected that the memory usage will explode.
On my local machine test, four tests in vtr_strong fails:

regression_tests/vtr_reg_strong/strong_titan
stratixiv_arch.timing.xml/ucsb_152_tap_fir_stratixiv_arch_timing.blif/common max_vpr_mem: previous_golden = 1011288 current_golden = 1368492
regression_tests/vtr_reg_strong/strong_custom_grid
non_column.xml/raygentop.v/common max_vpr_mem: previous_golden = 91316 current_golden = 193908
regression_tests/vtr_reg_strong/strong_custom_grid
non_column_tall_aspect_ratio.xml/raygentop.v/common max_vpr_mem: previous_golden = 87860 current_golden = 191944
regression_tests/vtr_reg_strong/strong_custom_grid
multiple_io_types.xml/raygentop.v/common max_vpr_mem: previous_golden = 417392 current_golden = 1258700

The peak memory usage will exceed the current QoR because routing resource graph is one of most memory consuming data structure in VPR. During refactoring, we duplicate the data structure and this leads to memory overhead.
To ease the integration, I have done a run to rewrite the golden results. Feel free to bug this if you want only a few to be relaxed.
For sure, the golden results will be improved when refactoring is done. The duplication on routing resource graph will not exist.

Related Issue

Continued Effort on refactoring the Routing Resource Graph (RRG) to a unified RRGraph object.

#990

Motivation and Context

With an aim to group the discrete routing-related data structures into a unified object, which can ease router development and improve runtime and memory usage.
RRGraph object is designed to in a general way how routing resources are connected in a FPGA fabric, replacing the classical rr_node and rr_edge data structures.

How Has This Been Tested?

Passed basic and strong regression tests with relaxation on memory usage. (See details in Description section 5)

CentOS, g++-8.2
TravisCI

Types of changes

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My change requires a change to the documentation
I have updated the documentation accordingly
I have added tests to cover my changes
All new and existing tests passed

mithro

Why do the golden files change?

tangxifan · 2019-11-18T17:29:09Z

Why do the golden files change?

This is mainly temporary during our code construction process. To enforce incremental pull requests, we will first adapt downstream routers to use RRGraph objects and then refactor the builders for RRGraph (You can see more details in our design document https://docs.google.com/document/d/1LMIlpYoppFtuSi_OZZJNLZ7edWFuuC0js4hm1Ah0n2M/edit?usp=sharing).

Therefore, at this stage, I will create a RRGraph object, which duplicates the routing resource graph modeled by legacy data structure rr_node. This will cause memory overhead considered by current golden results. However, this duplication will be eliminated when refactoring is done.
The RRGraph object is a unified data structure to contain rr_node, rr_node_indices, rr_segment as well as rr_switches, without adding more information other than those already in these legacy data structures. So, there would not be memory overhead in the refactored version.

Actually, I put this issue open for discussion. Feel free to shot how you want to keep the current golden results.

kmurray · 2019-11-19T22:09:31Z

I quickly ran this PR on the Titan23 design neuron (which is a moderate sized design; the vtr_reg_basic vtr_reg_strong tests only include small designs so that they run fast) and got the following:

## Build routing resource graph took 32.35 seconds (max_rss 2405.3 MiB, delta_rss +612.6 MiB)
  RR Graph Nodes: 5149691
  RR Graph Edges: 40724744
## Build routing resource graph object
## Build routing resource graph object took 30.41 seconds (max_rss 5911.4 MiB, delta_rss +3506.2 MiB)

which implies that the new object is using substantially more memory (5.7x) than the previous data structure. We probably need to look into why this is. I would expect the new RR graph object to use somewhat more (maybe 20-30%) but not this much more. My best guess is that there is some memory fragmentation occurring which is the problem. (I've previously seen these types of effects with the old RR graph as well).

Looking at the creation code, it seems like you aren't doing any reservations for the per-node edge lists. While you do reserve the edge elements themselves, we don't reserve the RRGraph::node_in_edges_ or RRGraph::node_out_edges_ sub vectors. That is probably the first thing I would try, as they will otherwise be constantly reallocated whenever we do RRGraph::add_edge().

After that I've had good success using the Valgrind's 'Massif' heap profiler to figure out memory usage hot spots.

tangxifan · 2019-11-20T02:58:27Z

I quickly ran this PR on the Titan23 design neuron (which is a moderate sized design; the vtr_reg_basic vtr_reg_strong tests only include small designs so that they run fast) and got the following:
## Build routing resource graph took 32.35 seconds (max_rss 2405.3 MiB, delta_rss +612.6 MiB)
  RR Graph Nodes: 5149691
  RR Graph Edges: 40724744
## Build routing resource graph object
## Build routing resource graph object took 30.41 seconds (max_rss 5911.4 MiB, delta_rss +3506.2 MiB)
which implies that the new object is using substantially more memory (5.7x) than the previous data structure. We probably need to look into why this is. I would expect the new RR graph object to use somewhat more (maybe 20-30%) but not this much more. My best guess is that there is some memory fragmentation occurring which is the problem. (I've previously seen these types of effects with the old RR graph as well).

Looking at the creation code, it seems like you aren't doing any reservations for the per-node edge lists. While you do reserve the edge elements themselves, we don't reserve the RRGraph::node_in_edges_ or RRGraph::node_out_edges_ sub vectors. That is probably the first thing I would try, as they will otherwise be constantly reallocated whenever we do RRGraph::add_edge().

After that I've had good success using the Valgrind's 'Massif' heap profiler to figure out memory usage hot spots.

Thanks for the insights. Very helpful! I will follow this idea and optimize the RRGraph object.
So I will revert back to the old golden results and see if I can fit again.
By the way, I am thinking about other sources of the problem.

Does the current statistics include memory usage for rr_node_indices? Current RRGraph contains it besides rr_nodes.
We have two big mappings old_to_new_rr_node and old_to_new_rr_edge. Will these create memory overhead?

kmurray · 2019-11-20T16:02:47Z

* Does the current statistics include memory usage for `rr_node_indices`? Current `RRGraph` contains it besides `rr_nodes`.

Yes, I believe it does. Since rr_node_indices is created at the same time as the RR graph (in the traditional RR graph builder) it should be included in the +612MiB

* We have two big mappings `old_to_new_rr_node` and `old_to_new_rr_edge`. Will these create memory overhead?

Yes, they will have some overhead. I'd focus on the edge lists first since I expect they will be the bigger issue.

You currently are using:

     std::map<int, RRNodeId> old_to_new_rr_node;

Since both the key and value are small (2x32-bits) the overhead for a std::map BST node will be large (at least 2x64 bit pointers per key-value pair). Since we know the old RR node indices are contiguous integers you could achieve the same look-up effect with:

std::vector<RRNodeId> old_to_new_rr_node(rr_nodes.size(), -1);

which would be much lower overhead (no pointer overhead per node). Note that we sized it exactly to avoid dynamically growing its size which can cause memory fragmentation.

The edge look-up is more challenging since the key is not a contiguous ID range:

     std::map<std::pair<int, int>, RREdgeId> old_to_new_rr_edge; // Key:

Since we know precisely how many edges there are in the graph, you could do something like:

std::vector<std::pair<std::pair<int,short>,RREdgeId> old_to_new_rr_edges_vec(num_edges);

for (int inode = 0; inode < rr_nodes.size(); ++i) {
    for (short iedge : rr_nodes[inode].edges()) {
        RREdgeId edge_id = rr_graph.add_node();
        old_to_new_rr_edges_vec.emplace_back({{inode,iedge},edge_id});
    }
}
vtr::flat_map<std::pair<int,short>,RREdgeId> old_to_new_rr_edges(std::move(old_to_new_rr_edges_vec));

Which uses a vtr::flat_map (a map implemented as a sorted vector, rather than a BST), constructed directly from an exactly sized vector. That would again avoid the 2x64bit pointer overhead for each BST node used by std::map.

kmurray · 2019-11-21T15:07:28Z

Re-running on neuron with the latest changes does show improvement:

## Build routing resource graph took 29.99 seconds (max_rss 2505.3 MiB, delta_rss +656.2 MiB)
  RR Graph Nodes: 5149691
  RR Graph Edges: 40724744
## Build routing resource graph object
## Build routing resource graph object took 3.80 seconds (max_rss 3787.9 MiB, delta_rss +1282.6 MiB)

The object data structure only uses 2x the baseline RR graph (down from 5.7x).

LNIS-Projects · 2019-11-21T15:58:43Z

Re-running on neuron with the latest changes does show improvement:
## Build routing resource graph took 29.99 seconds (max_rss 2505.3 MiB, delta_rss +656.2 MiB)
  RR Graph Nodes: 5149691
  RR Graph Edges: 40724744
## Build routing resource graph object
## Build routing resource graph object took 3.80 seconds (max_rss 3787.9 MiB, delta_rss +1282.6 MiB)
The object data structure only uses 2x the baseline RR graph (down from 5.7x).

Indeed, I found that the node_in_edge and node_out_edge reservation do help in reducing memory fragments.
After applying this, the number of failed test cases in regression tests drops from 4 to 2.
Moreover, after I removed the big edge mapping old_to_new_rr_edge in convert_rr_graph, the memory footprint drops significantly.
After applying this, the number of failed test cases in regression tests further drops from 2 to 1.
The memory footprint of the test case multiple_io_types.xml/raygentop.v/common drops from 1GB to 600MB, while the golden result is 400MB.
Now I keep looking into the memory hotspots in convert_rr_graph.
I have also tried to reserve memory in the fast node_lookup but it appears to have no impact on the memory footprint.

However, we have to admit that RRGraph object indeed consumes more memory than the legacy rr_node. This is because:

The input edges for each node are now stored in RRGraph object, while they are not in rr_node. Note that rr_node only keeps the number of input edges, which is fan_in. This is the major source of memory overhead. But in practice, node_in_edges is very important and useful when you traverse the graph and apply back-annotation. Memory investment is worthwhile.
The ids of node and edge cost additional memory.
Segment id for each node costs additional memory. I may try to remove it and see memory benefits.

Any thoughts are warmly welcomed.

kmurray · 2019-11-21T18:53:12Z

However, we have to admit that RRGraph object indeed consumes more memory than the legacy rr_node.

Yes this is true.

Previously when I've looked at this I've found it helpful to work out on a spread sheet what the memory usage is. Its much easier to experiment with a spreadsheet than re-coding everything.Take a look at this spreadsheet which works out roughly where we spend memory.

The major culprits are things which are O(edges) as there are far more edges than anything else.

Optimizations which seem to be no-brainers:

Use uint16_t for num_config in/out edges (size_t is hugely overkill, and the old RR graph also used shorts for this).
We can re-define the NodeLookup from:

    /* Fast look-up to search a node by its type, coordinator and ptc_num 
     * Indexing of fast look-up: [0..xmax][0..ymax][0..NUM_TYPES-1][0..ptc_max][0..NUM_SIDES-1] 
     */
    typedef std::vector<std::vector<std::vector<std::vector<std::vector<RRNodeId>>>>> NodeLookup;
    mutable NodeLookup node_lookup_;

to

    /* Fast look-up to search a node by its type, coordinator and ptc_num 
     * Indexing of fast look-up: [0..xmax][0..ymax][0..NUM_TYPES-1][0..ptc_max][0..NUM_SIDES-1] 
     */
    typedef vtr::NdMatrix<std::vector<std::vector<RRNodeId>>,3> NodeLookup;
    mutable NodeLookup node_lookup_;

Since the width, height and number of RR types are fixed (i.e. known) at RR graph construction time. This would avoid the std::vector overhead (24 bytes for each instance). We'd want to keep the inner two dimensions as std::vector, since they are not fully dense.

Potential functionality preserving optimizations:

Instead of storing the each valid Id (node_ids_, edge_ids_), simply store the delimeters of the valid range in a vector and do binary search to check whether a given Id is in the valid range or not. Since we usually only have valid IDs this should be effectively constant time, but will save ~175MiB (mostly from eliminating edge_ids).
In various places where we're storing a small number of elements we could try using vtr::small_vector, which trades-off maximum vector size for more efficient storage (For small sizes it can also store elements in place within the object). For instance this could be used for the inner dimensions of the NodeLookup:

    /* Fast look-up to search a node by its type, coordinator and ptc_num 
     * Indexing of fast look-up: [0..xmax][0..ymax][0..NUM_TYPES-1][0..ptc_max][0..NUM_SIDES-1] 
     */
    typedef vtr::NdMatrix<vtr::small_vector<vtr::small_vector<RRNodeId,uint8_t>,uint16_t>,3> NodeLookup;
    mutable NodeLookup node_lookup_;

Since NUM_SIDES=4 we can get away with using 1 byte to store the size of the inner dimension, and since the number of PTCs is at most a few hundred we can use 2 bytes for the size.

Potential Optimizations with more trade-offs:

Don't store backward edges (i.e. node_in_edges). We loose some functionality (but match the functionality of the old RR graph). Saves ~260MiB.
Use a unified array per node to store all edges (in, out, config, non config) along with counts to calculate the start/end of each range of edges.

Small optimizations which probably don't offer a big pay off:

Use a union for node side/direction
Store only unique RC values

kmurray · 2019-11-21T18:54:22Z

If I had to push for the simplest path forward it would be to just apply the obvious optimizations, followed by:

Storing only the delimters of valid ID ranges

and if that is insufficient:

Drop storing the node_in_edges

vaughnbetz · 2019-11-21T19:53:54Z

Discussing this with Kevin ...
I think the edge_ids vector can be removed.
Current use: storing whether or not an id is valid. This can be more efficiently stored as:
unordered_map of the invalid edges (as usually this hash table will be empty).

Checking if an edge is valid: checks if the id is in range (0 < id < edge_ids.size) and probes the unordered_map to see if it is invalid.

Compressing rr-graph to remove invalid edges: You can walk through the hash table and put the invalid edge ids in a vector, sort it, use that to walk (once) through the edge_ids and reindex them as we can see immediately by how much to reduce the value of every edge id.

Should do the same thing for node_ids as it gives savings too.

Expected savings from the spreadsheet Kevin attached: 11.8% from edges + 1.5% from nodes = 13.3% memory footprint. Should have no impact on runtime.

vaughnbetz · 2019-11-21T19:55:00Z

From the spreadsheet, bidirectional edges are costing about 21.8% memory footprint. Convenient though. Suggest optimizing everything else and then we can think about that one, as deleting them is a real trade-off.

vaughnbetz · 2019-11-21T20:00:58Z

I like the single node edge array + count delimiter option. 13% expected memory reduction, no impact on typical users of the rr-graph. We lose constant time insertion of an arbitrary type of edge as we don't have unique vectors for each category, but only the rr-graph builder would care about that, and it could just set up the edge_* arrays, and then the end of the rr-graph builder could walk them to set up the node_edges* data.

…ry footprint

… to identify groups

This avoids exposing the details of how we are tracking invalid edges through-out the RRGraph implementation code.

Previously used a std::unordered_map but the value was unused.

This should be more run-time and memory efficient than creating a vector of the entire range and returning it.

… class

Users of the RRGraph shouldn't care how the edge/node iteration is implemented so move the implementation below the public methods to improve readability.

We now use a single array to store the edges associated with a node. The array is sorted into sub-ranges (e.g. incoming/outgoing, configurable/nonconfigurable) which still allows efficient iteration through the edge subranges. We store the sizes of the various sub-ranges to enable quickly determination of the delimiters between sub-ranges. We also use a raw pointer for the edge array (rather than a vtr::vector) which further saves memory. The create_edge() routine now no longer inserts the edge into the node data. Instead a single call is made to rebuild_node_edges() which will walk the various RRGraph::edge_* members to precisely allocate the relevant node_edges_ and node_num*edges_ data members. The trade-off of this change is that the various node_*_edges() will not work properly after an edge is added to the RR graph until rebuild_node_edges() is called. Currently it rebuilds the node edges from scratch, and so calls to it should be minimized (the preferred approach is to add all edges and then call rebuild_node_edges() once). If needed, support for incrementally rebuilding node_edges could be added (while the use of a single array per node type in general require insertion not at the end of the array, an O(n) operation, the number of edges per node is typically bounded by a small constant, so it would still be reasonably efficient); it is not currently required and so is left as *potential* future work should the need for it arise.

Instead clients should use the .size() member of the relevant range.

tangxifan · 2019-11-26T05:53:13Z

It seems that we have some errors in getting titan benchmarks.

Building target(s): get_titan_benchmarks
Downloading (~1GB) and extracting Titan benchmarks (~10GB) into VTR source tree.
  File "./vtr_flow/scripts/download_titan.py", line 86
    print "Found existing {} with matching checksum (skipping download and extraction)".format(tar_gz_filename)
                                                                                      ^
SyntaxError: invalid syntax

tangxifan · 2019-11-26T17:04:43Z

Are the titan failures due to the regtest flagging the resident memory set as being out of bounds? Or did the compile fail (out of memory or anything else)?
Can you produce a spreadsheet that gives before and after peak memory footprint (with and without the new rr-graph object being created) as that would be another useful view of the memory footprint impact (it'll be higher than the real impact as both rr-graphs are loaded, but it would let us lock for unusually large increases).

All the benchmarks passed the run_vtr_flow, meaning that packing, placement and routing are successful. Errors occurred only when parse_vtr_flow checks the memory usage and complains they exceed the golden results too much.

I am reworking the spreadsheet to record golden memory results and current memory usage.
Actually, I have another idea to capture the memory size of rr_node.
I would like to call only VPR router for each benchmark. In this case, the packer and placer will only read-in previous results, which should not create too much holes in memory.
Then router will build rr_graph and we can find the actual memory size of rr_node.
What do you think?

kmurray · 2019-11-26T18:23:55Z

I am reworking the spreadsheet to record golden memory results and current memory usage.
Actually, I have another idea to capture the memory size of rr_node.
I would like to call only VPR router for each benchmark. In this case, the packer and placer will only read-in previous results, which should not create too much holes in memory.
Then router will build rr_graph and we can find the actual memory size of rr_node.
What do you think?

Yes, that seems reasonable.

Also, your changes in 11f719a put the vtr::malloc_trim() calls after the vtr::ScopedStartFinishTimers are constructed. You'll actually want to do them the other way around (since the timers record the current memory usage when constructed in order to calculate the delta, we want to ensure memory is trimmed first).

…tr-verilog-to-routing into rr_graph_refactoring

tangxifan · 2019-11-26T20:57:12Z

I am reworking the spreadsheet to record golden memory results and current memory usage.
Actually, I have another idea to capture the memory size of rr_node.
I would like to call only VPR router for each benchmark. In this case, the packer and placer will only read-in previous results, which should not create too much holes in memory.
Then router will build rr_graph and we can find the actual memory size of rr_node.
What do you think?

Yes, that seems reasonable.

Also, your changes in 11f719a put the vtr::malloc_trim() calls after the vtr::ScopedStartFinishTimers are constructed. You'll actually want to do them the other way around (since the timers record the current memory usage when constructed in order to calculate the delta, we want to ensure memory is trimmed first).

Thanks for the advice. Just made the modification.
From what I observed so far, peak memory is reached most of time when create_device() function is executed. After looking into that, I realize that you create a rr_graph before placement and routing. But it seems that this rr_graph is a detailed rr_graph? Are we still using global rr_graphs for placer?

… on rr_graph

vaughnbetz · 2019-11-27T00:22:37Z

We create a detailed routing (wires, switches) rr-graph for the placer, so we can profile / search the routing graph to produce delay lookups that the placer can quickly access.

tangxifan · 2019-11-30T19:41:17Z

I have summarized in the spreadsheet about the peak memory usage of VPR flow and rr_graph.
It also compares the memory usage of golden results, before refactoring (tested on our local server) and after refactoring (tested on our local server).
I have tested over both VTR benchmarks and Titan benchmarks.
I noticed that on my local server, it seems that all the Titan benchmarks failed in QoR (too much overhead in memory usage）even before refactoring.
Can you check quickly on your local computers and tell me if it goes wrong at your side as well?

In the spreadsheet, you may notice that even when I turn on router only, the memory statistics on rr_node is not accurate. You can see some benchmarks give zero usage which does not make any sense. But when you compare to the memory statistics of running vpr flow, it is more accurate.

vaughnbetz · 2019-12-02T16:16:43Z

For the Titan benchmarks, what is the difference between the two router-only runs? I see two columns, with and without refactoring, for the router only runs.

tangxifan · 2019-12-02T17:15:20Z

For the Titan benchmarks, what is the difference between the two router-only runs? I see two columns, with and without refactoring, for the router only runs.

Sorry. That is a typo. The left part is full VPR flow-run while the right part is router-only run.
I have corrected it in the spreadsheet.

tangxifan · 2020-01-30T20:26:15Z

Hi all, I understand that the memory footprint has become a critical concern here. To address that, I suggest unit tests before deployment. My plan is as follows:

Move data structure rr_node and RRGraph out from vpr to a library librrgraph in libs.
And I remove the conversion function from VPR codes.
Build readers and writers for RRGraph as we did for rr_node. We make sure that the RRGraph can read the same XML and output the same XML as rr_node did.
Compare the memory footprint of both objects with a set of routing resource graphs (*.xml)
Optimize the internal data organization of RRGraph if needed.

As such, we can be confident that before deployment

RRGraph object is functional
RRGraph object will not cause overheads

After this, we can continue our incremental refactoring.
Your advice is warmly welcomed.

tangxifan · 2020-02-05T17:40:27Z

Hi all,
I have experimentally deployed the RRGraph object in VPR8 version (commit 2780988).
My implementation is now under testing in the OpenFPGA framework.
My modification covers

routing resource graph builder
Routers
Routing results storage.
Routing stats print-out
Drawer
Currently, I have not seen any memory overhead reported by vtr-basic and vtr-strong regression tests. It means that it really worths a try to land this refactoring effort.
During this effort, I have seen that current rr_node_indicies have a lot of exceptions, which are not allowed by the new data structures. For example, the indexing on SOURCE/SINK/OPIN/IPIN of grid whose width and height is >1. These all bring a lot of difficulties, i.e., many hard-to-fix bugs, when adapting the rr_graph builders.
In terms of QoR, I have seen some QoR shift:
1. minimum routable channel width is reduced for some benchmarks. I am still investigating why.
2. critical path delay is slightly reduced. I suspect there is something wrong in the rc-tree annotation.
However, these problems should be addressed very soon.

probot-autolabeler bot added lang-cpp C/C++ code libvtrutil tests VPR VPR FPGA Placement & Routing Tool VTR Flow VTR Design Flow (scripts/benchmarks/architectures) labels Nov 15, 2019

mithro reviewed Nov 16, 2019

View reviewed changes

infra: Update Travis Status Icon URL

e826f57

tangxifan force-pushed the rr_graph_refactoring branch from 057ded1 to e0a212a Compare November 20, 2019 23:50

add create rr_graph_obj function which loads rr_nodes to the object

62ede3a

tangxifan force-pushed the rr_graph_refactoring branch 2 times, most recently from a8a3784 to 62ede3a Compare November 20, 2019 23:54

tangxifan added 6 commits November 20, 2019 17:35

add in/out edge reserve methods for RRGraph object

60de1d6

remove redundant files from util about RRGraph object

dc06c98

fix bugs in reserving in/out_edges for RRGraph object

d48dc31

optimizing memory footprints by removing unused vectors and maps

b0048c0

fix broken code format

b814fbf

further remove the large vector used temporarily when loading RRGraph

1f5f178

kmurray and others added 2 commits November 21, 2019 17:56

vpr: Remove unused INTRA_CLUSTER_EDGE RR node type

7b3a490

change num_non_config in/out edges to short integer, for sake of memo…

4f9f90c

…ry footprint

tangxifan and others added 11 commits November 25, 2019 22:51

now node_in/out_edges are arranged in the same vector with delimeters…

643f21f

… to identify groups

minor code format fix

5342a7c

vpr: Rename RRGraph node/edge_id_range_ to num_nodes_/num_egdes_

5e62b39

vpr: Prefer using valid_node_id()/valid_edge_id() in RRGraph

967d18f

This avoids exposing the details of how we are tracking invalid edges through-out the RRGraph implementation code.

vpr: Use std::unordered_set for tracking invalied IDs in RRGraph

d042246

Previously used a std::unordered_map but the value was unused.

vpr: Use lazy iterators to iterate through RRGraph nodes/edges

8f1fa62

This should be more run-time and memory efficient than creating a vector of the entire range and returning it.

vpr: Refactor lazy RRGraph node/edge iterators into a common template…

52d696c

… class

vpr: Use forward declaration to move lazy_id_iterator implementation

0907cc4

Users of the RRGraph shouldn't care how the edge/node iteration is implemented so move the implementation below the public methods to improve readability.

vpr: Remove RRGraph::node_num_*edge() data members.

763b22d

Instead clients should use the .size() member of the relevant range.

add malloc_trim to have clean memory stats when building rr_graph

0e81632

tangxifan added 2 commits November 26, 2019 13:39

move malloc_trim ahead of timer to gain more accurate memory stats

88473d9

Merge branch 'rr_graph_refactoring' of https://github.com/tangxifan/v…

ab912a5

…tr-verilog-to-routing into rr_graph_refactoring

probot-autolabeler bot added docs Documentation infra Project Infrastructure lang-python Python code scripts Utility & Infrastructure scripts labels Nov 26, 2019

comment out the create_device memory stats to have clean memory stats…

441f55a

… on rr_graph

litghost mentioned this pull request Jan 24, 2020

Proxy rr node #1084

Closed

9 tasks

kmurray mentioned this pull request Jan 31, 2020

Initial refactoring of edge storage #1085

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converter from rr_nodes to RRGraph object #1048

Converter from rr_nodes to RRGraph object #1048

tangxifan commented Nov 15, 2019 •

edited

Loading

mithro left a comment

tangxifan commented Nov 18, 2019

kmurray commented Nov 19, 2019 •

edited

Loading

tangxifan commented Nov 20, 2019

kmurray commented Nov 20, 2019

kmurray commented Nov 21, 2019

LNIS-Projects commented Nov 21, 2019 •

edited

Loading

kmurray commented Nov 21, 2019

kmurray commented Nov 21, 2019

vaughnbetz commented Nov 21, 2019

vaughnbetz commented Nov 21, 2019

vaughnbetz commented Nov 21, 2019

tangxifan commented Nov 26, 2019

tangxifan commented Nov 26, 2019

kmurray commented Nov 26, 2019

tangxifan commented Nov 26, 2019

vaughnbetz commented Nov 27, 2019

tangxifan commented Nov 30, 2019

vaughnbetz commented Dec 2, 2019

tangxifan commented Dec 2, 2019

tangxifan commented Jan 30, 2020

tangxifan commented Feb 5, 2020

Converter from rr_nodes to RRGraph object #1048

Are you sure you want to change the base?

Converter from rr_nodes to RRGraph object #1048

Conversation

tangxifan commented Nov 15, 2019 • edited Loading

Description

Related Issue

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

mithro left a comment

Choose a reason for hiding this comment

tangxifan commented Nov 18, 2019

kmurray commented Nov 19, 2019 • edited Loading

tangxifan commented Nov 20, 2019

kmurray commented Nov 20, 2019

kmurray commented Nov 21, 2019

LNIS-Projects commented Nov 21, 2019 • edited Loading

kmurray commented Nov 21, 2019

kmurray commented Nov 21, 2019

vaughnbetz commented Nov 21, 2019

vaughnbetz commented Nov 21, 2019

vaughnbetz commented Nov 21, 2019

tangxifan commented Nov 26, 2019

tangxifan commented Nov 26, 2019

kmurray commented Nov 26, 2019

tangxifan commented Nov 26, 2019

vaughnbetz commented Nov 27, 2019

tangxifan commented Nov 30, 2019

vaughnbetz commented Dec 2, 2019

tangxifan commented Dec 2, 2019

tangxifan commented Jan 30, 2020

tangxifan commented Feb 5, 2020

tangxifan commented Nov 15, 2019 •

edited

Loading

kmurray commented Nov 19, 2019 •

edited

Loading

LNIS-Projects commented Nov 21, 2019 •

edited

Loading