-
Notifications
You must be signed in to change notification settings - Fork 210
2019 02 05
I'm starting the make the hierarchical processing features available to the Region class (so far, only booleans were supported). First achievement is the hierarchical merge function. "Merge" is an operation in which all connected polygons are joined into a big single one. The purpose is simplification and analysis - for example selecting polygons by area can only happen after merging.
Hierarchical merge utilizes the net formation feature and connects polygons if they touch. Each such connected cluster forms one merged polygon. The merged polygon will appear at the lowest possible level in the hierarchy at which all constituents are known.
I added a small sample P&R block into the project. Thanks to Jean-Paul Chaput for allowing me to use it. This block has about 8k standard cells. The design rules are symbolic 1µm rules, but the hierarchy is realistic.
Place-and-route blocks have a hierarchical structure of standard cells, sometimes also via cells or tie-down diodes. The routine usually is within the top cell of the block and a flat arrangement of metal wires (paths, polygons) and vias rectangles (unless these are cells on their own).
The layer table is:
1 n well
3 n implant (source/drain)
4 p implant (source/drain)
5 n implant (tie-down)
6 p implant (tie-down)
7 poly
10 contact
11 metal 1
14 via 1
16 metal 2
18 via 2
19 metal 3
21 via 3
22 metal 4
25 via 4
26 metal 5
Here is some code that uses the merge feature to merge the shapes from these layout's layers. It also performs boolean operations for forming source/drain and gate areas and separating poly from diffusion contacts. Both operations happen in flat and deep mode (controlled by the $flat global variable). You can set this flag externally using
klayout -b -r this_script.rb -rd flat=1
All operations are repeated 10 times, because otherwise the timer resolution won't be sufficient to capture the
Here is the script:
dss = RBA::DeepShapeStore::new
ly = RBA::Layout::new(false)
# run from the repo's root for this file:
ly.read("testdata/algo/vexriscv_clocked_r.oas.gz")
flat = ($flat.to_i != 0)
class TimedSection
def initialize(what, &block)
$stdout.write(what + " ... ")
$stdout.flush
timer = RBA::Timer::new
timer.start
# 10 times for more precise measurements
10.times do
yield
end
timer.stop
$stdout.write(timer.to_s + "\n")
$stdout.flush
end
end
puts "Running merge in #{flat ? %q/flat/ : %q/deep/} mode"
ly.layer_indexes.sort { |a,b| ly.get_info(a).layer <=> ly.get_info(b).layer }.each do |li|
ld = ly.get_info(li)
lres = ly.layer(1000 + ld.layer, ld.datatype)
iter = ly.top_cell.begin_shapes_rec(li)
merged = nil
TimedSection::new((flat ? "Flat" : "Deep") + " merge of layer " + ld.to_s) do
region = flat ? RBA::Region::new(iter) : RBA::Region::new(iter, dss)
merged = region.merged
end
ly.insert(ly.top_cell.cell_index, lres, merged)
end
nactive, pactive, poly, cont = [ 3, 4, 7, 10 ].collect do |layer|
iter = ly.top_cell.begin_shapes_rec(ly.layer(layer, 0))
flat ? RBA::Region::new(iter) : RBA::Region::new(iter, dss)
end
nsd = nil
TimedSection::new((flat ? "Flat" : "Deep") + " nactive-poly ") do
nsd = nactive - poly
end
ly.insert(ly.top_cell.cell_index, ly.layer(1200, 0), nsd)
psd = nil
TimedSection::new((flat ? "Flat" : "Deep") + " pactive-poly ") do
psd = pactive - poly
end
ly.insert(ly.top_cell.cell_index, ly.layer(1201, 0), psd)
ngate = nil
TimedSection::new((flat ? "Flat" : "Deep") + " nactive&poly ") do
ngate = nactive & poly
end
ly.insert(ly.top_cell.cell_index, ly.layer(1202, 0), ngate)
pgate = nil
TimedSection::new((flat ? "Flat" : "Deep") + " pactive&poly ") do
pgate = pactive & poly
end
ly.insert(ly.top_cell.cell_index, ly.layer(1203, 0), pgate)
poly_cont = nil
TimedSection::new((flat ? "Flat" : "Deep") + " cont&poly ") do
poly_cont = cont & poly
end
ly.insert(ly.top_cell.cell_index, ly.layer(1204, 0), poly_cont)
diff_cont = nil
TimedSection::new((flat ? "Flat" : "Deep") + " cont-poly ") do
diff_cont = cont - poly
end
ly.insert(ly.top_cell.cell_index, ly.layer(1205, 0), diff_cont)
ly.write("merged_#{flat ? %q/flat/ : %q/deep/}.oas.gz")
The performance numbers on my laptop (Ubuntu 16, 64 bit) are:
Flat:
Flat merge of layer 1/0 ... 0.13s (sys), 11.05s (user), 11.177s (wall)
Flat merge of layer 3/0 ... 0s (sys), 6.81s (user), 6.808s (wall)
Flat merge of layer 4/0 ... 0s (sys), 8.9s (user), 8.899s (wall)
Flat merge of layer 5/0 ... 0s (sys), 1.3s (user), 1.3s (wall)
Flat merge of layer 6/0 ... 0s (sys), 1.23s (user), 1.232s (wall)
Flat merge of layer 7/0 ... 0.15s (sys), 13.7s (user), 13.851s (wall)
Flat merge of layer 10/0 ... 0.15s (sys), 7.92s (user), 8.076s (wall)
Flat merge of layer 11/0 ... 0.27s (sys), 22.93s (user), 23.204s (wall)
Flat merge of layer 14/0 ... 0s (sys), 0.82s (user), 0.821s (wall)
Flat merge of layer 16/0 ... 0s (sys), 1.86s (user), 1.865s (wall)
Flat merge of layer 18/0 ... 0.01s (sys), 0.84s (user), 0.848s (wall)
Flat merge of layer 19/0 ... 0.01s (sys), 2.07s (user), 2.081s (wall)
Flat merge of layer 21/0 ... 0s (sys), 0.06s (user), 0.062s (wall)
Flat merge of layer 22/0 ... 0s (sys), 0.09s (user), 0.094s (wall)
Flat merge of layer 24/0 ... 0.02s (sys), 5.43s (user), 5.446s (wall)
Flat merge of layer 25/0 ... 0s (sys), 0.05s (user), 0.054s (wall)
Flat merge of layer 26/0 ... 0s (sys), 0.11s (user), 0.103s (wall)
Flat merge of layer 255/0 ... 0.02s (sys), 2.27s (user), 2.287s (wall)
Flat nactive-poly ... 0.46s (sys), 21.98s (user), 22.44s (wall)
Flat pactive-poly ... 0.48s (sys), 25.67s (user), 26.146s (wall)
Flat nactive&poly ... 1.02s (sys), 22.93s (user), 23.941s (wall)
Flat pactive&poly ... 0.58s (sys), 24.68s (user), 25.268s (wall)
Flat cont&poly ... 0.26s (sys), 20.98s (user), 21.237s (wall)
Flat cont-poly ... 0.26s (sys), 22.5s (user), 22.762s (wall)
Hierarchical:
Deep merge of layer 1/0 ... 0.01s (sys), 1.75s (user), 1.744s (wall)
Deep merge of layer 3/0 ... 0s (sys), 0.29s (user), 0.296s (wall)
Deep merge of layer 4/0 ... 0s (sys), 0.32s (user), 0.32s (wall)
Deep merge of layer 5/0 ... 0s (sys), 0.25s (user), 0.253s (wall)
Deep merge of layer 6/0 ... 0s (sys), 0.23s (user), 0.239s (wall)
Deep merge of layer 7/0 ... 0s (sys), 0.39s (user), 0.386s (wall)
Deep merge of layer 10/0 ... 0s (sys), 0.49s (user), 0.496s (wall)
Deep merge of layer 11/0 ... 0.03s (sys), 12.76s (user), 12.793s (wall)
Deep merge of layer 14/0 ... 0.02s (sys), 3.54s (user), 3.559s (wall)
Deep merge of layer 16/0 ... 0.01s (sys), 4.56s (user), 4.583s (wall)
Deep merge of layer 18/0 ... 0.03s (sys), 3.51s (user), 3.532s (wall)
Deep merge of layer 19/0 ... 0.01s (sys), 3.23s (user), 3.252s (wall)
Deep merge of layer 21/0 ... 0s (sys), 0.33s (user), 0.331s (wall)
Deep merge of layer 22/0 ... 0s (sys), 0.33s (user), 0.338s (wall)
Deep merge of layer 24/0 ... 0s (sys), 0.54s (user), 0.547s (wall)
Deep merge of layer 25/0 ... 0s (sys), 0.3s (user), 0.299s (wall)
Deep merge of layer 26/0 ... 0s (sys), 0.31s (user), 0.315s (wall)
Deep merge of layer 255/0 ... 0.01s (sys), 6.07s (user), 6.073s (wall)
Deep nactive-poly ... 0s (sys), 1.26s (user), 1.275s (wall)
Deep pactive-poly ... 0s (sys), 1.2s (user), 1.2s (wall)
Deep nactive&poly ... 0s (sys), 1.18s (user), 1.183s (wall)
Deep pactive&poly ... 0s (sys), 1.16s (user), 1.17s (wall)
Deep cont&poly ... 0.01s (sys), 1.25s (user), 1.265s (wall)
Deep cont-poly ... 0s (sys), 1.31s (user), 1.306s (wall)
So overall a great improvement in the hierarchical frontend layers (factor 5 to 20 in layers 1-10). Still about factor 2 in layer 11 (metal 1) which has both hierarchical parts in the local wiring of the standard cells, but also flat parts from the power routing and pins. There is a 2-3x disadvantage of the hierarchical implementation in the flat layers (14-26) because the hierarchical analysis demands some overhead and there is no gain for the merge computation itself.
But the booleans pay this effort back with a 8x performance boost. That's because they are mainly frontend-driven and benefit from the hierarchical nature of the standard cells.
Total runtime of the script on my system is
- 46s (hierarchical)
- 210s (flat)
Another remarkable fact is the reduction of the memory footprint (maxresident on process)
- 518M (hierarchical)
- 1.6G (flat)