Improve `opt_merge` performance #4175

povik · 2024-02-01T09:01:19Z

See the commits. This replaces the hashing implementation, and makes it ignore cells holding memory initialization data.

Ping @rmlarsen since they were interested in opt_merge performance before.

rmlarsen · 2024-02-01T22:25:00Z

Cool. I'll run our benchmarks on it when I get a chance.

rmlarsen · 2024-02-05T22:23:02Z

For our circuit, this reduces the time spent in OptMerge from ~25.5s to ~21.8s, so a ~17% speedup.

Flame graph before:

Flame graph after:

povik · 2024-02-05T22:27:15Z

@rmlarsen Interesting. There seem to be much more time spent in compare_cell_parameters_and_connections after the change, so that suggests there's more false positives where the hash matches but the cells actually don't.

rmlarsen · 2024-02-05T22:37:01Z

@povik yeah, that is odd. Is the mkhash function not mixing the hashes sufficiently?

passes/opt/opt_merge.cc

rmlarsen · 2024-02-05T23:36:47Z

@povik yeah mkhash creates a simple 32 bit hash:

// The XOR version of DJB2
inline unsigned int mkhash(unsigned int a, unsigned int b) {
	return ((a << 5) + a) ^ b;
}

You end up combining a lot of those potentially. So the hashing is definitely weaker.

QuantamHD · 2024-02-05T23:46:48Z

I wonder if pulling in xxhash or cityhash would be an improvement here.

rmlarsen · 2024-02-05T23:49:53Z

BTW: This code is really weird. unsigned int is 32 bit per the C++ standard.

yosys/kernel/hashlib.h

Line 39 in 1b73b5b

inline unsigned int mkhash_xorshift(unsigned int a) {

rmlarsen · 2024-02-06T00:03:42Z

passes/opt/opt_merge.cc

-			hash_conn_strings.push_back(s + "\n");
+			conn_hash = mkhash(a.hash(), acc);
+		} else {
+			for (auto conn : cell->connections())


I don't quite grok what's going on here, but it seems quite different from the original hashing of connections. Could this be the source of more collisions?

povik · 2024-02-06T09:14:51Z

BTW: This code is really weird. unsigned int is 32 bit per the C++ standard.

yosys/kernel/hashlib.h

Line 39 in 1b73b5b

inline unsigned int mkhash_xorshift(unsigned int a) {

Right, that doesn't make much sense.

povik · 2024-02-06T09:16:44Z

Turns out the collisions are due to a combination of mkhash being a weak mixing primitive, and Const hashing not working at all (hash() for those returns a constant due to a coding error). Let me collect my changes and push.

nakengelhardt · 2024-02-06T09:49:45Z

BTW: This code is really weird. unsigned int is 32 bit per the C++ standard.

Wait, what? ILP64 may not be popular but since when does the C++ standard ban it?

povik · 2024-02-06T09:53:40Z

BTW: This code is really weird. unsigned int is 32 bit per the C++ standard.

Wait, what? ILP64 may not be popular but since when does the C++ standard ban it?

Second closer look at https://en.cppreference.com/w/cpp/language/types indeed confirms the standard allows int be 64-bit, the page just doesn't list it as an option in the "common data models" table. I will revert the mkhash_xorshift change.

povik · 2024-02-06T10:09:17Z

I will revert the mkhash_xorshift change.

On second thought those ILP64 being so rare there's no point keeping a special branch for them, if the default branch is basically fine too. I think I will keep the change in.

nakengelhardt · 2024-02-06T10:18:28Z

It's currently the only way to get around the limitation on design size, which people do run into somewhat regularly.

yosys/kernel/hashlib.h

Lines 203 to 204 in d00843d

    
           if (sizeof(int) == 4) 
        
           	throw std::length_error("hash table exceeded maximum size.\nDesign is likely too large for yosys to handle, if possible try not to flatten the design.");

I don't know if anyone has taken this path in practice, but from past conversations in issues I suspect some people patched yosys locally to replace "int" with "long". We might want to go that way long-term too... we've stayed away from it so far out of fear of introducing more subtle issues that our tests won't be able to uncover, but given how we keep finding that the hash wasn't working properly in the first place maybe we should just go for it.

Either way we should probably take that discussion to a separate issue/PR.

rmlarsen · 2024-02-06T18:08:57Z

@nakengelhardt @povik if this is indeed something people struggle with in the Yosys, I'd highly recommend going with the standard types int32_t and int64_t etc. We don't have to suffer these indignities anymore.

nakengelhardt · 2024-02-06T18:23:59Z

@nakengelhardt @povik if this is indeed something people struggle with in the Yosys, I'd highly recommend going with the standard types int32_t and int64_t etc. We don't have to suffer these indignities anymore.

How does that help, what benefit do you get from requiring a precise size rather than a minimum size here?

rmlarsen · 2024-02-06T18:35:09Z

@nakengelhardt @povik if this is indeed something people struggle with in the Yosys, I'd highly recommend going with the standard types int32_t and int64_t etc. We don't have to suffer these indignities anymore.

How does that help, what benefit do you get from requiring a precise size rather than a minimum size here?

Well defined semantics of your code, perhaps?

povik · 2024-02-06T18:53:54Z

But it doesn't seem like the lack of defined semantics is the issue here, or something anyone is struggling with. It seems like the underspecification of int is a feature right now, there's sizeof(int) == 4 which is the well-tested case, and sizeof(int) == 8 for adventurers who are hitting limits otherwise. Of course eventually moving to X with fixed sizeof(X) == 8 in the code where people are hitting limits is desirable.

rmlarsen · 2024-02-06T19:02:21Z

But it doesn't seem like the lack of defined semantics is the issue here, or something anyone is struggling with. It seems like the underspecification of int is a feature right now, there's sizeof(int) == 4 which is the well-tested case, and sizeof(int) == 8 for adventurers who are hitting limits otherwise. Of course eventually moving to X with fixed sizeof(X) == 8 in the code where people are hitting limits is desirable.

Sure. But in this instance, the code is less readable and has different behavior on different platforms, which should be avoided unless absolutely necessary IMHO.

nakengelhardt · 2024-02-06T19:04:16Z

Yes, we do need configurability of the hash size for different use cases. The current way of doing it has just fallen out of favor since (or possibly before) this code was first written. Ideally we would have 64b hash be the default on 64b architectures, but we need to maintain the option of 32b hashes for smaller/slower architecture targets where the performance hit matters (people running yosys on an older raspi are still a thing, and then there's that proof-of-concept someone made of a softcore doing synthesis to partially reconfigure the FPGA it is running on, which we want to enable just for the heck of it). So there's no way around it that any caller of the function has to work for all sizes anyway.

povik · 2024-02-06T19:07:54Z

s/int/long/ in the hashing code then? 😄

rmlarsen · 2024-02-06T19:08:07Z

Yes, we do need configurability of the hash size for different use cases. The current way of doing it has just fallen out of favor since (or possibly before) this code was first written. Ideally we would have 64b hash be the default on 64b architectures, but we need to maintain the option of 32b hashes for smaller/slower architecture targets where the performance hit matters (people running yosys on an older raspi are still a thing, and then there's that proof-of-concept someone made of a softcore doing synthesis to partially reconfigure the FPGA it is running on, which we want to enable just for the heck of it). So there's no way around it that any caller of the function has to work for all sizes anyway.

I see. [This is C++, so this is usually done with explicit overloads, not by adding a runtime check for sizeof. But there appears to be a lot of C code in Yosys.]

rmlarsen · 2024-02-06T19:08:29Z

s/int/long/ in the hashing code then? 😄

Noooooooo! ;-)

QuantamHD · 2024-02-06T19:10:04Z

What's the purpose of this custom hash function in the first place? It seems like adopting an existing function that works on char* streams would both be platform independent, and faster than what we've done here.

rmlarsen · 2024-02-06T19:14:46Z

What's the purpose of this custom hash function in the first place? It seems like adopting an existing function that works on char* streams would both be platform independent, and faster than what we've done here.

Moreover, in the code in question, it is also being used to combine hashes for which this hash function is very poor. There should probably be a separate hash function for that purpose. IFAICT, DJB2 is a very simple hashing for strings.

rmlarsen · 2024-02-06T19:17:09Z

kernel/rtlil.h

@@ -712,7 +712,7 @@ struct RTLIL::Const
 	inline unsigned int hash() const {
 		unsigned int h = mkhash_init;
 		for (auto b : bits)
-			mkhash(h, b);
+			h = mkhash(h, b);


good catch!

QuantamHD · 2024-02-06T19:21:19Z

For reference on the state of the art.

povik · 2024-02-06T19:24:59Z

DJB2 didn't make the leaderboard? Notwithstanding we are probably using it wrong in the changed opt_merge code and in few other places in hashlib.h.

whitequark · 2024-02-06T19:30:27Z

Ideally we would have 64b hash be the default on 64b architectures, but we need to maintain the option of 32b hashes for smaller/slower architecture targets where the performance hit matters (people running yosys on an older raspi are still a thing, and then there's that proof-of-concept someone made of a softcore doing synthesis to partially reconfigure the FPGA it is running on, which we want to enable just for the heck of it).

Has anyone actually benchmarked this and found that the hash function is an issue? Or is this conjecture?

rmlarsen · 2024-02-06T19:37:04Z

Ideally we would have 64b hash be the default on 64b architectures, but we need to maintain the option of 32b hashes for smaller/slower architecture targets where the performance hit matters (people running yosys on an older raspi are still a thing, and then there's that proof-of-concept someone made of a softcore doing synthesis to partially reconfigure the FPGA it is running on, which we want to enable just for the heck of it).

Has anyone actually benchmarked this and found that the hash function is an issue? Or is this conjecture?

It is not a hotspot in my benchmarks, but this was measured on Intel Skylake x86_64. The main issue in this context was the poor hashing, which caused a significant number of hash collisions. But I think Martin fixed the primary source of that (lack of hashing of constants).

povik · 2024-02-06T19:52:25Z

I assume Catherine's question was about hashing in general all over Yosys code, and the associated dict<> and pool<> operations which consume the hashes. Was that what you looked at, Rasmus, or did you mean opt_merge specifically?

But I think Martin fixed the primary source of that (lack of hashing of constants).

FWIW the poor properties of mkhash contributed the most in my testing, which went away when I inserted mkhash_xorshift64 in few key places in the PR here. It may not be the best approach but it's at least something we can improve on.

rmlarsen · 2024-02-06T20:04:19Z

I assume Catherine's question was about hashing in general all over Yosys code, and the associated dict<> and pool<> operations which consume the hashes. Was that what you looked at, Rasmus, or did you mean opt_merge specifically?

But I think Martin fixed the primary source of that (lack of hashing of constants).

FWIW the poor properties of mkhash contributed the most in my testing, which went away when I inserted mkhash_xorshift64 in few key places in the PR here. It may not be the best approach but it's at least something we can improve on.

Ah yes. Hash table manipulation in general (but not mkhash) still dominates the profile. Top of "Bottom up view":

FWIW, we turned on more ABC passes since my original PRs, so ignore those. But in yosys proper, the largest time sinks are hashing related.

rmlarsen · 2024-02-06T20:12:07Z

I assume Catherine's question was about hashing in general all over Yosys code, and the associated dict<> and pool<> operations which consume the hashes. Was that what you looked at, Rasmus, or did you mean opt_merge specifically?

But I think Martin fixed the primary source of that (lack of hashing of constants).

FWIW the poor properties of mkhash contributed the most in my testing, which went away when I inserted mkhash_xorshift64 in few key places in the PR here. It may not be the best approach but it's at least something we can improve on.

Ah yes. Hashing in general still dominates the profile:

FWIW, we turned on more ABC passes since my original PRs, so ignore those. But in yosys proper, the largest time sinks are hashing related.

I believe the hot spots in my profile are related to the non-coherent layout of the hash tables in yosys, not time spent in the hash function itself. Replacing the hash table implementation with something like the swisstables from Abseil would likely give a significant speedup. I have not had the time to experiment with this, as it is a rather invasive change. https://abseil.io/about/design/swisstables

rmlarsen · 2024-02-06T20:16:43Z

@povik here is the flame graph for most of the Yosys cost corresponding to the "Bottom up" list above. It's a mix of OptMergePass/OptExprPass/CleanupPass/OptMuxTreePass.

Avoid building a string which we subsequently hash when hashing cells, instead use the readily available `hash()` on IDs and `SigSpec` and combine those to build the overall cell hash.

There's little value in treating those with `opt_merge` but they can be relatively expensive to hash if their init data is large in size.

povik · 2024-11-18T13:45:25Z

Superseded by Emil's work in #4677 making opt_merge use a new hashing interface

rmlarsen reviewed Feb 5, 2024

View reviewed changes

passes/opt/opt_merge.cc Outdated Show resolved Hide resolved

rmlarsen reviewed Feb 6, 2024

View reviewed changes

povik added the status-paused Status: Unfinished, not actively worked on, PR author intends to continue PR in the future label Apr 13, 2024

povik added 7 commits July 29, 2024 10:32

opt_merge: Revisit cell hashing

bd87e93

Avoid building a string which we subsequently hash when hashing cells, instead use the readily available `hash()` on IDs and `SigSpec` and combine those to build the overall cell hash.

opt_merge: Use cells() in place of cells_

1467e85

opt_merge: Ignore $meminit/$mem cells

a689fa1

There's little value in treating those with `opt_merge` but they can be relatively expensive to hash if their init data is large in size.

hashlib: Clean up mkhash_xorshift, add a 64-bit variant

fae9ab6

hashlib: Decorate mkhash and friends nodiscard

890daa2

opt_merge: Attempt stronger cell hashing

0de4db3

opt_merge: const auto& where suitable

379c462

widlarizer force-pushed the opt_merge-performance branch from 0790bfb to 379c462 Compare July 29, 2024 08:35

hashlib: fix rebase

1b0dc91

widlarizer mentioned this pull request Oct 18, 2024

opt_merge: hashing performance and correctness #4677

Open

widlarizer mentioned this pull request Nov 15, 2024

Neater hashing interface #4524

Merged

4 tasks

povik added status-superseded Status: Work continues in a different PR or was made redundant and removed status-paused Status: Unfinished, not actively worked on, PR author intends to continue PR in the future labels Nov 18, 2024

povik closed this Nov 18, 2024

Improve opt_merge performance #4175

Improve opt_merge performance #4175

Conversation

povik commented Feb 1, 2024

rmlarsen commented Feb 1, 2024 • edited Loading

rmlarsen commented Feb 5, 2024

povik commented Feb 5, 2024

rmlarsen commented Feb 5, 2024

rmlarsen commented Feb 5, 2024

QuantamHD commented Feb 5, 2024

rmlarsen commented Feb 5, 2024

rmlarsen Feb 6, 2024

Choose a reason for hiding this comment

povik commented Feb 6, 2024

povik commented Feb 6, 2024

nakengelhardt commented Feb 6, 2024

povik commented Feb 6, 2024 • edited Loading

povik commented Feb 6, 2024

nakengelhardt commented Feb 6, 2024 • edited Loading

rmlarsen commented Feb 6, 2024

nakengelhardt commented Feb 6, 2024

rmlarsen commented Feb 6, 2024

povik commented Feb 6, 2024

rmlarsen commented Feb 6, 2024

nakengelhardt commented Feb 6, 2024

povik commented Feb 6, 2024

rmlarsen commented Feb 6, 2024 • edited Loading

rmlarsen commented Feb 6, 2024

QuantamHD commented Feb 6, 2024

rmlarsen commented Feb 6, 2024 • edited Loading

rmlarsen Feb 6, 2024

Choose a reason for hiding this comment

QuantamHD commented Feb 6, 2024

povik commented Feb 6, 2024

whitequark commented Feb 6, 2024

rmlarsen commented Feb 6, 2024

povik commented Feb 6, 2024

rmlarsen commented Feb 6, 2024 • edited Loading

rmlarsen commented Feb 6, 2024

rmlarsen commented Feb 6, 2024 • edited Loading

povik commented Nov 18, 2024

Improve `opt_merge` performance #4175

Improve `opt_merge` performance #4175

rmlarsen commented Feb 1, 2024 •

edited

Loading

povik commented Feb 6, 2024 •

edited

Loading

nakengelhardt commented Feb 6, 2024 •

edited

Loading

rmlarsen commented Feb 6, 2024 •

edited

Loading

rmlarsen commented Feb 6, 2024 •

edited

Loading

rmlarsen commented Feb 6, 2024 •

edited

Loading

rmlarsen commented Feb 6, 2024 •

edited

Loading