Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve opt_merge performance #4175

Closed
wants to merge 8 commits into from
Closed

Conversation

povik
Copy link
Member

@povik povik commented Feb 1, 2024

See the commits. This replaces the hashing implementation, and makes it ignore cells holding memory initialization data.

Ping @rmlarsen since they were interested in opt_merge performance before.

@rmlarsen
Copy link
Contributor

rmlarsen commented Feb 1, 2024

Cool. I'll run our benchmarks on it when I get a chance.

@rmlarsen
Copy link
Contributor

rmlarsen commented Feb 5, 2024

For our circuit, this reduces the time spent in OptMerge from ~25.5s to ~21.8s, so a ~17% speedup.

Flame graph before:
image

Flame graph after:
image

@povik
Copy link
Member Author

povik commented Feb 5, 2024

@rmlarsen Interesting. There seem to be much more time spent in compare_cell_parameters_and_connections after the change, so that suggests there's more false positives where the hash matches but the cells actually don't.

@rmlarsen
Copy link
Contributor

rmlarsen commented Feb 5, 2024

@povik yeah, that is odd. Is the mkhash function not mixing the hashes sufficiently?

passes/opt/opt_merge.cc Outdated Show resolved Hide resolved
@rmlarsen
Copy link
Contributor

rmlarsen commented Feb 5, 2024

@povik yeah mkhash creates a simple 32 bit hash:

// The XOR version of DJB2
inline unsigned int mkhash(unsigned int a, unsigned int b) {
	return ((a << 5) + a) ^ b;
}

You end up combining a lot of those potentially. So the hashing is definitely weaker.

@QuantamHD
Copy link
Contributor

I wonder if pulling in xxhash or cityhash would be an improvement here.

@rmlarsen
Copy link
Contributor

rmlarsen commented Feb 5, 2024

BTW: This code is really weird. unsigned int is 32 bit per the C++ standard.

inline unsigned int mkhash_xorshift(unsigned int a) {

hash_conn_strings.push_back(s + "\n");
conn_hash = mkhash(a.hash(), acc);
} else {
for (auto conn : cell->connections())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite grok what's going on here, but it seems quite different from the original hashing of connections. Could this be the source of more collisions?

@povik
Copy link
Member Author

povik commented Feb 6, 2024

BTW: This code is really weird. unsigned int is 32 bit per the C++ standard.

inline unsigned int mkhash_xorshift(unsigned int a) {

Right, that doesn't make much sense.

@povik
Copy link
Member Author

povik commented Feb 6, 2024

Turns out the collisions are due to a combination of mkhash being a weak mixing primitive, and Const hashing not working at all (hash() for those returns a constant due to a coding error). Let me collect my changes and push.

@nakengelhardt
Copy link
Member

BTW: This code is really weird. unsigned int is 32 bit per the C++ standard.

Wait, what? ILP64 may not be popular but since when does the C++ standard ban it?

@povik
Copy link
Member Author

povik commented Feb 6, 2024

BTW: This code is really weird. unsigned int is 32 bit per the C++ standard.

Wait, what? ILP64 may not be popular but since when does the C++ standard ban it?

Second closer look at https://en.cppreference.com/w/cpp/language/types indeed confirms the standard allows int be 64-bit, the page just doesn't list it as an option in the "common data models" table. I will revert the mkhash_xorshift change.

@povik
Copy link
Member Author

povik commented Feb 6, 2024

I will revert the mkhash_xorshift change.

On second thought those ILP64 being so rare there's no point keeping a special branch for them, if the default branch is basically fine too. I think I will keep the change in.

@nakengelhardt
Copy link
Member

nakengelhardt commented Feb 6, 2024

It's currently the only way to get around the limitation on design size, which people do run into somewhat regularly.

yosys/kernel/hashlib.h

Lines 203 to 204 in d00843d

if (sizeof(int) == 4)
throw std::length_error("hash table exceeded maximum size.\nDesign is likely too large for yosys to handle, if possible try not to flatten the design.");

I don't know if anyone has taken this path in practice, but from past conversations in issues I suspect some people patched yosys locally to replace "int" with "long". We might want to go that way long-term too... we've stayed away from it so far out of fear of introducing more subtle issues that our tests won't be able to uncover, but given how we keep finding that the hash wasn't working properly in the first place maybe we should just go for it.

Either way we should probably take that discussion to a separate issue/PR.

@rmlarsen
Copy link
Contributor

rmlarsen commented Feb 6, 2024

@nakengelhardt @povik if this is indeed something people struggle with in the Yosys, I'd highly recommend going with the standard types int32_t and int64_t etc. We don't have to suffer these indignities anymore.

@nakengelhardt
Copy link
Member

@nakengelhardt @povik if this is indeed something people struggle with in the Yosys, I'd highly recommend going with the standard types int32_t and int64_t etc. We don't have to suffer these indignities anymore.

How does that help, what benefit do you get from requiring a precise size rather than a minimum size here?

@rmlarsen
Copy link
Contributor

rmlarsen commented Feb 6, 2024

@nakengelhardt @povik if this is indeed something people struggle with in the Yosys, I'd highly recommend going with the standard types int32_t and int64_t etc. We don't have to suffer these indignities anymore.

How does that help, what benefit do you get from requiring a precise size rather than a minimum size here?

Well defined semantics of your code, perhaps?

@povik
Copy link
Member Author

povik commented Feb 6, 2024

But it doesn't seem like the lack of defined semantics is the issue here, or something anyone is struggling with. It seems like the underspecification of int is a feature right now, there's sizeof(int) == 4 which is the well-tested case, and sizeof(int) == 8 for adventurers who are hitting limits otherwise. Of course eventually moving to X with fixed sizeof(X) == 8 in the code where people are hitting limits is desirable.

@rmlarsen
Copy link
Contributor

rmlarsen commented Feb 6, 2024

But it doesn't seem like the lack of defined semantics is the issue here, or something anyone is struggling with. It seems like the underspecification of int is a feature right now, there's sizeof(int) == 4 which is the well-tested case, and sizeof(int) == 8 for adventurers who are hitting limits otherwise. Of course eventually moving to X with fixed sizeof(X) == 8 in the code where people are hitting limits is desirable.

Sure. But in this instance, the code is less readable and has different behavior on different platforms, which should be avoided unless absolutely necessary IMHO.

@nakengelhardt
Copy link
Member

Yes, we do need configurability of the hash size for different use cases. The current way of doing it has just fallen out of favor since (or possibly before) this code was first written. Ideally we would have 64b hash be the default on 64b architectures, but we need to maintain the option of 32b hashes for smaller/slower architecture targets where the performance hit matters (people running yosys on an older raspi are still a thing, and then there's that proof-of-concept someone made of a softcore doing synthesis to partially reconfigure the FPGA it is running on, which we want to enable just for the heck of it). So there's no way around it that any caller of the function has to work for all sizes anyway.

@povik
Copy link
Member Author

povik commented Feb 6, 2024

s/int/long/ in the hashing code then? 😄

@rmlarsen
Copy link
Contributor

rmlarsen commented Feb 6, 2024

Yes, we do need configurability of the hash size for different use cases. The current way of doing it has just fallen out of favor since (or possibly before) this code was first written. Ideally we would have 64b hash be the default on 64b architectures, but we need to maintain the option of 32b hashes for smaller/slower architecture targets where the performance hit matters (people running yosys on an older raspi are still a thing, and then there's that proof-of-concept someone made of a softcore doing synthesis to partially reconfigure the FPGA it is running on, which we want to enable just for the heck of it). So there's no way around it that any caller of the function has to work for all sizes anyway.

I see. [This is C++, so this is usually done with explicit overloads, not by adding a runtime check for sizeof. But there appears to be a lot of C code in Yosys.]

@rmlarsen
Copy link
Contributor

rmlarsen commented Feb 6, 2024

s/int/long/ in the hashing code then? 😄

Noooooooo! ;-)

@QuantamHD
Copy link
Contributor

What's the purpose of this custom hash function in the first place? It seems like adopting an existing function that works on char* streams would both be platform independent, and faster than what we've done here.

@rmlarsen
Copy link
Contributor

rmlarsen commented Feb 6, 2024

What's the purpose of this custom hash function in the first place? It seems like adopting an existing function that works on char* streams would both be platform independent, and faster than what we've done here.

Moreover, in the code in question, it is also being used to combine hashes for which this hash function is very poor. There should probably be a separate hash function for that purpose. IFAICT, DJB2 is a very simple hashing for strings.

kernel/rtlil.h Outdated
@@ -712,7 +712,7 @@ struct RTLIL::Const
inline unsigned int hash() const {
unsigned int h = mkhash_init;
for (auto b : bits)
mkhash(h, b);
h = mkhash(h, b);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

@QuantamHD
Copy link
Contributor

For reference on the state of the art.

image

@povik
Copy link
Member Author

povik commented Feb 6, 2024

DJB2 didn't make the leaderboard? Notwithstanding we are probably using it wrong in the changed opt_merge code and in few other places in hashlib.h.

@whitequark
Copy link
Member

Ideally we would have 64b hash be the default on 64b architectures, but we need to maintain the option of 32b hashes for smaller/slower architecture targets where the performance hit matters (people running yosys on an older raspi are still a thing, and then there's that proof-of-concept someone made of a softcore doing synthesis to partially reconfigure the FPGA it is running on, which we want to enable just for the heck of it).

Has anyone actually benchmarked this and found that the hash function is an issue? Or is this conjecture?

@rmlarsen
Copy link
Contributor

rmlarsen commented Feb 6, 2024

Ideally we would have 64b hash be the default on 64b architectures, but we need to maintain the option of 32b hashes for smaller/slower architecture targets where the performance hit matters (people running yosys on an older raspi are still a thing, and then there's that proof-of-concept someone made of a softcore doing synthesis to partially reconfigure the FPGA it is running on, which we want to enable just for the heck of it).

Has anyone actually benchmarked this and found that the hash function is an issue? Or is this conjecture?

It is not a hotspot in my benchmarks, but this was measured on Intel Skylake x86_64. The main issue in this context was the poor hashing, which caused a significant number of hash collisions. But I think Martin fixed the primary source of that (lack of hashing of constants).

@povik
Copy link
Member Author

povik commented Feb 6, 2024

I assume Catherine's question was about hashing in general all over Yosys code, and the associated dict<> and pool<> operations which consume the hashes. Was that what you looked at, Rasmus, or did you mean opt_merge specifically?

But I think Martin fixed the primary source of that (lack of hashing of constants).

FWIW the poor properties of mkhash contributed the most in my testing, which went away when I inserted mkhash_xorshift64 in few key places in the PR here. It may not be the best approach but it's at least something we can improve on.

@rmlarsen
Copy link
Contributor

rmlarsen commented Feb 6, 2024

I assume Catherine's question was about hashing in general all over Yosys code, and the associated dict<> and pool<> operations which consume the hashes. Was that what you looked at, Rasmus, or did you mean opt_merge specifically?

But I think Martin fixed the primary source of that (lack of hashing of constants).

FWIW the poor properties of mkhash contributed the most in my testing, which went away when I inserted mkhash_xorshift64 in few key places in the PR here. It may not be the best approach but it's at least something we can improve on.

Ah yes. Hash table manipulation in general (but not mkhash) still dominates the profile. Top of "Bottom up view":

image

FWIW, we turned on more ABC passes since my original PRs, so ignore those. But in yosys proper, the largest time sinks are hashing related.

@rmlarsen
Copy link
Contributor

rmlarsen commented Feb 6, 2024

I assume Catherine's question was about hashing in general all over Yosys code, and the associated dict<> and pool<> operations which consume the hashes. Was that what you looked at, Rasmus, or did you mean opt_merge specifically?

But I think Martin fixed the primary source of that (lack of hashing of constants).

FWIW the poor properties of mkhash contributed the most in my testing, which went away when I inserted mkhash_xorshift64 in few key places in the PR here. It may not be the best approach but it's at least something we can improve on.

Ah yes. Hashing in general still dominates the profile:

image

FWIW, we turned on more ABC passes since my original PRs, so ignore those. But in yosys proper, the largest time sinks are hashing related.

I believe the hot spots in my profile are related to the non-coherent layout of the hash tables in yosys, not time spent in the hash function itself. Replacing the hash table implementation with something like the swisstables from Abseil would likely give a significant speedup. I have not had the time to experiment with this, as it is a rather invasive change. https://abseil.io/about/design/swisstables

@rmlarsen
Copy link
Contributor

rmlarsen commented Feb 6, 2024

@povik here is the flame graph for most of the Yosys cost corresponding to the "Bottom up" list above. It's a mix of OptMergePass/OptExprPass/CleanupPass/OptMuxTreePass.

image

@povik povik added the status-paused Status: Unfinished, not actively worked on, PR author intends to continue PR in the future label Apr 13, 2024
povik added 7 commits July 29, 2024 10:32
Avoid building a string which we subsequently hash when hashing cells,
instead use the readily available `hash()` on IDs and `SigSpec` and
combine those to build the overall cell hash.
There's little value in treating those with `opt_merge` but they can be
relatively expensive to hash if their init data is large in size.
@widlarizer widlarizer force-pushed the opt_merge-performance branch from 0790bfb to 379c462 Compare July 29, 2024 08:35
@widlarizer widlarizer mentioned this pull request Nov 15, 2024
4 tasks
@povik povik added status-superseded Status: Work continues in a different PR or was made redundant and removed status-paused Status: Unfinished, not actively worked on, PR author intends to continue PR in the future labels Nov 18, 2024
@povik
Copy link
Member Author

povik commented Nov 18, 2024

Superseded by Emil's work in #4677 making opt_merge use a new hashing interface

@povik povik closed this Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status-superseded Status: Work continues in a different PR or was made redundant
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants