-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inflate: make use of enable-dfa-jump-thread
#257
Conversation
Codecov ReportAttention: Patch coverage is
|
// not the entirity of `dispatch`. We get a massive boost from that pass. | ||
// | ||
// It unfortunately does duplicate the code for some of the states; deduplicating it by having | ||
// more of the states call this function is slower. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to remove the duplication by using macros for the content of the states that are currently duplicated? It would make rust-analyzer work less well on that code though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is because of the labels that we break/continue too. Also it's fine because we won't touch this much ever again hopefully. But not ideal for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can pass the labels as macro arguments, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you? what would be the fragment specifier of the label? is it an identifier
somehow? tt
might work but often requires brackets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also in our function we load the reader and writer to the stack. you could parameterize the macro on that too but, idk, is that worth it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the very least please add a comment to every copy of every state that is duplicated indicating that they should be kept in sync.
Does this regress performance when not enabling the LLVM flag or is it perf neutral? |
it's a win even without the flag; loading the values to the stack in this restricted case is advantageous. I get a ~10% increase at chunk size 4 (versus ~20% with the flag) |
Refactor so that the llvm
enable-dfa-jump-thread
has an effect. The numbers are really good for the small chunk sizesWe're now on-par for a chunk size of 4 with zlib-ng, and doing very well overall.
It really is a massive jump for chunk sizes 4 and 5, (20% and 12% resp.) and then matters less and less for bigger chunk sizes.
NOTE: these benchmarks are run with
-Cllvm-args=-enable-dfa-jump-thread
; this commit does not enable that flag in any way, it (for now) has to be enabled manually via rustflags.