Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rtlil: Speeds up string decoding by 30% #3940

Closed

Conversation

QuantamHD
Copy link
Contributor

This change represents about a 2% speed up of Yosys as a whole.

Written by @rmlarsen

@rmlarsen
Copy link
Contributor

Did you intend to include the change to decode_string here? Or is that going to be a separate PR?

This change represents about a 2% speed up of Yosys as a
whole.

Written by @rmlarsen

Co-authored-by: Rasmus Larsen <[email protected]>
Signed-off-by: Ethan Mahintorabi <[email protected]>
@QuantamHD QuantamHD force-pushed the const_string_improvements branch from 8f6a9f5 to 2235c3f Compare September 19, 2023 00:12
@QuantamHD
Copy link
Contributor Author

Good catch. I added it back.

bits.reserve(str.size() * 8);
for (int i = str.size() - 1; i >= 0; --i) {
const unsigned char ch = str[i];
bits.insert(bits.end(), LUT[ch].begin(), LUT[ch].end());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am bit surprised by this being faster -- would you perhaps have a figure for this change alone?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This in itself is about 20% faster. Decode_string is about 30% faster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before:
image

After:
image


int i = n_over_8 * 8;
if (i < n) {
fill_byte(i, n);
Copy link
Member

@povik povik Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must say I really dislike the closure here. What about something like:

    int cap = GetSize(bits); // where the current byte ends
    for (int i = GetSize(bits) / 8 * 8; i >= 0; i -= 8) {
        for (int j = i; j < cap; j++) {
            ...
        }
        cap = i;
    }

Copy link
Contributor

@rmlarsen rmlarsen Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a named helper function for filling each byte is more readable to me, but I suppose it's a matter of what style you are used to reading. But we can certainly change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do change it. While at it, please fix the patch to use tabs like in the rest of the file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also to consider is splitting the patches: we are ready to merge the decode_string() change but we are not so sure about the lookup table in the constructor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QuantamHD do you want to update / split the PR or should I just file a new one?

@povik
Copy link
Member

povik commented Sep 26, 2023

We are curious about the timings. Couple of questions: when you say this speeds up Yosys as a whole by 2 %, do you mean some representative ASIC flow you are running on your end? Would you have some typical timings for the read_verilog / hierarchy steps on their own, ideally absolute? We assume those steps are where the significant speedup is, e.g. in constructing the Const values for all the src attributes, et cetera.

@rmlarsen
Copy link
Contributor

rmlarsen commented Sep 26, 2023

Hi Martin,

Thanks for the comments. Response below.

We are curious about the timings. Couple of questions: when you say this speeds up Yosys as a whole by 2 %, do you mean some representative ASIC flow you are running on your end? Would you have some typical timings for the read_verilog / hierarchy steps on their own, ideally absolute? We assume those steps are where the significant speedup is, e.g. in constructing the Const values for all the src attributes, et cetera.

Yes, the 2% is the reduction in overall time of the synthesis flow for a representative circuit from a Google hardware block that we are running through Yosys. I can provide you with a flame graph for example, since that probably gives more context, or a wider slice of the Tree view.

@povik
Copy link
Member

povik commented Sep 26, 2023

Hello Rasmus,

Yes, anything you could provide to inform our discussion would be helpful.

@rmlarsen
Copy link
Contributor

@povik Will do. Notice that @QuantamHD is OOO until tomorrow. I'd like to coordinate with him on how to proceed in terms of splitting, so give us a few days. Thanks.

@povik
Copy link
Member

povik commented Sep 26, 2023

We will probably next look at this on Monday 14:00 UTC during the weekly development call (which is btw open to public and you are more than welcome to join if interested).

@rmlarsen
Copy link
Contributor

Ah, good to know. Thanks! Would be nice to join the call if I can at some point.

@povik
Copy link
Member

povik commented Sep 26, 2023

Link here: https://meet.jit.si/yosyshq-slack-devel-discuss

It's scheduled to be an hour long, occurs every week.

@rmlarsen
Copy link
Contributor

@povik I moved the decode_string() change to #3959. I got rid of the lambda and after a few tries found a variant that was a bit faster too.

@QuantamHD
Copy link
Contributor Author

@rmlarsen Should we close this then?

@rmlarsen
Copy link
Contributor

@QuantamHD sure. We can upstream the Const constructor separately, if the Yosys authors find it acceptable.

@QuantamHD QuantamHD closed this Sep 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants