Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add several new formatting features used by Amaranth to translate Python format strings #4301

Merged
merged 10 commits into from
Apr 2, 2024

Conversation

whitequark
Copy link
Member

@whitequark whitequark commented Mar 28, 2024

This is a work-in-progress PR at the moment.

  • H format type (uppercase hex digits)
  • c format type (Unicode code point emitted as UTF-8)
  • = justification (align right, but after the padding)
  • arbitrary fill character
  • sign mode (either - or , as opposed to either - or +)
  • _ option (insert an underscore between each 3 decimal, 4 binary/octal/hex digits)
  • # option (insert 0x/0o/0b after sign and before padding)
  • signed hex numbers
  • padding interacts poorly with base
  • consider combination of 0 padding and _
  • test integration with Amaranth

@whitequark whitequark marked this pull request as draft March 28, 2024 06:15
@whitequark
Copy link
Member Author

@mmicko I need to have these in the next Yosys release to meet the current Amaranth release schedule. The PR should be done by the 1st or very shortly after.

@povik
Copy link
Member

povik commented Mar 28, 2024

This looks like it overlaps or supersedes #4192.

@whitequark
Copy link
Member Author

@povik I think there's actually no overlap? Note VerilogFmtArg vs FmtPart.

@povik
Copy link
Member

povik commented Mar 28, 2024

Right you are

Before this commit, the `STRING` variant inserted a literal string;
the `CHARACTER` variant inserted a string. This commit renames them
to `LITERAL` and `STRING` respectively.
This is necessary for translating Python format strings in Amaranth.
The first two were already supported with the `plus` boolean flag.
The third one is a new specifier, which is allocated the ` ` character.
In addition, `MINUS` is now allocated the `-` character, but old format
where there is no `+`, `-`, or `-` in the respective position is also
accepted for compatibility.
Before this commit, the existing alignments were `LEFT` and `RIGHT`,
which added the `padding` character to the right and left just before
finishing formatting. However, if `padding == '0'` and the alignment is
to the right, then the padding character (digit zero) was added after
the sign, if one is present.

After this commit, the special case for `padding == '0'` is removed,
and the new justification `NUMERIC` adds the padding character like
the justification `RIGHT`, except after the sign, if one is present.
(Space, for the `SPACE_MINUS` sign mode, counts as the sign.)
This format type is used to print an Unicode character (code point) as
its UTF-8 serialization. To this end, two UTF-8 decoders (one for fmt,
one for cxxrtl) are added for rendering. When converted to a Verilog
format specifier, `UNICHAR` degrades to `%c` with the low 7 bits of
the code point, which has equivalent behavior for inputs not exceeding
ASCII. (SystemVerilog leaves source and display encodings completely
undefined.)
The option is serialized to RTLIL as `#` (to match Python's and Rust's
option with the same symbol), and sets the `show_base` flag. Because
the flag is called `show_base` and not e.g. `alternate_format` (which
is what Python and Rust call it), in addition to the prefixes `0x`,
`0X`, `0o`, `0b`, the RTLIL option also prints the `0d` prefix.
The option is serialized to RTLIL as `_` (to match Python's option with
the same symbol), and sets the `group` flag. This flag inserts an `_`
symbol between each group of 3 digits (for decimal) or four digits (for
binary, hex, and octal).
Also fix interaction of `NUMERIC` justification with `show_base`.
When converted to Verilog, padding characters are replaced with one of
these two. Otherwise padding is performed with exactly that character.
Before this commit, the combination of `_` and `0` format characters
would produce a result like `000000001010_1010`.
After this commit, it would be `0000_0000_1010_1010`.

This has a slight quirk where a format like `{:020_b}` results in
the output `0_0000_0000_1010_1010`, which is one character longer than
requested. Python has the same behavior, and it's not clear what would
be strictly speaking correct, so Python behavior is implemented.
@mwkmwkmwk mwkmwkmwk marked this pull request as ready for review April 2, 2024 09:40
@mwkmwkmwk mwkmwkmwk requested a review from zachjs as a code owner April 2, 2024 09:40
@mwkmwkmwk mwkmwkmwk merged commit 9417038 into YosysHQ:main Apr 2, 2024
18 checks passed
@whitequark whitequark deleted the extend-format branch April 8, 2024 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants