Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hjson: quoteless strings shall not include a comma (#552) #553

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

cousteaulecommandant
Copy link
Contributor

Some hjson files in X-HEEP use trailing comma after a "quoteless string" which isn't valid in the standard (https://hjson.github.io/syntax.html):

Do not add commas or comments as they would become part of the string.

This includes "barewords" such as active: low, but also (surprisingly) hexadecimal values such as address: 0x10000000, which is technically also interpreted as a string and not an integer value. So if you add a comma there, the hjson parser considers it as part of the string (this is not a bug in the parser, it's the correct interpretation of the file).
Commas are fine (but superfluous) if the string is quoted, and also after numbers, true, and false.

This was reported as #552, which this PR fixes.

This fix also deprecates certain workarounds in the code that were using .split(',') or .strip(',') to address this issue (but the issue was really that the file format was wrong, not that the parser failed).
I decided not to remove those fixes from mcu_gen.py just in case, but in principle they're no longer necessary if files are kept consistent with the Hjson format.

It would probably be a good idea to get rid of commas in hjson files entirely (including after numbers, quoted strings, or closing brackets/braces), since they are superfluous and the standard discourages its use, but I decided to leave that out of this PR as well.

@davideschiavone
Copy link
Member

@davidmallasen , can you pls give it a look?

@davidmallasen
Copy link
Member

@davideschiavone I completely agree with @cousteaulecommandant. In fact, I would suggest to also remove the commas after quoted strings, numbers, etc. to avoid any confusion and to keep things consistent.

I see that in the config folder this was already taken into account and there are only commas after quoted strings or blocks.

Leaving the .split(',') or .strip(',') in mcu_gen.py depends on how strict you want to be and how backwards compatible you want to be in other projects that end up using the latest version of X-HEEP.

The changes required in hw/vendor/lowrisc_opentitan/hw/ip/i2c/data/i2c_testplan.hjson could be an issue, but it should be easily solvable with a vendor patch. @cousteaulecommandant in this case the CI is failing because you are directly changing a file inside the hw/vendor/* folders, and this should only be done with patches. You can look in hw/vendor/patches for examples and https://opentitan.org/book/util/doc/vendor.html for the documentation.

@cousteaulecommandant
Copy link
Contributor Author

I would suggest to also remove the commas after quoted strings, numbers, etc. to avoid any confusion and to keep things consistent.

I was tempted to add that as well (on a second commit), but I felt that that could cause a lot of trouble and merge conflicts. But I could try to add that as well if you want.

Leaving the .split(',') or .strip(',') in mcu_gen.py depends on how strict you want to be and how backwards compatible you want to be in other projects that end up using the latest version of X-HEEP.

Yes; I was afraid that removing that would break every single fork, derived project and ongoing work. It may be a good idea to remove it in a future though.

The changes required in hw/vendor/lowrisc_opentitan/hw/ip/i2c/data/i2c_testplan.hjson could be an issue, but it should be easily solvable with a vendor patch. @cousteaulecommandant in this case the CI is failing because you are directly changing a file inside the hw/vendor/* folders, and this should only be done with patches.

I can exclude that file from the PR for now. It's already fixed in upstream OpenTitan anyway.

Or should I fix the patch as well?

@davidmallasen
Copy link
Member

davidmallasen commented Jul 18, 2024

@davideschiavone I'll leave these decisions in your hands. My suggestion would be:

  1. Remove all the commas in the hjsons to keep things consistent with the hjson specification and the python package we use.
  2. Keep the current functionality in mcu-gen unless you know it's safe to remove it as well.
  3. Leave the OpenTitan file as-is or patch/update it if we're removing 2.

@davideschiavone
Copy link
Member

davideschiavone commented Jul 18, 2024

@davideschiavone I'll leave these decisions in your hands. My suggestion would be:

  1. Remove all the commas in the hjsons to keep things consistent with the hjson specification and the python package we use.
  2. Keep the current functionality in mcu-gen unless you know it's safe to remove it as well.
  3. Leave the OpenTitan file as-is or patch/update it if we're removing 2.

I do not have any personal preference or opinion, so I will leave to the top (still active) contributors the decision to take.

Please, @JoseCalero @JuanSapriza @simone-machetti @christophmuellerorg @davidmallasen @LuigiGiuffrida98 @StMiky @danivz
vote by reacting to this message with 👍 if you want to apply the change or 👎 if you do not want to apply those changes.

@benoitdenkinger , although not active any more, feel free to vote and I will take it into account

@davideschiavone
Copy link
Member

@davideschiavone I'll leave these decisions in your hands. My suggestion would be:

  1. Remove all the commas in the hjsons to keep things consistent with the hjson specification and the python package we use.
  2. Keep the current functionality in mcu-gen unless you know it's safe to remove it as well.
  3. Leave the OpenTitan file as-is or patch/update it if we're removing 2.

I do not have any personal preference or opinion, so I will leave to the top (still active) contributors the decision to take.

Please, @JoseCalero @JuanSapriza @simone-machetti @christophmuellerorg @davidmallasen @LuigiGiuffrida98 @StMiky @danivz vote by reacting to this message with 👍 if you want to apply the change or 👎 if you do not want to apply those changes.

@benoitdenkinger , although not active any more, feel free to vote and I will take it into account

@davidmallasen @cousteaulecommandant , we clearly have a winner :) pls go on

@davidmallasen
Copy link
Member

Thank you all. @cousteaulecommandant could you:

  1. Remove all the commas in the hjsons to keep things consistent.
  2. Exclude the OpenTitan file from the PR

@davideschiavone
Copy link
Member

Thank you all. @cousteaulecommandant could you:

  1. Remove all the commas in the hjsons to keep things consistent.
  2. Exclude the OpenTitan file from the PR

if we leave the commas in the OpenTitan I also guess we do not have to change the python script

@davidmallasen
Copy link
Member

Exactly

Some hjson files in X-HEEP use trailing comma after a "quoteless string"
which isn't valid in the standard (https://hjson.github.io/syntax.html):

> Do not add commas or comments as they would become part of the string.

This includes "barewords" such as `active: low,` but also (surprisingly)
hexadecimal values such as `address: 0x10000000,` which is technically
also interpreted as a string and not an integer value.
(`42,`, `true,`, `false,` and `"string",` are OK though.)

This fixes issue esl-epfl#552.
It also deprecates certain workarounds in the code that were using
`.split(',')` or `.strip(',')` to address this issue (but the issue was
really that the file format was wrong, not that the parser failed).
- Followed by spaces
- There's a commented-out line that has a trailing comma as well
The Hjson format specification (https://hjson.github.io/syntax.html)
also discourages the use of trailing commas at the end of a line,
which are harmless (valid) but superfluous and confusing:

> You should omit optional commas to make your data more readable.

It has been decided that adopting this style will be less confusing
and avoid possible repetitions of issue esl-epfl#552 in the future.

This commit removes all the trailing commas from hjson files,
after a meticulous process to verify that none of these trailing commas
had a specific meaning; such as in this hypothetical scenario:
```
    description: '''This is the description's first line,
                    which has a trailing comma.'''
```
@cousteaulecommandant
Copy link
Contributor Author

cousteaulecommandant commented Jul 20, 2024

OK, done. I removed all trailing commas from all Hjson files, being careful that none of these commas are actually part of a comment or a '''...''' string or similar.
Also updated the documentation accordingly.
(And excluded the vendorized OpenTitan files as requested.)

I think I did it right. I tried opening all affected hjson files and they at least load without errors using Python's hjson module.

Just in case I messed up some corner case, I left each "logical step" of the substitutions on a separate commit in case you want to check them one by one (before squashing everything together in a single commit).

@cousteaulecommandant
Copy link
Contributor Author

On second thought, I'm still unsure if modifying these many files is a good idea re: other forks. If someone was working on a fork/branch that touched any hjson file, it's going to be merge conflicts everywhere.

At the very least, I think it would probably be a good idea to split this PR into the two main commits ("shall not include" and "should not include") rather than squashing everything into a single commit, so that people can choose not to merge the latter, or revert it before merging. And I think it would make sense since one PR is fixing an actual issue whereas the second is fixing a mere coding style.

@davidmallasen
Copy link
Member

Thanks @cousteaulecommandant. @davideschiavone I double-checked all the changes and if the CI passes I think it's good to merge.

The merge conflicts in forks of X-HEEP are almost inevitable, that's why Davide asked the main contributors. Since everyone was on board, and I believe only cherry-picking some commits when updating the base X-HEEP is not a maintainable practice, I don't think this should be an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants