Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug(json): re-enabling of parsing_* is possible #43

Open
Autoparallel opened this issue Aug 20, 2024 · 2 comments
Open

bug(json): re-enabling of parsing_* is possible #43

Autoparallel opened this issue Aug 20, 2024 · 2 comments
Labels
bug Something isn't working json

Comments

@Autoparallel
Copy link
Contributor

Autoparallel commented Aug 20, 2024

Idea

The current way we handle the parsing state does not adequately prevent re-enabling of the state flags parsing_* (currently just parsing_string and parsing_number). This means that invalid JSON can be provided and intended values can be obfuscated.

Example

Consider the (invalid) JSON:

{
    "k": "v" 123 "v",
}

At the moment, the parser will read up to the key "k" and upon reading: will write

[1,1]

to stack position 0.

We will then move through bytes and toggle: parsing_string on and off, then toggle parsing_number on then off, and finally again with parsing_string we see it toggle.

Solution

This can likely be solved by having these parsing states be stack variables in a new index and adding constraints in the following way.

Consider now the sequence:

read `:`  --> [1,1,0,0]
read `"` --> [1,1,1,0]
read `"` --> [1,1,2,0]
read `1` --> FAIL
read `"` --> FAIL

In this case, we write the additional 1 into position 2 of the stack at height 0 indicating we enter a string value. Upon reading the second " we increment position 2 of the stack at height 0 to [1,1,2,0] which indicates we have cleared the string value. At this point, we constrain to not allow any further value parsing.

The interaction with a comma would remain mostly unchanged in that we would require, for instance, seeing [1,1,2,0] before reading a comma and then we'd see:

read `:`  --> [1,1,0,0]
read `"` --> [1,1,1,0]
read `"` --> [1,1,2,0]
read `,` --> [1,0,0,0]

Also, we can repeat this same process for when we are parsing a key instead of a value, and enforce the key is a string. E.g.,

read `{` --> [1,0,0,0]
read `"` --> [1,0,1,0]
read `"` --> [1,0,2,0]
read `:` --> [1,1,0,0]

For instance, in the above explanation if we had instead:

read `:`  --> [1,1,0,0]
read `1` --> [1,1,0,1]
read ` ` --> [1,1,0,2]
read `1` --> FAIL
read `"` --> FAIL

we do the same, using stack position 3 to track parsing_number and a 2 written to this position to indicate clearing that value and allowing for no more value-types to be parsed.

Future Work

Given #32 and #33 this type of implementation described seems more pleasing and conducive to reducing passing invalid JSON through the parser. In those cases, we can add two more stack indices representing parsing_null and parsing_bool which, for example could go like so:

read `:` --> [1,1,0,0,0,0]
read `n` --> [1,1,0,0,1,0]
read `u` --> [1,1,0,0,2,0]
read `l` --> [1,1,0,0,3,0]
read `l` --> [1,1,0,0,4,0]

whereby reading any other non-whitespace ASCII after achieving [1,1,0,0,4,0] results in FAIL.

Easy differentiation in true and false could be handled upon filtering for [x,y,0,0,0,5] as only false can attain 5 in the final stack position here.


Edit: 8/30/24

I think we can compress the stack quite a bit, actually. We need only a stack 3 wide if we do the following encoding:

  • parsing_string --> [..,..,1] and [..,..,2]
  • parsing_number --> [..,..,10] and [..,..,20]
  • parsing_null --> [..,..,100], ... [..,..,400]
  • "parsing_true" --> [..,..,1000],...[..,..,4000],
  • "parsing_false" --> [..,..,10000],...[..,..,50000],

These numbers cannot ever overlap with each other in nominal conditions, so we can save 3 x STACK_HEIGHT numbers of field elements. Likely this same sort of compression could be used elsewhere.

@Autoparallel Autoparallel added bug Something isn't working json labels Aug 20, 2024
@Autoparallel Autoparallel pinned this issue Aug 20, 2024
@Autoparallel
Copy link
Contributor Author

Pinned issue as this is a rather invasive endeavor we should decide upon before integrating to many other changes.

@Autoparallel
Copy link
Contributor Author

@lonerapier i'm pinging you here because this has upstream effects into the fetcher/interpreter.

I can tackle these changes if need be, but maybe we discuss

@Autoparallel Autoparallel unpinned this issue Aug 23, 2024
@Autoparallel Autoparallel changed the title bug(disclosure): re-enabling of parsing_* is possible bug(json): re-enabling of parsing_* is possible Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working json
Projects
None yet
Development

No branches or pull requests

1 participant