You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current way we handle the parsing state does not adequately prevent re-enabling of the state flags parsing_* (currently just parsing_string and parsing_number). This means that invalid JSON can be provided and intended values can be obfuscated.
Example
Consider the (invalid) JSON:
{
"k": "v"123"v",
}
At the moment, the parser will read up to the key "k" and upon reading: will write
[1,1]
to stack position 0.
We will then move through bytes and toggle: parsing_string on and off, then toggle parsing_number on then off, and finally again with parsing_string we see it toggle.
Solution
This can likely be solved by having these parsing states be stack variables in a new index and adding constraints in the following way.
In this case, we write the additional 1 into position 2 of the stack at height 0 indicating we enter a string value. Upon reading the second " we increment position 2 of the stack at height 0 to [1,1,2,0] which indicates we have cleared the string value. At this point, we constrain to not allow any further value parsing.
The interaction with a comma would remain mostly unchanged in that we would require, for instance, seeing [1,1,2,0] before reading a comma and then we'd see:
we do the same, using stack position 3 to track parsing_number and a 2 written to this position to indicate clearing that value and allowing for no more value-types to be parsed.
Future Work
Given #32 and #33 this type of implementation described seems more pleasing and conducive to reducing passing invalid JSON through the parser. In those cases, we can add two more stack indices representing parsing_null and parsing_bool which, for example could go like so:
These numbers cannot ever overlap with each other in nominal conditions, so we can save 3 x STACK_HEIGHT numbers of field elements. Likely this same sort of compression could be used elsewhere.
The text was updated successfully, but these errors were encountered:
Idea
The current way we handle the parsing state does not adequately prevent re-enabling of the state flags
parsing_*
(currently justparsing_string
andparsing_number
). This means that invalid JSON can be provided and intended values can be obfuscated.Example
Consider the (invalid) JSON:
At the moment, the parser will read up to the key
"k"
and upon reading:
will writeto stack position 0.
We will then move through bytes and toggle:
parsing_string
on and off, then toggleparsing_number
on then off, and finally again withparsing_string
we see it toggle.Solution
This can likely be solved by having these parsing states be stack variables in a new index and adding constraints in the following way.
Consider now the sequence:
In this case, we write the additional
1
into position 2 of the stack at height 0 indicating we enter a string value. Upon reading the second"
we increment position 2 of the stack at height 0 to[1,1,2,0]
which indicates we have cleared the string value. At this point, we constrain to not allow any further value parsing.The interaction with a comma would remain mostly unchanged in that we would require, for instance, seeing
[1,1,2,0]
before reading a comma and then we'd see:Also, we can repeat this same process for when we are parsing a key instead of a value, and enforce the key is a string. E.g.,
For instance, in the above explanation if we had instead:
we do the same, using stack position 3 to track
parsing_number
and a 2 written to this position to indicate clearing that value and allowing for no more value-types to be parsed.Future Work
Given #32 and #33 this type of implementation described seems more pleasing and conducive to reducing passing invalid JSON through the parser. In those cases, we can add two more stack indices representing
parsing_null
andparsing_bool
which, for example could go like so:whereby reading any other non-whitespace ASCII after achieving
[1,1,0,0,4,0]
results inFAIL
.Easy differentiation in
true
andfalse
could be handled upon filtering for[x,y,0,0,0,5]
as onlyfalse
can attain 5 in the final stack position here.Edit: 8/30/24
I think we can compress the stack quite a bit, actually. We need only a stack 3 wide if we do the following encoding:
parsing_string
-->[..,..,1]
and[..,..,2]
parsing_number
-->[..,..,10]
and[..,..,20]
parsing_null
-->[..,..,100]
, ...[..,..,400]
parsing_true
" -->[..,..,1000]
,...[..,..,4000]
,parsing_false
" -->[..,..,10000]
,...[..,..,50000]
,These numbers cannot ever overlap with each other in nominal conditions, so we can save
3 x STACK_HEIGHT
numbers of field elements. Likely this same sort of compression could be used elsewhere.The text was updated successfully, but these errors were encountered: