-
Notifications
You must be signed in to change notification settings - Fork 783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC 8259, bitwise checks, next siblings, doxygen comments, and more #197
base: master
Are you sure you want to change the base?
Conversation
jsmnint_t typedef'd to allow easy changing of main int type jsmntype_t extended to allow more checks JSMN_STRICT now has simplified format checking via parser->expected JSMN_NEXT_SIBLING allow for easy parsing to a parents next sibling
After the parse, primitive tokens can be queried with JSMN_PRI_MINUS for negative numbers, JSMN_PRI_DECIMAL for fractions, and JSMN_PRI_EXPONENT for numbers with exponents. JSMN_PRI_LITERAL denotes 'true', 'false', or 'null'.
This undoes my previous comment preference, but I would rather keep c89 compliance. The Makefile not also compiles with the c89 standard. Removed some flags that have no use in this branch.
Updated jsmn.h to pass all available tests. Modified testutil.h to better support jsmnint_t. Remove a warning when compiling the tests because invalid characters are in the file for testing.
* Updated jsmn to allow for continuation of PRIMITIVEs after a JSMN_ERROR_PART error. Conditionals are split between lines for coverage tests. jsmn_defines.h documents and sets all of the possible build configurations. * jsmn_utils added for better examples. The explode example is in the example folder. * tests have been updated to correctly test different configurations. * README.md updated with c89 comments.
Hello, There seems to be an issue in this PR where a string like It can be fixed by changing |
@atomsnc - Sorry, I saw your post and then life hit me for a bit. Very good catch. I don't know how I wasn't testing for that case previously. I think your fix fixes the symptom but not the cause, so I solved the problem a little further up by ignoring Thank you, and I am also pleased that someone could get through my code. |
A massive pr |
I really like how strict mode is the default here with possibility of enabling specific relaxations. Especially ones that help cut corners just to squeeze extra performance by reducing conformance. I also think introduction of the sibling feature is a good idea, helps a lot when traversing a more complex JSON. Myself I mostly was thinking about trying to fix issues with JSMN without introducing additional fields to parser's state. But honesty, by just saving what's expected in the future should just result in more performant code, as bitwise checks ought to be faster and cleaner than deriving what should come next from current tokens and input string. I think I will try to mix your PR with @dominickpastore to deal with certain parts that just personally felt ugly to me. Throughout the next week or two I will try to battle with git in an attempt to do it properly and push the results onto a separate branch. |
@pt300 I appreciate the comments.
It certainly is. I swiped the next_sibling idea from #68 and shorter tokens and unsigned ints from #93 years ago when I first found jsmn.
I made sure to keep the original types and added new types to be able to check for after parsing, and these led into adding extra states into the same enum for deeper checks by knowing what is legal as the next types. I worked hard to keep everything in a 32-bit uint. I like the enums and bitwise checks because it is calculated at compile time but makes the code easier to read in places.
I look forward to it. I hope you can feel the spirit of jsmn's current release in my PR because it was a great starting point and I tried to keep as much of it as I could while make some significant changes. I am curious though; what parts feel ugly to you? I know I can write some incredibly dense code that isn't the easiest to follow. |
After the last sweep through your code it's been mostly few smaller things like the use of defined() in preprocessor macros instead of #ifdef and #ifndef. There was something else but I cannot remember at this time. I will be looking thoroughly once more and will let you know if I find any issues. Also, just a reminder that I am no expert. Feel free to say so if what I say feels wrong. |
@themobiusproject I feel like a worthwhile addition would be to have a coalesce strategy in the For tokens which have
Something like
Above code is untested. |
I actually also prefer
Could you better describe this use case? I am trying to figure out where it fits in so I can better understand the code snippet. |
@themobiusproject When the tokens array for storing tokens while parsing cannot be too big (especially in embedded devices). In that case when the array gets full, user can evict a few completed tokens to parse a bit more. |
I have picked most of the changes you've done in this PR and those are now visible in experimental branch. For now I have left out the utilities and the example utilising them. Definitely there are more changes coming to what is on the branch right. Mostly cleaning up the code and actually making sure there's no space for nasty surprises. Then, it will go back to adding features again. |
@pt300 @themobiusproject I just checked out the experimental branch that contains this patch. However, now example/simple.c doesn't parse the JSON in it anymore. This here still works:
but this here won't parse anymore:
This are the warnings during compilation and the (invalid) result of the run:
I also tried then PRs 194-196 on top of master. And with that, the example/simple.c still runs. |
@holgerschurig I just added the json from the simple.c to my tests and it parses. You can check 0560339. When @pt300 merged my code, he grabbed it before 9f7e9de which did leave an error in parsing. That commit needs to be applied to the experimental branch and it should fix the problem. As for the files in the example, I only ever touch explode.c because I wrote that one along with the |
Sorry again for the inactivity. This looks odd, as I remember the tests were passing. I will make sure to look into this tomorrow and put the patch in. I also really wanted to simplify some parts of the parsing, such as reparsing some tokens from the beginning instead of trying to continue parsing. Should cut down complexity without major performance hit + make the code easier to maintain and more difficult to break. But, you can see how well it's going. Hope you can forgive me. |
I have applied the changes, but there seem to be still an issue with parsing. I will have to revisit it some other time. |
This PR does a number of things, but the main points are:
jsmntype_t
to:tokens == NULL
JSMN_SHORT_TOKENS
flagJSMN_MULTIPLE_JSON
parses multiple json objects in a single bufferJSMN_MULTIPLE_JSON_FAIL
parses one object will fail if there is any non-whitespace character following it in the bufferJSMN_PERMISSIVE
flags have been extended to be more modularJSMN_PERMISSIVE_KEY
allows PRIMITIVEs to be OBJECT KEYsJSMN_PERMISSIVE_PRIMITIVE
allows PRIMITIVEs to be any contiguous valueCode-wise, the opening and closing of containers, colons, and commas have been pulled out of
jsmn_parse
and put into their own functions (jsmn_parse_
prefix withcontainer_open
,container_close
,colon
, andcomma
respectively).jsmn_parse
is extremely clean. Overall, the code is quite verbose but most of it reduces to bit checks and comparisons.For post processing, every token has multiple flags set.
JSMN_KEY
orJSMN_VALUE
JSMN_OBJECT
,JSMN_ARRAY
,JSMN_STRING
, orJSMN_PRIMITIVE
These base flags are combined for complex types such as:
JSMN_OBJ_VAL
for an object that is also a value, orJSMN_STR_KEY
for a string that is also a key.Primitives are further flagged:
JSMN_PRI_LITERAL
for true, false, or nullJSMN_PRI_MINUS
if it is a negative numberJSMN_PRI_DECIMAL
if it is a float or doubleJSMN_PRI_EXPONENT
if it has an exponentThis and a lot of validation is done with an enum that can be packed into two bytes. With the enum, vanilla jsmntok_t, parent, and sibling, the final size of the jsmntok_t can be reduced to 12 bytes, or 3 ints, per token.
I have also added over 300 validation tests courtesy of JSONTestSuite. The tests are set up in three groups: the first are the tests that should pass, second are the tests that shouldn't pass, and third are the tests that may pass based on the json parser implementation.
This PR should resolve #118, resolve #131, resolve #137, resolve #158, resolve #163, resolve #177, resolve #178, and resolve #179.
I believe this can also resolve #164 by being able to traverse the tokens by sibling.
This should be able to close PR #93, PR #102, PR #108, PR #166, PR #168, PR #172, PR #192, and PR #198.
I also plan on getting utf8 into here with a flag to enable or disable it as to avoid jsmn ballooning in size for small devices.
A strong set of permissive rules would be welcome to further refine my rules.