RFC 8259, bitwise checks, next siblings, doxygen comments, and more #197

themobiusproject · 2020-06-02T11:26:15Z

This PR does a number of things, but the main points are:

it is fully RFC-8259 compliant
enables strict mode by default and adds a permissive mode
extendeds jsmntype_t to:
- allow bitwise checks for better json validation while parsing
- add more information in post processing
- define the rules for both strict and permissive modes
adds extra validation when tokens == NULL
- the parser keeps track of depth and type of container down to 16 or 32 levels deep depending on the JSMN_SHORT_TOKENS flag
adds siblings to traverse through a single level
- if there are more than one json string in the buffer, the first element of the first json string links to the first element of the second json string through the sibling
allows three different modes for treating multiple json objects in the same buffer
- the default mode stops at the end of a json object allowing the user to parse the next json object if there is one
  - a new function is available to query if there is anything left in the buffer
- JSMN_MULTIPLE_JSON parses multiple json objects in a single buffer
- JSMN_MULTIPLE_JSON_FAIL parses one object will fail if there is any non-whitespace character following it in the buffer
  - this is probably the mode that follows the intent of RFC-8259 the best
JSMN_PERMISSIVE flags have been extended to be more modular
- JSMN_PERMISSIVE_KEY allows PRIMITIVEs to be OBJECT KEYs
- JSMN_PERMISSIVE_PRIMITIVE allows PRIMITIVEs to be any contiguous value
adds doxygen compatible comments and doxygen.cfg

Code-wise, the opening and closing of containers, colons, and commas have been pulled out of jsmn_parse and put into their own functions (jsmn_parse_ prefix with container_open, container_close, colon, and comma respectively). jsmn_parse is extremely clean. Overall, the code is quite verbose but most of it reduces to bit checks and comparisons.

For post processing, every token has multiple flags set.

JSMN_KEY or JSMN_VALUE
JSMN_OBJECT, JSMN_ARRAY, JSMN_STRING, or JSMN_PRIMITIVE

These base flags are combined for complex types such as:

JSMN_OBJ_VAL for an object that is also a value, or
JSMN_STR_KEY for a string that is also a key.

Primitives are further flagged:

JSMN_PRI_LITERAL for true, false, or null
JSMN_PRI_MINUS if it is a negative number
JSMN_PRI_DECIMAL if it is a float or double
JSMN_PRI_EXPONENT if it has an exponent

This and a lot of validation is done with an enum that can be packed into two bytes. With the enum, vanilla jsmntok_t, parent, and sibling, the final size of the jsmntok_t can be reduced to 12 bytes, or 3 ints, per token.

I have also added over 300 validation tests courtesy of JSONTestSuite. The tests are set up in three groups: the first are the tests that should pass, second are the tests that shouldn't pass, and third are the tests that may pass based on the json parser implementation.

This PR should resolve #118, resolve #131, resolve #137, resolve #158, resolve #163, resolve #177, resolve #178, and resolve #179.
I believe this can also resolve #164 by being able to traverse the tokens by sibling.

This should be able to close PR #93, PR #102, PR #108, PR #166, PR #168, PR #172, PR #192, and PR #198.

I also plan on getting utf8 into here with a flag to enable or disable it as to avoid jsmn ballooning in size for small devices.

A strong set of permissive rules would be welcome to further refine my rules.

jsmnint_t typedef'd to allow easy changing of main int type jsmntype_t extended to allow more checks JSMN_STRICT now has simplified format checking via parser->expected JSMN_NEXT_SIBLING allow for easy parsing to a parents next sibling

After the parse, primitive tokens can be queried with JSMN_PRI_MINUS for negative numbers, JSMN_PRI_DECIMAL for fractions, and JSMN_PRI_EXPONENT for numbers with exponents. JSMN_PRI_LITERAL denotes 'true', 'false', or 'null'.

This undoes my previous comment preference, but I would rather keep c89 compliance. The Makefile not also compiles with the c89 standard. Removed some flags that have no use in this branch.

Updated jsmn.h to pass all available tests. Modified testutil.h to better support jsmnint_t. Remove a warning when compiling the tests because invalid characters are in the file for testing.

* Updated jsmn to allow for continuation of PRIMITIVEs after a JSMN_ERROR_PART error. Conditionals are split between lines for coverage tests. jsmn_defines.h documents and sets all of the possible build configurations. * jsmn_utils added for better examples. The explode example is in the example folder. * tests have been updated to correctly test different configurations. * README.md updated with c89 comments.

atomsnc · 2020-08-20T21:23:52Z

Hello,

There seems to be an issue in this PR where a string like
char s[] = "{\"array\":[1,2]}";
fails, when an array is inside an object.

It can be fixed by changing
if (type & (JSMN_OBJECT | JSMN_INSD_OBJ))
to
if (type & (JSMN_OBJECT | JSMN_INSD_OBJ) && !(type & JSMN_ARRAY))
in jsmn_parse_comma

themobiusproject · 2020-09-08T21:12:12Z

@atomsnc - Sorry, I saw your post and then life hit me for a bit.

Very good catch. I don't know how I wasn't testing for that case previously. I think your fix fixes the symptom but not the cause, so I solved the problem a little further up by ignoring JSMN_INSD_OBJ altogether.

Thank you, and I am also pleased that someone could get through my code.

pt300 · 2020-10-24T16:42:31Z

A massive pr

pt300 · 2020-10-24T17:38:53Z

I really like how strict mode is the default here with possibility of enabling specific relaxations. Especially ones that help cut corners just to squeeze extra performance by reducing conformance. I also think introduction of the sibling feature is a good idea, helps a lot when traversing a more complex JSON.

Myself I mostly was thinking about trying to fix issues with JSMN without introducing additional fields to parser's state. But honesty, by just saving what's expected in the future should just result in more performant code, as bitwise checks ought to be faster and cleaner than deriving what should come next from current tokens and input string.

I think I will try to mix your PR with @dominickpastore to deal with certain parts that just personally felt ugly to me. Throughout the next week or two I will try to battle with git in an attempt to do it properly and push the results onto a separate branch.

themobiusproject · 2020-10-24T22:36:42Z

@pt300 I appreciate the comments.

A massive pr

It certainly is.

I swiped the next_sibling idea from #68 and shorter tokens and unsigned ints from #93 years ago when I first found jsmn.

Myself I mostly was thinking about trying to fix issues with JSMN without introducing additional fields to parser's state. But honesty, by just saving what's expected in the future should just result in more performant code, as bitwise checks ought to be faster and cleaner than deriving what should come next from current tokens and input string.

I made sure to keep the original types and added new types to be able to check for after parsing, and these led into adding extra states into the same enum for deeper checks by knowing what is legal as the next types. I worked hard to keep everything in a 32-bit uint.

I like the enums and bitwise checks because it is calculated at compile time but makes the code easier to read in places. expected = JSMN_PRI_INTEGER | JSMN_PRI_DECIMAL | JSMN_PRI_EXPONENT | JSMN_CLOSE; lets you be able to compare the code to the json rfc standard easily and it is all compile time code.

I think I will try to mix your PR with @dominickpastore to deal with certain parts that just personally felt ugly to me. Throughout the next week or two I will try to battle with git in an attempt to do it properly and push the results onto a separate branch.

I look forward to it. I hope you can feel the spirit of jsmn's current release in my PR because it was a great starting point and I tried to keep as much of it as I could while make some significant changes. I am curious though; what parts feel ugly to you? I know I can write some incredibly dense code that isn't the easiest to follow.

pt300 · 2020-10-25T00:21:40Z

After the last sweep through your code it's been mostly few smaller things like the use of defined() in preprocessor macros instead of #ifdef and #ifndef. There was something else but I cannot remember at this time. I will be looking thoroughly once more and will let you know if I find any issues. Also, just a reminder that I am no expert. Feel free to say so if what I say feels wrong.

atomsnc · 2020-10-25T01:11:14Z

@themobiusproject I feel like a worthwhile addition would be to have a coalesce strategy in the utils for open tokens when the tokens array cannot be large.

For tokens which have token.end != JSMN_NEG, they can either by dropped or moved out with a callback.

next_sibling will need to be computed again.

Something like

int coalesce_open_tokens(jsmntok_t *tokens, jsmn_parser *parser) {
    jsmnint_t i, index = 0;
    
    for (i = 0; i < parser->count; i++) {
        if (tokens[i].end == JSMN_NEG) {
            tokens[index++] = tokens[i];            
        }
#if defined(JSMN_NEXT_SIBLING)
        else {
            /* Relink siblings if any */
            for (jsmnint_t j = 0; j < i; j++) {
                if (tokens[j].next_sibling == i) {
                    tokens[j].next_sibling = tokens[i].next_sibling;
                    break;
                }
            }
        }
#endif
    }
    /* Return 0 if all the tokens are being worked on. We truly cannot add more. */ 
    if (index == parser->count) return 0;
    parser->count = index;
    parser->toknext = index;
    return 1;
}

Above code is untested.

themobiusproject · 2020-10-26T06:21:41Z

@pt300

After the last sweep through your code it's been mostly few smaller things like the use of defined() in preprocessor macros instead of #ifdef and #ifndef. There was something else but I cannot remember at this time. I will be looking thoroughly once more and will let you know if I find any issues. Also, just a reminder that I am no expert. Feel free to say so if what I say feels wrong.

I actually also prefer #ifdef and #ifndef, but the reason I went with #if defined() and #if !defined() is because of the occasional instance of #if defined() && defined() which looked horrible when surrounded by #if{,n}defs.

@atomsnc

I feel like a worthwhile addition would be to have a coalesce strategy in the utils for open tokens when the tokens array cannot be large.

Could you better describe this use case? I am trying to figure out where it fits in so I can better understand the code snippet.

atomsnc · 2020-10-26T18:48:42Z

@themobiusproject When the tokens array for storing tokens while parsing cannot be too big (especially in embedded devices). In that case when the array gets full, user can evict a few completed tokens to parse a bit more.

pt300 · 2020-11-04T16:28:53Z

I have picked most of the changes you've done in this PR and those are now visible in experimental branch. For now I have left out the utilities and the example utilising them.

Definitely there are more changes coming to what is on the branch right. Mostly cleaning up the code and actually making sure there's no space for nasty surprises. Then, it will go back to adding features again.

holgerschurig · 2021-11-14T17:51:53Z

@pt300 @themobiusproject I just checked out the experimental branch that contains this patch. However, now example/simple.c doesn't parse the JSON in it anymore. This here still works:

{"user": "johndoe", "admin": false, "uid": 1000, "groups": ["users"]}

but this here won't parse anymore:

{"user": "johndoe", "admin": false, "uid": 1000, "groups": ["users", "wheel", "audio", "video"]}

This are the warnings during compilation and the (invalid) result of the run:

holger@holger:~/jsmn$ gcc -Wall -Wextra -oa example/simple.c 
example/simple.c: In function ‘jsoneq’:
example/simple.c:16:50: warning: comparison of integer expressions of different signedness: ‘int’ and ‘jsmnint’ {aka ‘unsigned int’} [-Wsign-compare]
   16 |   if (tok->type == JSMN_STRING && (int)strlen(s) == tok->end - tok->start &&
      |                                                  ^~
example/simple.c: In function ‘main’:
example/simple.c:66:21: warning: comparison of integer expressions of different signedness: ‘int’ and ‘jsmnint’ {aka ‘unsigned int’} [-Wsign-compare]
   66 |       for (j = 0; j < t[i + 1].size; j++) {
      |                     ^
holger@holger:~/jsmn$ ./a
Failed to parse JSON: -3

I also tried then PRs 194-196 on top of master. And with that, the example/simple.c still runs.

themobiusproject · 2021-11-24T17:11:45Z

@holgerschurig I just added the json from the simple.c to my tests and it parses. You can check 0560339. When @pt300 merged my code, he grabbed it before 9f7e9de which did leave an error in parsing. That commit needs to be applied to the experimental branch and it should fix the problem.

As for the files in the example, I only ever touch explode.c because I wrote that one along with the jsmn_explodeJSON function associated with it. You can copy JSON_STRING from simple.c (or your unescaped version in your previous comment) into library.json in the root directory and run explode from the examples folder, it will parse and show more information than you will probably want to see.

pt300 · 2021-11-24T19:11:11Z

Sorry again for the inactivity.

This looks odd, as I remember the tests were passing. I will make sure to look into this tomorrow and put the patch in.

I also really wanted to simplify some parts of the parsing, such as reparsing some tokens from the beginning instead of trying to continue parsing. Should cut down complexity without major performance hit + make the code easier to maintain and more difficult to break. But, you can see how well it's going. Hope you can forgive me.

pt300 · 2021-11-25T23:55:04Z

I have applied the changes, but there seem to be still an issue with parsing. I will have to revisit it some other time.

themobiusproject added 2 commits June 2, 2020 05:04

bitwise checks on all things jsmntype_t

d6f6f1a

jsmnint_t typedef'd to allow easy changing of main int type jsmntype_t extended to allow more checks JSMN_STRICT now has simplified format checking via parser->expected JSMN_NEXT_SIBLING allow for easy parsing to a parents next sibling

Update README.md

14ecc27

themobiusproject changed the title ~~Bitwise checks~~ Bitwise checks, next siblings, doxygen compatible comments, and more Jun 2, 2020

themobiusproject added 13 commits June 2, 2020 07:01

README.md verbage (PR zserge#114)

d5d1a50

Order escapes and whitespace to match rfc8259.

025c4b3

Replace possibly confusing bitwise checks with inline functions.

8c715ec

Make strict mode default, add JSMN_PERMISSIVE for old default.

9895329

Extend the parser->expected functionality to PERMISSIVE mode.

5287a50

prepare for better primitive checks

b595be1

rfc 8259 primitive parsing

96d7221

After the parse, primitive tokens can be queried with JSMN_PRI_MINUS for negative numbers, JSMN_PRI_DECIMAL for fractions, and JSMN_PRI_EXPONENT for numbers with exponents. JSMN_PRI_LITERAL denotes 'true', 'false', or 'null'.

c89 compliance

3c00c19

This undoes my previous comment preference, but I would rather keep c89 compliance. The Makefile not also compiles with the c89 standard. Removed some flags that have no use in this branch.

Stronger parsing validation, break jsmn_parse apart

53e8d57

Validate closing containers even when tokens == NULL.

e213f2f

Pack jsmntype_t only when JSMN_SHORT_TOKENS is defined.

b5d5a90

c89 comments again, removed loop that validation takes care of.

51c8ce4

Adding 300+ validation checks from JSONTestSuite

0f792d7

Updated jsmn.h to pass all available tests. Modified testutil.h to better support jsmnint_t. Remove a warning when compiling the tests because invalid characters are in the file for testing.

themobiusproject mentioned this pull request Jun 10, 2020

Roadmap #159

Open

themobiusproject added 2 commits June 10, 2020 12:06

Add .gitignore

3135d95

themobiusproject changed the title ~~Bitwise checks, next siblings, doxygen compatible comments, and more~~ RFC 8259, bitwise checks, next siblings, doxygen comments, and more Jun 18, 2020

Final revisions.

fe07a15

Fix {"a":[1,2]} case

9f7e9de

dominickpastore mentioned this pull request Oct 19, 2020

Comply with RFC 8259 in strict mode #194

Open

Add simple.c test to tests/tests.c

0560339

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC 8259, bitwise checks, next siblings, doxygen comments, and more #197

RFC 8259, bitwise checks, next siblings, doxygen comments, and more #197

themobiusproject commented Jun 2, 2020 •

edited

Loading

atomsnc commented Aug 20, 2020

themobiusproject commented Sep 8, 2020

pt300 commented Oct 24, 2020

pt300 commented Oct 24, 2020

themobiusproject commented Oct 24, 2020

pt300 commented Oct 25, 2020

atomsnc commented Oct 25, 2020 •

edited

Loading

themobiusproject commented Oct 26, 2020

atomsnc commented Oct 26, 2020

pt300 commented Nov 4, 2020

holgerschurig commented Nov 14, 2021 •

edited

Loading

themobiusproject commented Nov 24, 2021 •

edited

Loading

pt300 commented Nov 24, 2021

pt300 commented Nov 25, 2021

RFC 8259, bitwise checks, next siblings, doxygen comments, and more #197

Are you sure you want to change the base?

RFC 8259, bitwise checks, next siblings, doxygen comments, and more #197

Conversation

themobiusproject commented Jun 2, 2020 • edited Loading

atomsnc commented Aug 20, 2020

themobiusproject commented Sep 8, 2020

pt300 commented Oct 24, 2020

pt300 commented Oct 24, 2020

themobiusproject commented Oct 24, 2020

pt300 commented Oct 25, 2020

atomsnc commented Oct 25, 2020 • edited Loading

themobiusproject commented Oct 26, 2020

atomsnc commented Oct 26, 2020

pt300 commented Nov 4, 2020

holgerschurig commented Nov 14, 2021 • edited Loading

themobiusproject commented Nov 24, 2021 • edited Loading

pt300 commented Nov 24, 2021

pt300 commented Nov 25, 2021

themobiusproject commented Jun 2, 2020 •

edited

Loading

atomsnc commented Oct 25, 2020 •

edited

Loading

holgerschurig commented Nov 14, 2021 •

edited

Loading

themobiusproject commented Nov 24, 2021 •

edited

Loading