What does the spec say about parentheses? #20

motet-a · 2017-06-17T16:36:46Z

The SPDX license expression spec can be interpreted in many different and incompatible ways about the parentheses.

Actually, the new parser considers LGPL-2.1 AND MIT as a valid expression, but many people are saying that complex expressions must be encapsulated with parentheses, and thus LGPL-2.1 AND MIT is invalid. For example:

Any license expression that consists of more than one license identifier and/or LicenseRef should be encapsulated by parentheses.

There are similar parentheses in the examples of the npm doc.

I wrote the new parser mostly according to the formal grammar in the spec. And this grammar allows to omit those parentheses. Here is the proof:

LGPL-2.1 is a simple-expression
MIT is also a simple-expression
A simple-expression is also a compound-expression
LGPL-2.1 AND MIT is a compound-expression
A compound-expression is a license-expression

It looks right.

However, it gets strange just below the grammar:

For the Tag:value format, any license expression that consists of more than one license identifier and/or LicenseRef, should be encapsulated by parentheses: "( )".

(I'm not here to blame the spec but it looks really strange. Why require parentheses? Or why not require parentheses in anyway? It would be much simpler.)

See also this in the SPDX wiki.

So:

Is the SPDX license in npm's package.json file in Tag:value format?
Should the parser accept a tagValueFormat option?
Should this tagValueFormat option be true by default?
In examples, should we boycott useless parentheses or always use them?
Should we ask The Linux Foundation to clarify the spec?
Am I totally wrong?

Thanks.

The text was updated successfully, but these errors were encountered:

motet-a · 2017-06-17T20:34:09Z

I just found out this nice Python library which also does not require encapsulating complex expressions with parentheses. There are a lot of tests. Note it is quite relaxed (and not so spec-compliant) since the AND, OR, and WITH operators are case-insensitive.

kemitchell · 2017-06-25T18:38:10Z

@motet-a: Great stuff, as always.

You are right that the spec is unclear and incomplete.

We could definitely go to SPDX and suggest improvements. I've corresponded with the leading force behind the expression syntax in the past. It would be nice to bring a concrete suggestion with us if we do. Like (E)BNF.

My own personal preference would be to require parentheses. That means more parentheses, but also less variation. Since they nest, the grammar must support expressions in parentheses. Simple expressions without parentheses are the "extra feature".

Side Note: Don't trust the package.json doc too much. I wrote it!

motet-a · 2017-06-26T20:57:14Z

So, commit e9bea12 is a quick draft of this change. Here is the new,
non left-recursive, ABNF grammar used by the parser:

license-ref = [DOCUMENTREF ":"] LICENSEREF

license-plus = LICENSE["+"]

postfixed-license = (license-ref / license-plus) ["WITH" EXCEPTION]

parenthesized-expression = "(" expression ")"

atom = parenthesized-expression / postfixed-license

and-expression = atom ["AND" and-expression]

or-expression = and-expression ["OR" or-expression]

expression = or-expression

tag-value-format-license-expression = parenthesized-expression / 
                                      license-id / 
                                      license-ref

Note:

It still requires a little additional hack to forbid spaces before the +.
These rule names don’t always match parser functions and don't
always match original SPDX rules. I would have liked to match
spec names but it is not possible. Feel free to rename
everything.
The compound-expression rule in the spec grammar is
splitted into many rules in order to kill left-recursivity and
disambiguate operator precedence.
We don’t care about AND and OR associativity since a AND b
is the same as b AND a and a OR b is the same as b OR a (Well,
at least semantically. I hope no user relies on associativity).
The default “entrypoint” is tag-value-format-license-expression in order to
require parentheses with complex expressions, but if you use expression
instead you get the actual behaviour where parentheses are
optional.

I think we need some kind of configuration object, like the one introduced in #16. Maybe it could be non-breaking. Do you have any thoughts on this?

motet-a · 2017-06-26T21:31:02Z

Moreover, the original license-ref rule makes absolutely no sense:

license-ref = ["DocumentRef-"1*(idstring)":"]"LicenseRef-"1*(idstring)

LicenseRef- abc def ghi matches license-ref because 1*(idstring) means “one or more idstring” and we assume that tokens are separated with zero or more spaces.

If we assume there are no spaces between tokens, 1*(idstring) still makes no sense because it is exactly the same as idstring.

kemitchell · 2017-06-26T22:16:45Z

I think we need some kind of configuration object, like the one introduced in #16. Maybe it could be non-breaking. Do you have any thoughts on this?

I think that since the spec is broken, we should try to fix the spec. If we succeed, it will be clear what we should do in the software: Follow the spec! That will help our friends doing Python and other implementations, too.

@goneall, any chance the Java SPDX tooling has an SPDX expression parser we should know about?

goneall · 2017-06-26T22:33:11Z

As background and a partial explanation, the use of parenthesis for SPDX license expressions is somewhat confusing and has been heavily discussed in the SPDX tech work group.

The conclusion of the discussions was that the parenthesis is not formally required a complex license expression EXCEPT when a complex license expression is used inside an SPDX tag/value file. The reason for this is the parsing is ambiguous and unimplementable in a tag/value file without some what of delineating the start and end of the license expression (simple expressions use white space as the delimiter). We agreed to use parentheses to delineate the beginning and ending of a license expression.

@kestewart You may want to review the comments in this issue for spec improvements.

goneall · 2017-06-26T22:35:25Z

@kemitchell The SPDX Java tools license expression parser can be found at https://github.com/spdx/tools/blob/master/src/org/spdx/rdfparser/license/LicenseExpressionParser.java

kemitchell · 2017-06-26T23:03:21Z

@motet-a, looks like they handle Identifier+ versus Identifier + by postprocessing lexer output:
https://github.com/spdx/tools/blob/3cb54000b9e9c37eaada3cd185053e86cd0b8a06/src/org/spdx/rdfparser/license/LicenseExpressionParser.java#L84
https://github.com/spdx/tools/blob/3cb54000b9e9c37eaada3cd185053e86cd0b8a06/src/org/spdx/rdfparser/license/LicenseExpressionParser.java#L112

kemitchell · 2017-06-26T23:07:10Z

@goneall: Any chance the Java implementer worked up an alternative grammar before coding?

kemitchell · 2017-06-26T23:12:37Z

@goneall: Do I understand correctly that per the tech group compromise, tag/value files can still have Apache-2.0, as opposed to (Apache-2.0)?

goneall · 2017-06-27T01:08:14Z

@kemitchell I don't believe an alternative grammar was used - I believe it was hand coded from the spec.

goneall · 2017-06-27T01:09:45Z

Correct @goneall <https://github.com/goneall> : Do I understand correctly that per the tech group compromise, tag/value files can still have Apache-2.0, as opposed to (Apache-2.0)? —

motet-a · 2017-06-27T08:45:46Z

@kemitchell I wrote a few tokenizers for programming languages where one does not simply use String.prototype.split() for many reasons (literal strings are the biggest problem) but in this case, it suprisingly seems to work.

In our case, the "no spaces before +" assertion is also a single ifstatement.

kemitchell · 2017-06-27T16:57:23Z

@motet-a I like your solution a lot. If I were going to port to another language, I'd start with your code.

I'd still like to propose a spec revision that encodes all the relevant parser logic in the grammar. Any issue with treating GPL-2.0 and GPL-2.0+ as distinct token types, say LICENSE and LICENSE-PLUS? Then:

license-plus = LICENSE / LICENSE-PLUS

motet-a · 2017-06-27T20:39:30Z

@kemitchell This is exactly how most programming languages work. Most of them have two grammars:

The lexical grammar describing how to build the tokens from a source string
The parser grammar describing how to build a tree from the tokens

I think the SPDX spec could be much clearer with such a separation.

Moreover, I think the parser grammar in the spec should specify the operator precedence. The actual one is ambiguous:

compound-expression =  simple-expression /
        simple-expression "WITH" license-exception-id /
        compound-expression "AND" compound-expression /
        compound-expression "OR" compound-expression /
        "(" compound-expression ")"

According to this, a AND b OR c could perfectly be parsed as (a AND b) OR c or a AND (b OR c). I know the operator precedence is specified in English in the spec, but I think it is clearer to specify it in the grammar as I did.

@kemitchell In the formal parser grammar, one could write "+" instead of LICENSE-PLUS if the lexical grammar is specified. It is clear that "+" implicitly means “token +” since any terminal is a token. For example, the Python grammar is written this way (since the lexical grammar is obviously separated).

motet-a · 2017-06-27T20:43:36Z

@kemitchell license-plus = LICENSE / LICENSE-PLUS looks wrong to me. Do you mean license-plus = LICENSE [LICENSE-PLUS]?

kemitchell · 2017-06-27T20:59:54Z

@motet-a: Forgive me. I was not clear.

I was proposing to emit different token types for GPL-2.0 and GPL-2.0+. GPL-2.0 would continue to be LICENSE. GPL-2.0+ (including +) would be token type LICENSE-PLUS. So:

license-plus =
    LICENSE      # e.g. `Apache-2.0`
  / LICENSE-PLUS # e.g. `Apache-2.0+`

Then address the difference in the lexical grammar.

By the way, any chance you typed out a lexical grammar for the recent patch to spdx-expression-parse that you could share? If we could start from there and refine to a full, formal spec, I'd take it to the SPDX tech group list immediately.

Thanks for your patience with me. It's been many years since I did any serious parser work. Pulled out my copy of Grune and Jacobs again, and realized suddenly how much I've forgotten. ☹️

motet-a · 2017-06-27T21:31:47Z

This single LICENSE-PLUS token type makes a lot of sense. Looks better to me than the current two separate tokens.

I think it is better to finish to tweak spdx-expression-parse in order to have a working prototype which complies to a potential future spec before sending any proposal. It is not perfect yet and we could write many more tests.

PS: Well, I have to learn English. My patience is also beneficial to me :-)

kemitchell · 2017-07-04T19:28:18Z

@motet-a: Agreed. Let's go ahead and release.

It would be nice to add the grammars we have implemented---parser and lexer---to README, with a note that they differ from the unclear specification in the standard.

motet-a · 2017-07-04T20:39:12Z

Oh, by the way, I wrote my thoughts in this gist.

motet-a · 2017-07-05T16:44:37Z

There is a related bug on the SPDX bugtracker.

kemitchell · 2017-07-06T06:03:39Z

@kestewart, @motet-a has done yeoman's work in his gist.

Run by Bill and Mark?

graingert · 2017-08-11T11:35:33Z

@motet-a the Tag:value format is specifically for RDFa. Not that RDFa can't handle missing brackets.

motet-a mentioned this issue Jun 17, 2017

Release 2.0.2 #19

Closed

motet-a changed the title ~~It seems no one really knows the SPDX spec~~ What does the spec says about parentheses? Jun 17, 2017

motet-a changed the title ~~What does the spec says about parentheses?~~ What does the spec say about parentheses? Jun 17, 2017

c-okelly mentioned this issue Aug 11, 2017

please correct your license to ne a vaild SPDX expression jrburke/amdefine#33

Closed

kemitchell mentioned this issue Aug 27, 2017

Add relaxed option to allow invalid licenses and exception names #16

Open

wking mentioned this issue Oct 12, 2017

appendix-IV: Refactor the license expression appendix spdx/spdx-spec#37

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What does the spec say about parentheses? #20

What does the spec say about parentheses? #20

motet-a commented Jun 17, 2017 •

edited

Loading

motet-a commented Jun 17, 2017

kemitchell commented Jun 25, 2017

motet-a commented Jun 26, 2017 •

edited

Loading

motet-a commented Jun 26, 2017

kemitchell commented Jun 26, 2017

goneall commented Jun 26, 2017

goneall commented Jun 26, 2017

kemitchell commented Jun 26, 2017

kemitchell commented Jun 26, 2017

kemitchell commented Jun 26, 2017

goneall commented Jun 27, 2017

goneall commented Jun 27, 2017 via email •

edited

Loading

motet-a commented Jun 27, 2017 •

edited

Loading

kemitchell commented Jun 27, 2017

motet-a commented Jun 27, 2017

motet-a commented Jun 27, 2017

kemitchell commented Jun 27, 2017

motet-a commented Jun 27, 2017

kemitchell commented Jul 4, 2017

motet-a commented Jul 4, 2017

motet-a commented Jul 5, 2017

kemitchell commented Jul 6, 2017

graingert commented Aug 11, 2017

What does the spec say about parentheses? #20

What does the spec say about parentheses? #20

Comments

motet-a commented Jun 17, 2017 • edited Loading

motet-a commented Jun 17, 2017

kemitchell commented Jun 25, 2017

motet-a commented Jun 26, 2017 • edited Loading

motet-a commented Jun 26, 2017

kemitchell commented Jun 26, 2017

goneall commented Jun 26, 2017

goneall commented Jun 26, 2017

kemitchell commented Jun 26, 2017

kemitchell commented Jun 26, 2017

kemitchell commented Jun 26, 2017

goneall commented Jun 27, 2017

goneall commented Jun 27, 2017 via email • edited Loading

motet-a commented Jun 27, 2017 • edited Loading

kemitchell commented Jun 27, 2017

motet-a commented Jun 27, 2017

motet-a commented Jun 27, 2017

kemitchell commented Jun 27, 2017

motet-a commented Jun 27, 2017

kemitchell commented Jul 4, 2017

motet-a commented Jul 4, 2017

motet-a commented Jul 5, 2017

kemitchell commented Jul 6, 2017

graingert commented Aug 11, 2017

motet-a commented Jun 17, 2017 •

edited

Loading

motet-a commented Jun 26, 2017 •

edited

Loading

goneall commented Jun 27, 2017 via email •

edited

Loading

motet-a commented Jun 27, 2017 •

edited

Loading