-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What does the spec say about parentheses? #20
Comments
I just found out this nice Python library which also does not require encapsulating complex expressions with parentheses. There are a lot of tests. Note it is quite relaxed (and not so spec-compliant) since the |
@motet-a: Great stuff, as always. You are right that the spec is unclear and incomplete. We could definitely go to SPDX and suggest improvements. I've corresponded with the leading force behind the expression syntax in the past. It would be nice to bring a concrete suggestion with us if we do. Like (E)BNF. My own personal preference would be to require parentheses. That means more parentheses, but also less variation. Since they nest, the grammar must support expressions in parentheses. Simple expressions without parentheses are the "extra feature". Side Note: Don't trust the |
So, commit e9bea12 is a quick draft of this change. Here is the new, license-ref = [DOCUMENTREF ":"] LICENSEREF
license-plus = LICENSE["+"]
postfixed-license = (license-ref / license-plus) ["WITH" EXCEPTION]
parenthesized-expression = "(" expression ")"
atom = parenthesized-expression / postfixed-license
and-expression = atom ["AND" and-expression]
or-expression = and-expression ["OR" or-expression]
expression = or-expression
tag-value-format-license-expression = parenthesized-expression /
license-id /
license-ref Note:
I think we need some kind of configuration object, like the one introduced in #16. Maybe it could be non-breaking. Do you have any thoughts on this? |
Moreover, the original license-ref = ["DocumentRef-"1*(idstring)":"]"LicenseRef-"1*(idstring)
If we assume there are no spaces between tokens, |
I think that since the spec is broken, we should try to fix the spec. If we succeed, it will be clear what we should do in the software: Follow the spec! That will help our friends doing Python and other implementations, too. @goneall, any chance the Java SPDX tooling has an SPDX expression parser we should know about? |
As background and a partial explanation, the use of parenthesis for SPDX license expressions is somewhat confusing and has been heavily discussed in the SPDX tech work group. The conclusion of the discussions was that the parenthesis is not formally required a complex license expression EXCEPT when a complex license expression is used inside an SPDX tag/value file. The reason for this is the parsing is ambiguous and unimplementable in a tag/value file without some what of delineating the start and end of the license expression (simple expressions use white space as the delimiter). We agreed to use parentheses to delineate the beginning and ending of a license expression. @kestewart You may want to review the comments in this issue for spec improvements. |
@kemitchell The SPDX Java tools license expression parser can be found at https://github.com/spdx/tools/blob/master/src/org/spdx/rdfparser/license/LicenseExpressionParser.java |
@motet-a, looks like they handle |
@goneall: Any chance the Java implementer worked up an alternative grammar before coding? |
@goneall: Do I understand correctly that per the tech group compromise, tag/value files can still have |
@kemitchell I don't believe an alternative grammar was used - I believe it was hand coded from the spec. |
Correct
@goneall <https://github.com/goneall> : Do I understand correctly that per the tech group compromise, tag/value files can still have Apache-2.0, as opposed to (Apache-2.0)?
—
|
@kemitchell I wrote a few tokenizers for programming languages where one does not simply use String.prototype.split() for many reasons (literal strings are the biggest problem) but in this case, it suprisingly seems to work. In our case, the "no spaces before +" assertion is also a single |
@motet-a I like your solution a lot. If I were going to port to another language, I'd start with your code. I'd still like to propose a spec revision that encodes all the relevant parser logic in the grammar. Any issue with treating
|
@kemitchell This is exactly how most programming languages work. Most of them have two grammars:
I think the SPDX spec could be much clearer with such a separation. Moreover, I think the parser grammar in the spec should specify the operator precedence. The actual one is ambiguous: compound-expression = simple-expression /
simple-expression "WITH" license-exception-id /
compound-expression "AND" compound-expression /
compound-expression "OR" compound-expression /
"(" compound-expression ")" According to this, @kemitchell In the formal parser grammar, one could write |
@kemitchell |
@motet-a: Forgive me. I was not clear. I was proposing to emit different token types for
Then address the difference in the lexical grammar. By the way, any chance you typed out a lexical grammar for the recent patch to Thanks for your patience with me. It's been many years since I did any serious parser work. Pulled out my copy of Grune and Jacobs again, and realized suddenly how much I've forgotten. |
This single I think it is better to finish to tweak PS: Well, I have to learn English. My patience is also beneficial to me :-) |
@motet-a: Agreed. Let's go ahead and release. It would be nice to add the grammars we have implemented---parser and lexer---to README, with a note that they differ from the unclear specification in the standard. |
Oh, by the way, I wrote my thoughts in this gist. |
There is a related bug on the SPDX bugtracker. |
@kestewart, @motet-a has done yeoman's work in his gist. Run by Bill and Mark? |
@motet-a the Tag:value format is specifically for RDFa. Not that RDFa can't handle missing brackets. |
The SPDX license expression spec can be interpreted in many different and incompatible ways about the parentheses.
Actually, the new parser considers
LGPL-2.1 AND MIT
as a valid expression, but many people are saying that complex expressions must be encapsulated with parentheses, and thusLGPL-2.1 AND MIT
is invalid. For example:There are similar parentheses in the examples of the npm doc.
I wrote the new parser mostly according to the formal grammar in the spec. And this grammar allows to omit those parentheses. Here is the proof:
LGPL-2.1
is asimple-expression
MIT
is also asimple-expression
simple-expression
is also acompound-expression
LGPL-2.1 AND MIT
is acompound-expression
compound-expression
is alicense-expression
It looks right.
However, it gets strange just below the grammar:
(I'm not here to blame the spec but it looks really strange. Why require parentheses? Or why not require parentheses in anyway? It would be much simpler.)
See also this in the SPDX wiki.
So:
package.json
file in Tag:value format?tagValueFormat
option?tagValueFormat
option betrue
by default?Thanks.
The text was updated successfully, but these errors were encountered: