You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We propose to change the language spec to support integer type in addition to string, list and function.
Why?
Mostly to increase code readability because today integer arithmetic are made using tools like expr and integer comparisons with test. We want to change code like:
for i = 0;$i< 10;$i++ {}
for i = 10;$i>= 0;$i-- {}
for;$i<$max; {}
# and so on
Command execution should continue to work only with quoted strings and common arguments, not affected by integers addition mostly to avoid automatic type coercions as operating system arguments to programs only accepts strings.
i = 1
echo $i+1 # results: runtime error: evaluated integer but command expects string
Before going into the proposal changes, some introduction to nash syntax related to command arguments is required to address some ambiguities that we'll need to work on to make it happen.
Nash specification
Currently there are 3 (three) types of literals in the syntax, but they correspond to only 2 (two) runtime types in the interpreter.
The syntax literals are: string, list and arg.
The string and arg are emitted as tokens token.String and token.Arg, respectively, by scanner.
The literals and tokens emitted by lexer can seen below:
But STRING and ARG literals are parsed into same node ast.StringExpr and then they are treated equal by the interpreter (they're both strings).
Arguments (Arg) are the program's arguments. They're unicode strings not quoted and can only be used there. Then, the syntax below is valid:
cat /etc/passwd
but this is invalid:
if$path == /etc/passwd {
# do something
}
The name token.Arg is not good because the lexer emits it to command path names also, not only arguments. But I'll leave this for future discussion.
The definition of what's an argument to nash is a trade off between script readability and command line (cli) ease of use.
Theoretically, the only global policy for path names is some ASCII character for path separator (eg.: / in linux/posix and \ on windows). As an example, directories and files names could be encoded with any character set on Linux whereas ending with NUL byte (including \n and \t are valid). Thus we need to limit nash understanding of an argument, because otherwise there's no syntax left for it...
But this unfortunately is wrong and only resembles the real implementation.
In code, an token.Arg is any non-blank unicode character not in the list below:
$ { } ( } [ ] > < " , ; |
and not starting with:
= + '!='
In other words, test+all is an argument, but +test+all does not. To use the rejected characters above in argument requires quoting.
Current implementation emits an token.Plus only when '+' starts the word but it's emitted in conjunction with a token.Arg if in the middle. The same for '='.
The '+', '=' and '!' aren't in the rejected list because they're valid path names and common used in arguments, but avoided in the beginning of words because of ambiguities in the language (see #114). After revisiting the language, apparently this is only true for '='.
Language changes
Variable declaration
The new EBNF to variable declaration must be (ignoring tuple assignment for simplicity):
Regarding commands, the new "-" operator must be disambiguated in the lexer. Mostly because syntax below are equivalent:
counter = n - 1 # IDENT ASSIGN IDENT MINUS INT
counter=n-1 # IDENT ASSIGN IDENT MINUS INT
but syntaxes below aren't:
fzf -q -1 # IDENT ARG ARG
fzf - q - 1 # IDENT MINUS IDENT MINUS INT
There's only two ways I can think of to fix the parser for that.
The first is making the lexer emit the tokens individually as it would emit for a common language (non shell), but then in the parser we look in the line/column of the token in the source code to convert it into "-1" or "- 1" in the arguments depending on the case... but hard to make this clear in the specification/documentation...
The second way is requiring quoting of '-' when close to numbers in commands. Eg.:
echo -1 # results: - 1 (see the space between - and 1)
echo "-1" # results: -1
echo 1 # results: 1
echo -test # results: -test
echo +1 # results: + 1 (see space)
chmod +x test.sh # works as expected
Lexer emitting Plus 1 for "+1" but emitting Arg to "+x".
Apart from the problem above, almost everything must work as before for commands. But some inconsistencies require discussion.
The "+" (concat operator in strings) is allowed to be used in command arguments, but the "+" (the sum operator in integers) isn't. Why?
As said before, arguments are always passed as string to the operating system. Thus, doesn't makes sense to support sums of integers in arguments if there's no automatic coercion of integers into strings. The same applies to "-" operator.
Interpreter
For the runtime, integers will be stored as signed integer with at least 32 bits in size (it will be a Go int type internally) with no support for changing that.
Nash will continue to be strongly typed, no automatic type coercion allowed, and because of that some built-in support for integer conversion to/from strings will be required.
Below is the two crucial builtins needed:
atoi(s)
Receives a string and return an integer or an error in case of invalid string.
itoa(i)
Receives an integer and returns the string representation of i.
Compatiblity
This isn't backward compatible (see Commands section above)
The text was updated successfully, but these errors were encountered:
After talking with @katcipis, @vitorarins and @lborguetti we decided in direction of not allowing expressions in arguments. The issues #200 and #199 were created to demand other language changes that are requirements for adding integer.
Integers
We propose to change the language spec to support integer type in addition to string, list and function.
Why?
Mostly to increase code readability because today integer arithmetic are made using tools like
expr
and integer comparisons withtest
. We want to change code like:To something like:
The change in the spec to add support for this will impact almost every non-terminal production rule.
Proposal
Below are the set of syntax snippets that will be valid if this were accepted:
Integer assignments
Conditionals
Loops:
Command execution should continue to work only with quoted strings and common arguments, not affected by integers addition mostly to avoid automatic type coercions as operating system arguments to programs only accepts strings.
Before going into the proposal changes, some introduction to nash syntax related to command arguments is required to address some ambiguities that we'll need to work on to make it happen.
Nash specification
Currently there are 3 (three) types of literals in the syntax, but they correspond to only 2 (two) runtime types in the interpreter.
The syntax literals are: string, list and arg.
The string and arg are emitted as tokens
token.String
andtoken.Arg
, respectively, by scanner.The literals and tokens emitted by lexer can seen below:
But STRING and ARG literals are parsed into same node
ast.StringExpr
and then they are treated equal by the interpreter (they're both strings).Arguments (Arg) are the program's arguments. They're unicode strings not quoted and can only be used there. Then, the syntax below is valid:
but this is invalid:
The name token.Arg is not good because the lexer emits it to command path names also, not only arguments. But I'll leave this for future discussion.
The definition of what's an argument to nash is a trade off between script readability and command line (cli) ease of use.
Theoretically, the only global policy for path names is some ASCII character for path separator (eg.: / in linux/posix and \ on windows). As an example, directories and files names could be encoded with any character set on Linux whereas ending with NUL byte (including \n and \t are valid). Thus we need to limit nash understanding of an argument, because otherwise there's no syntax left for it...
In our EBNF the argument is weakly defined as:
But this unfortunately is wrong and only resembles the real implementation.
In code, an token.Arg is any non-blank unicode character not in the list below:
and not starting with:
In other words,
test+all
is an argument, but+test+all
does not. To use the rejected characters above in argument requires quoting.Current implementation emits an
token.Plus
only when '+' starts the word but it's emitted in conjunction with a token.Arg if in the middle. The same for '='.The '+', '=' and '!' aren't in the rejected list because they're valid path names and common used in arguments, but avoided in the beginning of words because of ambiguities in the language (see #114).After revisiting the language, apparently this is only true for '='.Language changes
Variable declaration
The new EBNF to variable declaration must be (ignoring tuple assignment for simplicity):
If
Increment / Decrement
For loop
Commands
Regarding commands, the new "-" operator must be disambiguated in the lexer. Mostly because syntax below are equivalent:
but syntaxes below aren't:
There's only two ways I can think of to fix the parser for that.
The first is making the lexer emit the tokens individually as it would emit for a common language (non shell), but then in the parser we look in the line/column of the token in the source code to convert it into "-1" or "- 1" in the arguments depending on the case... but hard to make this clear in the specification/documentation...
The second way is requiring quoting of '-' when close to numbers in commands. Eg.:
Lexer emitting
Plus 1
for "+1" but emittingArg
to "+x".Apart from the problem above, almost everything must work as before for commands. But some inconsistencies require discussion.
The "+" (concat operator in strings) is allowed to be used in command arguments, but the "+" (the sum operator in integers) isn't. Why?
As said before, arguments are always passed as string to the operating system. Thus, doesn't makes sense to support sums of integers in arguments if there's no automatic coercion of integers into strings. The same applies to "-" operator.
Interpreter
For the runtime, integers will be stored as signed integer with at least 32 bits in size (it will be a Go
int
type internally) with no support for changing that.Nash will continue to be strongly typed, no automatic type coercion allowed, and because of that some built-in support for integer conversion to/from strings will be required.
Below is the two crucial builtins needed:
atoi(s)
Receives a string and return an integer or an error in case of invalid string.
itoa(i)
Receives an integer and returns the string representation of
i
.Compatiblity
This isn't backward compatible (see Commands section above)
The text was updated successfully, but these errors were encountered: