language: add support for integers #181

i4ki · 2017-02-21T20:49:35Z

Integers

We propose to change the language spec to support integer type in addition to string, list and function.

Why?

Mostly to increase code readability because today integer arithmetic are made using tools like expr and integer comparisons with test. We want to change code like:

i <= expr $i "+" 1

-test $i -lt 2 

if $status == "0" {
    
}

To something like:

i = $i + 1 # or i++

if $i < 2 {
    
}

The change in the spec to add support for this will impact almost every non-terminal production rule.

Proposal

Below are the set of syntax snippets that will be valid if this were accepted:

Integer assignments

counter = 1
$counter++
$counter--
counter = 1+1
counter = 10 - 1
counter = $n + 1

Conditionals

if 0 < 1 {}
if $counter >= 10 {}
if $counter <= $maxValue {}
if $counter <= getMax() {}

Loops:

for i = 0; $i < 10; $i++ {}
for i = 10; $i >= 0; $i-- {}
for ;$i < $max; {}
# and so on

Command execution should continue to work only with quoted strings and common arguments, not affected by integers addition mostly to avoid automatic type coercions as operating system arguments to programs only accepts strings.

i = 1
echo $i+1	# results: runtime error: evaluated integer but command expects string

Before going into the proposal changes, some introduction to nash syntax related to command arguments is required to address some ambiguities that we'll need to work on to make it happen.

Nash specification

Currently there are 3 (three) types of literals in the syntax, but they correspond to only 2 (two) runtime types in the interpreter.
The syntax literals are: string, list and arg.

The string and arg are emitted as tokens token.String and token.Arg, respectively, by scanner.

The literals and tokens emitted by lexer can seen below:

"Dade Murphy"	# token.String
/etc/passwd	# token.Arg
bin/cmd		# token.Arg
/bin/cat	# token.Arg
search-engine		# token.Arg
("zero" "cool")	# token.LParen, token.String, token.String, token.RParen

But STRING and ARG literals are parsed into same node ast.StringExpr and then they are treated equal by the interpreter (they're both strings).

Arguments (Arg) are the program's arguments. They're unicode strings not quoted and can only be used there. Then, the syntax below is valid:

cat /etc/passwd

but this is invalid:

if $path == /etc/passwd {
    # do something
}

The name token.Arg is not good because the lexer emits it to command path names also, not only arguments. But I'll leave this for future discussion.

The definition of what's an argument to nash is a trade off between script readability and command line (cli) ease of use.

Theoretically, the only global policy for path names is some ASCII character for path separator (eg.: / in linux/posix and \ on windows). As an example, directories and files names could be encoded with any character set on Linux whereas ending with NUL byte (including \n and \t are valid). Thus we need to limit nash understanding of an argument, because otherwise there's no syntax left for it...

In our EBNF the argument is weakly defined as:

arg  = ( unicode_char { unicode_char } )  .

But this unfortunately is wrong and only resembles the real implementation.

In code, an token.Arg is any non-blank unicode character not in the list below:

$ { } ( } [  ] > < " , ; |

and not starting with:

= + '!='

In other words, test+all is an argument, but +test+all does not. To use the rejected characters above in argument requires quoting.

Current implementation emits an token.Plus only when '+' starts the word but it's emitted in conjunction with a token.Arg if in the middle. The same for '='.

~~The '+', '=' and '!' aren't in the rejected list because they're valid path names and common used in arguments, but avoided in the beginning of words because of ambiguities in the language (see #114).~~ After revisiting the language, apparently this is only true for '='.

Language changes

Variable declaration

The new EBNF to variable declaration must be (ignoring tuple assignment for simplicity):

/* Variable declaration */
varDecl      = assignValue | assignCmdOut .
assignValue  = identifier "=" varSpec .
varSpec      = ( listLit | valueExpr) .
valueExpr =  ( stringLit | intLit | variable) [ "+" valueExpr ] .
assignCmdOut = identifier "<=" ( command | fnInv ) .

stringLit   = "\"" { unicode_char | newline } "\"" .
intLit = { unicode_digit } .

If

value = ( variable | string | integer | list | fnInv) .
comparison = ( "==" | "<" | ">" | "<=" | ">=" ) .
/* If-else-if */
ifDecl = "if" value comparison value "{" program "}"
         [ "else" "{" program "}" ]
         [ "else" ifDecl ] .

Increment / Decrement

inc = variable "+" "+" .
dev = variable "-" "-" .
incDec = (inc | dec) .

For loop

/* For loop */
forDecl = "for" [ forIn | forAdvanced ] "{" program "}" .
forIn     = identifier "in" ( list | variable | funcall) .
forAdvanced = [ assignValue ] ";" [ condition ] ";" [ valueExpr | incDec ] .

Commands

Regarding commands, the new "-" operator must be disambiguated in the lexer. Mostly because syntax below are equivalent:

counter = n - 1 # IDENT ASSIGN IDENT MINUS INT
counter=n-1 # IDENT ASSIGN IDENT MINUS INT

but syntaxes below aren't:

fzf -q -1 # IDENT ARG ARG
fzf - q - 1 # IDENT MINUS IDENT MINUS INT

There's only two ways I can think of to fix the parser for that.
The first is making the lexer emit the tokens individually as it would emit for a common language (non shell), but then in the parser we look in the line/column of the token in the source code to convert it into "-1" or "- 1" in the arguments depending on the case... but hard to make this clear in the specification/documentation...

The second way is requiring quoting of '-' when close to numbers in commands. Eg.:

echo -1 # results: - 1  (see the space between - and 1)
echo "-1" # results: -1
echo 1 # results: 1
echo -test # results: -test
echo +1 # results: + 1 (see space)
chmod +x test.sh # works as expected

Lexer emitting Plus 1 for "+1" but emitting Arg to "+x".

Apart from the problem above, almost everything must work as before for commands. But some inconsistencies require discussion.

The "+" (concat operator in strings) is allowed to be used in command arguments, but the "+" (the sum operator in integers) isn't. Why?

As said before, arguments are always passed as string to the operating system. Thus, doesn't makes sense to support sums of integers in arguments if there's no automatic coercion of integers into strings. The same applies to "-" operator.

Interpreter

For the runtime, integers will be stored as signed integer with at least 32 bits in size (it will be a Go int type internally) with no support for changing that.
Nash will continue to be strongly typed, no automatic type coercion allowed, and because of that some built-in support for integer conversion to/from strings will be required.

Below is the two crucial builtins needed:

atoi(s)

Receives a string and return an integer or an error in case of invalid string.

itoa(i)

Receives an integer and returns the string representation of i.

Compatiblity

This isn't backward compatible (see Commands section above)

The text was updated successfully, but these errors were encountered:

uisso · 2017-03-20T11:28:25Z

Well, it will become a welcomed resource to us.

i4ki · 2017-04-01T23:36:44Z

After talking with @katcipis, @vitorarins and @lborguetti we decided in direction of not allowing expressions in arguments. The issues #200 and #199 were created to demand other language changes that are requirements for adding integer.

I'll update this issue description soon.

katcipis · 2017-04-03T18:12:52Z

@tiago4orion it would be good to merge the new reference docs and already add the specs for integers on it :-)

https://github.com/NeowayLabs/nash/pull/162

i4ki assigned katcipis Mar 19, 2017

i4ki added the enhancement label Mar 19, 2017

i4ki assigned matheusvill, vitorarins and lborguetti Mar 19, 2017

i4ki mentioned this issue Aug 27, 2018

Arguments of expr sum are concatenated in nash #270

Closed

katcipis unassigned lborguetti Aug 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

language: add support for integers #181

language: add support for integers #181

i4ki commented Feb 21, 2017 •

edited

Loading

uisso commented Mar 20, 2017

i4ki commented Apr 1, 2017

katcipis commented Apr 3, 2017

language: add support for integers #181

language: add support for integers #181

Comments

i4ki commented Feb 21, 2017 • edited Loading

Integers

Why?

Proposal

Nash specification

Language changes

Variable declaration

If

Increment / Decrement

For loop

Commands

Interpreter

Compatiblity

uisso commented Mar 20, 2017

i4ki commented Apr 1, 2017

katcipis commented Apr 3, 2017

i4ki commented Feb 21, 2017 •

edited

Loading