Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

language: add support for integers #181

Open
i4ki opened this issue Feb 21, 2017 · 3 comments
Open

language: add support for integers #181

i4ki opened this issue Feb 21, 2017 · 3 comments
Assignees

Comments

@i4ki
Copy link
Collaborator

i4ki commented Feb 21, 2017

Integers

We propose to change the language spec to support integer type in addition to string, list and function.

Why?

Mostly to increase code readability because today integer arithmetic are made using tools like expr and integer comparisons with test. We want to change code like:

i <= expr $i "+" 1

-test $i -lt 2 

if $status == "0" {
    
}

To something like:

i = $i + 1 # or i++

if $i < 2 {
    
}

The change in the spec to add support for this will impact almost every non-terminal production rule.

Proposal

Below are the set of syntax snippets that will be valid if this were accepted:

Integer assignments

counter = 1
$counter++
$counter--
counter = 1+1
counter = 10 - 1
counter = $n + 1

Conditionals

if 0 < 1 {}
if $counter >= 10 {}
if $counter <= $maxValue {}
if $counter <= getMax() {}

Loops:

for i = 0; $i < 10; $i++ {}
for i = 10; $i >= 0; $i-- {}
for ;$i < $max; {}
# and so on

Command execution should continue to work only with quoted strings and common arguments, not affected by integers addition mostly to avoid automatic type coercions as operating system arguments to programs only accepts strings.

i = 1
echo $i+1	# results: runtime error: evaluated integer but command expects string

Before going into the proposal changes, some introduction to nash syntax related to command arguments is required to address some ambiguities that we'll need to work on to make it happen.

Nash specification

Currently there are 3 (three) types of literals in the syntax, but they correspond to only 2 (two) runtime types in the interpreter.
The syntax literals are: string, list and arg.

The string and arg are emitted as tokens token.String and token.Arg, respectively, by scanner.

The literals and tokens emitted by lexer can seen below:

"Dade Murphy"	# token.String
/etc/passwd	# token.Arg
bin/cmd		# token.Arg
/bin/cat	# token.Arg
search-engine		# token.Arg
("zero" "cool")	# token.LParen, token.String, token.String, token.RParen

But STRING and ARG literals are parsed into same node ast.StringExpr and then they are treated equal by the interpreter (they're both strings).

Arguments (Arg) are the program's arguments. They're unicode strings not quoted and can only be used there. Then, the syntax below is valid:

cat /etc/passwd

but this is invalid:

if $path == /etc/passwd {
    # do something
}

The name token.Arg is not good because the lexer emits it to command path names also, not only arguments. But I'll leave this for future discussion.

The definition of what's an argument to nash is a trade off between script readability and command line (cli) ease of use.

Theoretically, the only global policy for path names is some ASCII character for path separator (eg.: / in linux/posix and \ on windows). As an example, directories and files names could be encoded with any character set on Linux whereas ending with NUL byte (including \n and \t are valid). Thus we need to limit nash understanding of an argument, because otherwise there's no syntax left for it...

In our EBNF the argument is weakly defined as:

arg  = ( unicode_char { unicode_char } )  .

But this unfortunately is wrong and only resembles the real implementation.

In code, an token.Arg is any non-blank unicode character not in the list below:

$ { } ( } [  ] > < " , ; | 

and not starting with:

= + '!=' 

In other words, test+all is an argument, but +test+all does not. To use the rejected characters above in argument requires quoting.

Current implementation emits an token.Plus only when '+' starts the word but it's emitted in conjunction with a token.Arg if in the middle. The same for '='.

The '+', '=' and '!' aren't in the rejected list because they're valid path names and common used in arguments, but avoided in the beginning of words because of ambiguities in the language (see #114). After revisiting the language, apparently this is only true for '='.

Language changes

Variable declaration

The new EBNF to variable declaration must be (ignoring tuple assignment for simplicity):

/* Variable declaration */
varDecl      = assignValue | assignCmdOut .
assignValue  = identifier "=" varSpec .
varSpec      = ( listLit | valueExpr) .
valueExpr =  ( stringLit | intLit | variable) [ "+" valueExpr ] .
assignCmdOut = identifier "<=" ( command | fnInv ) .

stringLit   = "\"" { unicode_char | newline } "\"" .
intLit = { unicode_digit } .

If

value = ( variable | string | integer | list | fnInv) .
comparison = ( "==" | "<" | ">" | "<=" | ">=" ) .
/* If-else-if */
ifDecl = "if" value comparison value "{" program "}"
         [ "else" "{" program "}" ]
         [ "else" ifDecl ] .

Increment / Decrement

inc = variable "+" "+" .
dev = variable "-" "-" .
incDec = (inc | dec) .

For loop

/* For loop */
forDecl = "for" [ forIn | forAdvanced ] "{" program "}" .
forIn     = identifier "in" ( list | variable | funcall) .
forAdvanced = [ assignValue ] ";" [ condition ] ";" [ valueExpr | incDec ] .

Commands

Regarding commands, the new "-" operator must be disambiguated in the lexer. Mostly because syntax below are equivalent:

counter = n - 1 # IDENT ASSIGN IDENT MINUS INT
counter=n-1 # IDENT ASSIGN IDENT MINUS INT

but syntaxes below aren't:

fzf -q -1 # IDENT ARG ARG
fzf - q - 1 # IDENT MINUS IDENT MINUS INT

There's only two ways I can think of to fix the parser for that.
The first is making the lexer emit the tokens individually as it would emit for a common language (non shell), but then in the parser we look in the line/column of the token in the source code to convert it into "-1" or "- 1" in the arguments depending on the case... but hard to make this clear in the specification/documentation...

The second way is requiring quoting of '-' when close to numbers in commands. Eg.:

echo -1 # results: - 1  (see the space between - and 1)
echo "-1" # results: -1
echo 1 # results: 1
echo -test # results: -test
echo +1 # results: + 1 (see space)
chmod +x test.sh # works as expected

Lexer emitting Plus 1 for "+1" but emitting Arg to "+x".

Apart from the problem above, almost everything must work as before for commands. But some inconsistencies require discussion.

The "+" (concat operator in strings) is allowed to be used in command arguments, but the "+" (the sum operator in integers) isn't. Why?

As said before, arguments are always passed as string to the operating system. Thus, doesn't makes sense to support sums of integers in arguments if there's no automatic coercion of integers into strings. The same applies to "-" operator.

Interpreter

For the runtime, integers will be stored as signed integer with at least 32 bits in size (it will be a Go int type internally) with no support for changing that.
Nash will continue to be strongly typed, no automatic type coercion allowed, and because of that some built-in support for integer conversion to/from strings will be required.

Below is the two crucial builtins needed:

  • atoi(s)

Receives a string and return an integer or an error in case of invalid string.

  • itoa(i)

Receives an integer and returns the string representation of i.

Compatiblity

This isn't backward compatible (see Commands section above)

@uisso
Copy link

uisso commented Mar 20, 2017

Well, it will become a welcomed resource to us.

@i4ki
Copy link
Collaborator Author

i4ki commented Apr 1, 2017

After talking with @katcipis, @vitorarins and @lborguetti we decided in direction of not allowing expressions in arguments. The issues #200 and #199 were created to demand other language changes that are requirements for adding integer.

I'll update this issue description soon.

@katcipis
Copy link
Member

katcipis commented Apr 3, 2017

@tiago4orion it would be good to merge the new reference docs and already add the specs for integers on it :-)

https://github.com/NeowayLabs/nash/pull/162

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants