From d1c0bf8cdd7e308d3d8af37f56cc3a2cd9401872 Mon Sep 17 00:00:00 2001
From: Victor Lei <victorlei@gmail.com>
Date: Sat, 27 Aug 2016 23:58:40 +0300
Subject: [PATCH] docs

---
 NEWS.rst |  65 +++++++++
 SMOP.rst | 423 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 488 insertions(+)
 create mode 100644 NEWS.rst
 create mode 100644 SMOP.rst

diff --git a/NEWS.rst b/NEWS.rst
new file mode 100644
index 00000000..74bcecc9
--- /dev/null
+++ b/NEWS.rst
@@ -0,0 +1,65 @@
+=============
+Release notes
+=============
+August 27, 2016
+    Version 0.29 is out
+
+    Test suite for regression testing is ready,
+        with about 1000 m-scripts.  Scripts are translated
+        to python, then the resulting py-files are loaded,
+        but not yet run.
+        
+
+August 11, 2016
+    Version 0.28 is out. 
+
+    Line numbering information is included in the output.
+        Enabled by default.  Disabled with ``--no-numbers``.
+
+    Block comments are preserved now.
+        Lines, containing anything but a comment, are
+        preserved. Enabled by default.
+        Disabled with ``--no-comments``.
+
+    New command-line options
+        ``--no-resolve,--no-backend`` useful mostly for
+        debugging. New shortcuts -R -B -T -C -N
+        
+    Special ``%!`` comments are partially supported now.
+        They are mostly useful in testing, and
+        are used by Octave library and test suite. Disabled
+        by default. Enabled with ``--testing-mode``.
+        
+
+June 19,2016
+   After a year-long vacation I am back to active development.
+   My first goal is adopting the Octave runtime and test suite.
+
+October 23, 2014
+   Downloaded ``mybench`` -- a collection of 20 or so
+   micro-benchmarks originally meant to compare matlab and
+   octave performance.  After succesfully running the first nine,
+   the geometric mean of the speedup is 0.36,  which is cool.
+
+    ==   ========   ======    ===========    =======
+    //   name       octave    smop           speedup
+    ==   ========   ======    ===========    =======
+    1    rand       2.58      0.36           0.14
+    2    randn      2.26      1.04           0.46
+    3    primes     0.35      0.17           0.49
+    4    fft2       2.75      1.13           0.41
+    5    square     4.24      0              
+    6    inv        4.38      2.26           0.53
+    7    eig        17.95     9.09           0.51
+    8    qr         3.06      1.83           0.60
+    9    shur       5.98      2.31           0.39
+    10   roots      8.31      2.02           0.24
+    ==   ========   ======    ===========    =======
+
+October 15, 2014
+   Version 0.26.3 is available for beta testing.
+   Next version 0.27 is planned to compile octave
+   ``scripts`` library, which contains over 120 KLOC in
+   almost 1,000 matlab files. There  are 13 compilation
+   errors with smop 0.26.3 .
+
diff --git a/SMOP.rst b/SMOP.rst
new file mode 100644
index 00000000..7478d28b
--- /dev/null
+++ b/SMOP.rst
@@ -0,0 +1,423 @@
+Lexical analysis and parsing
+============================
+
+A. Reserved words
+    Reserved words are not reserved when used as fields.
+    So ``return=1`` is illegal, but ``foo.return=1`` is fine.
+
+#. Round, square, and curly brackets
+    In Python and in C it's simple -- function are called
+    using parentheses, and array subscripted with square
+    brackets.  In Matlab and in Fortran both are done with
+    parentheses, so there is no visual difference between
+    function call ``foo(x)``  and indexing array ``foo``.
+    
+#. Handling the white space.
+    There are four kinds of ws, each recognized by a dedicated
+    rule: NEWLINE, COMMENT, ELLIPSIS, and SPACES.  Only NEWLINE
+    returns a token, the three others silently discard their input.
+    NEWLINE collects adjacent ``\n`` characters and returns a
+    single SEMI token::
+
+        def t_NEWLINE(t):
+            r'\n+'
+            t.lexer.lineno += len(t.value)
+            if not t.lexer.parens and not t.lexer.braces:
+                t.value = ";"
+                t.type = "SEMI"
+                return t
+
+    Comments come in two flavors -- regular and MULTILINE.
+        The regular comments are  discarded, while MULTILINE
+        become doc strings, and are handled as expr statements.
+        regular consume everything from % or # up to but not
+        including \n::
+
+            (%|\#).*
+
+
+    Multiline COMMENT rule drops leading blanks, then eats everything
+    from ``%`` or ``#`` up to the end of line, `including` newline
+    character.  COMMENT leaves one newline character, or else funny
+    things happen::
+
+	@TOKEN(r"(%|\#).*")
+        def t_COMMENT(t):
+            if not options.do_magic or t.value[-1] != "!":
+                t.lexer.lexpos = t.lexer.lexdata.find("\n",t.lexer.lexpos)
+    
+    Multiline comments::
+
+        (^[ \t](%|\#).*\n)+
+
+    The pattern for multi-line comments works only if re.MULTILINE flag
+    is passed to ctor.
+
+    Comments starting with ``%!`` have a special meaning TBD
+
+    ELLIPSIS is the matlab way for line continuation.  It discards
+    everything between ``...`` and the newline character, including
+    the trailing ``\n``::
+
+        def t_ELLIPSIS(t):
+            r"\.\.\..*\n"
+            t.lexer.lineno += 1
+            pass
+
+    SPACES discards one or more horisontal space characters.  Not clear
+    what escape sequence is supposed to do::
+
+        def t_SPACES(t):
+            r"(\\\n|[ \t\r])+"
+            pass
+
+#. ``p_TRANSPOSE``
+    a. A quote, immediately following a reserved word is always a
+       ``STRING``. Implemented using inclusive state `afterkeyword`::
+
+            t.type = reserved.get(t.value,"IDENT")
+            if t.type != "IDENT" and t.lexer.lexdata[t.lexer.lexpos]=="'":
+                t.lexer.begin("afterkeyword")
+
+    b. A quote, immediately following any of: (1) an alphanumeric
+       charater, (2) right bracket, parenthesis or brace, or (3)
+       another ``TRANSPOSE``, is a ``TRANSPOSE``.  Otherwise, it
+       starts a string.  If the quote is separated from the term by
+       line continuation (...), matlab starts a string, so these
+       rules still holds.::
+
+           def t_TRANSPOSE(t):
+               r"(?<=\w|\]|\)|\})((\.')|')+"
+               # <---context ---><-quotes->
+               # We let the parser figure out what that mix of quotes and
+               # dot-quotes, which is kept in t.value, really means.
+               return t
+
+
+#. Keyword ``end`` -- expression and statement 
+    Any of: ``endwhile``, etc. are ``END_STMT``.  Otherwise, if ``end``
+    appears inside parentheses of any kind, it's ``END_EXPR``.
+    Otherwise, ``end`` is illegal.
+
+#. Optional ``end`` statement as function terminator
+    Inconsistency between Matlab and Octave, easily solved
+    if the lexer effectively handles the whitespace:: 
+
+       function : FUNCTION
+                | END_STMT SEMI FUNCTION
+
+    This usage is consistent with the other cases -- (1) statements start
+    with a keyword and are terminated by the SEMI token, and (2) the
+    lexer combines several comments, blanks, and other junk as one
+    SEMI token.  Compare parse.py rule for RETURN statement.
+
+#. Semicolon as statement terminator, as column separator in matrices.
+    Comma, semicolon, and newline are statement terminators.  In
+    matrix expressiions, whitespace is significant and separates elements
+    just as comma does.
+
+#. Matrix state
+    In matrix state, consume whitespace separating two terms and
+    return a fake ``COMMA`` token.  This allows parsing ``[1 2 3]`` as
+    if it was ``[1,2,3]``.  Handle with care: ``[x + y]`` vs ``[x +y]``
+
+    Term T is::
+
+       (1) a name or a number
+       (2) literal string using single or doble quote
+       (3) (T) or [T] or {T} or T' or +T or -T
+
+    Terms end with::
+
+       (1) an alphanumeric charater \w
+       (2) single quote (in octave also double-quote)
+       (3) right parenthesis, bracket, or brace
+       (4) a dot (after a number, such as 3. 
+
+    The pattern for whitespace accounts for ellipsis as a
+    whitespace, and for the trailing junk.
+
+    Terms start with::
+
+        (1) an alphanumeric character
+        (2) a single or double quote,
+        (3) left paren, bracket, or brace and finally
+        (4) a dot before a digit, such as .3  .
+
+        TODO: what about curly brackets ???
+        TODO: what about dot followed by a letter, as in field
+        [foo  .bar]
+        
+        t.lexer.lineno += t.value.count("\n")
+        t.type = "COMMA"
+        return t
+
+
+Class matlabarray
+=================
+
+Matlab arrays differ from numpy arrays in many ways, and class
+matlabarray captures the following differences:
+
+A. Base-one indexing
+    Following Fortran tradition, matlab starts array indexing with
+    one, not with zero. Correspondingly, the last element of a
+    N-element array is N, not N-1.
+
+#. C_CONTIGUOUS and F_CONTIGUOUS data layout
+    Matlab matrix elements are ordered in columns-first, aka
+    F_CONTIGUOUS order.  By default, numpy arrays are C_CONTIGUOUS.
+    Instances of matlabarray are F_CONTIGUOUS, except if created
+    empty, in which case they are C_CONTIGUOUS.
+    
+    +-----------------------+--------------------------------------+
+    | matlab                | numpy                                |
+    +=======================+======================================+
+    |::                     |::                                    |
+    |                       |                                      |
+    |  > reshape(1:4,[2 2]) |   >>> a=matlabarray([1,2,3,4])       |
+    |  1 3                  |   >>> a.reshape(2,2,order="F")       |
+    |  2 4                  |   1 3                                |
+    |                       |   2 4                                |
+    |                       |                                      |
+    |                       |   >>> a.reshape(2,2,order="C")       |
+    |                       |   1 2                                |
+    |                       |   3 4                                |
+    +-----------------------+--------------------------------------+
+
+    >>> a=matlabarray([1,2,3,4])
+    >>> a.flags.f_contiguous
+    True
+    >>> a.flags.c_contiguous
+    False
+  
+    >>> a=matlabarray()
+    >>> a.flags.c_contiguous
+    True
+    >>> a.flags.f_contiguous
+    False
+
+#. Auto-expanding arrays
+    Arrays are auto-expanded on out-of-bound assignment. Deprecated,
+    this feature is widely used in legacy code.  In smop, out-of-bound
+    assignment is fully supported for row and column vectors, and for
+    their generalizations having shape
+    
+        [1 1 ... N ... 1 1 1]
+
+    These arrays may be resized along their only non-singular
+    dimension.  For other arrays, new columns can be added to
+    F_CONTIGUOUS arrays, and new rows can be added to C_CONTIGUOUS
+    arrays.
+
+    +----------------------------+----------------------------------+
+    | matlab                     | numpy                            |
+    +============================+==================================+
+    |::                          |::                                |
+    |                            |                                  |
+    |  > a=[]                    |   >>> a=matlabarray()            |
+    |  > a(1)=123                |   >>> a[1]=123                   |
+    |  > a                       |   >>> a                          |
+    |  123                       |   123                            |
+    |                            |                                  |
+    +----------------------------+----------------------------------+
+
+#. Create by update
+    In matlab, arrays can be created by updating a non-existent array,
+    as in the following example:
+
+    >>> clear a
+    >>> a(17) = 42
+
+    This unique feature is not yet supported by smop, but can be
+    worked around by inserting assignments into the original matlab
+    code:
+
+    >>> a = []
+    >>> a(17) = 42
+
+#. Assignment as copy
+    Array data is not shared by copying or slice indexing. Instead
+    there is copy-on-write.
+
+#. Everything is a matrix
+    There are no zero or one-dimensional arrays. Scalars are
+    two-dimensional rather than zero-dimensional as in numpy.
+
+#. Single subscript implies ravel.
+    TBD
+
+#. Broadcasting
+    Boadcasting rules are different
+
+#. Boolean indexing
+    TBD
+
+#. Character string constants and escape sequences [ffd52d5fc5]
+    In Matlab, character strings are enclosed in single quotes, like
+    ``'this'``, and escape sequences are not recognized::
+
+        matlab> size('hello\n')
+        1   7
+
+    There are seven (!) characters in ``'hello\n'``, the last two being
+    the backslash and the letter ``n``.
+
+    Two consecutive quotes are used to put a quote into a string::
+
+        matlab> 'hello''world'
+        hello'world
+
+    In Octave, there are two kinds of strings: octave-style (enclosed
+    in double quotes), and matlab-style (enclosed in single quotes).
+    Octave-style strings do understand escape sequences::
+
+        matlab> size("hello\n")
+        1   6
+
+    There are six characters in ``"hello\n"``, the last one being
+    the newline character.
+
+    Octave recognizes the same escape sequnces as C:: 
+
+        \"  \a  \b  \f  \r  \t  \0  \v  \n  \\ \nnn \xhh
+
+    where n is an octal digit and h is a hexadecimal digit.
+
+    Finally, two consecutive double-quote characters become a single
+    one, like here::
+
+        octave> "hello""world"
+        hello"world
+
+----------------------------------------------------------------------
+
+Data structures
+===============
+
+A. Empty vector [], empty string "", and empty cellarray {}
+    +----------------------------+----------------------------------+
+    | matlab                     | numpy                            |
+    +============================+==================================+
+    | ::                         | ::                               |
+    |                            |                                  |
+    |   > size([])               |   >>> matlabarray().shape        |
+    |   0 0                      |   (0, 0)                         |
+    |                            |                                  |
+    |   > size('')               |   >>> char().shape               |
+    |   0 0                      |   (0, 0)                         |
+    |                            |                                  |
+    |   > size({})               |   >>> cellarray().shape          |
+    |   0 0                      |   (0, 0)                         |
+    +----------------------------+----------------------------------+
+    
+
+#. Scalars are 1x1 matrices
+    +----------------------------+----------------------------------+
+    | matlab                     | numpy                            |
+    +============================+==================================+
+    | ::                         | ::                               |
+    |                            |                                  |
+    |   > a=17                   |   >>> a=matlabarray(17)          |
+    |   > size(a)                |   >>> a.shape                    |
+    |   1 1                      |   1 1                            |
+    |                            |                                  |
+    +----------------------------+----------------------------------+
+
+#. Rectangular char arrays
+    Class char inherits from class matlabarray the usual matlab array
+    behaviour -- base-1 indexing, Fortran data order, auto-expand on
+    out-of-bound assignment, etc.
+
+    +----------------------------+----------------------------------+
+    | matlab                     | numpy                            |
+    +============================+==================================+
+    | ::                         | ::                               |
+    |                            |                                  |
+    |   > s='helloworld'         |   >>> s=char('helloworld')       |
+    |   > size(s)                |   >>> print size_(s)             |
+    |   1 10                     |   (1,10)                         |
+    |   > s(1:5)='HELLO'         |   >>> s[1:5]='HELLO'             |
+    |   > s                      |   >>> print s                    |
+    |   HELLOworld               |   HELLOworld                     |
+    |   > resize(s,[2 5])        |   >>> print resize_(s,[2,5])     |
+    |   HELLO                    |   HELLO                          |
+    |   world                    |   world                          |
+    +----------------------------+----------------------------------+
+
+#. Row vector
+    +----------------------------+----------------------------------+
+    | matlab                     | numpy                            |
+    +============================+==================================+
+    | ::                         | ::                               |
+    |                            |                                  |
+    |  > s=[1 2 3]               |   >>> s=matlabarray([1,2,3])     |
+    |                            |                                  |
+    +----------------------------+----------------------------------+
+
+
+#. Column vector
+    +----------------------------+----------------------------------+
+    | matlab                     | numpy                            |
+    +============================+==================================+
+    |::                          |::                                |
+    |                            |                                  |
+    |  > a=[1;2;3]               |   >>> a=matlabarray([[1],        |
+    |                            |                      [2],        |
+    |                            |                      [2]])       |
+    |  > size(a)                 |   >>> a.shape                    |
+    |  3 1                       |   (3, 1)                         |
+    +----------------------------+----------------------------------+
+
+
+#. Cell arrays
+    Cell arrays subclass matlabarray and inherit the usual matlab
+    array behaviour -- base-1 indexing, Fortran data order, expand on
+    out-of-bound assignment, etc. Unlike matlabarray, each element of
+    cellarray holds a python object.
+
+    +----------------------------+----------------------------------+
+    | matlab                     | numpy                            |
+    +============================+==================================+
+    |::                          |::                                |
+    |                            |                                  |
+    |  > a = { 'abc', 123 }      |   >>> a=cellarray(['abc',123])   |
+    |  > a{1}                    |   >>> a[1]                       |
+    |  abc                       |   abc                            |
+    +----------------------------+----------------------------------+
+
+#. Cell arrays of strings
+    In matlab, cellstrings are cell arrays, where each cell contains a
+    char object.  In numpy, class cellstring derives from matlabarray,
+    and each cell contains a native python string (not a char
+    instance).
+
+    +----------------------------+----------------------------------+
+    | matlab                     | numpy                            |
+    +============================+==================================+
+    |::                          |::                                |
+    |                            |                                  |
+    |  > a = { 'abc', 'hello' }  |   >>> a=cellstring(['abc',       |
+    |                            |                     'hello'])    |
+    |  > a{1}                    |   >>> a[1]                       |
+    |  abc                       |   abc                            |
+    +----------------------------+----------------------------------+
+
+----------------------------------------------------------------------
+
+Data structures
+    All matlab data structures subclass from matlabarray
+
+Structs
+    TBD
+
+Function pointers
+    Handles @
+
+String concatenation
+    Array concatenation not implemented
+
+    >>> ['hello' 'world']
+    helloworld
+
+.. vim: tw=70:sw=2