Skip to content

Commit

Permalink
Start bootstrap transpiler in Python
Browse files Browse the repository at this point in the history
Currently supports if-else statements and recursion
  • Loading branch information
thedavidchu committed Jan 7, 2024
0 parents commit 340de91
Show file tree
Hide file tree
Showing 29 changed files with 3,132 additions and 0 deletions.
24 changes: 24 additions & 0 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: Lint Code Base

on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]

jobs:
run-linter:
runs-on: ubuntu-latest
steps:
- name: Check out Git repository
uses: actions/checkout@v2

- name: Run Black (Python)
uses: psf/black@stable
with:
options: |
--verbose
--line-length=80
--exclude /(\.github|\.git|\.venv|\.vscode)/
src: "."
version: "22.3.0"
27 changes: 27 additions & 0 deletions .github/workflows/linux_compiler_test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: Linux Compiler CI

on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]

jobs:
run-linux-python3:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Compile
run: |
# For some reason, if we change to the compiler directory, Python complains.
export PYTHONPATH="${PYTHONPATH}:/home/runner/work/lolc/lolc/src/"
for x in fibonacci helloworld math_ops nested_if sum_three
do
python src/compiler/lol.py -i examples/$x.lol -o results
gcc results/$x-*.c
./a.out
done
68 changes: 68 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Ignore PyCharm and VS Code directory
.idea
.vscode/
# Ignore Python Venv
.venv/
venv/

# Ignore Python Caches
/results/
__pycache__/

# Ignore CMake's build directory
build/
cmake/
test/output/*

# Prerequisites
*.d

# Object files
*.o
*.ko
*.obj
*.elf

# Linker output
*.ilk
*.map
*.exp

# Precompiled Headers
*.gch
*.pch

# Libraries
*.lib
*.a
*.la
*.lo

# Shared objects (inc. Windows DLLs)
*.dll
*.so
*.so.*
*.dylib

# Executables
*.exe
*.out
*.app
*.i*86
*.x86_64
*.hex

# Debug files
*.dSYM/
*.su
*.idb
*.pdb

# Kernel Module Compile Results
*.mod*
*.cmd
.tmp_versions/
modules.order
Module.symvers
Mkfile.old
dkms.conf
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2022 David Chu

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
69 changes: 69 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Light Object-Oriented Language (LOL)

## Project Naming

I am trying to come up with an appropriate name for this project.
I like "Light Object Language" (because lol), but it's not really object-based.
Maybe "Light Open-Source Language" is better.
I was thinking of something short and close to the beginning of the alphabet (e.g.
"Ah").
I was musing with design objectives as well; my goals for this language are to be
(1) secure, (2) usable, and (3) performant in that order (or maybe.
This would lead to the acronym "sup", like "wassup".
As a mature individual (the "lol" notwithstanding), I'm not sure if this would be
great; it would also put my language in the middle-toward-the-end of the alphabet,
i.e. the most forgettable place ever.

## Project Goals

1. Expose the internals of the transpiler to the user (if they so choose).
2. Provide a (1) secure, (2) usable, and (3) performant language, in that order.
3. Allow the programmer to dump as much information into the compiler as they wish.
How much is used by the compiler is another question. (Is this a good idea?)
4. Don't make silly features.

## Bootstrapping

This project is to be bootstrapped in Python. Since it targets C, I can choose any
language for the bootstrap. I chose Python because of its rich standard library,
and I am faster at writing Python than C.

## Language Features

Make safety and usability/intuitiveness the priority. Then, performance can be
added for a little extra work (since much of the optimization will be done on a
small amount of the code).

- Drop in replacement for C
- Structs are in order
- Public functions, structs, etc. are linked with C
- Initially, I will emit C code (and then maybe target LLVM-IR or some other compiler framework's IR)
- Ability to give compile-time information (in square brackets)
- This can include ranges on integers (which must be proven at compile time or upon conversion, bound-checked at runtime)
- Inspired by Rust
- Memory safety, immutability, "reserved" (in C's context) by default
- Mark unowned data as "unowned"; otherwise, borrow check everything (but then lifetimes would be annoying to implement... how do they even work?)
- Enum type for errors -- but we could use unused portions of integer/other ranges (see above, compile-time information)
- Traits are cool
- I like the fact there is no inheritance
- Inspired by Zig
- No macros/preprocessor; compile-time running of any function
- I like the idea of having the memory allocator specified--efficient allocation is a huge problem, so it would be cool to specify something with the "allocator trait", to use Rust's terminology
- Inspired by Python
- Types can store methods--but maybe use the "::" syntax for any compile-time namespace stuff
- E.g. `int16::max` -- I guess this is a language feature and not in keeping with putting things in the standard library
- Have namespaces like Python, where you inherit the importing namespace's name
- E.g. `import math; math.sqrt(10)` instead of C++'s `#include <cmath>\n std::sqrt`
- I actually want to go further and use Nix's import syntax of `math = import("math");` or something... it's been a while since I wrote Nix
- Python's constructor syntax makes more intuitive sense to me than C++'s
- E.g. `x: ClassName = ClassName(arg0, arg1)` rather than `ClassName x(arg0, arg1);`. I just feel like the '=' makes everything clearer.
- Inspired by C
- Limited language features
- Optional standard library (I'll need to write wrappers for the C standard library)
- By default, struct layouts are like in C (which I believe is in the order the user specified?)
- Transparent what exactly is happening. No hidden function calls, no hidden operations (e.g. '<<' can be overloaded in C++)
- Syntax is inspired by C-family
- Inspired by C++
- Generics with templates. I'm going to use Python/Go's syntax
- Inspired by Java
- The name "interface" instead of Rust's "trait" makes more sense to me... I may just be missing the full idea of Rust traits.
181 changes: 181 additions & 0 deletions docs/CODING_STYLE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
# Coding Style

## Definitions

* Integral Value: any `char`, `int`, `enum`, etc.

## Compiler Version
The code in this project should be as compatible with C89 as possible. This is
because C89 remains the best supported version of C (e.g. MSVC). The compiler
shall be called with `-std=c99 -pedantic -Wall -Werror` (ohhh, this pains me. I
wish I could ask you to use C89, but it is so limiting).

However, C99 has some nice-to-have features. I will admit that I really want to

* IO Functionality: `snprintf()` (get the number of characters used) and
`printf("%zu", (size_t)x);`
* Named Structure Tags: `struct point p = {.x = 0, .y = 0};`
* New Libraries: `<stdbool.h>` and `<inttypes.h>`
* Make your own boolean library.
*
* Key Words: `inline`, `restrict`
* In this project, `#define` them to empty strings if you are compiling
wit C89.
* Variable initialization anywhere: `for (int i = 0; i < N; ++i) { ... }`
* Try to avoid this in this project
* Calls with Structures: `f(((struct point){.x = 0, .y = 0}))`
* Try to avoid this in this project
* C++ Style Comments: `// This is an inline comment`
* Don't use this in this project.

Needless to say, some of these are excellent additions to the language.

## Portability
While passing or returning structures from functions is an established part of
the C89 language, the binary implementation is compiler-dependent. While some
compilers may use registers, others may use memory. For this reason, we will
not pass structures between functions, but rather pass _pointers_ to structures.

Strictly speaking, external identifiers in C89 are only guaranteed to be
significant to 6 characters; internal identifiers are only guaranteed to be
significant to 31 characters. We will ignore this. This is simply a matter of
refactoring the code.

## Safety Conventions

### Initialization
All invalid pointers shall be set to NULL. My `mem_malloc()` function will
actually enforce this. Where difficulty with this arises is if we allocate a
structure containing pointers, but the structure has non-null pointers. It is
the user's job to set all these pointers to NULL as soon as the memory is
allocated. The function `mem_malloc()` does not do this.

All numeric values shall be set to `0`.

All structures shall be initialized using `{ 0 }`

### Bracing
All groups of code shall be braced (including no code). I believe this is inline
with MISRA's standard.

```c
while (x-- > 0) {
/* no op */
}
```

Yes, I know that you can put a semicolon at the end of the while statement. But
don't.


### Switch Statements
Switch statements should not have fall-through unless it is explicitly marked.
Moreover, every instance should have a default, even if it is impossible. If it
is deemed impossible to hit the default, then an assertion should be thrown.

### Increasing Integral Values
For any of the operators that grow an integral value, the bounds must be checked
before an operator. These include `x + y`, `x << y`, and `x * y`. An idiom to
check the validity of these operators is to apply them, then apply the inverse,
before checking for equality.

E.g. `x == (x + y) - y` implies a valid addition.

It is for this reason that my memory functions take in two arguments, so that
the user does not have to check the bounds on multiplication themselves.

Admittedly, the `++i` or `i++` operators can overflow. However, we will ignore
this fact for now. In a for-loop, they are checked upon every iteration.

### Division
A check for zero must be performed before doing any division operation. These
include `x / y` and `x % y`.

### Right Bit Shifting with Signed Integrals
Don't right bit shift with signed integrals. The implementation is compiler-
specific with regard to the sign extension.

Or at the very least, never do this with a negative signed integral. If I
remember correctly, positive signed integrals behave like unsigned integrals for
right bit shifts.

### Bitwise Operation Ordering
The order of bitwise operations is implementation defined. There is no short-
circuit logic, unlike logical boolean operations.

### Side Effects in Function Calls
No function call may use arguments with side-effects.

At the very least, do not rely on a particular order of side-effects when
calling a function. All of the side-effects will take place before the called
function is entered, however the order is compiler dependent.

### Strings
Unless the string appears directly as a `"<this is a string>"` in the code, do
not rely on it being null-terminated. Especially not if you used a copy to a
buffer. Unless you explicity copy in a null character at the end.

Where possible, store the length of a string along with the string.

Do not use unsafe standard library functions that rely on strings to be null-
terminated. Only use ones that have a known number of bytes to operate upon.


## Conventions

### Braces and If-Else Chains
The structure of braces shall follow K&R. The braces for functions shall follow
this pattern (contrary to K&R's usage).

```c
int function(int x, int y) {
if (x) {
/* ... */
} else if (y) {
/* ... */
} else {
/* ... */
}
}
```
### Labels and Switches
Labels shall be indented 1 less than the surrounding code.
```c
int function(int x) {
switch (x) {
case 0:
/* ... */
case 1:
/* ... */
case 2:
/* ... */
default:
/* ... */
}
goto cleanup:
cleanup:
/* ... */
}
```

### Use of Assertions
Use assertions for code where it is impossible to get to somewhere. Otherwise,
use a return statement for ease of testing. This means that assertions can be
used liberally for commenting.

Assertions should not be used for deployment code of legitimate cases (e.g. NULL
error handling). This makes the code untestable (because it exits the test
program).

### Use of macros as functions

If a macro behaves entirely like a function (i.e. arguments evaluated exactly
once each, no manipulating variables), then it can be named following the
function naming conventions.

Otherwise, it shall be upper-case to warn the user of potentially funky
behaviour.
Loading

0 comments on commit 340de91

Please sign in to comment.