-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Start bootstrap transpiler in Python
Currently supports if-else statements and recursion
- Loading branch information
0 parents
commit 340de91
Showing
29 changed files
with
3,132 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
name: Lint Code Base | ||
|
||
on: | ||
push: | ||
branches: [ "main" ] | ||
pull_request: | ||
branches: [ "main" ] | ||
|
||
jobs: | ||
run-linter: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Check out Git repository | ||
uses: actions/checkout@v2 | ||
|
||
- name: Run Black (Python) | ||
uses: psf/black@stable | ||
with: | ||
options: | | ||
--verbose | ||
--line-length=80 | ||
--exclude /(\.github|\.git|\.venv|\.vscode)/ | ||
src: "." | ||
version: "22.3.0" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
name: Linux Compiler CI | ||
|
||
on: | ||
push: | ||
branches: [ "main" ] | ||
pull_request: | ||
branches: [ "main" ] | ||
|
||
jobs: | ||
run-linux-python3: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v2 | ||
- name: Set up Python | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: '3.10' | ||
- name: Compile | ||
run: | | ||
# For some reason, if we change to the compiler directory, Python complains. | ||
export PYTHONPATH="${PYTHONPATH}:/home/runner/work/lolc/lolc/src/" | ||
for x in fibonacci helloworld math_ops nested_if sum_three | ||
do | ||
python src/compiler/lol.py -i examples/$x.lol -o results | ||
gcc results/$x-*.c | ||
./a.out | ||
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
# Ignore PyCharm and VS Code directory | ||
.idea | ||
.vscode/ | ||
# Ignore Python Venv | ||
.venv/ | ||
venv/ | ||
|
||
# Ignore Python Caches | ||
/results/ | ||
__pycache__/ | ||
|
||
# Ignore CMake's build directory | ||
build/ | ||
cmake/ | ||
test/output/* | ||
|
||
# Prerequisites | ||
*.d | ||
|
||
# Object files | ||
*.o | ||
*.ko | ||
*.obj | ||
*.elf | ||
|
||
# Linker output | ||
*.ilk | ||
*.map | ||
*.exp | ||
|
||
# Precompiled Headers | ||
*.gch | ||
*.pch | ||
|
||
# Libraries | ||
*.lib | ||
*.a | ||
*.la | ||
*.lo | ||
|
||
# Shared objects (inc. Windows DLLs) | ||
*.dll | ||
*.so | ||
*.so.* | ||
*.dylib | ||
|
||
# Executables | ||
*.exe | ||
*.out | ||
*.app | ||
*.i*86 | ||
*.x86_64 | ||
*.hex | ||
|
||
# Debug files | ||
*.dSYM/ | ||
*.su | ||
*.idb | ||
*.pdb | ||
|
||
# Kernel Module Compile Results | ||
*.mod* | ||
*.cmd | ||
.tmp_versions/ | ||
modules.order | ||
Module.symvers | ||
Mkfile.old | ||
dkms.conf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2022 David Chu | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
# Light Object-Oriented Language (LOL) | ||
|
||
## Project Naming | ||
|
||
I am trying to come up with an appropriate name for this project. | ||
I like "Light Object Language" (because lol), but it's not really object-based. | ||
Maybe "Light Open-Source Language" is better. | ||
I was thinking of something short and close to the beginning of the alphabet (e.g. | ||
"Ah"). | ||
I was musing with design objectives as well; my goals for this language are to be | ||
(1) secure, (2) usable, and (3) performant in that order (or maybe. | ||
This would lead to the acronym "sup", like "wassup". | ||
As a mature individual (the "lol" notwithstanding), I'm not sure if this would be | ||
great; it would also put my language in the middle-toward-the-end of the alphabet, | ||
i.e. the most forgettable place ever. | ||
|
||
## Project Goals | ||
|
||
1. Expose the internals of the transpiler to the user (if they so choose). | ||
2. Provide a (1) secure, (2) usable, and (3) performant language, in that order. | ||
3. Allow the programmer to dump as much information into the compiler as they wish. | ||
How much is used by the compiler is another question. (Is this a good idea?) | ||
4. Don't make silly features. | ||
|
||
## Bootstrapping | ||
|
||
This project is to be bootstrapped in Python. Since it targets C, I can choose any | ||
language for the bootstrap. I chose Python because of its rich standard library, | ||
and I am faster at writing Python than C. | ||
|
||
## Language Features | ||
|
||
Make safety and usability/intuitiveness the priority. Then, performance can be | ||
added for a little extra work (since much of the optimization will be done on a | ||
small amount of the code). | ||
|
||
- Drop in replacement for C | ||
- Structs are in order | ||
- Public functions, structs, etc. are linked with C | ||
- Initially, I will emit C code (and then maybe target LLVM-IR or some other compiler framework's IR) | ||
- Ability to give compile-time information (in square brackets) | ||
- This can include ranges on integers (which must be proven at compile time or upon conversion, bound-checked at runtime) | ||
- Inspired by Rust | ||
- Memory safety, immutability, "reserved" (in C's context) by default | ||
- Mark unowned data as "unowned"; otherwise, borrow check everything (but then lifetimes would be annoying to implement... how do they even work?) | ||
- Enum type for errors -- but we could use unused portions of integer/other ranges (see above, compile-time information) | ||
- Traits are cool | ||
- I like the fact there is no inheritance | ||
- Inspired by Zig | ||
- No macros/preprocessor; compile-time running of any function | ||
- I like the idea of having the memory allocator specified--efficient allocation is a huge problem, so it would be cool to specify something with the "allocator trait", to use Rust's terminology | ||
- Inspired by Python | ||
- Types can store methods--but maybe use the "::" syntax for any compile-time namespace stuff | ||
- E.g. `int16::max` -- I guess this is a language feature and not in keeping with putting things in the standard library | ||
- Have namespaces like Python, where you inherit the importing namespace's name | ||
- E.g. `import math; math.sqrt(10)` instead of C++'s `#include <cmath>\n std::sqrt` | ||
- I actually want to go further and use Nix's import syntax of `math = import("math");` or something... it's been a while since I wrote Nix | ||
- Python's constructor syntax makes more intuitive sense to me than C++'s | ||
- E.g. `x: ClassName = ClassName(arg0, arg1)` rather than `ClassName x(arg0, arg1);`. I just feel like the '=' makes everything clearer. | ||
- Inspired by C | ||
- Limited language features | ||
- Optional standard library (I'll need to write wrappers for the C standard library) | ||
- By default, struct layouts are like in C (which I believe is in the order the user specified?) | ||
- Transparent what exactly is happening. No hidden function calls, no hidden operations (e.g. '<<' can be overloaded in C++) | ||
- Syntax is inspired by C-family | ||
- Inspired by C++ | ||
- Generics with templates. I'm going to use Python/Go's syntax | ||
- Inspired by Java | ||
- The name "interface" instead of Rust's "trait" makes more sense to me... I may just be missing the full idea of Rust traits. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,181 @@ | ||
# Coding Style | ||
|
||
## Definitions | ||
|
||
* Integral Value: any `char`, `int`, `enum`, etc. | ||
|
||
## Compiler Version | ||
The code in this project should be as compatible with C89 as possible. This is | ||
because C89 remains the best supported version of C (e.g. MSVC). The compiler | ||
shall be called with `-std=c99 -pedantic -Wall -Werror` (ohhh, this pains me. I | ||
wish I could ask you to use C89, but it is so limiting). | ||
|
||
However, C99 has some nice-to-have features. I will admit that I really want to | ||
|
||
* IO Functionality: `snprintf()` (get the number of characters used) and | ||
`printf("%zu", (size_t)x);` | ||
* Named Structure Tags: `struct point p = {.x = 0, .y = 0};` | ||
* New Libraries: `<stdbool.h>` and `<inttypes.h>` | ||
* Make your own boolean library. | ||
* | ||
* Key Words: `inline`, `restrict` | ||
* In this project, `#define` them to empty strings if you are compiling | ||
wit C89. | ||
* Variable initialization anywhere: `for (int i = 0; i < N; ++i) { ... }` | ||
* Try to avoid this in this project | ||
* Calls with Structures: `f(((struct point){.x = 0, .y = 0}))` | ||
* Try to avoid this in this project | ||
* C++ Style Comments: `// This is an inline comment` | ||
* Don't use this in this project. | ||
|
||
Needless to say, some of these are excellent additions to the language. | ||
|
||
## Portability | ||
While passing or returning structures from functions is an established part of | ||
the C89 language, the binary implementation is compiler-dependent. While some | ||
compilers may use registers, others may use memory. For this reason, we will | ||
not pass structures between functions, but rather pass _pointers_ to structures. | ||
|
||
Strictly speaking, external identifiers in C89 are only guaranteed to be | ||
significant to 6 characters; internal identifiers are only guaranteed to be | ||
significant to 31 characters. We will ignore this. This is simply a matter of | ||
refactoring the code. | ||
|
||
## Safety Conventions | ||
|
||
### Initialization | ||
All invalid pointers shall be set to NULL. My `mem_malloc()` function will | ||
actually enforce this. Where difficulty with this arises is if we allocate a | ||
structure containing pointers, but the structure has non-null pointers. It is | ||
the user's job to set all these pointers to NULL as soon as the memory is | ||
allocated. The function `mem_malloc()` does not do this. | ||
|
||
All numeric values shall be set to `0`. | ||
|
||
All structures shall be initialized using `{ 0 }` | ||
|
||
### Bracing | ||
All groups of code shall be braced (including no code). I believe this is inline | ||
with MISRA's standard. | ||
|
||
```c | ||
while (x-- > 0) { | ||
/* no op */ | ||
} | ||
``` | ||
|
||
Yes, I know that you can put a semicolon at the end of the while statement. But | ||
don't. | ||
|
||
|
||
### Switch Statements | ||
Switch statements should not have fall-through unless it is explicitly marked. | ||
Moreover, every instance should have a default, even if it is impossible. If it | ||
is deemed impossible to hit the default, then an assertion should be thrown. | ||
|
||
### Increasing Integral Values | ||
For any of the operators that grow an integral value, the bounds must be checked | ||
before an operator. These include `x + y`, `x << y`, and `x * y`. An idiom to | ||
check the validity of these operators is to apply them, then apply the inverse, | ||
before checking for equality. | ||
|
||
E.g. `x == (x + y) - y` implies a valid addition. | ||
|
||
It is for this reason that my memory functions take in two arguments, so that | ||
the user does not have to check the bounds on multiplication themselves. | ||
|
||
Admittedly, the `++i` or `i++` operators can overflow. However, we will ignore | ||
this fact for now. In a for-loop, they are checked upon every iteration. | ||
|
||
### Division | ||
A check for zero must be performed before doing any division operation. These | ||
include `x / y` and `x % y`. | ||
|
||
### Right Bit Shifting with Signed Integrals | ||
Don't right bit shift with signed integrals. The implementation is compiler- | ||
specific with regard to the sign extension. | ||
|
||
Or at the very least, never do this with a negative signed integral. If I | ||
remember correctly, positive signed integrals behave like unsigned integrals for | ||
right bit shifts. | ||
|
||
### Bitwise Operation Ordering | ||
The order of bitwise operations is implementation defined. There is no short- | ||
circuit logic, unlike logical boolean operations. | ||
|
||
### Side Effects in Function Calls | ||
No function call may use arguments with side-effects. | ||
|
||
At the very least, do not rely on a particular order of side-effects when | ||
calling a function. All of the side-effects will take place before the called | ||
function is entered, however the order is compiler dependent. | ||
|
||
### Strings | ||
Unless the string appears directly as a `"<this is a string>"` in the code, do | ||
not rely on it being null-terminated. Especially not if you used a copy to a | ||
buffer. Unless you explicity copy in a null character at the end. | ||
|
||
Where possible, store the length of a string along with the string. | ||
|
||
Do not use unsafe standard library functions that rely on strings to be null- | ||
terminated. Only use ones that have a known number of bytes to operate upon. | ||
|
||
|
||
## Conventions | ||
|
||
### Braces and If-Else Chains | ||
The structure of braces shall follow K&R. The braces for functions shall follow | ||
this pattern (contrary to K&R's usage). | ||
|
||
```c | ||
int function(int x, int y) { | ||
if (x) { | ||
/* ... */ | ||
} else if (y) { | ||
/* ... */ | ||
} else { | ||
/* ... */ | ||
} | ||
} | ||
``` | ||
### Labels and Switches | ||
Labels shall be indented 1 less than the surrounding code. | ||
```c | ||
int function(int x) { | ||
switch (x) { | ||
case 0: | ||
/* ... */ | ||
case 1: | ||
/* ... */ | ||
case 2: | ||
/* ... */ | ||
default: | ||
/* ... */ | ||
} | ||
goto cleanup: | ||
cleanup: | ||
/* ... */ | ||
} | ||
``` | ||
|
||
### Use of Assertions | ||
Use assertions for code where it is impossible to get to somewhere. Otherwise, | ||
use a return statement for ease of testing. This means that assertions can be | ||
used liberally for commenting. | ||
|
||
Assertions should not be used for deployment code of legitimate cases (e.g. NULL | ||
error handling). This makes the code untestable (because it exits the test | ||
program). | ||
|
||
### Use of macros as functions | ||
|
||
If a macro behaves entirely like a function (i.e. arguments evaluated exactly | ||
once each, no manipulating variables), then it can be named following the | ||
function naming conventions. | ||
|
||
Otherwise, it shall be upper-case to warn the user of potentially funky | ||
behaviour. |
Oops, something went wrong.