Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scopes & Identifiers #82

Draft
wants to merge 23 commits into
base: rewrite
Choose a base branch
from
27 changes: 24 additions & 3 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

27 changes: 22 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,12 +57,29 @@ In the UI:
- [ ] User interface for runtime/compiler options
- [ ] Compiler errors/warnings pop up as messages in the editor

## Performance
- (?) Use laziness in the lexer to reduce memory consumption
- (?) Run scope computations in the parser
- (?) Build indices during parsing
- (?) Switch from `u64` data to `u32` data in the AST, and then anything
that needs 64 bits should use multiple nodes

## Resources
- Compiler Architecture - https://scholarworks.iu.edu/dspace/handle/2022/24749
- Preprocessor General Info - https://gcc.gnu.org/onlinedocs/cpp/index.html
- Macro Expansion Algo - https://gcc.gnu.org/onlinedocs/cppinternals/index.html
- Translation of C standard to AST types - https://github.com/vickenty/lang-c/blob/master/src/ast.rs
- Precedence climbing method - https://eli.thegreenplace.net/2012/08/02/parsing-expressions-by-precedence-climbing
- Monaco Editor Quick Fixes - https://stackoverflow.com/questions/57994101/show-quick-fix-for-an-error-in-monaco-editor
- Fuzzer to look into - https://github.com/rust-fuzz/afl.rs
- WASM interpreter to look into - https://github.com/paritytech/wasmi
- Copy-and-Patch Technique to look into - https://github.com/sillycross/wasmnow

## Credits
- [Aaron Hsu's PhD Thesis on Data-Parallel Compiler Architecture](https://scholarworks.iu.edu/dspace/handle/2022/24749) -
TCI's architecture is almost fully based on work done in 2019 by Aaron Hsu.
- [`lang-c` by `vickenty`](https://github.com/vickenty/lang-c) -
TCI used the source of `lang-c` as a reference and as inspiration when designing the AST.
`lang-c` source code was also very useful as a point of reference for the C specification,
because specifications are difficult to read.
- [Precendence Climbing Explanation](https://eli.thegreenplace.net/2012/08/02/parsing-expressions-by-precedence-climbing) -
An explanation for precedence climbing with pseudo-code in Python that I use
because I always forget the details.
- GNU GCC's [Documentation](https://gcc.gnu.org/onlinedocs/cpp/index.html)
and [Internals Documentation](https://gcc.gnu.org/onlinedocs/cppinternals/index.html) -
Documentation that acted as a reference when building the lexer/macro code
5 changes: 5 additions & 0 deletions compiler-tests/integration/variable.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
int main() {
int a;
a = 1;
return a;
}
4 changes: 1 addition & 3 deletions compiler/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,7 @@ description = "C compiler for students"
license = "MIT"

[dependencies]
# Until 0.13.0 is released, gotta use the github version to get sorting and ToOwned
soa_derive = { git = "https://github.com/lumol-org/soa-derive" }

soa_derive = "0.13.0"
lazy_static = "1.4.0"
codespan-reporting = "0.11.1"
serde = { version = "1.0.59", features = ["derive"] }
Expand Down
79 changes: 77 additions & 2 deletions compiler/src/ast.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,14 @@
This module describes the AST created by the parser.
*/

use crate::api::*;
use crate::{api::*, parser::Scope};

pub trait AstInterpretData {
type AstData: From<u64> + Into<u64>;

fn read(&self, data: u64) -> Self::AstData {
return data.into();
}
}

#[derive(Debug, Clone, Copy, StructOfArray, serde::Serialize, serde::Deserialize)]
Expand Down Expand Up @@ -176,7 +180,7 @@ pub enum AstStatement {
Goto, // data: label ; children: statement that is being labelled
Expr, // children: expression
Branch, // children: condition, if_true body, if_false body
Block, // children: statements; maybe this is unnecessary
Block(AstBlock), // children: statements; maybe this is unnecessary
For, // children: start expression, condition, post expression, body
ForDecl, // children: declaration, condition, post expression, body
While, // children: condition expression, body
Expand All @@ -187,6 +191,19 @@ pub enum AstStatement {
Continue,
}

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, PartialOrd, Ord, Serialize, Deserialize)]
pub struct AstBlock;

impl AstInterpretData for AstBlock {
type AstData = Scope;
}

impl Into<AstNodeKind> for AstBlock {
fn into(self) -> AstNodeKind {
AstNodeKind::Statement(AstStatement::Block(AstBlock))
}
}

/// A derived declarator. This is the `*const` part of
/// `int *const a`, or the `[3]` part of `int b[3]`
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, PartialOrd, Ord, Serialize, Deserialize)]
Expand Down Expand Up @@ -426,4 +443,62 @@ impl AstNodeVec {
},
);
}

pub fn parent_chain<'a>(&'a self, start: usize) -> ParentIter<'a> {
return ParentIter {
ast: self,
index: Some(start),
};
}
}

// Eh maybe this isn't useful, not sure yet. It seemed useful but eh
pub struct ParentIter<'a> {
ast: &'a AstNodeVec,
index: Option<usize>,
}

impl<'a> Iterator for ParentIter<'a> {
type Item = AstNodeRef<'a>;

fn next(&mut self) -> Option<Self::Item> {
let index = self.index?;

let parent = self.ast.parent[index] as usize;
self.index = match parent == index {
true => None,
false => Some(parent),
};

return Some(self.ast.index(index));
}
}

/// Split a mut slice by ranges
pub fn split_by_ranges<'a, T>(
mut values: &'a mut [T],
mut ranges: Vec<core::ops::Range<usize>>,
) -> Vec<(usize, &'a mut [T])> {
let mut out = Vec::new();
ranges.sort_by_key(|r| r.start);

let mut current_index = 0;
for range in ranges {
if range.start >= range.end {
// Range is empty
continue;
}

if current_index > range.start {
panic!("ranges overlapped");
}

let (left, right) = values.split_at_mut(range.end - current_index);
values = right;

out.push((range.start, &mut left[(range.start - current_index)..]));
current_index = range.end;
}

return out;
}
2 changes: 1 addition & 1 deletion compiler/src/filedb.rs
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ impl<'a> Files<'a> for FileDb {
}
}

#[derive(Debug, Hash, Eq, PartialEq, Clone, Copy, Serialize, Deserialize)]
#[derive(Debug, Hash, Eq, PartialEq, PartialOrd, Ord, Clone, Copy, Serialize, Deserialize)]
#[repr(transparent)]
pub struct Symbol(u32);

Expand Down
26 changes: 7 additions & 19 deletions compiler/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,17 @@ pub mod api {
pub use super::error::{Error, ErrorKind, FileStarts, TranslationUnitDebugInfo};
pub use super::filedb::{File, FileDb, Symbol, SymbolTable};
pub use super::format::display_tree;
pub use super::parser::{expand_macros, lex, parse, Token, TokenKind, TokenSlice, TokenVec};
pub use super::parser::{
expand_macros, lex, parse, Scopes, Token, TokenKind, TokenSlice, TokenVec,
};
pub use super::pass::types::{TyDb, TyId, TyQuals};

pub use super::run_compiler_test_case;

pub(crate) use itertools::{Either, Itertools};
pub(crate) use rayon::prelude::*;
pub(crate) use serde::{Deserialize, Serialize};
pub(crate) use std::collections::HashMap;
pub(crate) use std::collections::{BTreeMap, HashMap};

#[cfg(test)]
pub use ntest::*;
Expand Down Expand Up @@ -188,25 +190,11 @@ pub fn run_compiler_for_testing(files: &filedb::FileDb, file_id: u32) -> Pipelin
);
out.macro_expansion = StageOutput::Ok(macro_expansion_res.kind.clone());

let mut parsed_ast = run_stage!(parsed_ast, parse(&macro_expansion_res));
let (mut parsed_ast, scopes) = run_stage!(parsed_ast, parse(&macro_expansion_res));
out.parsed_ast = StageOutput::Ok(parsed_ast.iter().map(|n| n.to_owned()).collect());

let scopes =
match pass::declaration_scopes::validate_scopes(&mut parsed_ast, &lexer_res.symbols) {
Ok(s) => s,
Err(e) => {
out.ast_validation = StageOutput::Err(e);
return out;
}
};

if let Err(e) = pass::declaration_types::validate_declarations(&mut parsed_ast, &out.ty_db) {
out.ast_validation = StageOutput::Err(e);
return out;
}

if let Err(e) = pass::expr_types::validate_exprs(&mut parsed_ast) {
out.ast_validation = StageOutput::Err(e);
if let Err(mut e) = pass::validate(&mut parsed_ast, &out.ty_db, &scopes) {
out.ast_validation = StageOutput::Err(e.pop().unwrap());
return out;
}

Expand Down
Loading