C_compiler

Project Overview

The modern C compiler replicates what current compilers(Klang, GDB, Swift) use to speed up C code. This is done through a process of translating C code into simpler languages, built with a specific optimization in mind, going from C->LB->LA->IR->L3->L2->L1 before being translated to assembly and machine code.

Languages

L1

L1 is a barebones language that serves as a middle ground between the back-end compiler and assembly. The language definition is very similar to assembly.

Figure 1: L1 Language Syntax

L2

In L2, the language supports both variables and registers. The focus of this language is to map variables to registers. This is done through graph coloring, where a connection between two variables represents that they must be colored with two different registers. This graph is built using liveness analysis. For every line, we create a list of variables read and stored. If a variable is read, previously stored variables cannot share the same register. If graph coloring is not possible with 16 registers, we will "spill" the variable name, allocating the variable's value onto the stack. L2's syntax is built to support this:

Figure 2: L2 Language Syntax

L3

L3 is about instruction selection, combining multiple lines of code into one for optimizations. To do so, L3 merges instructions into large tree expressions:

Figure 3: L3 Tree

Then, we merge the tree to simplify the total number of expressions and convert them to L2 instructions. Thus simplifying the overall number of instructions.

Figure 4: L3 merging trees

This is supported by L3's simpler syntax:

Figure 5: L3 language syntax

IR

IR is not part of the optimization, but rather a simplified view between the front-end and back-end. This is reflected in the syntax, as its forceful use of basic blocks makes programs easier to read during debugging.

Figure 6: IR language syntax

However, there is a problem going from IR to L3, due to IR's unique syntax. This is solved with control flow graph linearization. Naively we could have each false branch fall through to a force branch. However, we implemented CFG which utilized Markov chains which maximizes fall-through links while minimizing jumps.

LA

An underlying problem throughout the previous compilers was how to differentiate a data value from a memory location. Previously, data members ended with a "1" bit while memory locations ended with a "0" bit. In LA, values can only be data members or an array or tuple of data members. In addition to this quick fix, we also implemented automatic tensor errors and type declarations.

Figure 7: LA language syntax

LB

Finally, in LB, we added scopes, which delimit variable names and allow for the creation of "if" and "while" statements.

Figure 8: LB language syntax

Conclusion

In the end, our compiler was able to compete with up to 90% of the modern-day compilers. The reason why it was slower was due to modern compilers further optimizing spilling.

All pictures are credited to Professor Simone Campanoni

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
ir_compiler		ir_compiler
l1_compiler		l1_compiler
l2_compiler		l2_compiler
l3_compiler		l3_compiler
la_compiler		la_compiler
lb_compiler		lb_compiler
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

C_compiler

Project Overview

Languages

L1

L2

L3

IR

LA

LB

Conclusion

About

Releases

Packages

Languages

KevinzChen04/C_compiler

Folders and files

Latest commit

History

Repository files navigation

C_compiler

Project Overview

Languages

L1

L2

L3

IR

LA

LB

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages