Skip to content

Proyecto base para el compilador de 4to año en Ciencia de la Computación.

License

Notifications You must be signed in to change notification settings

DiazRock/cool-compiler-2020

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COOL compiler by DiazRock

This is a compiler of Classroom Object-Oriented Language (COOL), written in python3.8. It takes a .cl, and translate it into a .mips file, a set of MIPS32 low level instructions, which you can run using a simulator, like .spim on Linux. For know about COOL, you can read the manual. On a line, is a language that is simply in off, and has all the features of a programming language that is Object-Oriented, and procedural.

It consists of four phases or components:

  • The lexicographic phase
  • The syntactic phase
  • The semantic analysis
  • The generation of CIL and MIPS code.

The lexicographic phase

The input in this phase is a .cl file. In order to get the set of tokens of COOL, we use 'ply.lex', a framework that create a data structure to store those tokens. After this building, we tokenize the input file: we split it in tokens, and this is the output from lexer to the next component.

    # Este es el código para tokenizar un .cl
    def tokenizer(stream_input):
        global readjust_col
        readjust_col = 0
        lexer.input(stream_input)
        token_list = []
        lexer.lineno= 1
        #lexer.lexpos= 1
        real_col = {}
        for tok in lexer:
            real_col.update({ str(tok): find_column(stream_input, tok) })
            token_list.append(tok)
     
        return errors, token_list, real_col

The syntactic phase

In the syntax analyzer, we receive the token set as input. We have here a context free grammar analyzer (CFG) to describe the syntax structure of a set of tokens in COOL. A CFG, can be explained as a tree generator, where the symbols of nodes can be terminals (the tokens of the input) or non-terminals, and the branches are created when we visit a node with a non-terminal in its structure. The rule, that through a non-terminal, produces a new node, it's called production.

The CFG, produces a derivation parse tree. This tree, has a lot of info that is useless in the following phases. So, in order to simplify the derivation tree, we can attach attributes to the productions of the CFG, and we can define from that a comprimed version of the derivation, called Abstract Syntax Tree (AST).

def p_feature_list(self, p):
        """
        features_list : features_list feature SEMICOLON
                      | feature SEMICOLON
        """
        if len(p) == 3:
            p[0] = { 'methods': [], 'attributes': [] }
            key = 'methods' if type(p[1]) is NodeClassMethod else 'attributes'
            p[0][key] += [p[1]]
        else:
            key = 'methods' if type(p[2]) is NodeClassMethod else 'attributes'
            p[1][key] += [p[2]]
            p[0] = p[1]

This is an example of translate all the theory that we talked before, to python code, using ply.yacc, a tool that uses button up parsing theory for a correct derivation of the trees that produces the CFG defined for COOL. By the way, I created my own CFG grammar analyzer, using LR automaton, and other funny things, that you can see here.

The final output of the syntax analyzer component is the AST, that we use as input in the Semantic and Generation phases.

The semantic phase

We could jump from the syntax analysis to generation code, but we don't know if the operation that uses our input file from the first phase are correct. For example, execute 4 + 'dump' is legal? If we have an object of type B, that executes a function 'foo', we need to know if that function is defined in B or in a parent of B, in order to be correct.

So, in this phase, we have four important concepts to check:

  • The existence of the types that is used in the AST
  • The definition of those types, and its features, has sense under the COOL standards
  • The correctness of the hierarchy relation.
  • The consistence and correctness of the defined and used operations.

Node Visitor pattern

The achieve the announced goals, we run the AST, using the Visitor pattern.

class NodeVisitor:
        def __init__(self, programContext):
            self.programContext= programContext
        
        def visit(self, node: Node, **args):
            if isinstance(self, TypeCheckerVisitor):
                if issubclass(type(node), NodeBinaryOperation):
                    return self.visit_NodeBinaryOperation(node, **args)
                
            visitor_method_name = 'visit_' + node.clsname
            visitor = getattr(self, visitor_method_name, self.not_implemented)        
            return visitor(node, **args) # Return the new context result from the visit

        def not_implemented(self, node: Node, **args):
            raise Exception('Not implemented visit_{} method'.format(node.clsname))

Each tour of the AST, is a class that inherit from 'NodeVisitor'. We 4 classes, i.e, 4 tour, in the following order:

  1. TypeCollectorVisitor
  2. TypeBuilderVisitor
  3. TypeInheritanceVisitor
  4. TypeCheckerVisitor

Data structures for a tour

ProgramContext: In each tour, we actualize an object of type globalContext, and of name programContext, that stores all necessary info in order of the consistency of the context of the AST. In that way, every time that programContext changes, it does it through some method defined in its class, and in each method checks, before the actualization, if the operation is valid.

      def createType(self, node: NodeClass):
        return interceptError(
                validationFunc= lambda: not node.idName in self.types,
                errorOption= 'repeated class' if not node.idName in self.basics else 'repeated class basic',
                idName= node.idName,
                row_and_col = (node.line, node.column)
            )or self.types.update({
                            node.idName:
                                    Type (idName= node.idName,
                                          attributes= {},
                                          methods= {},
                                          parent= node.parent,
                                          builtIn= node.idName in ['String', 'Bool', 'Int'])
                            })

This is the createType function from globalContext class. It intercepts if there is a mistake in the operation to create a new type.

class TypeCollectorVisitor(NodeVisitor):
    
    def visit_NodeProgram(self, node: NodeProgram):
      errors = []
      for nodeClass in node.class_list:
        result= self.visit(nodeClass)
        if type (result) is error: # This ugly return is because we only need a one error, this is the panic mode!
          errors.append(result)
          return errors
      

    def visit_NodeClass(self, node: NodeClass):
      # When we create a type, we store it in the context, if there is no errors
      return self.programContext.createType(node)

And this is an example of how we use it, in a visitor class. That's the idea in all the validation that the semantic phase does. We execute operations, after we evaluate that are legal.

The error class

For tracking the errors, we have a map of positions in the lex component. It is used every time we detect an error. The structure of errors is inside the 'utils' folder. Let's reuse the line that we exposed before:

interceptError(
  validationFunc= lambda: not node.idName in self.types,
  errorOption= 'repeated class' if not node.idName in self.basics else 'repeated class basic',
  idName= node.idName,
  row_and_col = (node.line, node.column)
)

If you read the InterceptError function in the errors.py file, you will see a declaration of this function. It's a pretty funny feature :).

Environment

Environment is a dictionary in where you can store every object name that appears in the AST, with its type. We use it in the evaluation of an expression and its possibles scopes.

The final output

This is a validation phase, the key factor is check if the input file has an incorrect use of COOL. However, in order to auxiliate us in the following phase, I produced a mapExprWithResult dictionary. Assigning each expression that we found, to its correct type. So, if there is no error, this is the output to the next phase, besides the AST.

Generation Code

After we check the consistency of the AST, we run it again, in order to get an equivalent representation in MIPS32. But, the translation from the AST to a MIPS32 is hard, so we take an intermediate representation, called Cool Intermediate Language (CIL), and we map from the AST to a CIL, and we write from CIL, the .mips file. In this phase, we need to be aware of the conventions of MIPS32 assembler language, and how it communicates with the low level architecture.

How to run the code

You must have python3.8 version, or superior. Follow these:

  • pip install -r requirements.txt
  • ./coolc.sh, and pass as a parameter the .cl file to compile

An important To Do

One of the todos that I got, is to create a soft installation process: Only make install, and you will have a cli that compiles .cl files.

The tests

You can run the test executing make test. You will see how it works.

About

Proyecto base para el compilador de 4to año en Ciencia de la Computación.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Assembly 67.6%
  • Python 17.1%
  • Cool 14.8%
  • C 0.5%
  • Shell 0.0%
  • Common Lisp 0.0%