HA-2 for Compilers Construction course
Java Maven Project
Each token is represented by a token type and lexeme which is string representation of a token. We divide all tokens into 12 different categories:
- Literals can be strings or numbers. The only requirement for a string is that it starts with single or triple quote and ends with it. Numbers can have different forms: integer (a sequence of digits), hex (0x followed by sequence of digits or symbols in A-F range in both registers), binary (0b followed by sequence of 0s and 1s), float (2 sequences of digits separated by a dot symbol). All of those except binary numbers can end with an exponent notation (like 4e3), there is also an explicit notation for float numbers: 'f' or 'F' at the end of a number (it should always follow exponent notation if it is present).
- Delimiters can be either space, tab or end of line symbol.
- Keywords are special words of kotlin language (hard, soft keywords, modifiers and special identifiers combined).
- Special symbols are operators and other special symbols of kotlin language.
- Opening for opening brackets symbol '('.
- Opening curly for opening curly brackets symbol '{'.
- Opening square for opening square brackets sysmol ']'.
- Closing brackets.
- Closing curly brackets.
- Closing square brackets. Rationale for making a category for each of the brackets is that it will be easier making a syntactic analyzer if there are discrete categories for brackets.
- Identifiers can either be class, variable, method or parameter names.
- Comments.
- Prepare in.txt to source code of Kotlin Language
- Run one of the two lexer configuration:
- console_lexer.jar: makes output to the console, one token by one enter press
- lexer_to_file.jar: out all tokens to the out.txt file at once
// Command for run: // "java -jar console_lexer.jar" or "java -jar lexer_to_file.jar"
// Also you can use IDE for run Java Maven Project. Start class is "Main.java"