A compiler is just a translator from a language understandable by humans (in our case Minijava+) and a language understandable only by computers (java bytecode). The compiler we realized analyzes and parses a .java file we give in argument. It looks if it is a valid file (according to the language grammar) and when it read theentire file and decided it is valid, starts to generate the bytecode. Our compiler also emits warnings when the user did something odd (like initializing variables without using them later). Finally we also perform a data-flow analysis to prove certain assertions. Here is a little example of how a program is compiled:
Writing a compiler is helpful as it shows us how the computerworks and thinks. After writing this we understand way better how to optimize some code and how it will be understood by the computer. We will explain step by step how we realized this compiler, from the lexer until the code generation and then control flow graphs. This compiler is useful because it is dedicated to a special language: Minijava+ with which we can still write nice programs.
Thisprogram would first be transformed into a set of tokens:
CLASS Rationnal LACCO PUBLIC
STATIC VOID MAIN … denom RPAREN SEMICOLON RACCO RETURN 0 SEMICOLON RACCO RACCO EOF
It is done in the following way: we read the file character by character, with reserved keywords like class, static … After each character we allow and recognize, we create the corresponding Token and add it tothe list. When we recognize some patterns like comments we don’t add them to the Tokens.
With this, the compiler knows the meaning of each word from the original file (.java). Then we parse those tokens and create the appropriate class/methods. It is in this part that we find the typo errors. After this we have a complete AST of our program. Now we analyze this AST, and find errors like callof unknown methods/arguments, cycles in extends, and so on. We continue with type checking. This means we find errors like putting a string value to an integer for example. Finally we generate the bytecode, and we can run the generated file by using java. After this the compiler generates a control flow graph representing the program: in this one we can verify that it behaves like we wanted.
Here, we will create the AST of our program. Using the output of the lexer, we create the nodes of the tree: classes, methods, arguments, etc. According to the grammar of Minijava+, we know what a valid syntax is and what is not. So for example when we see the keyword ‘CLASS’, if it is not followed by an identifier we know this is not a valid minijava program so we can stop the parsingright up. It is also during this phase that we put the right position of tokens, this is really helpful for debugging, knowing which line and which column bugs. We also had to transform our grammar into:
E::= P4 || E | P4 && E | P4 P4::= P3 == P4 | P3 < P4 | P3 P3::= P2 + P3 | P2 - P3 | P2 P2::= P1 * P2 | P1 / P2 | P1 P1::= P0 | " " P0 | true P0 | false P0 | Identifier P0 | this P0 | new Int[E] P0| new Identifer() P0 | (E) P0 |!P1 P0::= [E] P0 | .length P0 | .Identifier ( ( E ( , E )* )? ) P0 | e
As explained before, the lexer is the first part of the compiler. It translates the file in a list of tokens. For now it does not find errors (only if it finds a strange character).
There is only one place where we have to look 2 positions ahead (LL(2) Parsing) : when we affectstatements or expressions.
Every other place needs only one lookahead. We have also written a pretty printer that takes a list of token with positions and will print the corresponding minijava program in a pretty way. Note that parsing a program and then giving it to the pretty printer gives us as result the original program.
but it is a good base, as it will reject any invalid minijava+...