项目作者: Omar-Khaled2198

项目描述 :
A mini compiler for C language in Java
高级语言: Java
项目地址: git://github.com/Omar-Khaled2198/Mini-Compiler.git
创建时间: 2019-03-14T23:31:08Z
项目社区:https://github.com/Omar-Khaled2198/Mini-Compiler

开源协议:

下载


Mini C Compiler

Compilers can be decomposed into several phases, each of which converts one form of source program into another
There are different phases of compiler as follow:

  • Lexical analysis (Implemented)
  • Syntax analysis (Implemenetd)
  • Semantic analysis
  • Intermediate code generation
  • Code optimization
  • Code generation

Lexical analysis

Source program is scanned to read the stream of characters and those characters are grouped to form a sequence called lexemes which produces token as output.

  • Token: Token is a sequence of characters that represent lexical unit, which matches with the pattern, such as keywords, operators, identifiers etc.
  • Lexeme: Lexeme is instance of a token i.e., group of characters forming a token.
  • Pattern: Pattern describes the rule that the lexemes of a token takes. It is the structure that must be matched by strings.
    ```


    SINGLE_COMMENT \/\/.+
    MULTI_COMMENT \/*(.|\n)?\\/ ?
    INTEGRAL_LITERAL \b0|[1-9][0-9]\b
    FlOAT_LITERAL [0-9]+.[0-9]+
    CHAT_LITERAL \’[a-zA-z0-9]\’
    STRING_LITERAL \”.
    \”

```

Syntax Analysis

Syntax analysis is the second phase of compiler which is also called as parsing.

  • Parser converts the tokens produced by lexical analyzer into a tree like representation called parse tree.
  • A parse tree describes the syntactic structure of the input.

Parser follow production rules (Grammer) defined by means of context free grammar (CFG).


Grammer ex:

  1. program -> decl_list
  2. decl_list -> decl decl_list'
  3. decl_list' -> decl decl_list' | ϵ
  4. decl -> var_decl | fun_decl
  5. var_decl -> type_spec IDENT var_decl'
  6. var_decl' -> ; | [ ] ;
  7. type_spec -> VOID
  8. | BOOL
  9. | INT
  10. | FLOAT
  11. ...
  12. ...
  13. ...

The way the production rules are implemented divides parsing into two types : top-down parsing and bottom-up parsing. I use Recursive descent parser is a kind of top-down parser.

Example

Input

  1. int main(){
  2. int x;
  3. }

Genereted Tokens from Lexical analyzer

  1. <INT>: int
  2. <ID>: main
  3. <LEFT_ROUND_B>: (
  4. <RIGHT_ROUND_B>: )
  5. <LEFT_CURLY_B>: {
  6. <INT>: int
  7. <ID>: x
  8. <SEMICOLON>: ;
  9. <RIGHT_CURLY_B>: }

Genereted Parse Tree from Syntax Analyzer
Parse Tree