qccompiler

Design and implementation of a compiler from scratch, including:

  1. Lexical analysis
  2. Syntactic analysis
    1. Abstract syntax tree (AST) construction
  3. Semantic analysis
  4. Code generation

It’s written in C and it compiles input files written in qC, a small subset of the language ANSI C (C89/C90). The generated code is in C, but very close to Assembly.

Features of qC

  • use variables and literals of types character and integer (both with signal)
  • function declarations/calls, with recursion support
  • pointers to variables and literals and to other pointers
  • unidimensional arrays for integers, characters or pointers
  • literals of type string
  • arithmetic and logic expressions (check language grammar)
  • simple relational operations
  • pointer operations
  • assign operations
  • control operations (if-else and while)
  • output operations (simplified version of printf)
  • conversion between integers and strings - operations itoa and atoi
  • comments of type /* … */

Tokens

Token Meaning
ID alphameric case sensitive sequences beginning with a letter where ‘_’ is also allowed
INTLIT sequence of digits without unnecessary left pad zeros
CHRLIT single character (except newline or single quote) or escape sequence (\n, \t, \, \‘, \” and \0) between single quotes
STRLIT sequence of characters (except newline or single quote) and/or escape sequences between double quotes
AMP &
AND &&
ASSIGN =
AST *
ATOI atoi
CHAR char
COMMA ,
DIV /
ELSE else
EQ ==
GE >=
GT >
IF if
INT int
ITOA itoa
LBRACE {
LE <=
LPAR (
LSQ [
LT <
MINUS -
MOD %
NE !=
NOT !
OR `
PLUS +
PRINTF printf
RBRACE }
RETURN return
RPAR )
RSQ ]
SEMI ;
WHILE while
RESERVED C keywords not used in qC

Grammar (EBNF notation)

Start → (FunctionDefinition | FunctionDeclaration | Declaration) {FunctionDefinition | FunctionDeclaration | Declaration}
FunctionDefinition → TypeSpecifier FunctionDeclarator LBRACE {Declaration} {Statement} RBRACE
FunctionDeclaration → TypeSpecifier FunctionDeclarator SEMI
FunctionDeclarator → {AST} ID LPAR [ParameterList] RPAR
ParameterList → ParameterDeclaration {COMMA ParameterDeclaration}
ParameterDeclaration → TypeSpecifier {AST} ID
Declaration → TypeSpecifier Declarator {COMMA Declarator} SEMI
TypeSpecifier → CHAR | INT
Declarator → {AST} ID [LSQ INTLIT RSQ]
Statement → [Expression] SEMI
Statement → LBRACE {Statement} RBRACE
Statement → IF LPAR Expression RPAR Statement [ELSE Statement]
Statement → WHILE LPAR Expression RPAR Statement
Statement → RETURN Expression SEMI
Expression → Expression ASSIGN Expression
Expression → Expression (AND | OR) Expression
Expression → Expression (EQ | NE | LT | GT | LE | GE) Expression
Expression → Expression (PLUS | MINUS | AST | DIV | MOD) Expression
Expression → (AMP | AST | PLUS | MINUS | NOT) Expression
Expression → Expression LSQ Expression RSQ
Expression → ID LPAR [Expression {COMMA Expression}] RPAR
Expression → (PRINTF | ATOI) LPAR Expression RPAR
Expression → ITOA LPAR Expression COMMA Expression RPAR
Expression → ID | INTLIT | CHRLIT | STRLIT | LPAR Expression RPAR

Usage

$ make
$ ./qccompiler [OPTIONS] < input.qc

Options:

-t     print the abstract syntax tree and stop after syntatic analysis.
-s     print the symbol table and stop after semantic analisys.
-c     allways compile the program (unless errors occur).
-o     allways compile the program (unless errors occur) and print compiled program to file.

If both flags -s and -t are set, the proccess stops after the semantic analysis. The result of the compilation is written to the file `result.c

Dependencies

  • flex
  • yacc
  • gcc

Related