Due: 5:00pm, Tuesday, November 25. Value: 50 pts.
The following three files implement most of a compiler to translate a program written in a simple language called Minc into ARM assembly language.
main.c | Loads a file and builds an internal representation
before calling generate_code .
You should not modify this file. |
codegen.c | Defines generate_code , which given a
representation of a program using a C struct should
print the corresponding ARM assembly language.
This is the file you should modify; you should not add or
modify other files. |
codegen.h | A header file defining structures and functions useful for both main.c and codegen.c. |
Your assignment is to complete codegen.c to handle the entirety of the Minc language described below; you should not add or modify other files, and your submitted solution should be entirely in codegen.c.
The name Minc comes from minimal C, as Minc is a very basic C-style language. It supports just four statement types.
i = 3;
)print
statements (ex. print cur;
)while
statements (ex.
while (i < 5) {
…}
)if
statements with no else
(ex. if (i < 5) {
…}
)Each of these statements includes an arithmetic expression. Minc expressions are very constrained:
a + b + c
” is erroneous,
since it involves two operations (both additions).+
, -
, <<
, >>
,
==
, <
, >
, <=
,
>=
, and !=
.Despite these severe constraints, one can still write a reasonable program. The following two basic examples illustrate.
Count down 10 to 1: | Greatest common divisor: |
cur = 10; | b = 40; |
The following links are to legal Minc programs:
count.mc: | Count down 10 to 1 (example above) |
gcd.mc: | Greatest common divisor (example above) |
hailstone.mc: | Iterate through hailstone sequence |
factor.mc: | Show all factors of a number |
The distributed code already converts a given file into an
internal memory representation that is easier to process than
the original progra text.
This internal representation uses
a type named struct statement
.
struct statement {
int stype; /* one of: STMT_PRINT, STMT_ASSN, STMT_IF, STMT_WHILE */
int line; /* line number in source file where statement starts */
int dest_var; /* used for STMT_ASSN only, indicating var id to save*/
int op_type; /* one of: OP_LT, OP_EQ, ..., OP_NONE, OP_ADD, OP_SUB */
int left_is_var; /* 0 if constant, 1 if variable */
int left_val; /* if constant, number; if variable, variable id */
int right_is_var; /* 0 if constant, 1 if variable */
int right_val; /* if constant, number; if variable, variable id */
struct statement *body; /* used for STMT_IF and STMT_WHILE only, */
/* pointing to first statement in body. */
struct statement *next; /* next statement to execute after this one, */
/* or NULL if final statement in list. */
};
As indicated by the final next
field,
this is a linked list of statements, each object in the list representing a
single statement on the same level of the program.
The other fields indicate the various parts of each individual
statement, starting with stype
indicating which of the
four statement types it is.
This is probably easiest to comprehend using an example. Below is a Minc program accompanied by an illustration showing how the distributed program represents this Minc program in memory.
n = 10;
sum = 0;
while (n > 0) {
sum = sum + n;
n = n - 1;
}
print sum;
As it reads the program, it assigns a number between 0 and 9
to each unique variable; in this case, it chose 0 for n
and 1 for sum
.
struct
represents
the first statement, “n = 10;
”:
It is an assignment statement, so stype
is STMT_ASSN
.
The next field, line
, indicates the line of the file where the
statement was found.
The field dest_var
indicates that we are writing to
variable 0 (representing n
).
The next five fields represent the expression to be computed:
in this case, the expression is simply 10, with no operation to
be performed, so op_type
is OP_NONE
;
the value is not a variable, so we see left_is_var
is 0,
while left_val
is the numeric constant from the statement.
Since this is OP_NONE
, the value of right_is_var
and right_val
is unimportant;
and because the statement type is STMT_ASSN
, the value of
body
is unimportant.
Finally, next
points to the statement to be executed
following this one.struct
represents “sum = 0;” on
line 2 of the program.
It is very similar to the previous line, with the important
difference that dest_var
changes to 1 since this
statement changes sum
instead.struct
represents the program's while
statement beginning on line 3.
For a while
statement, the value of dest_var
is
irrelevant. The expression to be evaluated involves a
greater-than comparison, so op_type
is OP_GT
.
In this case, we see a variable on the left side,
so left_is_var
is 1, while left_val
is the variable's identifier, 0 for n
.
Notice also that body
is a pointer to the first statement
in the body.next
field, we find the struct
corresponding to the first statement following the while
statement: print sum;
. Its next
value is NULL
since no statement follows it.while
statement's body
field, we find a struct
corresponding to
“sum = sum + n;
” on line 4.
You can see that in this case op_type
is OP_ADD
since the operation in the statement is addition.next
field,
we find the struct
corresponding to
“n = n - 1;
” on line 5.
Its next
field is NULL
since no line follows it
within the while
statement's body.First, compile your program with a statement such as
“gcc main.c codegen.c
”.
Ensure that you have a Minc program saved in a file,
which in our examples we'll presume is named test.mc
.
The distributed program includes a working interpreter that
you can use if you simply want to execute a Minc program to see what
it does. To execute the Minc program, enter
“./a.out -x test.mc
”.
But if you want the program to generate ARM assembly code,
omit the -x
flag: “./a.out test.mc
.
That will display the ARM assembly translation to the screen.
You can redirect this output to a file instead:
“./a.out test.mc > test.s
”
Once you have saved the ARM assembly code, you can then load
it within aas and execute it.
(You'll want to ensure that io.s from
the malloc assignment is available in the same directory.)
You should find that using aas to execute the code
generated by your program should
work identically to “./a.out -x test.mc
”.
In case it's helpful, below is the sample program illustrated above along with the ARM program that a correct solution might generate. This is not the only correct solution, though: The real test is how the generated code behaves within aas, as explained in the previous section.
n = 10; | MOV SP, #0xFF00 |
By the way, when you want a C program to display a backslash '\
'
in the output, you will need to include a double-backslash.
For example, in the statement
“printf("#'\\n'\n");
”,
the first double-backslash ends up translating to a single
backslash in the output,
whereas the lone backslash followed by n translates to
a newline (with no backslash).