Phase 3

In this phase, you will extend your compiler to generate correct x86-64 assembler code for all Decaf programs. By the end of code generation, you should have a fully working Decaf compiler. You’ll be able to write, compile, and execute real programs on a real machine!

There are two deliverables for this phase:

Your Decaf compiler, due Friday, April 4 at 11:59pm

A Project Design Document explaining the technical details of your implementation for phases 1, 2, and 3, due Saturday, April 5 at 11:59pm

Only one submission is needed for each team — please use Gradescope’s feature to add multiple people to a single submission.

This due date is also posted on the Class Schedule.

Table of contents

Overview
Things to keep in mind
Specifications
Design Document
Evaluation
Submission
1. Code
2. Design Document
Appendix

Overview

For Code Generation, your compiler will translate your high-level IR into a low-level IR. Your low-level IR will include structures that more closely match the machine instructions of a modern architecture. Your compiler will then translate your low-level IR into x86-64 assembly code. You should target the subset of the x86-64 ISA defined in the x86-64 architecture guide, in the reference section at the bottom of this handout.

Your generated code must include the runtime checks listed in the Decaf language specification. Additional checks such as integer overflow are not required.

The last two phases, Dataflow Analysis and Optimization, will focus on improving the efficiency of the target code generated by your compiler. For this phase, you are not expected to produce great assembly code. In fact, you are expected to produce bad code. When considering tradeoffs, always choose simplicity of implementation over performance.

You are not constrained as to how you go about generating your final assembly code listing. However, we suggest that you follow the general approach presented in lecture.

For this phase, you should focus your creative energies on designing your Control Flow Graph, familiarizing yourself with the target ISA, your machine-code representations of the run-time structures, and generating correct assembly code. Do not try to produce a register allocation scheme; you will be addressing these issues later.

Tip: Make sure you read the x86-64 manual (linked on the Resources page) to fully understand what each instruction does. Do not rely solely on the cheat sheet, especially if code isn’t doing what you expect it to.

Things to keep in mind

Testing Infrastructure

By this phase, you should take the time to write your own test cases, as the public test cases are nowhere near extensive enough to cover the entire breadth of possible Decaf programs. You are required to set up testing infrastructure that allows you to run your compiler automatically against your test cases, as well as the test cases we may provide you with. You may use any unit testing library to assist you with this, or write your own test scripts. Please tell us about your testing infrastructure in your design document.

Do not solely use the Gradescope autograder for your testing. The autograder will take longer and longer to run for later phases as it becomes more extensive.

Linking External Libraries

Decaf does not have any input/output functions. Part of this phase is to implement the standard x86-64 calling convention for import statements, so that you can interface with the outside world. Any function that is called using import needs to be linked in separately. gcc will link against any standard libraries, such as printf (you may need to use the -l argument for gcc to link some libraries). The testing files provided to you link against the standard C library. If you want to use functions that are not easy to use in Decaf (handle pointers, etc), you are welcome to write your own library calls in C, compile them to object files (using gcc -c) and then link them in by hand when compiling your assembly.

Memory Alignment

The concept of memory alignment plays a big role in hardware. To say that we require some address to be n-byte aligned is to say that we require address % n = 0. In other words, the numerical value of the address must be a multiple of n.

Some instructions require the stack pointer to be 16-byte aligned. You will not need to use these instructions yourself, but functions in the C standard library often do, including printf.

Consider the following assembly:

...
# assume that at this point, %rsp % 16 = 0
pushq $1     # %rsp % 16 = 8, stack pointer is  8 byte aligned, but not 16
pushq $2     # %rsp % 16 = 0, stack pointer is 16 byte aligned
pushq $3     # %rsp % 16 = 8, stack pointer is  8 byte aligned, but not 16
call printf  # we are entering the function while %rsp is NOT 16-byte aligned

Recall that every pushq decrements the stack pointer by 8 bytes. This maintains the invariant that %rsp is 8-byte aligned at all times, but not necessarily 16-byte aligned. So if you call a foreign functions that expects %rsp to be 16-byte aligned and your stack pointer is not 16-byte aligned when you call it, it may segfault.

For this project, you can just make sure that the stack pointer is always 16-byte aligned whenever you perform a method call.

Printf

Your compiler must support the printf function. There are two aspects of this function that are exceptions to the standard Decaf ABI:

The stack must be 16-byte aligned, as noted above.
Since printf takes a variable number of arguments, the System V ABI specifies that rax contains the number of floating point arguments; this should always be set to 0 before calling printf.

We have included various printf-related test cases in the public test cases to help you make sure that you are calling printf correctly.

macOS Specific Issues

Please see the macOS Infrastructure page for more information about differences in x86 assembly on macOS vs. Linux. You may want to implement a -m (--mac) flag in order to aid your debugging.

Specifications

You should be able to run your compiler from the command line with:

./run.sh -t assembly <INPUT FILE> -o <OUTPUT FILE>

Your compiler should then write a x86-64 assembly listing to the output file specified by -o, or to stdout if -o isn’t provided.

Nothing should be written to standard error for a syntactically and semantically correct program (or to standard output, if an output file is specified) unless the --debug flag is present. If the --debug flag is present, your compiler should still run and produce the same resulting assembly listing. Any debugging output is left to your own discretion. All errors must be written to standard error.

The public test cases for this phase have been added to the 6110-sp25/public-tests repository on Github. Please read the comments in the test cases to see what we expect your compiler to do.

Internally, the autograder will assemble the assembly file produced by your compiler using the following command:

gcc -O0 -no-pie <ASSEMBLY FILE> -o <EXECUTABLE>

As usual, you can run your executable with ./<EXECUTABLE> and print its return code with echo $?. Note that echo $? prints the return code of the most recent command, so you should run it only immediately after running your exectuable (e.g. don’t run it twice).

Design Document

In this phase you will also write a design document that explains the technical details of how you have integrated Phases 1, 2, and 3. This will be a much more substantial document than the reports for previous phases. You may write any length design document within reason as long as it addresses all the points below.

The TAs will review your design document as well as provide feedback on your code. Please make sure all code has been committed to your team’s repository. It must include the following sections:

Design (16 points): An overview of your design, an analysis of design alternatives you considered, and key design decisions for each of the parts below. This section should help us understand your code, and convince us that it is sufficiently general, and let us know anything that might not be reflected in your code. You must, at a minimum, answer the above questions for the following parts of your compiler:
- Your chosen scanner/parser implementation (1 point)
- Your high-level IR (1 point)
- Your semantic checker (2 points)
- Your CFG or low-level IR implementation (5 points)
- Your assembly code generator (5 points)
- Your testing infrastructure (2 points)
Extras (2 points): A list of any clarifications, assumptions, or additions to the problem assigned. This include any interesting debugging techniques/logging, additional build scripts, or approved libraries used in designing the compiler. The project specifications are fairly broad and leave many of the decisions to you. This is an aspect of real software engineering. If you think major clarifications are necessary, consult the TA.
Difficulties (1 point): A list of known problems with your project, and as much as you know about the cause. If there are any Phase 2 tests that you were not passing before the Phase 2 deadline, but fixed for Phase 3, you should also include this information in the write up.
Contribution Statement (1 point): A brief description of how your group divided the work. This will not affect your grade; it will be used to alert the TAs to any possible problems.

Evaluation

This phase is worth 15% of the overall grade in this class. Your grade in this phase (15% total) is allocated as follows:

Your Decaf compiler: 12%, from autograded public and private test cases.
- The TAs reserve the right to perform regression testing for test cases in Phase 1 and 2
Your Project Design Document: 3%

The public test cases are available at the 6110-sp25/public-tests repository. You should also make sure that your Phase 3 compiler also passes all of the legal phase 1 and 2 test cases. The private test cases for Phase 2 will be available at the 6110-sp25/public-tests repository after the phase 2 submission deadline.

Submission

Code

Please submit your phase 3 code on Gradescope via GitHub, following the steps below:

Push your code to your team GitHub repository (6110-sp25/<TEAM NAME>). We suggest making a separate branch for the submission, say, phase3-submission.
Go to the Phase 3 assignment on Gradescope, and select your GitHub repository and branch.
Add your team members to the submission on Gradescope.

Submitted repositories should have the following structure:

<repo name>
├── build.sh
├── run.sh
├── doc/
│   └── phase3.pdf  # phase 3 design document
└── ...

Make sure the ./build.sh and ./run.sh scripts are located at the root of your repository, otherwise the autograder will fail.

Please make sure that your compiler doesn’t get caught in an infinite loop when running any of the tests.

Design Document

Please submit your design document as a PDF in the Phase 3 Report assignment on Gradescope, and remember to add your team members to the submission. In addition, as indicated by the previous section, please also add the design document to your team GitHub repository at doc/phase3.pdf.

Appendix

References

The complete Intel x64 manual — official PDFs with all the gory details
x86 and amd64 instruction reference — navigable reference by Félix Cloutier derived from the Intel manual
x86 wiki — good primer on how x86 instructions work
x64 cheat sheet — lists and tables detailing registers and assembly commands from Brown University’s CS033
Agner Fog’s optimization page — this is a very useful reference page with manuals on how to optimize code for x86-64. In particular, take a look at the latency tables.
Godbolt — allows you to input C code and gives you line-by-line assembly output of various compilers (such as gcc and clang), very useful for figuring out how to translate certain operations to assembly.

Using GDB

GDB is a tool that allows you to step through your program and inspect memory contents for debugging. Run it with gdb prog, then type start to begin. This brings you to the start of the main function.

Before proceeding, type layout asm to bring up a view of your current position in the assembly code. You should be able to correlate this with your generated assembly in prog.s. You can navigate around the assembly code with the up and down buttons, though it doesn’t always work.

Sometimes, if your program writes to stdout or stderr, the interface can get messed up. When this happens, just type Ctrl-X A (Ctrl-X followed by A) twice, to switch out of assembly view and back again.

At any point, you can view register contents by typing info reg, and stack contents by typing x/16x $rsp. That last command is actually a general way to inspect memory. The command x/<num>x <addr> examines num words starting from address addr. The second x in the command means to print it in hexadecimal; you can change it to d for decimal, for example.

Finally, you can use si to step forward by one assembly instruction, finish to skip to and take the ret in a function call, and ni to jump over a function call. At any point, you can start the program over by typing start, or quit by typing quit.

Assembly Examples

Example 1

print 32 using the printf function in the standard C library

format_str_0:
    .string "%d\n"  # string constant
    .align 4

.globl main
main:
    # pre-call ritual
    pushq %rbp         # save base pointer
    movq  %rsp, %rbp   # save stack pointer

    # call function `printf`
    leaq format_str_0(%rip), %rdi  # 1st arg; load the address of str constant into %rdi
    movq $32, %rsi                 # 2nd arg; load int constant into %rsi
    movq $0, %rax                  # zero rax before printf
    call printf                    # call with the above args

    # analogous to 'return 0;' in C's main
    mov $0, %rax

    # post-call ritual
    movq %rbp, %rsp
    popq %rbp
    ret  # return to where the function was called

Example 2

a function calc(a, b, c, d, e) which returns (a*b - c/d)*e where c/d discards remainders; for example 3/2 = 1

format_str_0:
    .string "%d\n"  # string constant
    .align 4

calc:
    # pre-call ritual
    pushq %rbp
    movq  %rsp, %rbp

    # %rsi = %rdi * %rsi; calculate a * b
    imul %rdi, %rsi

    # %rax = %rdx / %rcx; calculate c / d
    movq %rdx, %rax  # move %rdx into %rax
    xor  %rdx, %rdx  # zero-out %rdx by xor'ing it with itself
    idiv %rcx        # %rax = %rdx:%rax / %rcx = %rax / %rcx since %rdx = 0

    # %rsi = %rsi - %rax; calculate a*b - c/d
    subq %rax, %rsi

    # %rsi = %rsi * %r8; calculate (a*b - c/d) * e
    imul %r8, %rsi

    # move the final value into %rax to return it
    movq %rsi, %rax

    # post-call ritual
    movq %rbp, %rsp
    popq %rbp
    ret

.globl main
main:
    # pre-call ritual
    pushq %rbp         # save base pointer
    movq  %rsp, %rbp   # save stack pointer

    # calculate (4*5 - 3/2)*1
    movq $4, %rdi  # 1st arg; a=4
    movq $5, %rsi  # 2nd arg; b=5
    movq $3, %rdx  # 3rd arg; c=3
    movq $2, %rcx  # 4th arg; d=2
    movq $1, %r8   # 5th arg; e=1
    call calc # retval = 19 is now in %rax

    # call function `printf`
    leaq format_str_0(%rip), %rdi  # 1st arg; load the address of str constant into %rdi
    movq %rax, %rsi                # 2nd arg; load result of calc into %rsi
    movq $0, %rax                  # zero rax before printf
    call printf                    # call with the above args

    # analogous to 'return 0;' in C's main
    mov $0, %rax

    # post-call ritual
    movq %rbp, %rsp
    popq %rbp
    ret  # return to where the function was called

Example 3

translate the following decaf code

import printf;

int a[10]; // global array

void main ( ) {
  int i;
  for (i = 0; i < len(a); i++) {
      printf("%d\n", a[i]);
  }
}

.data
format_str_0:
    .string  "%d\n"
    .align 4

    .comm array_0, 88, 16
    .align 4

.text
.globl main
main:
    pushq %rbp
    movq  %rsp, %rbp
    subq   $16, %rsp

    movq $0, -8(%rbp)  # i = 0

    movq $10, array_0       # length
    movq  $0, array_0 + 8
    movq  $0, array_0 + 16
    movq  $0, array_0 + 24
    movq  $0, array_0 + 32
    movq  $0, array_0 + 40
    movq  $0, array_0 + 48
    movq  $0, array_0 + 56
    movq  $0, array_0 + 64
    movq  $0, array_0 + 72
    movq  $0, array_0 + 80

cond_start:
    movq -8(%rbp), %rdi  # %rdi = i
    movq array_0,  %rsi  # %rsi = array_0 = length

stop:
    cmp %rsi, %rdi  # weird syntax
    jge loop_done   # if %rdi >= %rsi = i >= len, exit loop

loop_start:
    # save registers
    push %rdi
    push %rsi
    push %rdx
    push %rcx
    push %r8
    push %r9
    push %r10
    push %r11

    addq $1, %rdi                  # add 1 to compensate for the first element being length
    imul $8, %rdi                  # convert to bytes
    movq      array_0(%rdi), %rsi  # address of array_0 + value in %rdi = array_0 + %rdi = array_0 + i * 8 = array_0[i]
    leaq format_str_0(%rip), %rdi  # pointer to string
    movq $0, %rax                  # zero rax before printf
    call printf

    # restore registers
    popq %r11
    popq %r10
    popq %r9
    popq %r8
    popq %rcx
    popq %rdx
    popq %rsi
    popq %rdi

    addq $1, -8(%rbp)  # i++
    jmp cond_start

loop_done:
    mov    $0, %rax
    movq %rbp, %rsp
    popq %rbp
    ret