11 Compiler 4: Adding Control Flow (Corrected)

7.5.0.17

11 Compiler 4: Adding Control Flow (Corrected)

11.1 Assignment Summary

The goals of this assignment are to (1) practice exposing low-level features from the target language to the compiler, (2) introduce designing abstractions to tame low-level operations, (3) demonstrate the problems that control flow causes with program analysis, (4) introduce linking, and (5) practice thinking about programming at different levels of abstraction. We’ll expose labels and jmp instructions from x64, provide abstractions for using them in the source language, compile the latter abstraction to the former implementation, and practice reasoning about our new source language.

This assignment is due Friday Febuary 14, 2020 at 11:59pm; you have 2 weeks this time.

By Friday Febuary 7, 2020 at 11:59pm, you should have tests for all exercises, should have finished implementing 1/3–1/2 of the exercises. We will provide interim feedback by running any tests you’ve pushed to your repository against the reference solution at this time.

Assignment Checklist

You should find a new repository in your https://github.students.cs.ubc.ca account named a4_<team ids> with a code skeleton and a support library. You should complete the assignment in that git repository and push it to the GitHub Students instance.

You should first copy your solution to Compiler 3: Register Allocation (Corrected) into the starter code provided.

If we patch the code skeleton or library after the release, the most recent version will be available here at a4-skeleton.rkt and a4-compiler-lib.rkt, and the functional graph library a4-graph-lib.rkt. The name of the skeleton is a4-skeleton.rkt to avoid accidentally overwriting your files, but your file in the Git repository should be named a4.rkt.

Per Organizing Racket Code, you are not required to keep all of your code in a single file, but a4.rkt must export the at least the names exported by a4-skeleton.rkt, and you must follow the instructions on organizing your code from Organizing Racket Code.

Regardless of how your code is organized, to simplify grading, you must add a comment with the exercise number before the code for each of your exercises:

; Exercise 20
(define (values-lang-fib n) ...)

; Exercise 21
; When I compared the two implementations I found ...

For non-obvious test cases, add a comment explaining which language feature or edge case you are testing. For trivial tests, such as (check-exn exn:fail? (thunk (check-values-lang '()))), this is not necessary and will only make your code less readable.

Redesign and extend the implementation of generate-x64, a compiler from Paren-x64 v3 to x64.
Design and implement link-paren-x64, a linker for the Paren-x64 v3 (equivalently, a compiler from Paren-x64 v3 to Paren-x64-rt).
Redesign and extend the implementation of interp-paren-x64, the interpreter for Paren-x64 v3
Redesign and extend the implementation of patch-instructions, a compiler from Paren-asm v2 to Paren-x64 v3.
Design and implement flatten-program, a compiler from Block-asm-lang to Paren-asm v2.
Design and implement expose-basic-blocks, a compiler from Block-nested-lang to Block-asm-lang.
Redesign and extend the implementation of replace-locations, a compiler from Block-assigned-lang to Block-nested-lang.
Design and implement discard-call-live, a compiler from Block-jump-live-lang to Block-assigned-lang.
Redesign and extend the implementation of assign-registers, a compiler from Conflict-lang v2 to Block-jump-live-lang.
Redesign and extend the implementation of conflict-analysis, a compiler from Undead-block-lang to Conflict-lang v2.
Redesign and extend the implementation of undead-analysis, a compiler from Block-locals-lang to Undead-block-lang.
Redesign and extend the implementation of uncover-locals, a compiler from Block-lang to Block-locals-lang.
Redesign and extend the implementation of select-instructions, a compiler from Values-unique-lang v2 to Block-lang.
Redesign and extend the implementation of uniquify, a compiler from Values-lang v2 to Values-unique-lang v2.

11.2 Language Diagram

11.3 Related Reading

You can find additional reading material on register allocation and undead analysis at the resources below. These comes from other courses that use a compiler that is similar to, but different from, your compiler. Their intermediate languages are different, and their compiler pipeline is different. There is no one universal truth of how these passes work for all compilers in all situations, so you will need to generalize from these lessons and apply them to your compiler.

Still, the lessons should be helpful if used with care:

R. Kent Dybvig’s notes on register allocation kents-notes-on-register-alloc.html. This language allows mixing abstract locations and real locations in the same language, unlike your intermediate languages.
These notes are courtesy of R. Kent Dybvig, author of the Chez Scheme compiler https://github.com/cisco/ChezScheme.
Essentials of Compilation, Chapters 3.2, 3.3, and 3.4 book.pdf. These chapters include notes on calling conventions and function call/return, which your compiler should not support yet.
This is a freely available book written by the group at Indiana University: https://github.com/IUCompilerCourse/Essentials-of-Compilation

11.4 Preface: What’s wrong with Values-lang

The language Values-lang has a significant limitation: we can only expression simple, straight-line arithmetic computations. We’ll never be able to write any interesting programs!

In this assignment, we will expose a machine feature, control flow instructions in the form of labels and jmp instructions, and systematically abstract these. This requires changes to nearly every pass and every intermediate language.

We’ll proceed bottom-up, but first let’s take a look at our goal. We want to extend Values-lang with control-flow features: a form of if expression, top-level function definitions, and non-returning function calls (tail calls).

p	::=	(module b ... e)

b	::=	(define x (lambda (x ...) e))

e	::=	tail
	\|	(let ([x n]) e)
	\|	(if (cmp v v) e e)

n	::=	v
	\|	(binop v v)

tail	::=	n
	\|	(apply x v ...)

x	::=	name

v	::=	int64
	\|	x

binop	::=	*
	\|	+

cmp	::=	neq?
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=

Note that an if expression, (if (cmp v v) e e), cannot branch on an arbitrary expression. It is restricted so that a comparison between two values must appear in the predicate position. This is a necessary artifact inherited from x64. Even by the end of this assignment, we will not yet have enough support from the low-level language to add values that represent the result of a comparison, i.e., we won’t have booleans.

Note also that a function can only be called as the very last expression. This is because we will not have introduced enough to implement returning from a function, so all function calls are restricted to only jump forward.

Despite the limitations of Values-lang v2, the language is finally expressive enough to write interesting programs. We’ll therefore implement a type checking tool to help us program in Values-lang v2, and spend some time writing programs to understand its expressiveness and its limitations. If you prefer, you can jump to the last few exercises and complete those first.

11.5 Exposing Control-Flow Primitives

We’ll start by designing a new Paren-x64 v3 to expose the additional features of x64 necessary to implement control flow abstractions. This extends the previous Paren-x64 v2 with comparison operations, labels, and conditional and unconditional jump operations.

p	::=	(begin s ...)

s	::=	(set! loc triv)
	\|	(set! reg loc)
	\|	(set! reg_1 (binop reg_1 int32))
	\|	(set! reg_1 (binop reg_1 loc))
	\|	(define label s)
	\|	(jump trg)
	\|	(compare reg opand)
	\|	(jump-if cmp label)

trg	::=	reg
	\|	label

triv	::=	trg
	\|	int64

opand	::=	int64
	\|	reg

loc	::=	reg
	\|	addr

reg	::=	rsp
	\|	rbp
	\|	rax
	\|	rbx
	\|	rcx
	\|	rdx
	\|	rsi
	\|	rdi
	\|	r8
	\|	r9
	\|	r10
	\|	r11
	\|	r12
	\|	r13
	\|	r14
	\|	r15

addr	::=	(rbp - dispoffset)
	\|	(rbp + dispoffset)

binop	::=	*
	\|	+

cmp	::=	neq?
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=

(define (label? s)
(and (symbol? s)
(regexp-match-exact? #rx"L\\..+\\.[0-9]+" (symbol->string s))))

x64 exposes labels, written l: before some instruction, and jumps, written jmp trg where trg is either a label or a register containing a label. Labels can be chained in x64, as we’ve seen in prior assignments. For example, the following assigns two labels to the same instructions:

L1:

L2:

mov rax, 42

In Paren-x64 v3, we model labels with the (define label s) instruction, which defines a label label at the instruction s in the instruction sequence. This corresponds to the x64 string label:\n s. Note that they can be nested, allowing the same behavior as chaining labels in x64. For convenience, we assume all labels are symbols of the form L.<name>.<number>, and are unique.

Note that (define label s) does not behave like a define in Racket or in Values-lang v2. The instruction s gets executed after the previous instruction in the sequence, even of the previous instruction was not a jump. All define does here is name the instruction so we can jump to it later, the same as a label in x64.

The new comparison instruction (compare reg opand) corresponds to the x64 instruction cmp reg, opand. This instruction compares reg to opand and sets some flags in the machine describing their relation, such as whether reg is less than opand, or whether they are equal. The flags are used by the next condition jump instruction.

The conditional jump instructions in x64, in the same order as the cmp, are: jne label, je label, jl label, jle label, jg label, and jge label. Each corresponds to "jump to trg if the comparison flag is set to ___". For example, the instruction je label jumps to label if the comparison flag "equal" is set.

In Paren-x64 v3, we abstract the various conditional jump instructions into a single instruction with multiple flags. The instruction je l corresponds to (jump-if eq? l). jl l jumps to l if comparison flag "less than" is set, and corresponds to (jump-if < l). The rest of the instructions follow this pattern.

Exercise 1: Redesign and extend the implementation of generate-x64 to support control-flow primitives. The source language is Paren-x64 v3 and the target language is x64.
To compile halt to Paren-x64, it may help if you generate a labeled instruction that essentially does nothing. This way, you can insert a label at the end of your program. Otherwise, you’ll need to think carefully about how to arrange and generate blocks.

Labels and jumps are a small change to the language syntactically, but have a large effect on the semantics. You’ll notice this, for example, when writing the interpreter for a language with labels and jumps compared to a language without them.

We can no longer write the Paren-x64 v3 interpreter in one simple loop over the instructions. Instead, we need some way to resolve labels. That way, when running the interpreter, we can easily jump to any expression at any time—a possibility the language now allows. This process of resolving labels is called linking.

We’re going to write a simple linker for Paren-x64 v3 to give you a rough idea of how the operating system’s linker works. We’ll use a low-level linking implementation that is similar to the operating systems linker: we first resolve all labels to their address in memory (in our case, their index in the instruction sequence) and then implement jumps by simply setting a program counter to the instruction’s address.

To do this, we design a new language Paren-x64-rt, which represents the run-time language used by the interpreter after linking.

p	::=	(begin s ...)

s	::=	(set! loc triv)
	\|	(set! reg loc)
	\|	(set! reg_1 (binop reg_1 int32))
	\|	(set! reg_1 (binop reg_1 loc))
	\|	(jump trg)
	\|	(compare reg opand)
	\|	(jump-if cmp pc-addr)

trg	::=	reg
	\|	pc-addr

triv	::=	trg
	\|	int64

opand	::=	int64
	\|	reg

loc	::=	reg
	\|	addr

reg	::=	rsp
	\|	rbp
	\|	rax
	\|	rbx
	\|	rcx
	\|	rdx
	\|	rsi
	\|	rdi
	\|	r8
	\|	r9
	\|	r10
	\|	r11
	\|	r12
	\|	r13
	\|	r14
	\|	r15

addr	::=	(rbp - dispoffset)
	\|	(rbp + dispoffset)

binop	::=	*
	\|	+

cmp	::=	neq?
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=

(define pc-addr? natural-number/c)

We remove the instruction (define label s), which the linker will remove, and turn all label values into pc-addr, a representation of an address recognized by the interpreter. In our case, since programs are represented as lists of instructions, a pc-addr is a natural number representing the position of the instruction in the list.

Exercise 2: Design and implement link-paren-x64, which resolves all labels into the address of the instruction in the instruction sequence. The source language is Paren-x64 v3 and the target language is Paren-x64-rt.

Exercise 3: Redesign and extend the implementation of interp-paren-x64 to support control-flow primitives. The source language is Paren-x64 v3 and the output is a Racket integer.
It should use link-paren-x64 to resolve all labels, and should use a program counter to loop over the instruction sequence instead of folding over the sequence. This means some inner helper function accepts Paren-x64-rt programs.
You may want to refer to the code skeleton in a4-skeleton.rkt.

Next, we redesign Paren-asm v2, exposing the new instructions while abstracting away from the machine constraints about which instructions work on which physical locations. Now jumps can target arbitrary locations, and compare can compare arbitrary locations. We can also move labels into locations.

p	::=	(begin s ...)

s	::=	(set! loc triv)
	\|	(set! loc (binop loc opand))
	\|	(define label s)
	\|	(halt opand)
	\|	(jump trg)
	\|	(compare loc opand)
	\|	(jump-if cmp label)

binop	::=	*
	\|	+

triv	::=	opand
	\|	label

opand	::=	int64
	\|	loc

trg	::=	label
	\|	loc

loc	::=	reg
	\|	addr

reg	::=	rsp
	\|	rbp
	\|	rbx
	\|	rcx
	\|	rdx
	\|	rsi
	\|	rdi
	\|	r8
	\|	r9
	\|	r10
	\|	r11
	\|	r12
	\|	r13
	\|	r14
	\|	r15

addr	::=	(rbp - dispoffset)
	\|	(rbp + dispoffset)

cmp	::=	neq?
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=

While halt is still an instructions, we assume that there is exactly one halt in the program, and that it is the final instruction executed in the program. We do not restrict the syntax to require this, since we now support jumps. Jumps mean our syntax does not give us a clear indication of which instruction is executed last. It might be the case that halt is the second instruction in the instruction sequence, but is always executed last becuase of the control flow of the program. It could also be that there are multiple halt instructions syntactically, but only one will ever be executed due to conditional jumps.

While we’re redesigning, we also lift some restrictions to allow our compiler to "not be stupid". We generalize Paren-asm v2 to allow literal values in some operands. We allow jumps to take either a location or a label directly, and allow binops and halt to take either a value or a location. Relaxing this restriction makes Paren-asm v2 less uniform, but makes it easier to work with, and allows us to produce better code.

Forcing ourselves to "be stupid" is a common hazard when designing abstraction boundaries. Previously, in our attempt to abstract away from annoying details of x64 and make a more uniform language, we actually forced our compiler to produce worse code. We required all operations in Paren-asm to work uniformly over locations only, even though x64 supports working over values for many instructions. This required all compiler passes that target Paren-asm to introduce extra instructions. While we could "optimize" those instructions away, it’s often better to "not be stupid"—to avoid generating bad code that you have to clean up, since optimizations are often incomplete.

Exercise 4: Redesign and extend the implementation of patch-instructions to implement the new instructions jump and compare instructions that don’t exist in x64 by instruction sequences that are valid in x64. The source language is Paren-asm v2 and the target is Paren-x64 v3.
It will be tricky to implement the instruction (compare addr addr) with only one temporary register. You can do it in four Paren-x64 v3 instructions.
You should continue to use rax as a temporary register.

11.6 Blocks: Abstracting labeled instructions

In Paren-asm v2, control can jump to any instruction at any time. This makes giving semantics to programs, such as writing an interpreter or an analysis, very difficult. At any label, we must make assumptions about the state of the machine, since the state is not affected only by the sequence of instructions that came before the label in the instruction sequence, but potentially arbitrary instructions that were executed prior to a jumping to the label.

To simplify reasoning about programs, we can organizing code into labeled blocks, assuming that control flow only enters the beginning of a block and exits the end of a block. This block abstraction simplifies reasoning about control flow. We have more structure on which to hang assumption, and can make more assumptions about code when writing analyses. We want to introduce this abstraction before we have to extend the register allocator, which currently does the most analysis in our compiler.

We design Block-asm-lang, a basic-block-structured abstract assembly language in which sequences of statements are organized into basic blocks, and code can jump between blocks. Labels are no longer instructions that can happen anywhere; instead, each block is labeled. Jumps cannot appear just anywhere; instead, they happen only at the end of a block.

p	::=	(module b ... b)

b	::=	(define label info tail)

info	::=	(any ...)

tail	::=	(begin s ... tail)
	\|	(jump trg)
	\|	(if (cmp loc opand) (jump trg) (jump trg))
	\|	(halt opand)

s	::=	(set! loc triv)
	\|	(set! loc (binop loc opand))

binop	::=	*
	\|	+

triv	::=	opand
	\|	label

opand	::=	int64
	\|	loc

trg	::=	label
	\|	loc

loc	::=	reg
	\|	addr

reg	::=	rsp
	\|	rbp
	\|	rbx
	\|	rcx
	\|	rdx
	\|	rsi
	\|	rdi
	\|	r8
	\|	r9
	\|	r10
	\|	r11
	\|	r12
	\|	r13
	\|	r14
	\|	r15

addr	::=	(rbp - dispoffset)
	\|	(rbp + dispoffset)

cmp	::=	neq?
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=

In Block-asm-lang, a program is a non-empty sequence of labeled blocks. We consider the first block in the sequence to be the start of the program. A tail represents a self-contained block of statements. Jumps can only appear at the end of blocks, and jumps only enter the beginning of blocks.

The halt instruction should only appear at the end of the final block; it cannot stop control flow, but only indicates that if the program has ended, the opand is the final value. Again, we cannot enforce this syntactically due to jumps. Instead, we require that halt appears at the end of a block, and assume only one halt instruction is ever executed during exectuion.

Note that now there is an info field for each block. Many analyses that previously worked over the entire program must now work over each block.

Exercise 5: Design and implement flatten-program to flatten all blocks into straight-line code. The source language is Block-asm-lang and the target language is Paren-asm v2.

Challenge 1: In Paren-asm v2, it’s unnecessary to have a jump when the target of the jump is the next instruction. Design and implement an optimization pass, called inline-jump, that eliminates these unnecessary jumps. The source language is Paren-asm v2 and target is Paren-asm v2

Challenge 2: Branches are often thought of as expensive, so real compilers often perform trace scheduling, an optimization that rearranges blocks and takes advantage of fall-through between blocks to reduce the number of jumps.
Implement a function trace-schedule to perform this optimization. The source language is Block-asm-lang and the target language is Block-asm-lang.
This optimization works in conjunction with the previous optimization.

11.6.1 Nested tails

Before we start modifying the register allocator, we will make a simplifying assumption: we should allow nested tails. This means nested if statements, and begins in the branches of an if statement.

It is not obvious why we would want to make this change, and this is one place where our back-to-front design fails us. In truth, compiler design, like most software design, is an interative process. We must look ahead a few steps in the compiler to see the problem.

If we were following our nose and propagating our new features up to the next abstraction layer, the next step might be to simply start on the register allocator, extending it with support for blocks. As soon as we do that, we realize that analyzing arbitrary jump instructions is kind of complicated. However, analyzing if is somewhat less complicated, as the structure of the control flow is more constrained. By allowing tails to be nested, we can allow more ifs and fewer apparent jumps in the source language of the analyses. The fewer jumps, the better job the analysis can do, and the better code it can produce. So we want to minimizes jumps in the language that we analyze.

We introduce Block-nested-lang, which allows nesting some sequences of expressions that would otherwise require additional blocks and jumps. In particular, if and begin can be nested. This means we could have an if statement of the form (if (cmp loc loc) (begin s ... (jump loc)) (begin s ... (jump loc))). Compiling this require introducing a new fresh label. The nesting structure allows all earlier compiler passes to ignore jumps that are quite easy to eliminate. In general, jumps are difficult to deal with (as we will see), so avoiding them for as long as possible is generally a good design choice.

p	::=	(module b ... b)

b	::=	(define label info tail)

info	::=	(any ...)

tail	::=	(begin s ... tail)
	\|	(jump trg)
	\|	(if (cmp loc opand) tail tail)
	\|	(halt opand)

s	::=	(set! loc triv)
	\|	(set! loc (binop loc opand))

binop	::=	*
	\|	+

triv	::=	opand
	\|	label

opand	::=	int64
	\|	loc

trg	::=	label
	\|	loc

loc	::=	reg
	\|	addr

reg	::=	rsp
	\|	rbp
	\|	rbx
	\|	rcx
	\|	rdx
	\|	rsi
	\|	rdi
	\|	r8
	\|	r9
	\|	r10
	\|	r11
	\|	r12
	\|	r13
	\|	r14
	\|	r15

addr	::=	(rbp - dispoffset)
	\|	(rbp + dispoffset)

cmp	::=	neq?
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=

Exercise 6: Design and implement expose-basic-blocks, which creates new blocks of any nested tails and replaces those nested tails with jumps. The source language is Block-nested-lang and the target language is Block-asm-lang.
You may want to use fresh-label from a4-compiler-lib.rkt.

11.7 Abstracting Low-level Control Flow

We now switch to a top-down perspective. So far, our goal was merely to expose some of the low-level features of x64. But going bottom-up, the next step is register allocation. It’s not clear how to assign registers to variables until we know how variables behave in this language with function calls.

We define Block-lang, a basic-block-structured location-oriented language. This is essentially the same as Block-nested-lang, but without the info field, and with all locations replaced by abstract locations. We know the register allocator will be responsible for assigning the abstract locations to real locations.

We also know that the register allocator will need some information about programs. In particular, the undead analysis will need to know what locations are undead-out after a jump. Since it may be impossible to figure out the target of the jump (Rice’s Theorem), we will annotate all jumps will that information any time we generate a jump instruction. Their only purpose here it to communicate to the analysis which locations should be considered undead; they are not actual arguments any more.

Notice that if we did not allow nested tails, the branches of each if expression would require these annotation. It’s pretty clear how to produce the arguments to a jump from a function call. However, trying to produce arguments for every if expression would not be so obvious. We would essentially need to perform undead analysis on the branches of the if, but the whole point of the annotation is to simplify undead analysis.

There are many design choices in a compiler like this. If we introduce an abstraction too late, our compiler may suffer. However, as we saw in the design of Paren-asm vs Paren-asm v2, premature abstraction boundaries can cause us to suffer as well.

p	::=	(module b ... b)

b	::=	(define label tail)

tail	::=	(begin s ... tail)
	\|	(jump trg aloc ...)
	\|	(if (cmp aloc opand) tail tail)
	\|	(halt opand)

s	::=	(set! aloc triv)
	\|	(set! aloc (binop aloc opand))

binop	::=	*
	\|	+

triv	::=	opand
	\|	label

opand	::=	int64
	\|	aloc

trg	::=	label
	\|	aloc

cmp	::=	neq?
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=

(define (aloc? s)
  (and (symbol? s)
       (not (register? s))
       (not (label? s))
       (regexp-match-exact? #rx".+\\.[0-9]+" (symbol->string s))))

Now we need to design value-oriented versions of if and jump.

In Values-lang v2, we extend expressions with an if expression that takes a comparison between two values. We have to restrict the predicate position because we have no explicit representation of the value that a comparison operation produces, i.e., we don’t have booleans. All the intermediate languages simply use the comparison instruction, which sets x64 flags that we do not yet have access to. The lower level languages we’ve design essentially treat compare-and-branch as an atomic statement.

We also add an apply expression that can call a declared function with some arguments. We must restrict calls to be in tail position, i.e., as the final expression in a block, since we do not know how to implement returning from a function call yet.

p	::=	(module b ... e)

b	::=	(define x (lambda (x ...) e))

e	::=	tail
	\|	(let ([x n]) e)
	\|	(if (cmp v v) e e)

n	::=	v
	\|	(binop v v)

tail	::=	n
	\|	(apply x v ...)

x	::=	name

v	::=	int64
	\|	x

binop	::=	*
	\|	+

cmp	::=	neq?
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=

Notice that we also modify a program syntax to be a list of function definitions followed by an expression. The final expression is where code begins executing.

Now that we have more than one kind of value (we have int64 and label), we begin to run into the problem of ill-typed operations. For example, our run-time system assumes that we halt with an integer, but the syntax of Values-unique-lang v2 allows the final expression to be a label. Similarly, a binop will only succeed if it is given two integers, but not if we try to add an integer and a label.

We have two options to make our language make sense: we can add dynamic type checking, or we can assume the problem never happens because the constraints of a higher language and its compilation strategy avoid this scenario. We’re still dealing with quite a low-level language, so we choose the latter option.

We further restrict the language by disallowing the result of any block or program to be a label. It is undefined in the target language to try to halt with a label, so we rule it out here. This is difficult to enforce without a type system, and hard to write a reasonable grammar for. We therefore assume a label on its own is not a valid tail. For example, the following is an invalid Values-unique-lang v2 program:

(module (define L.t.1 (lambda () 5)) L.t.1)

Similarly, we assume that labels are never used as arguments to binops, cmps. They can be used as arguments to apply, though.

Note that you won’t be able to write a checker to enforce these property on all programs, so you must simply assume it.

We modify Values-unique-lang v2 to include the if and apply expressions.

p	::=	(module b ... e)

b	::=	(define label (lambda (aloc ...) e))

e	::=	tail
	\|	(let ([aloc n]) e)
	\|	(if (cmp v v) e e)

n	::=	v
	\|	(binop v v)

tail	::=	n
	\|	(apply x v ...)

x	::=	label
	\|	aloc

v	::=	int64
	\|	x

binop	::=	*
	\|	+

cmp	::=	neq?
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=

Exercise 7: Redesign and extend the implementation of uniquify. The source language is Values-lang v2 and the target language is Values-unique-lang v2.
You may find this implementation of alpha equivalence for Values-lang v2 helpful: a4-alpha-equal.rkt. The code is annotated to demonstrate idiomatic Racket style, and explain feature you may be unfamiliar with.

Our target language does not provide any kind of function call abstractions. To compile function calls, we need to transform a function call into a series of set!s and a jump. That is, we need to transform `(apply ,x_f ,vs ...) into

`(begin
(set! ,xs ,vs) ...
(jump ,x_f ,xs ...))

where x_f is a reference to the function `(lambda (,xs ...) ,e).

The only way to pass information during a jump is to set the abstract locations that the function expects to contain values. This means we cannot implement a call to an arbitrary function; we can only call known functions, i.e., values that we can syntactically decide are functions whose list of parameters we know. For example, we can generate code for a function call when applying a label directly, `(module (define fact (lambda (x) ,e)) (apply fact 5)), or when the name is bound to a label, `(module (define fact (lambda (x) ,e)) (let ([x fact]) (apply x 5))). But we cannot handle indirect function calls, i.e., calls to functions whose label may have been passed to use as an argument. That is, the following must be an invalid program in Values-lang v2, since we do not know how to generate code for it:

`(module
    (define fact (lambda (x) ,e))
    (define f (lambda (g) (apply g 5)))
  (apply f g))

We therefore add as a requirement to Values-lang v2 that a program is only valid if it contains no indirect function calls.

Exercise 8: Redesign and extend the implementation of select-instructions to translate value-oriented operations into location-oriented instruction. The source language is Values-unique-lang v2 and the target language is Block-lang.

Examples:

> (select-instructions
    (uniquify
    `(module
       (define f (lambda (x) x))
       (define f (lambda (g) (apply g 5)))
       (apply f g))))
dict-ref: no value for key: 'g in: '((f . L.f.2))
> (select-instructions
    (uniquify
      `(module (define f (lambda (x) x))
         (let ([x f]) (apply x 5)))))
'(module (define L.main.4 (begin (set! x.4 L.f.3) (set! tmp.5 5) (set! x.3 tmp.5) (jump x.4 x.3))) (define L.f.3
                                                                                                     (halt
                                                                                                      x.3)))

11.8 Register Allocation

Next we need to extend the register allocator and analyses to jumps. We continue top-down.

Since variables are now shared across blocks. Essentially, our abstract locations represent global variables. The only way we can allocate registers correcly is to ensure the register allocation is aware of conflicts across blocks, and shares assignments across blocks. To do this, we need to do undead analysis across blocks, discover conflicts across blocks, and ensure register assignments are shared across blocks.

First, we easily extend uncover-locals to analyze if and jumps. We design Block-locals-lang below. We typeset the changes compared to Block-lang. The only change is the addition of an info field to each block, and to the module. Each local sets contains all variables used in its scope; either the block, or the entire module.

It’s a bit strange to call this field "locals" when it might contain all variables for the entire program. While so far we are writing entire programs, a real compiler support separately compiling individual modules. The name module suggests this, even those we cannot support separate compilation yet.

p	::=	(module info b ... b)

b	::=	(define label info tail)

info	::=	((locals (aloc ...)) any ...)

tail	::=	(begin s ... tail)
	\|	(jump trg aloc ...)
	\|	(if (cmp aloc opand) tail tail)
	\|	(halt opand)

s	::=	(set! aloc triv)
	\|	(set! aloc (binop aloc opand))

binop	::=	*
	\|	+

triv	::=	opand
	\|	label

opand	::=	int64
	\|	aloc

trg	::=	label
	\|	aloc

cmp	::=	neq?
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=

We can easily accomplish this by first uncovering locals on each block, then taking the union of the locals sets for the entire module.

Exercise 9: Redesign and extend the implementation of uncover-locals to add a list of all variables used in each block to the info field of the block. The source language is Block-lang and the target language is Block-locals-lang.

11.8.1 Undead analysis, globally

Now that we have variables that are live across blocks, our undead analysis algorithm must change to follow the control flow of the program. This gives us a more general analysis that is more precise, but much more complex and difficult to implement efficiently.

First, we define Undead-block-lang below.

::=

(module info b ... b)

::=

(define label info tail)

info

::=

((locals (aloc ...))

(undead-out undead-set-tree)

any ...)

tail

::=

(begin s ... tail)

(jump trg aloc ...)

(if (cmp aloc opand) tail tail)

(halt opand)

::=

(set! aloc triv)

(set! aloc (binop aloc opand))

binop

::=

triv

::=

opand

label

opand

::=

int64

aloc

trg

::=

label

aloc

cmp

::=

neq?

eq?

(define (undead-set? x)
  (and (list? x)
       (andmap aloc? x)
       (= (set-count (list->set x)) (length x))))

(define (undead-set-tree? ust)
  (match ust
    [(? undead-set?) #t]
    [(list (? undead-set?) (? undead-set-tree?) (? undead-set-tree?)) #t]
    [`(,(? undead-set?) ...) #t]
    [else #f]))

We will not need undead-out sets for the module; only each block.

We add the undead-out sets to our info fields. The data structure of undead-out sets is more complex than in Compiler 3: Register Allocation (Corrected). Previously, programs were lists of instructions, so a list sufficed. Now, our programs are trees. if introduces branching.

We design a new data structure call the Undead-set-tree below.

Undead-set is (listof aloc)

interp. a set of undead alocs at a particular instruction

Undead-set-tree is one of:

- Undead-set

- (list Undead-set Undead-set-tree Undead-set-tree)

- (listof Undead-set)

WARNING: datatype is non-canonical since Undead-set-tree can be an

Undead-set, so second and third case can overlap.

An Undead-set-tree is meant to be traversed simultaneously with an

Undead-block-lang/tail, so this ambiguity is not a problem.

interp. a tree of Undead-sets. The structure of the tree mirrors the

structure of a Block-locals-lang tail. There are three kinds of sub-trees:

(1) an instruction node is simply an undead sets;

(2) an if node has an undead-set for the condition and two branch sub-trees.

(3) a begin node is a list of undead sets, culminating in a sub-tree;

For example, consider the following Undead-set-tree.

`((x.1)
  (x.1 y.2)
  ((x.1 y.2 b.3)
   (x.1 y.2 c.4 b.3)
   ((x.1 y.2 c.4)
    ()
    ((y.2 c.4)
     (x.1 y.2)))))

This corresponds to the following tail.

`(begin
   (set! x.1 5)
   (set! y.2 x.1)
   (begin
      (set! b.3 (+ x.1 y.2))
      (set! c.4 b.3)
      (if (eq? c.4 b.3)
          (halt c.4)
          (begin
             (set! x.1 c.3)
             (jump f x.1 y.2)))))

The nesting structure mirror the nesting of tails. We can therefore follow a simultaneously traversing two lists, similar to what we did in Compiler 3: Register Allocation (Corrected).

11.8.2 Control-flow Analysis, and use/def chains

To compute the undead sets, we first need to perform control-flow analysis to build a control-flow graph. A control-flow graph is a directed graph in which each instruction represents a node (vertex), and an edge from vertex v1 to v2 means that v1 is executed before v2. In general, a vertex may have multiple predecessors due to jumps, and a multiple successors due to branching.

We generate the control-flow graph essentially by labeling each instruction with a uniquie identifier (such as a number, a symbol, or a pointer to the instruction’s representation), traversing the program, and adding an edge from instruction’s ID to the ID of its successor.

While building the control-flow graph, we also collect all the variables defined at that node in the graph, and all the variables used at that node. A variable is defined if it gets a new value assigned to it, such as the variable x.1 in the instruction (set! x.1 (+ y.6 z.7)). A variable is used if it is referenced in the instruction, such as y.6 and z.7 in the above instruction. We associate these sets with the instruction’s ID.

For example, consider the following program.

`(module
   (define main
     (begin
       (set! x.1 5)
       (set! y.2 x.1)
       (begin
         (set! b.3 (+ x.1 y.2))
         (set! c.4 b.3)
         (if (eq? c.4 b.3)
             (halt c.4)
             (begin
               (set! x.1 c.3)
               (jump f x.1 y.2))))))
   (define f (begin (halt x.1))))

We can represent its control-flow graph as:

`(module
   (define main
     (begin
       (1 (successors (2))
          (defs (x.1))
          (uses ())
          (set! x.1 5))
       (2 (successors (3))
          (defs (y.2))
          (uses (x.1))
          (set! y.2 x.1))
       (begin
         (3 (successors (4))
            (defs (b.3))
            (uses (x.1 y.2))
            (set! b.3 (+ x.1 y.2)))
         (4 (successors (5))
            (defs (c.4))
            (uses (b.3))
            (set! c.4 b.3))
         (5 (successors (6 7))
            (defs ())
            (uses (c.4 b.3))
            (if (eq? c.4 b.3)
                (6 (successors ())
                   (uses (c.4))
                   (defs ())
                   (halt c.4))
                (begin
                  (7 (successors (8))
                     (defs (x.1))
                     (uses (c.3))
                     (set! x.1 c.3))
                  (8 (succesors (9))
                     (defs ())
                     (uses (x.1 y.2))
                     (jump f x.1 y.2))))))))
     (define f
       (begin
         (9 (successors ())
            (defs ())
            (uses (x.1))
            (halt x.1)))))

Each instruction is replaced by a list whose first element is its ID, second element contain the set of successors, third element contains the set of defs, fourth elements containing the set of uses, and the fifth element containing the original instruction. This representation is quite inefficient, but useful for pedagogical purposes.

Actually constructing the control-flow graph is difficult. We need unique identifiers for every label before we analyze them to handle cycles in the graph.

Like with undead-analysis, it is faster (and simpler) to compute the graph by traversing the program backwards, passing the successors up through a recursive call and then adding them to the graph.

We provide you the function block-locals-lang->cfg, which constructs a somewhat more efficient graph representation.

TODO: Rename that control-flow-analysis next year.

After you have a control flow graph, we compute the Undead-set-trees by iterating over the graph, updating the undead-in sets and undead-out sets until a fixed point is reached. We update the sets according to the following algorithm.

Suppose ins is a map from instruction IDs to undead-in sets, and outs is map from instruction IDs to undead-out sets. Initially, every id in the control-flow graph maps to the empty set.
Compute new maps ins_new and outs_new for each id:
- Compute the new undead-in set as the use set for id union with the current undead-out set for id, removing the defs set for id. That is, (set-union uses (set-subtract (dict-ref outs id) defs)).
- Compute the new undead-out set as the union of the undead-in sets of all the successors.
Repeat the loop, replacing ins with ins_new and outs with outs_new until ins is equal to ins_new and outs is equal to outs_new.

Once the algorithm terminates, the outs is a map from instruction IDs to undead-out sets. We can easily construct an Undead-set-tree for each block by traversing the control-flow graph, replacing each node by its undead-out set.

We provide the function undead-analysis, and decorates each block with the undead-in sets and undead-out sets.

Now, undead-out sets will contain variables that may not exist in the block. The sets should still be associated with blocks, however, since each undead-out set corresponds to an instruction in a particular block.

Exercise 10: Redesign and extend the implementation of undead-analysis to collect undead-out sets for each block. The source language is Block-locals-lang and the target is Undead-block-lang.
Only the info field of each block of the output program is modified, with the addition of the the undead-out set.

11.8.3 Conflict Analysis

Next we need to compute the conflict graph. Our conflict graph will be for the entire module, since assignment must be shared across the entire program.

Below, we design Conflict-lang v2 below with blocks and control-flow primitives.

::=

(module info b ... b)

::=

(define label info tail)

info

::=

((locals (aloc ...))

(conflicts ((aloc (aloc ...) ...)))

any ...)

tail

::=

(begin s ... tail)

(jump trg aloc ...)

(if (cmp aloc opand) tail tail)

(halt opand)

::=

(set! aloc triv)

(set! aloc (binop aloc opand))

binop

::=

triv

::=

opand

label

opand

::=

int64

aloc

trg

::=

label

aloc

cmp

::=

neq?

eq?

The conflict-analysis does not change significantly. We simply extend the algorithm from Compiler 3: Register Allocation (Corrected) to support the new instructions, and run it on each block.

After running on each block, conflict-analysis should merge all the local conflict graphs into global conflict graph in the module’s info field. In the global conflict graph, a variable aloc_1 is in conflict with another variable aloc_2 if aloc_1 is in conflict with aloc_2 in any of the conflict graphs for each block.

Exercise 11: Redesign and extend the implementation of conflict-analysis to decorate each block with a conflict graph, then merge them and decorate the module with a single conflict graph. The source language is Undead-block-lang and the target language is Conflict-lang v2.

11.8.4 Global Register Allocation

We design Block-assigned-lang below, an extension of Loc-assigned-lang with blocks.

p	::=	(module b ... b)

b	::=	(define label info tail)

info	::=	((assignment ((aloc rloc) ...)) any ...)

tail	::=	(begin s ... tail)
	\|	(jump trg aloc ...)
	\|	(if (cmp aloc opand) tail tail)
	\|	(halt opand)

s	::=	(set! aloc triv)
	\|	(set! aloc (binop aloc opand))

binop	::=	*
	\|	+

triv	::=	opand
	\|	label

opand	::=	int64
	\|	aloc

trg	::=	label
	\|	aloc

rloc	::=	reg
	\|	addr

reg	::=	rsp
	\|	rbp
	\|	rbx
	\|	rcx
	\|	rdx
	\|	rsi
	\|	rdi
	\|	r8
	\|	r9
	\|	r10
	\|	r11
	\|	r12
	\|	r13
	\|	r14
	\|	r15

addr	::=	(rbp - dispoffset)
	\|	(rbp + dispoffset)

cmp	::=	neq?
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=

Conceptually, the allocator should now run the same algorithm as before. However, now the conflict graph and list of locals comes from the module’s info field. This means the allocator should will assign the same abstract location the same physical location, no matter which block its in.

Exercise 12: Redesign and extend the implementation of assign-registers to assign registers for each block. The source language is Conflict-lang v2 and the target language is Block-jump-live-lang.
Recall that you may not use the register rax, since it is used to patch instructions, or rbp, since it contains the frame pointer.

Now that we’re finished the register allocation, we can discard the annotations on jumps. Below, we design Block-jump-live-lang. The only change is that we remove the aloc ... list after a jump target.

p	::=	(module info b ... b)

b	::=	(define label info tail)

info	::=	((assignment ((aloc rloc) ...)) any ...)

tail	::=	(begin s ... tail)
	\|	(jump trg aloc ...)
	\|	(if (cmp aloc opand) tail tail)
	\|	(halt opand)

s	::=	(set! aloc triv)
	\|	(set! aloc (binop aloc opand))

binop	::=	*
	\|	+

triv	::=	opand
	\|	label

opand	::=	int64
	\|	aloc

trg	::=	label
	\|	aloc

rloc	::=	reg
	\|	addr

reg	::=	rsp
	\|	rbp
	\|	rbx
	\|	rcx
	\|	rdx
	\|	rsi
	\|	rdi
	\|	r8
	\|	r9
	\|	r10
	\|	r11
	\|	r12
	\|	r13
	\|	r14
	\|	r15

addr	::=	(rbp - dispoffset)
	\|	(rbp + dispoffset)

cmp	::=	neq?
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=

Exercise 13: Design and implement the function discard-call-live, which throws away the set of locations live across a jump, i.e., translates each (jump trg aloc ...) into (jump trg), discarding the undeadness annotations. The source language is Block-jump-live-lang and the target is Block-assigned-lang.

Exercise 14: Redesign and extend the implementation of replace-locations to support blocks and the control-flow primitives. The source language is Block-assigned-lang and the target language is Block-nested-lang.
You should use the assignment from the module’s info field.

11.9 Thinking like a compiler

We now, finally, have a source language in which we can write interesting programs. It is not like most programming languages that you’re likely to have written programs in. It has limitations designed into it because of what we know how to implement, and implement efficiently. This will always be the case, in everything language you ever write, and so it is useful to learn to think in a new layer of abstraction.

Exercise 15: Redesign and implement the function check-values-lang, which checks that its input is a valid Values-lang v2 program, or raises a helpful error if not. In this version, you must check that undefined variables are never used; it is now an error for a programmer to write a free variable in Values-lang v2.

Exercise 16: Design and implement a type checker for Values-lang v2 called type-check-values-lang. The input is a syntactically valid Values-lang v2 program, and the output is a Values-lang v2 program. However, it should perform that following checks and potentially raise an error:
If apply is called on a known function, then there are exactly as many arguments as declared by the function. Otherwise, raise a type error. A known function is a name that definitely refers to a function.
If apply is called on a name that refers to a number, then raise a type error.
If a binop is called with a known function as either argument, then raise a type error.
If a cmp is called with a known function as either argument, then raise a type error.

Exercise 17: You cannot implement a type checker for Values-lang v2 that can catch all type errors possible in the language. Briefly explain why not, and give three examples of type errors still possible in Values-lang v2. Explain what will happen when each of these examples is executed. Think through the program; don’t just try to compile and run them. Suggest how Values-lang v2 could be modified so it would be possible to catch each of your examples. Can you imagine any benefits for your compiler if you could guarantee that these examples were impossible?

Exercise 18: Design and implement the function interp-values-lang, an interpreter for Values-lang v2. It should receive a Values-lang v2 program as input and return a Racket number?. The interpreter may call type-check-values-lang on its input.

Exercise 19: Design and implement the function values-lang-fact, a Racket function that should take a Racket natural number as input and return a Values-lang v2 program. The result should be a program that calls the factorial function, implemented in Values-lang v2, on the number provided.
While factorial should be implemented in Values-lang v2, since Values-lang v2 cannot take input from the user, we simulate input by generating a new Values-lang v2 program with the input provided.
Make sure you use the function to test your interpreter against your compiler.

Examples:

> (interp-values-lang
   (values-lang-fact 1))
1
> (interp-values-lang
   (values-lang-fact 5))
120
> (interp-values-lang
   (values-lang-fact 6))
720

Exercise 20: Design and implement the function values-lang-fib, a Racket function that should take a Racket natural number as input and return a Values-lang v2 program. The result should be a program that calls the Fibonacci function, implemented in Values-lang v2, on the number provided.
While the Fibonacci function should be implemented in Values-lang v2, since Values-lang v2 cannot take input from the user, we simulate input by generating a new Values-lang v2 program with the input provided.
Make sure you use the function to test your interpreter against and your compiler.

Exercise 21: Last week, you were asked to compared two versions of your compiler: one with assign-homes against one with assign-homes-opt. Now, you can express programs with loops and indirection. Try the experiment again. What do you find?

← prev up next →

1	CPSC 411 Syllabus
2	Course Calendar
3	The Compiler Design Recipe
4	Organizing Racket Code
5	Compiler 0: x64 to elf64
6	Compiler 1: Paren-x64 to x64
7	Compiler 2: Values-lang to x64
8	Compiler 3: Register Allocation (Original)
9	Compiler 3: Register Allocation (Corrected)
10	Compiler 4: Adding Control Flow (Original)
11	Compiler 4: Adding Control Flow (Corrected)
12	Assignment 5: The CPSC411 Compiler Recovery and Reinvestment Assignment of 2020.
13	Compiler 6: Calling Conventions
14	Compiler 7: Immediate Data Types
15	Compiler 8: Structured Data Types
16	Compiler 9: First-class Functions
17	Compiler 10: Recursive Data and Syntactic Sugar
18	Assignment 11: The CPSC411 Compiler Encore

11.1	Assignment Summary
11.2	Language Diagram
11.3	Related Reading
11.4	Preface: What’s wrong with Values-lang
11.5	Exposing Control-Flow Primitives
11.6	Blocks: Abstracting labeled instructions
11.7	Abstracting Low-level Control Flow
11.8	Register Allocation
11.9	Thinking like a compiler