13 Compiler 6: Calling Conventions

7.5.0.17

13 Compiler 6: Calling Conventions

13.1 Assignment Summary

The goal of this assignment is to introduce calling conventions, so that we can call functions that return values to non-tail contexts, and make indirect function calls, which need not reference a function by name. To do this, we will expose physical locations in some intermediate languages that previously had machine-independent descriptions.

This assignment is due Friday, March 6, 2020 at 11:59pm.

You can use the reference solution here, but if you hammer it it might go down and not come back up. https://www.students.cs.ubc.ca/~cs-411/2019w2/a6-interrogator.cgi

Learning Objectives

Students should be able to identify the advantages of calling conventions.
Students should be able to identify the disadvantages of calling conventions.
Students should be able to implement a compiler for a language that supports general function calls by imposing a calling convention for functions.
Students should be able to differentiate between a frame and the stack.
Students should be able to describe the purpose of a frame.
Students should be able to describe the purpose of a stack of frames.
Students should be able to construct a representation of a frame for a non-tail call relative to the caller.
Students should be able to construct a representation of a frame for a tail call relative to the caller.
Students should be able to explain how proper tail calls are implemented and why they do not use additional stack space compared to non-tail calls.

Assignment Checklist

Completely new passes

pre-assign-frame-variables, a compiler from Conflict-lang v6 to Pre-framed-lang v6.
assign-frames, a compiler from Pre-framed-lang v6 to Framed-block-lang v6
assign-frame-variables, a compiler from Spilled-lang v6 to Block-jump-live-lang v6.
implement-fvars, a compiler from Block-fvar-lang v6 to Block-nested-lang v6.

Major modifications to passes

select-instructions, a compiler from Values-unique-lang v6 to Block-lang v6.
Redesign and extend the implementation of undead-analysis from Compiler 3: Register Allocation (Corrected), a compiler from Block-locals-lang v6 to Undead-block-lang v6.
expose-basic-blocks, a compiler from Block-nested-lang v6 to Block-asm-lang v6.

Minor modifications to passes

uniquify, a compiler from Values-lang v6 to Values-unique-lang v6.
uncover-locals, a compiler from Block-lang v6 to Block-locals-lang v6.
conflict-analysis, a compiler from Undead-block-lang v6 to Conflict-lang v6.
Redesign and extend the implementation of assign-registers, a compiler from Framed-block-lang v6 to Spilled-Lang v6.
Design and implement discard-call-live, a compiler from Block-jump-live-lang v6 to Block-assigned-lang v6.
Redesign and extend the implementation of replace-locations, a compiler from Block-assigned-lang v6 to Block-fvar-lang v6.
Design and implement flatten-program, a compiler from Block-asm-lang v6 to Paren-asm v6.
Redesign and extend the implementation of patch-instructions, a compiler from Paren-asm v6 to Paren-x64 v6.
Redesign and extend the implementation of generate-x64, a compiler from Paren-x64 v6 to x64.

13.2 Language Diagram

13.3 New Source Language

This week, we shift perspective from bottom-up to top-down. The motivation for our new features comes from the source language, and we don’t need to expose any more x64 primitives to express it. Instead, we need to figure out how to generate what we want quite early in the pipeline, and this will tell us how we must modify intermediate languages later in the pipeline.

We’re changing the version numbering system to reflect which assignment we’re on; all languages now have v<assignment-number> as their version.

The new source language, Values-lang v6, is defined below. We typeset the changes with respect to Values-lang v2. New expressions and non-terminals are marked with a superscript "+" and a blue background, while expressions and non-terminals removed from the previous language are marked with a superscript "–" and a red background.

p	::=	(module b ... e)

b	::=	(define x (lambda (x ...) e))

e	::=	tail
	\|	n
	\|	(let ([x n]) e)
	\|	(if (cmp v v) e e)

n	::=	v
	\|	(binop v v)
	\|	(apply v v ...)

tail	::=	n
	\|	(apply x v ...)

x	::=	name

v	::=	int64
	\|	x

binop	::=	*
	\|	+
	\|	-

cmp	::=	neq?
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=

Now we can have an (apply v v ...) bound in the right-hand side of a let. We also allow function calls to arbitrary values, and support indirect function calls: calls to functions whose name (label) is not statically apparent.

Note that we continue to assume well-typed programs. For example, while (apply 1 2) is valid syntactically, it is not well typed, so its run-time behavior is undefined.

This presents two challenges:

implementing calls to functions whose signature we do not know (as we saw in Compiler 4: Adding Control Flow (Corrected)), and
implementing returning from a function into the middle of a let.

We also add subtraction as a binop. This week, we start to need subtraction a lot more and it’s tiring to encode it when x64 supports it.

13.4 Calling Conventions Introduction

Introducing a calling convention solves the two problems mentioned above.

First, the calling convention gives us a pattern to set up a call to any function, even if we do not know its signature. Instead of assigning to the exact abstract locations it expects, we fix a set of physical locations and only set those. Both the caller and the callee agree that those locations are the only thing they need to know about. Every function call will first set those locations and then call a function. Every function definition will read from those locations on entry, and move the values into its own abstract locations. This way, no function needs to know about another’s abstract locations: they become truly local.

Second, the calling convention gives a pattern for ending, or returning from, a call to a function. It dictates in which physical location the return value is stored. We have already seen a version of this in our run-time system; now we extend the idea to arbitrary functions. The calling convention also dictates where a return address is stored, so we can jump back to the middle of a previous computation, or jump directly to the next computation. This requires that during a function call, we set a return address. The calling convention also requires that the run-time system sets up a default return address.

Design digression:
Strictly speaking, we don’t need to use physical locations for our calling convention. What we need is a set of global, shared locations, whose names are unique across the entire program, and any program that we might link with. The register allocator needs to assign all uses of these shared locations to the same physical locations, or we run into exactly the same problems we saw in Compiler 4: Adding Control Flow (Corrected).
The simplest implementation of these global shared location is to just use physical locations. Physical locations are automatically globally consistent and unique, and automatically known by all programs we might link with. If we allow those physical locations to be registers, we can generate very fast function calls by keeping memory out of the picture as much as possible. Unfortunately, using them requires that we expose physical locations through every layer of abstraction up to the point where we implement the calling convention. That’s every language from Paren-asm v2 to Block-lang. This makes all our abstractions only partial abstractions, and injects undefined behavior back into our intermediate languages.
We could use the stack to implement the calling convention. This is simpler to abstract, but slower.
We could try to create global abstract locations. However, then our register allocator will only work over whole programs; we won’t be able to support separate compilation, without implementing a separate, lower-level calling convention. This would create unnecessary indirection and be less efficient.
We could try to introduce abstract call-setup and return-from-call instructions, and leave it to the register allocator to implement these in terms of physical locations. This complicates the register allocator, an already complicated pass. It also mixes concerns: assigning abstract locations physical locations, and generating the instructions to implement call and return. At some level, the register allocator will have to perform the transformation we’re already suggesting: first setup physical locations for the call, create conflicts between those physical locations and abstract locations, then assign registers. At very least, we need to do this before conflict analysis, so we might as well do it before all the analyses... that is, do it in select-instructions. This leads us right back to the original design: expose physical locations up through Block-lang.

We’ll not be completely faithful to the x64 System V ABI, but will use a similar, but simplified, calling convention. The calling convention is defined by parameters in a6-compiler-lib.rkt.

Our calling convention passes the first n arguments as registers, using the set of registers defined in the parameter current-parameter-registers. The default value is '(rdi rsi rdx rcx r8 r9), which is defined by the x64 System V ABI to be where the first 6 arguments of any function are stored. To deal with an arbitrary number of arguments, we may need more than the n registers we have available for parameters. For the rest, we use fresh frame locations.

Unfortunately, the callee will not know the exact offset into the frame that we, the caller, might be using. We therefore need to agree a priori on which frame locations are used across the call. Since there might be arbitrary frame locations currently in use (for example, because the register allocator will be putting some abstract locations on the frame), there is no particular index that it’s safe to start from.

The solution is to introduce a stack of frames. Each function assumes that, prior to being called, it was given a brand-new frame from which it can start indexing at 0. The callee knows how many arguments it expected to receive, and can start counting from 0. The caller knows how many frame locations it has used, and can increment the frame base pointer beyond its own locations to a safe starting index. After the function call returns, the caller can decrement the frame base pointer by the same amount, restoring its own frame. These increment/decrement operations can be interpreted as "pushing" and "popping" a new frame on and off the stack (of frames).

We abstract the frame base pointer as a parameter stored in current-frame-base-pointer-register. The default value is 'rbp, the same register we’ve been using for our frame variables until now.

Prior to each call, the caller will store a return address in the parameter current-return-address-register. The default value is 'r15. Before a call, we store the return address in that register. When a call returns, the callee jumps to the return address stored in that register.

When returning from a call, we store the return value in current-return-value-register. The default value is 'rax, which we’ve been using as the final result of a program. With proper function calls, we will never quite know when the program is "complete". We must assume that we will always return to some prior call.

13.4.1 Designing a Calling Convention Translation

To implement calling conventions, we want to perform the following transformations.

When transforming a function `(lambda (,x_0 ... ,x_n-1 ,x_n ... ,x_n+k-1) ,body), we generate:

`(begin
   (set! ,tmp-rp ,(current-return-address-register))
   (set! ,x_0 ,r_0)
   ...
   (set! ,x_n-1 ,r_n-1)
   (set! ,x_n ,fv_0)
   ...
   (set! ,x_n+k-1 ,fv_k-1)
   ,body)

where:

fv_0 ... fv_k-1 are the first k frame variables, i.e., locations on this function’s frame. As displacement operands, they would be `(rbp + 0), ..., `(rbp + ,(* k 8)).
We begin using the addition form of displacement opands in this assignment for reasons that become apparent shortly.
r_0 ... r_n-1 are the n physical locations from current-parameter-registers.
tmp-rp is a fresh temporary storing the return address for this function.

The order of the set!s is not important for correctness, but we can influence optimizations. We want limit the live ranges of registers to help the register allocator makes better use of registers, so we should generate accesses to registers last to limit their live ranges. Loading the return address register last can also help limit its live range for simple functions.

When transforming a call in tail position, `(apply ,v ,v_0 ... ,v_n-1 ,v_n ... ,v_n+k-1), we generate:

`(begin
   (set! ,fv_0 ,v_n) ...
   (set! ,fv_k-1 ,v_n+k-1) ...
   (set! ,r_0 ,v_0) ...
   (set! ,r_n-1 ,v_n-1) ...
   (set! ,ra ,tmp-rp)
   (jump ,v ,fbp ,ra ,r_0 ... ,r_n-1 ,fv_0 ... ,fv_k-1))

where:

ra is the physical location storing the return address, current-return-address-register.
fbp is the physical location storing the frame base pointer, current-frame-base-pointer-register.
tmp-rp is the same fresh variable generated on entry to the current function.
all other meta-variables are the same as in the case for transforming lambda.

Note that form a tail call, we do not modify the stack for the caller. The caller is able to reuse the callee’s frame and return address, since there is no more computation left in this function.

Again, the order of the set!s is not important for correctness, but can enable better use of registers and optimizations.

When transforming a value v in tail position, we generate:
`(begin
(set! ,rv ,v)
(jump ,tmp-ra ,fbp ,rv))
where:
- rv is the physical location used to store the return value, current-return-value-register.
- fbp and tmp-rp are the same as in the case above for transforming a tail call.
We cannot, in general, figure out whether the program is returning a value to a caller or halting the program. This means halt will disappear from intermediate languages. Instead, the run-time system will setup the first return address, and the final result will be returned to the run-time system using the calling conventions.
Transforming a binary operation in tail position will be similar, but may involve several temporary abstract locations.

When transforming a call in non-tail position, `(apply ,v ,v_0 ... ,v_n-1 ,v_n ... ,v_n+k-1), we generate the following.

`(return-point ,rp-label
   (begin
     (set! ,nfv_0 ,v_n) ...
     (set! ,nfv_k-1 ,v_n+k-1) ...
     (set! ,r_0 ,v_0) ...
     (set! ,r_n-1 ,v_n-1) ...
     (set! ,ra ,rp-label)
     (jump ,v ,fbp ,ra ,r_0 ... ,r_n-1 ,nfv_0 ... ,nfv_k-1)))

where:

rp-label is a fresh label.
Intuitively, we need some way to introduce a label at this instruction, so when the call is complete we can return to where this apply happened. This return-point will be a new instruction in the target language that introduces a fresh label rp-label.
Note that our non-tail calls can only appear in a let-binding as `(let ([,x (apply ,v ,v_0 ... ,v_n-1 v_n ... v_n+k-1)]) ,e), so the above return-point will be followed by an instruction setting x to the (current-return-value-register), and the rest of the instructions for e.
`(begin
   ,translation-of-non-tail-call
   (set! ,x ,rv)
   ,translation-of-e)

nfv_0 ... nfv_k-1 should be the first k frame variables, i.e., locations on the callee’s frame. As displacement operands, they would be `(rbp - 0), ..., `(rbp - ,(* k 8)) to the callee.

However, we can’t make these frame variables yet. These variables are assigned in the caller, and the callee may not have the same frame base as the caller. Consider the following expression with a non-tail call using frame variables.

`(begin
   (set! fv0 1)
   (set! rax 42)
   (return-point L.rp.1
     (set! fv0 rax)
     (set! r15 L.rp.1)
     (jump L.label.1 rbp r15 fv0))
   (set! r9 fv0)
   (set! rax (+ rax r9)))

For this example, suppose we’ve exhausted our (current-parameter-registers), which we simulate by making it the empty set.

The frame location 'fv0 is live across the call; we use it after the call returns. However, we need to store 'rax in the callee’s 'fv0. If we try to use frame variables, or addrs, directly, we overwrite the original value of 'fv0. We first need to figure out how large the caller’s frame is, and then move 'rax to the index after the last frame location used by the caller.

Prior to undead analysis, we don’t know how much we need to increment the base frame pointer to save variables live after this non-tail call. This makes figuring the exact index for these new frame variables non-trivial. We need to know the base of the callee’s frame, while we only know the base of our frame, i.e., the caller’s frame.

But to do undead analysis, we really need both of those occurrences of 'fv0 to be unique. We’ve assumed all locations are uniquely identified by name, but the return point changes the interpretation of frame variables, so two occurrences of of the same frame variable are not necessarily the same. We want to avoid introducing such an ambiguous location.

Instead, we’ll generate something like this:

`(begin
   (set! fv0 1)
   (set! rax 42)
   (return-point L.rp.1
     (set! nfv.0 rax)
     (set! r15 L.rp.1)
     (jump L.label.1 rbp r15 nfv.0))
   (set! r9 fv0)
   (set! rax (+ rax r9)))

where nfv.0 must be assigned to frame location 0 in the callee’s frame. For now, we record these new-frame variables in an info field, and we’ll figure out how to assign frames later.

Note that this is only a problem in a non-tail call. In a tail call, we can reuse the caller’s frame as the callee’s frame, since we never return.

13.5 Designing New Intermediate Languages

The above desired implementation of calling conventions suggests a few changes to the target languages.

First, we need a return point instruction to describe returning from a function. Second, while not necessary, it will drastically simplify our pass if we had a better abstraction for working with frame locations directly, and leave some other pass to generate the particular displacement mode offsets. We will have other operations we must do on frames and frame variables, like computing the size of each frame, so a new abstraction will be useful. Third, halt is gone, and instead the run-time system must setup an initial function call. You won’t have to worry about this; the a6-compiler-lib.rkt does it already.

Design digression:
We might think we could pre-process Values-lang v6 to lift all non-tail calls into new functions and rewrite the program to have only tail calls. This would correspond to a CPS transformation and would be difficult to optimize correctly, particularly at this level of abstraction. Each non-tail call would introduce an entire function call setup to the "rest" of the body, as a continuation. Any parameters live across the non-tail call would need to be packaged and explicitly passed as arguments to the continuation. By creating a return point abstraction, later in the compiler when we have access to lower-level abstraction (basic blocks), we’ll instead be able to generate a single jump back to this return point, instead of a whole function call.

They also suggest it would be useful to have a syntactic distinction between terms in tail position, and those in non-tail position. We can expand the syntax of Values-lang v6 to reflect this syntax. We name this altered syntax Values-ctx-lang v6. It’s the same language as Values-lang v6, but expressed with apparent redundancy.

p	::=	(module b ... e)

b	::=	(define x (lambda (x ...) e))

e	::=	n
	\|	tail
	\|	(let ([x n]) e)
	\|	(if (cmp v v) e e)

n	::=	v
	\|	(binop v v)
	\|	(apply v v ...)

tail	::=	v
	\|	(binop v v)
	\|	(apply v v ...)

x	::=	name

v	::=	int64
	\|	x

binop	::=	*
	\|	+
	\|	-

cmp	::=	neq?
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=

In Values-ctx-lang v6, we duplicate the n non-terminal as tail. We use tail to indicate a tail position, and n to indicate a non-tail position. The advantage of this is that, if we follow the template for this language, we are guided to write two separate processors for apply: one in tail, and one in non-tail position. By designing our data to build in a distinction, we don’t have to think hard about recovering the distinction.

We define the new target language, Block-lang v6, below. We typeset the differences compared to Block-lang:

p	::=	(module b ... b)

b	::=	(define label info tail)

info	::=	((new-frames ((aloc ...) ...)))

tail	::=	(begin s ... tail)
	\|	(jump trg aloc ... loc ...)
	\|	(if (cmp aloc loc opand) tail tail)
	\|	(halt opand)

s	::=	(return-point label tail)
	\|	(set! aloc loc triv)
	\|	(set! aloc loc (binop aloc loc opand))

binop	::=	*
	\|	+
	\|	-

triv	::=	opand
	\|	label

opand	::=	int64
	\|	aloc
	\|	loc

trg	::=	label
	\|	aloc
	\|	loc

loc	::=	rloc
	\|	aloc

rloc	::=	reg
	\|	fvar

reg	::=	rax
	\|	rsp
	\|	rbp
	\|	rbx
	\|	rcx
	\|	rdx
	\|	rsi
	\|	rdi
	\|	r8
	\|	r9
	\|	r10
	\|	r11
	\|	r12
	\|	r13
	\|	r14
	\|	r15

cmp	::=	neq?
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=

(define (fvar? x)
  (and (symbol? x)
       (regexp-match-exact? #rx"^fv[0-9]+" (symbol->string x))))

(define (loc? loc)
  (or (aloc? loc) (rloc? loc)))

(define (rloc? loc)
  (or (register? loc) (fvar? loc)))

We further assume that a return-point cannot appear inside another return-point, i.e., there are no nested return-points. Our compiler can never generate this code, and there is no reason to support.

A frame variable is a symbol starting with the letters fv followed by a number. Frame variables are distinct from labels, alocs, and rlocs. The number represents the index into the frame for the current function. The a6-compiler-lib.rkt defines a few helpers for working with frame variables: fvar?, make-fvar, fvar->index, and fvar->addr.

We’re using the addition form of displacement mode operands to help us read the stack like an array.

Note that rax is allowed in this language, since halt has disappeared and our calling conventions require it.

Note that rbp is allowed in this language. We abstract which register implement the frame base pointer into a parameter, current-frame-base-pointer-register. While syntactically allowed, directly manipulating current-frame-base-pointer-register when not pushing or popping a new frame is undefined behavior, since the compiler expects to manage the stack.

To implement fvars later, we requires that current-frame-base-pointer-register is assigned only by incrementing or decrementing it by an integer literal. Other uses current-frame-base-pointer-register are invalid programs. Later passes will assume this in order to compute frame variable locations.

The info field records all the new frames created in the block, and will be used later to push new frames on to the stack, and assign new-frame variables to frame locations. The new-frame variables should be in order. Each new-frame variable must only appear in one list in the new-frames field. Recall that alocs are unique, and the new-frames field represents newly defined alocs. It would not make sense for the same aloc to appear in two frames.

All later intermediate languages will need to be adapted to expose the necessary lower-level features to make this new Block-lang v6 possible.

13.6 Implementing Calling Convention

Before we get started with implementing calling conventions, we want to uniquify names.

Values-unique-lang v6 is defined below.

p	::=	(module b ... e)

b	::=	(define label x (lambda (aloc x ...) e))

e	::=	n
	\|	tail
	\|	(let ([x aloc n]) e)
	\|	(if (cmp v v) e e)

n	::=	v
	\|	(binop v v)
	\|	(apply v v ...)

tail	::=	v
	\|	(binop v v)
	\|	(apply v v ...)

x	::=	name

v	::=	int64
	\|	label
	\|	aloc
	\|	x

binop	::=	*
	\|	+
	\|	-

cmp	::=	neq?
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=

Exercise 1: Redesign and extend the implementation of uniquify. The source language is Values-lang v6 and the target language is Values-unique-lang v6.

Exercise 2: Redesign and extend the implementation of select-instructions. The source language is Values-unique-lang v6 and the target language is Block-lang v6.
To handle apply, you should implement the calling conventions described in Designing a Calling Convention Translation. You must have proper handling of tail calls and non-tail calls.
Note that you can reinterpret Values-unique-lang v6 to impose the same syntactic distinction as in Values-ctx-lang v6.

Examples:

> (pretty-display
   (select-instructions
    '(module
       (define L.swap.1
         (lambda (x.1 y.2)
           (if (< y.2 x.1)
               x.1
               (apply L.swap.1 y.2 x.1))))
       (apply L.swap.1 1 2))))
(module
  (define L.main.1
    ((new-frames ()))
    (begin
      (set! ra.1 r15)
      (set! rsi 2)
      (set! rdi 1)
      (set! r15 ra.1)
      (jump L.swap.1 rbp r15 rsi rdi)))
  (define L.swap.1
    ((new-frames ()))
    (begin
      (set! ra.2 r15)
      (set! x.1 rdi)
      (set! y.2 rsi)
      (if (< y.2 x.1)
        (begin (set! rax x.1) (jump ra.2 rbp rax))
        (begin
          (set! rsi x.1)
          (set! rdi y.2)
          (set! r15 ra.2)
          (jump L.swap.1 rbp r15 rsi rdi))))))
> (parameterize ([current-parameter-registers '()])
    (pretty-display
     (select-instructions
      '(module
         (define L.swap.1
           (lambda (x.1 y.2)
             (if (< y.2 x.1)
                 x.1
                 (apply L.swap.1 y.2 x.1))))
         (apply L.swap.1 1 2)))))
(module
  (define L.main.2
    ((new-frames ()))
    (begin
      (set! ra.3 r15)
      (set! fv1 2)
      (set! fv0 1)
      (set! r15 ra.3)
      (jump L.swap.1 rbp r15 fv0 fv1)))
  (define L.swap.1
    ((new-frames ()))
    (begin
      (set! ra.4 r15)
      (set! x.1 fv0)
      (set! y.2 fv1)
      (if (< y.2 x.1)
        (begin (set! rax x.1) (jump ra.4 rbp rax))
        (begin
          (set! fv1 x.1)
          (set! fv0 y.2)
          (set! r15 ra.4)
          (jump L.swap.1 rbp r15 fv0 fv1))))))
> (pretty-display
   (select-instructions
    '(module
       (define L.swap.1
         (lambda (x.1 y.2)
           (if (< y.2 x.1)
               x.1
               (let ([z.3 (apply L.swap.1 y.2 x.1)])
                 z.3))))
       (apply L.swap.1 1 2))))
(module
  (define L.main.3
    ((new-frames ()))
    (begin
      (set! ra.5 r15)
      (set! rsi 2)
      (set! rdi 1)
      (set! r15 ra.5)
      (jump L.swap.1 rbp r15 rsi rdi)))
  (define L.swap.1
    ((new-frames ()))
    (begin
      (set! ra.6 r15)
      (set! x.1 rdi)
      (set! y.2 rsi)
      (if (< y.2 x.1)
        (begin (set! rax x.1) (jump ra.6 rbp rax))
        (begin
          (return-point L.rp.4
            (begin
              (set! rsi x.1)
              (set! rdi y.2)
              (set! r15 L.rp.4)
              (jump L.swap.1 rbp r15 rsi rdi)))
          (set! z.3 rax)
          (set! rax z.3)
          (jump ra.6 rbp rax))))))
> (parameterize ([current-parameter-registers '()])
    (pretty-display
     (select-instructions
      '(module
         (define L.swap.1
           (lambda (x.1 y.2)
             (if (< y.2 x.1)
                 x.1
                 (let ([z.3 (apply L.swap.1 y.2 x.1)])
                   z.3))))
         (apply L.swap.1 1 2)))))
(module
  (define L.main.5
    ((new-frames ()))
    (begin
      (set! ra.7 r15)
      (set! fv1 2)
      (set! fv0 1)
      (set! r15 ra.7)
      (jump L.swap.1 rbp r15 fv0 fv1)))
  (define L.swap.1
    ((new-frames ((nfv.9 nfv.10))))
    (begin
      (set! ra.8 r15)
      (set! x.1 fv0)
      (set! y.2 fv1)
      (if (< y.2 x.1)
        (begin (set! rax x.1) (jump ra.8 rbp rax))
        (begin
          (return-point L.rp.6
            (begin
              (set! nfv.10 x.1)
              (set! nfv.9 y.2)
              (set! r15 L.rp.6)
              (jump L.swap.1 rbp r15 nfv.9 nfv.10)))
          (set! z.3 rax)
          (set! rax z.3)
          (jump ra.8 rbp rax))))))

Next we extend the uncover-locals. Below we define Block-locals-lang v6. The only change from Block-lang v6 is the addition of the locals info field.

info

::=

((new-frames ((aloc ...) ...))

(locals (aloc ...)))

Exercise 3: Redesign and extend the implementation of uncover-locals. The source language is Block-lang v6 and the target language is Block-locals-lang v6.
The new-frame variables should be listed in the locals set for the enclosing block.
It will help if you interpret the locals set as the set of unassigned variables, and remove variables from the set as they are assigned.

Undead-block-lang v6 adds undead-out as an info field. We’re requiring the use of undead-out sets from now on. It also adds an info field call-undead. This is the set of all locations that are live after any non-tail call in a block. We discuss it in more detail shortly.

info

::=

((new-frames ((aloc ...) ...))

(locals (aloc ...))

(undead-out undead-set-tree)

(call-undead (aloc-or-fvar ...)))

(define (undead-set? x)
  (and (list? x)
       (andmap loc? x)
       (= (set-count x) (length x))))

(define (undead-set-tree? ust)
  (match ust
    ; for an instruction
    [(? undead-set?) #t]
    ; for a return point
    [(list (? undead-set?) (? undead-set-tree?))]
    ; for an if
    [(list (? undead-set?) (? undead-set-tree?) (? undead-set-tree?)) #t]
    ; for a begin
    [`(,(? undead-set-tree?) ...) #t]
    [else #f]))

Note that to handle the return-point instruction, which includes a nested tail, the undead-set-tree data structure is updated. The undead-set-tree corresponding to a begin can have as elements either undead-sets, for set! instructions, or undead-set-trees for return-point instructions.

Undead-set-tree is one of:

- Undead-set

- (list Undead-set Undead-set-tree)

- (list Undead-set Undead-set-tree Undead-set-tree)

- (listof Undead-set-tree)

WARNING: datatype is non-canonical since Undead-set-tree can be an

Undead-set, so third and fourth case, and the second and fourth case can

overlap.

An Undead-set-tree is meant to be traversed simultaneously with an

Undead-block-lang/tail, so this ambiguity is not a problem.

interp. a tree of Undead-sets.

The structure of the tree mirrors the structure of a Block-locals-lang tail.

There are three kinds of sub-trees:

(1) a set! node is simply an undead set;

(2) a return-point node is a list whose first element is the undead-set

representing the undead-out, of the return-point (the locations undead after

the call), and whose second element is the Undead-set-tree of the nested

tail.

(3) an if node has an Undead-set for the predicate and two sub-trees for the

branches.

(4) a begin node is a list of Undead-sets-trees, each corresponding to an

instruction in a begin tail.

Exercise 4: Redesign and extend the implementation of undead-analysis. The source language is Block-locals-lang v6 and the target language is Undead-block-lang v6.
It should return undead-out sets now. To make this clear, the info field has been renamed. This is necessary to support the frame allocation for non-tail calls.
The analysis should also record undead registers and frame variables, not just abstract locations.
Note that return-point implicitly assigns the register current-return-value-register.
The pass should also record in the info field call-undead every abstract location or frame variable that is in the undead-out set of a return point. These must be allocated to the frame in the next pass, and it will save us a traversal if we stash them in the info field.
The simplest way to get the call-undead locations is to use a single mutable variable that is local to the helper function that processes bs. Since the helper for instructions will need access to it, essentially all the helpers must be locally defined in the helper for b.

Examples:

> (pretty-display
   ((compose
     undead-analysis
     uncover-locals
     select-instructions)
    '(module
       (define L.swap.1
         (lambda (x.1 y.2)
           (if (< y.2 x.1)
               x.1
               (let ([z.3 (apply L.swap.1 y.2 x.1)])
                 z.3))))
       (apply L.swap.1 1 2))))
(module
  (define L.main.7
    ((new-frames ())
     (locals (ra.11))
     (undead-out
      ((ra.11 rbp)
       (ra.11 rsi rbp)
       (ra.11 rdi rsi rbp)
       (rdi rsi r15 rbp)
       (rdi rsi r15 rbp)))
     (call-undead ()))
    (begin
      (set! ra.11 r15)
      (set! rsi 2)
      (set! rdi 1)
      (set! r15 ra.11)
      (jump L.swap.1 rbp r15 rsi rdi)))
  (define L.swap.1
    ((new-frames ())
     (locals (ra.12 y.2 z.3 x.1))
     (undead-out
      ((rdi rsi ra.12 rbp)
       (rsi x.1 ra.12 rbp)
       (y.2 x.1 ra.12 rbp)
       ((y.2 x.1 ra.12 rbp)
        ((ra.12 rax rbp) (rax rbp))
        (((rax ra.12 rbp)
          ((y.2 rsi rbp) (rdi rsi rbp) (rdi rsi r15 rbp) (rdi rsi r15 rbp)))
         (z.3 ra.12 rbp)
         (ra.12 rax rbp)
         (rax rbp)))))
     (call-undead (ra.12)))
    (begin
      (set! ra.12 r15)
      (set! x.1 rdi)
      (set! y.2 rsi)
      (if (< y.2 x.1)
        (begin (set! rax x.1) (jump ra.12 rbp rax))
        (begin
          (return-point L.rp.8
            (begin
              (set! rsi x.1)
              (set! rdi y.2)
              (set! r15 L.rp.8)
              (jump L.swap.1 rbp r15 rsi rdi)))
          (set! z.3 rax)
          (set! rax z.3)
          (jump ra.12 rbp rax))))))
> (parameterize ([current-parameter-registers '()])
    (pretty-display
     ((compose
       undead-analysis
       uncover-locals
       select-instructions)
      '(module
         (define L.swap.1
           (lambda (x.1 y.2)
             (if (< y.2 x.1)
                 x.1
                 (let ([z.3 (apply L.swap.1 y.2 x.1)])
                   z.3))))
         (apply L.swap.1 1 2)))))
(module
  (define L.main.9
    ((new-frames ())
     (locals (ra.13))
     (undead-out
      ((ra.13 rbp)
       (ra.13 fv1 rbp)
       (ra.13 fv1 fv0 rbp)
       (fv1 fv0 r15 rbp)
       (fv1 fv0 r15 rbp)))
     (call-undead ()))
    (begin
      (set! ra.13 r15)
      (set! fv1 2)
      (set! fv0 1)
      (set! r15 ra.13)
      (jump L.swap.1 rbp r15 fv0 fv1)))
  (define L.swap.1
    ((new-frames ((nfv.15 nfv.16)))
     (locals (ra.14 y.2 nfv.16 nfv.15 z.3 x.1))
     (undead-out
      ((fv0 fv1 ra.14 rbp)
       (fv1 x.1 ra.14 rbp)
       (y.2 x.1 ra.14 rbp)
       ((y.2 x.1 ra.14 rbp)
        ((ra.14 rax rbp) (rax rbp))
        (((rax ra.14 rbp)
          ((y.2 nfv.16 rbp)
           (nfv.16 nfv.15 rbp)
           (nfv.16 nfv.15 r15 rbp)
           (nfv.16 nfv.15 r15 rbp)))
         (z.3 ra.14 rbp)
         (ra.14 rax rbp)
         (rax rbp)))))
     (call-undead (ra.14)))
    (begin
      (set! ra.14 r15)
      (set! x.1 fv0)
      (set! y.2 fv1)
      (if (< y.2 x.1)
        (begin (set! rax x.1) (jump ra.14 rbp rax))
        (begin
          (return-point L.rp.10
            (begin
              (set! nfv.16 x.1)
              (set! nfv.15 y.2)
              (set! r15 L.rp.10)
              (jump L.swap.1 rbp r15 nfv.15 nfv.16)))
          (set! z.3 rax)
          (set! rax z.3)
          (jump ra.14 rbp rax))))))

The following example shows the output on an intermediate representation of non-tail recursive factorial, compiled with current-parameter-registers set to '() to force the compiler to generate frame variables and test edge cases.

Example:

> (pretty-display
   (undead-analysis
    '(module
       (define L.main.5
         ((new-frames ()) (locals (ra.12)))
         (begin
           (set! ra.12 r15)
           (set! fv0 5)
           (set! r15 ra.12)
           (jump L.fact.4 rbp r15 fv0)))
       (define L.fact.4
         ((new-frames (nfv.16))
          (locals (ra.13 x.9 tmp.14 tmp.15 new-n.10 nfv.16 factn-1.11 tmp.17)))
         (begin
           (set! x.9 fv0)
           (set! ra.13 r15)
           (if (= x.9 0)
               (begin (set! rax 1) (jump ra.13 rbp rax))
               (begin
                 (set! tmp.14 -1)
                 (set! tmp.15 (+ x.9 tmp.14))
                 (set! new-n.10 tmp.15)
                 (return-point
                     L.rp.6
                   (begin
                     (set! nfv.16 new-n.10)
                     (set! r15 L.rp.6)
                     (jump L.fact.4 rbp r15 nfv.16)))
                 (set! factn-1.11 rax)
                 (set! tmp.17 (* x.9 factn-1.11))
                 (set! rax tmp.17)
                 (jump ra.13 rbp rax))))))))
(module
  (define L.main.5
    ((new-frames ())
     (locals (ra.12))
     (undead-out ((ra.12 rbp) (ra.12 fv0 rbp) (fv0 r15 rbp) (fv0 r15 rbp)))
     (call-undead ()))
    (begin
      (set! ra.12 r15)
      (set! fv0 5)
      (set! r15 ra.12)
      (jump L.fact.4 rbp r15 fv0)))
  (define L.fact.4
    ((new-frames (nfv.16))
     (locals (ra.13 x.9 tmp.14 tmp.15 new-n.10 nfv.16 factn-1.11 tmp.17))
     (undead-out
      ((r15 x.9 rbp)
       (x.9 ra.13 rbp)
       ((x.9 ra.13 rbp)
        ((ra.13 rax rbp) (rax rbp))
        ((tmp.14 x.9 ra.13 rbp)
         (tmp.15 x.9 ra.13 rbp)
         (new-n.10 x.9 ra.13 rbp)
         ((rax x.9 ra.13 rbp) ((nfv.16 rbp) (nfv.16 r15 rbp) (nfv.16 r15 rbp)))
         (factn-1.11 x.9 ra.13 rbp)
         (tmp.17 ra.13 rbp)
         (ra.13 rax rbp)
         (rax rbp)))))
     (call-undead (x.9 ra.13)))
    (begin
      (set! x.9 fv0)
      (set! ra.13 r15)
      (if (= x.9 0)
        (begin (set! rax 1) (jump ra.13 rbp rax))
        (begin
          (set! tmp.14 -1)
          (set! tmp.15 (+ x.9 tmp.14))
          (set! new-n.10 tmp.15)
          (return-point L.rp.6
            (begin
              (set! nfv.16 new-n.10)
              (set! r15 L.rp.6)
              (jump L.fact.4 rbp r15 nfv.16)))
          (set! factn-1.11 rax)
          (set! tmp.17 (* x.9 factn-1.11))
          (set! rax tmp.17)
          (jump ra.13 rbp rax))))))

13.7 Assigning Frames

After undead analysis, we have enough information to compute the size of frames.

The size of a frame n (in slots) for a given non-tail call is one more than the maximum of:

the number of locations in the undead-out set for the non-tail call, or
the index of the largest frame location in the undead-out set for the non-tail call.

The frame for the call must save all location live across the call, since the caller might overwrite any register. Since we’re allowing physical locations in our source language, we could have a frame variable live after a call, and must preserve up to that index.

We can model this as follows, although our implementation will be simpler as we discuss. Prior to the call, we push all locations live across the all onto the frame, then increment the base frame pointer. After the call, we’ll decrement the base frame pointer, restoring the caller’s frame, and load all locations live after the call from the frame.

In practice, calling conventions distinguish between two sets of registers: callee-saved and caller-saved, to allow some registers to be live across a call. We ignore this for simplicity and assume all registers are callee-saved.

Intuitively, we want to transform `(return-point ,rp ,tail) into:

`(begin
   (set! ,nfv_0 ,x_0)
   ...
   (set! ,nfv_n-1 ,x_n-1)

   (set! ,fbp (+ ,fbp ,nb))
   (return-point ,rp ,tail)
   (set! ,fbp (- ,fbp ,nb))

   (set! ,x_0 ,nfv_0)
   ...
   (set! ,x_n-1 ,nfv_n-1))

where:

nb is the number of bytes required to save n slots on the frame, i.e., (* n word-size-bytes).
fbp is the value of the parameter current-frame-base-pointer-register.
x_0, ... x_n are the locations in the undead-out set for the non-tail call.
nfv_0, ... nfv_n-1 are n free frame variables.

Unfortunately, there are two problems with this desired transformation.

The first is we don’t know which frame locations are free. Since frame variables appear directly in the source program for this pass, we must do conflict analysis to figure this out. To assign frame to each non-tail call, we’ll need to do a conflict analysis looking for frame variables in particular.

This is simple to solve. We run conflict-analysis, but also collect conflicts between abstract locations and real locations.

The second is less simple. We want to avoid producing new set!s. First, they will invalidate the undead analysis we’ve just performed. Second, some or all the moves might be unnecessary. We don’t know whether those variables 'x_0 ... 'x_n-1 need to be in registers, so it would be potentially more efficient to assign them to frame locations in the first place and leave some other pass to move them into registers when necessary.

To solve this, instead of producing new moves, we will assign each location that is live across a call to frame location. A new pass, assign-frames will do the work of updating the frame base pointer, and produce a partial assignment of abstract locations to frame variables. The register allocator will start from this assignment, and patch-instructions will move locations into a temporary registers as required.

The language Conflict-lang v6 is the same as Undead-block-lang v6 but with an addition info field recording the conflicts.

info

::=

((new-frames ((aloc ...) ...))

(locals (aloc ...))

(undead-out (undead-set-tree? ...))

(conflicts ((aloc (loc ...)) ...))

(call-undead (aloc-or-fvar ...)))

Exercise 5: Redesign and extend the implementation of conflict-analysis. The source language is Undead-block-lang v6 and the target language is Conflict-lang v6.
Note that rax is assigned by a non-tail call. The interpretation of the conflict graph will be somewhat more difficult than in prior assignments. It might contain conflicts for physical locations, which will never matter since we don’t try to physical locations.
Note that 'rbp and 'rax will likely end up in conflict with everything, even though they’ve been removed from the current-assignable-registers set.

If we our frame allocation was more clever, we would need to adjust the conflict analysis to make all caller saved registers in conflict with a non-tail call. However, we just assign all call-live variables to the frame, we don’t need to do very much for non-tail calls.

Example:

> (parameterize ([current-parameter-registers '()])
    (pretty-display
     ((compose
       conflict-analysis
       undead-analysis
       uncover-locals
       select-instructions)
      '(module
         (define L.swap.1
           (lambda (x.1 y.2)
             (if (< y.2 x.1)
                 x.1
                 (let ([z.3 (apply L.swap.1 y.2 x.1)])
                   z.3))))
         (apply L.swap.1 1 2)))))
(module
  (define L.main.11
    ((new-frames ())
     (locals (ra.17))
     (undead-out
      ((ra.17 rbp)
       (ra.17 fv1 rbp)
       (ra.17 fv1 fv0 rbp)
       (fv1 fv0 r15 rbp)
       (fv1 fv0 r15 rbp)))
     (call-undead ())
     (conflicts
      ((ra.17 (fv0 fv1 rbp))
       (rbp (r15 fv0 fv1 ra.17))
       (fv1 (r15 fv0 ra.17 rbp))
       (fv0 (r15 ra.17 fv1 rbp))
       (r15 (fv1 fv0 rbp)))))
    (begin
      (set! ra.17 r15)
      (set! fv1 2)
      (set! fv0 1)
      (set! r15 ra.17)
      (jump L.swap.1 rbp r15 fv0 fv1)))
  (define L.swap.1
    ((new-frames ((nfv.19 nfv.20)))
     (locals (ra.18 y.2 nfv.20 nfv.19 z.3 x.1))
     (undead-out
      ((fv0 fv1 ra.18 rbp)
       (fv1 x.1 ra.18 rbp)
       (y.2 x.1 ra.18 rbp)
       ((y.2 x.1 ra.18 rbp)
        ((ra.18 rax rbp) (rax rbp))
        (((rax ra.18 rbp)
          ((y.2 nfv.20 rbp)
           (nfv.20 nfv.19 rbp)
           (nfv.20 nfv.19 r15 rbp)
           (nfv.20 nfv.19 r15 rbp)))
         (z.3 ra.18 rbp)
         (ra.18 rax rbp)
         (rax rbp)))))
     (call-undead (ra.18))
     (conflicts
      ((rax (rbp ra.18))
       (ra.18 (y.2 x.1 fv0 fv1 rbp z.3 rax))
       (rbp (y.2 x.1 ra.18 z.3 r15 nfv.19 nfv.20 rax))
       (nfv.20 (r15 nfv.19 y.2 rbp))
       (y.2 (x.1 ra.18 rbp nfv.20))
       (nfv.19 (r15 nfv.20 rbp))
       (r15 (nfv.20 nfv.19 rbp))
       (z.3 (ra.18 rbp))
       (fv1 (x.1 ra.18))
       (fv0 (ra.18))
       (x.1 (y.2 fv1 ra.18 rbp)))))
    (begin
      (set! ra.18 r15)
      (set! x.1 fv0)
      (set! y.2 fv1)
      (if (< y.2 x.1)
        (begin (set! rax x.1) (jump ra.18 rbp rax))
        (begin
          (return-point L.rp.12
            (begin
              (set! nfv.20 x.1)
              (set! nfv.19 y.2)
              (set! r15 L.rp.12)
              (jump L.swap.1 rbp r15 nfv.19 nfv.20)))
          (set! z.3 rax)
          (set! rax z.3)
          (jump ra.18 rbp rax))))))

Pre-framed-lang v6 includes the assignment info field, and should preserve the frame conflicts, which might be useful for the register allocator.

info

::=

((new-frames ((aloc ...) ...))

(locals (aloc ...))

(undead-out (undead-set-tree? ...))

(conflicts ((aloc (aloc-or-fvar ...)) ...))

(call-undead (aloc-or-fvar ...))

(assignment ((aloc fvar) ...)))

Before setting up the frame for callee in each non-tail call, we assign all the locations that are undead across the call. This pass is similar to the register allocation algorithm, but drastically simpler since every abstract location to be assigned is low-degree.

Like assign-registers, the core algorithm is a straight-forward recursion over the call-undead set, and produces an assignment.

If the input set is empty, return the default assignment.
Choose a variable, x, from the input set of variables.
Recur with x removed from the input set and the conflict graph. The recursive call should return an assignment for all the remaining variables.
Select a compatible frame variable for x.
A variable x is compatible with a frame location fvar_i if it is not directly in conflict with fvar_i, and it is not in conflict with a variable y that has been assigned to fvar_i.
An easy way to find a compatible frame variable is to find the set of frame variables to which x cannot be assigned. Then, starting from fv0, assign the first frame variable that is not in the incompatible set.
You might want to use make-fvar from a6-compiler-lib.rkt.
Finally, add the assignment for the x to the result of the recursive call.

Since many frame variables will be assigned prior to register allocation, you will need to modify assign-registers to use a similar algorithm for spilling, instead of a naive algorithm that starts spilling at frame location 0.

Exercise 6: Design and implement pre-assign-frame-variables. The source language is Conflict-lang v6 and the target language is Pre-framed-lang v6.
pre-assign-frame-variables should assign each of the call-undead abstract locations to frame variables following the algorithm described above. If carefully designed, most of this code can be reused later when we modify assign-registers.
The default assignment for this pass is the empty assignment, since nothing has been assigned yet.
You should remove the assigned variables from the locals set, to allow later passes to assume the locals set are all unassigned.

Example:

> (parameterize ([current-parameter-registers '()])
    (pretty-display
     ((compose
       pre-assign-frame-variables
       conflict-analysis
       undead-analysis
       uncover-locals
       select-instructions)
      '(module
         (define L.swap.1
           (lambda (x.1 y.2)
             (if (< y.2 x.1)
                 x.1
                 (let ([z.3 (apply L.swap.1 y.2 x.1)])
                   z.3))))
         (apply L.swap.1 1 2)))))
(module
  (define L.main.13
    ((new-frames ())
     (locals (ra.21))
     (undead-out
      ((ra.21 rbp)
       (ra.21 fv1 rbp)
       (ra.21 fv1 fv0 rbp)
       (fv1 fv0 r15 rbp)
       (fv1 fv0 r15 rbp)))
     (call-undead ())
     (conflicts
      ((ra.21 (fv0 fv1 rbp))
       (rbp (r15 fv0 fv1 ra.21))
       (fv1 (r15 fv0 ra.21 rbp))
       (fv0 (r15 ra.21 fv1 rbp))
       (r15 (fv1 fv0 rbp))))
     (assignment ()))
    (begin
      (set! ra.21 r15)
      (set! fv1 2)
      (set! fv0 1)
      (set! r15 ra.21)
      (jump L.swap.1 rbp r15 fv0 fv1)))
  (define L.swap.1
    ((new-frames ((nfv.23 nfv.24)))
     (locals (x.1 z.3 nfv.23 nfv.24 y.2))
     (undead-out
      ((fv0 fv1 ra.22 rbp)
       (fv1 x.1 ra.22 rbp)
       (y.2 x.1 ra.22 rbp)
       ((y.2 x.1 ra.22 rbp)
        ((ra.22 rax rbp) (rax rbp))
        (((rax ra.22 rbp)
          ((y.2 nfv.24 rbp)
           (nfv.24 nfv.23 rbp)
           (nfv.24 nfv.23 r15 rbp)
           (nfv.24 nfv.23 r15 rbp)))
         (z.3 ra.22 rbp)
         (ra.22 rax rbp)
         (rax rbp)))))
     (call-undead (ra.22))
     (conflicts
      ((rax (rbp ra.22))
       (ra.22 (y.2 x.1 fv0 fv1 rbp z.3 rax))
       (rbp (y.2 x.1 ra.22 z.3 r15 nfv.23 nfv.24 rax))
       (nfv.24 (r15 nfv.23 y.2 rbp))
       (y.2 (x.1 ra.22 rbp nfv.24))
       (nfv.23 (r15 nfv.24 rbp))
       (r15 (nfv.24 nfv.23 rbp))
       (z.3 (ra.22 rbp))
       (fv1 (x.1 ra.22))
       (fv0 (ra.22))
       (x.1 (y.2 fv1 ra.22 rbp))))
     (assignment ((ra.22 fv2))))
    (begin
      (set! ra.22 r15)
      (set! x.1 fv0)
      (set! y.2 fv1)
      (if (< y.2 x.1)
        (begin (set! rax x.1) (jump ra.22 rbp rax))
        (begin
          (return-point L.rp.14
            (begin
              (set! nfv.24 x.1)
              (set! nfv.23 y.2)
              (set! r15 L.rp.14)
              (jump L.swap.1 rbp r15 nfv.23 nfv.24)))
          (set! z.3 rax)
          (set! rax z.3)
          (jump ra.22 rbp rax))))))

Now we can allocate frames for each non-tail call. For each block, we compute the size of the frames for all non-tail call, and adjust each. The size n is one plus the maximum of the index of all frame variables in the call-undead set for the enclosing block.

To adjust the callee’s frame, we transform `(return-point ,rp ,tail) into

`(begin
   (set! ,fbp (+ ,fbp ,nb))
   (return-point ,rp ,tail)
   (set! ,fbp (- ,fbp ,nb)))

where:

nb is the number of bytes required to save n slots on the frame, i.e., (* n word-size-bytes).
fbp is (current-frame-base-pointer-register).

We could allocate a different sized frame for each call, but this would require associating call-undead sets with each return point, and complicate the assignment of new-frame variables.

We also assign each of the abstract locations from the new-frame sets. In order, each new-frame variable is assigned to a frame variable starting with (make-fvar n). These assignments should be added to the assignment field the enclosing block.

The output will be Framed-block-lang v6, which only changes in its info fields compared to Pre-framed-lang v6.

info

::=

((new-frames ((aloc ...) ...))

(locals (aloc ...))

(undead-out (undead-set-tree? ...))

(conflicts ((aloc (aloc-or-fvar ...)) ...))

(call-undead (aloc-or-fvar ...))

(assignment ((aloc fvar) ...)))

Exercise 7: Design and implement assign-frames. The source language is Pre-framed-lang v6 and the target language is Framed-block-lang v6.
You should remove the assigned variables from the locals set, to allow later passes to assume the locals set are all unassigned.

Example:

> (parameterize ([current-parameter-registers '()])
    (pretty-display
     ((compose
       assign-frames
       pre-assign-frame-variables
       conflict-analysis
       undead-analysis
       uncover-locals
       select-instructions)
      '(module
         (define L.swap.1
           (lambda (x.1 y.2)
             (if (< y.2 x.1)
                 x.1
                 (let ([z.3 (apply L.swap.1 y.2 x.1)])
                   z.3))))
         (apply L.swap.1 1 2)))))
(module
  (define L.main.15
    ((locals (ra.25))
     (undead-out
      ((ra.25 rbp)
       (ra.25 fv1 rbp)
       (ra.25 fv1 fv0 rbp)
       (fv1 fv0 r15 rbp)
       (fv1 fv0 r15 rbp)))
     (conflicts
      ((ra.25 (fv0 fv1 rbp))
       (rbp (r15 fv0 fv1 ra.25))
       (fv1 (r15 fv0 ra.25 rbp))
       (fv0 (r15 ra.25 fv1 rbp))
       (r15 (fv1 fv0 rbp))))
     (assignment ()))
    (begin
      (set! ra.25 r15)
      (set! fv1 2)
      (set! fv0 1)
      (set! r15 ra.25)
      (jump L.swap.1 rbp r15 fv0 fv1)))
  (define L.swap.1
    ((locals (y.2 z.3 x.1))
     (undead-out
      ((fv0 fv1 ra.26 rbp)
       (fv1 x.1 ra.26 rbp)
       (y.2 x.1 ra.26 rbp)
       ((y.2 x.1 ra.26 rbp)
        ((ra.26 rax rbp) (rax rbp))
        (((rax ra.26 rbp)
          ((y.2 nfv.28 rbp)
           (nfv.28 nfv.27 rbp)
           (nfv.28 nfv.27 r15 rbp)
           (nfv.28 nfv.27 r15 rbp)))
         (z.3 ra.26 rbp)
         (ra.26 rax rbp)
         (rax rbp)))))
     (conflicts
      ((rax (rbp ra.26))
       (ra.26 (y.2 x.1 fv0 fv1 rbp z.3 rax))
       (rbp (y.2 x.1 ra.26 z.3 r15 nfv.27 nfv.28 rax))
       (nfv.28 (r15 nfv.27 y.2 rbp))
       (y.2 (x.1 ra.26 rbp nfv.28))
       (nfv.27 (r15 nfv.28 rbp))
       (r15 (nfv.28 nfv.27 rbp))
       (z.3 (ra.26 rbp))
       (fv1 (x.1 ra.26))
       (fv0 (ra.26))
       (x.1 (y.2 fv1 ra.26 rbp))))
     (assignment ((ra.26 fv2) (nfv.27 fv3) (nfv.28 fv4))))
    (begin
      (set! ra.26 r15)
      (set! x.1 fv0)
      (set! y.2 fv1)
      (if (< y.2 x.1)
        (begin (set! rax x.1) (jump ra.26 rbp rax))
        (begin
          (set! rbp (+ rbp 24))
          (return-point L.rp.16
            (begin
              (set! nfv.28 x.1)
              (set! nfv.27 y.2)
              (set! r15 L.rp.16)
              (jump L.swap.1 rbp r15 nfv.27 nfv.28)))
          (set! rbp (- rbp 24))
          (set! z.3 rax)
          (set! rax z.3)
          (jump ra.26 rbp rax))))))

13.8 Adjusting the Register Allocator

Frames are now implemented and all new-frame variables and variables live across a call are assigned to frame locations. We need to adjust our register allocator so that it does not try to spill variables into frame variables that are already taken.

To do this, we essentially remove spilling from assign-registers. Instead, it will produce a list of variables to spill, and a separate pass (which looks suspiciously like pre-assign-frame-variables) will handle spilling.

We must also remove an extra register from the current-assignable-registers set (which is now added to a6-compiler-lib.rkt). Since we now have non-tail calls and indirect calls, rax cannot be used both as a temporary register and as the return value. There are situations where we need a temporary register, but rax is live. Consider this expression:

`(begin
   (return-point L.rp.1
     ...)
   (set! factn-1.11 rax)
   (set! tmp.17 (* x.9 factn-1.11))
   (set! rax tmp.17)
   (jump ra.13 rbp rax))

The return address is stored in ra.13, but because it is call-undead, it is assigned to a frame location. patch-instructions will need to move the value to a register, since jmp in x64 cannot jump to a memory address. However, it cannot use rax as a temporary register, since it would be in conflict with anything assigned directly before the jump. Suddenly, patch-instructions has a register allocation problem!

The simplest solution is to reserve extra registers, forbidding the register allocator from using them, and using these temporary registers, as we did before. In fact, we need two temporary registers now, due to an edge case introduced in Compiler 4: Adding Control Flow (Corrected) when we changed Paren-asm v2. Consider an instruction like (set! (rbp + 0) (+ (rbp + 8) 2147483648)). We define two temporary registers in the parameter current-patch-instructions-registers in a6-compiler-lib.rkt, which defaults to '(r10 r11).

Design digression:
A more general solution that will produce better code in general is to run the undead and conflict analysis, register allocator, spilling, and patch instructions in a giant fixed point iteration. Patch instruction would generate "unspillable" variables instead of using temporary registers, directing the register allocator that these must be assigned registers. Since it generates new instructions, the undead and conflict analyses must be run again. Then register allocation can run, possibly producing spills. The spills are allocated, then instructions are patched, possibly producing new unspillables. This algorithm terminates with a small number of registers available because the generated unspillables have very small live ranges.

Spilled-lang v6, defined below, only changed by allowing assignments to arbitrary rlocs.

info

::=

((locals (aloc ...))

(undead-out (undead-set-tree? ...))

(conflicts ((aloc (aloc-or-fvar ...)) ...))

(assignment ((aloc fvar

rloc) ...)))

Exercise 8: Redesign and extend the implementation of assign-registers from Compiler 3: Register Allocation (Corrected). The source language is Framed-block-lang v6 and the target language is Spilled-lang v6.
In the output, the locals set should include only spilled locations.
You may be able to reuse a lot of code from pre-assign-frame-variables, if abstracted properly. It might help to re-imagine the "pick a low-degree variable" progress in the allocation algorithm as a sorting problem. If the variables are sorted in degree order, then the core assignment algorithm doesn’t need to know about degrees—it always picks the first variable in the sorted list. See sort.
Be careful when computing which registers cannot be used; your conflict graph contains a mix of abstract locations and registers.
You may still find it useful to use a single mutable variable to collect spilled variables.
Note that unlike in Compiler 4: Adding Control Flow (Corrected), register allocation can and should be done per block. Calling conventions mean we no longer need to know about abstract locations across blocks.
You should use the parameter current-assignable-registers from a6-compiler-lib.rkt.

It’s possible to reuse the more general cross-block allocation strategy to optimize register use across blocks, but it requires that we distinguish between functions that "escape" (might be called outside of our unit of compilation), and those that don’t. Escaping functions must strictly adhere to calling conventions. We assume all functions escape.

The final change to the register allocator is to assign spilled variables to frame locations.

The new language, Block-jump-live-lang v6, is only a small change over Spilled-lang v6.

info

::=

((locals (aloc ...))

(undead-out (undead-set-tree? ...))

(conflicts ((aloc (aloc-or-fvar ...)) ...))

(call-undead (aloc-or-fvar ...))

(assignment ((aloc rloc) ...)))

After assigning frame variables, we can discard all the assorted info fields and keep only the assignment.

Exercise 9: Design and implement assign-frame-variables. The source language is Spilled-lang v6 and the target language is Block-jump-live-lang v6.
Be careful when computing which frame variables cannot be used; your conflict graph contains a mix of abstract locations and frame locations.

Block-assigned-lang v6

tail	::=	(begin s ... tail)
	\|	(jump trg loc ...)
	\|	(if (cmp loc opand) tail tail)

Exercise 10: Redesign and extend the implementation of discard-call-live. The source language is Block-jump-live-lang v6 and the target language is Block-assigned-lang v6.

Exercise 11: Redesign and extend the implementation of replace-locations. The source language is Block-assigned-lang v6 and the target language is Block-fvar-lang v6.

The entire definition of Block-fvar-lang v6 is given below, with changes from Block-assigned-lang v6.

p	::=	(module b ... b)

b	::=	(define label info tail)

info	::=	((assignment ((aloc rloc) ...)) any ...)

tail	::=	(begin s ... tail)
	\|	(jump trg)
	\|	(if (cmp aloc rloc opand) tail tail)

s	::=	(return-point label tail)
	\|	(set! aloc rloc triv)
	\|	(set! aloc rloc (binop aloc rloc opand))

binop	::=	*
	\|	+
	\|	-

triv	::=	opand
	\|	label

loc	::=	rloc
	\|	aloc

opand	::=	int64
	\|	aloc
	\|	rloc

trg	::=	label
	\|	aloc
	\|	rloc

rloc	::=	reg
	\|	fvar

reg	::=	rax
	\|	rsp
	\|	rbp
	\|	rbx
	\|	rcx
	\|	rdx
	\|	rsi
	\|	rdi
	\|	r8
	\|	r9
	\|	r10
	\|	r11
	\|	r12
	\|	r13
	\|	r14
	\|	r15

cmp	::=	neq?
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=

Challenge 1: Integrate the following optimizations into replace-locations.
Never generate an instruction of the form (set! rloc_1 rloc_1). You may include a new instruction, (nop), in all subsequent languages. This instruction should generate an empty string in x64.
Rewrite instructions of the form (set! rloc_1 (binop rloc_2 rloc_1)) to (set! rloc_1 (binop rloc_1 rloc_2)), where binop is commutative. This will help patch-instructions produce better code.

Example:

> (parameterize ([current-parameter-registers '()])
    (pretty-display
     ((compose
       replace-locations
       discard-call-live
       assign-frame-variables
       assign-registers
       assign-frames
       pre-assign-frame-variables
       conflict-analysis
       undead-analysis
       uncover-locals
       select-instructions)
      '(module
         (define L.swap.1
           (lambda (x.1 y.2)
             (if (< y.2 x.1)
                 x.1
                 (let ([z.3 (apply L.swap.1 y.2 x.1)])
                   z.3))))
         (apply L.swap.1 1 2)))))
(module
  (define L.main.17
    ()
    (begin (nop) (set! fv1 2) (set! fv0 1) (nop) (jump L.swap.1)))
  (define L.swap.1
    ()
    (begin
      (set! fv2 r15)
      (set! r15 fv0)
      (set! r14 fv1)
      (if (< r14 r15)
        (begin (set! rax r15) (jump fv2))
        (begin
          (set! rbp (+ rbp 24))
          (return-point L.rp.18
            (begin
              (set! fv4 r15)
              (set! fv3 r14)
              (set! r15 L.rp.18)
              (jump L.swap.1)))
          (set! rbp (- rbp 24))
          (set! r15 rax)
          (set! rax r15)
          (jump fv2))))))

13.9 Implementing New Abstractions

To accommodate non-tail calls, we introduced two new abstractions: return points and frame variables. We must now implement these abstractions.

First, we compile fvars to addrs. The compilation is non-trivial, as it must be aware of increments and decrements to the current-frame-base-pointer-register. On entry to a block, frame variables start indexing from the base of the frame (offset 0). So, fv3 corresponds to (rbp + 24) ((* 3 word-size-bytes)). After an increment operation, such as (set! rbp (+ rbp 24)), fv3 corresponds to (rbp + 0). After a decrement, such as (set! rbp (- rbp 24)) fv3 corresponds to (rbp + 24) again.

After implementing fvars, we’re back to a similar level of abstraction as Block-nested-lang.

Below we define Block-nested-lang v6, with changes compared to Block-nested-lang.

p	::=	(module b ... b)

b	::=	(define label info tail)

info	::=	(any ...)

tail	::=	(begin s ... tail)
	\|	(jump trg)
	\|	(if (cmp rloc opand) tail tail)

s	::=	(return-point label tail)
	\|	(set! rloc triv)
	\|	(set! rloc (binop rloc opand))

binop	::=	*
	\|	+
	\|	-

triv	::=	opand
	\|	label

opand	::=	int64
	\|	rloc

trg	::=	label
	\|	rloc

rloc	::=	reg
	\|	addr

reg	::=	rax
	\|	rsp
	\|	rbp
	\|	rbx
	\|	rcx
	\|	rdx
	\|	rsi
	\|	rdi
	\|	r8
	\|	r9
	\|	r10
	\|	r11
	\|	r12
	\|	r13
	\|	r14
	\|	r15

addr	::=	(rbp - dispoffset)
	\|	(rbp + dispoffset)
	\|	(fbp + dispoffset)

cmp	::=	neq?
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=

(define (fbp? r) (eq? r (current-frame-base-pointer-register)))

We remove one form of addr since we do not want to allow a called function to access parts of the caller’s stack.

Recall that you can assume that current-frame-base-pointer-register is only incremented or decremented by integer literal values, like those generated by assign-frames. Other assignments to current-frame-base-pointer-register are still invalid programs.

You shouldn’t assume that the current-frame-base-pointer-register is rbp, and should use the parameter instead.

Exercise 12: Design and implement the function implement-fvars. The source language is Block-fvar-lang v6 and the target language is Block-nested-lang v6.
You should use an accumulator representing the current offset from the current block’s frame. You may want to use fvar->addr from a6-compiler-lib.rkt.

Example:

> (parameterize ([current-parameter-registers '()])
    (pretty-display
     ((compose
       implement-fvars
       replace-locations
       discard-call-live
       assign-frame-variables
       assign-registers
       assign-frames
       pre-assign-frame-variables
       conflict-analysis
       undead-analysis
       uncover-locals
       select-instructions)
      '(module
         (define L.swap.1
           (lambda (x.1 y.2)
             (if (< y.2 x.1)
                 x.1
                 (let ([z.3 (apply L.swap.1 y.2 x.1)])
                   z.3))))
         (apply L.swap.1 1 2)))))
(module
  (define L.main.19
    ()
    (begin (nop) (set! (rbp + 8) 2) (set! (rbp + 0) 1) (nop) (jump L.swap.1)))
  (define L.swap.1
    ()
    (begin
      (set! (rbp + 16) r15)
      (set! r15 (rbp + 0))
      (set! r14 (rbp + 8))
      (if (< r14 r15)
        (begin (set! rax r15) (jump (rbp + 16)))
        (begin
          (set! rbp (+ rbp 24))
          (return-point L.rp.20
            (begin
              (set! (rbp + 8) r15)
              (set! (rbp + 0) r14)
              (set! r15 L.rp.20)
              (jump L.swap.1)))
          (set! rbp (- rbp 24))
          (set! r15 rax)
          (set! rax r15)
          (jump (rbp + 16)))))))

To implement return points, we need to lift all the instructions following the return point in to a new block, and merge the tail implementing the call into the begin of the caller. Essentially, we transform:

`(begin
   ,ss1 ...
   ,(return-point ,rp ,tail)
   ,ss2 ...)

into:

`(define ,rp () (begin ,ss2 ...))
`(begin
,ss1 ...
,tail)

The target language of the transformation is Block-asm-lang v6, defined below as a change over Block-nested-lang v6.

p	::=	(module b ... b)

b	::=	(define label info tail)

info	::=	(any ...)

tail	::=	(begin s ... tail)
	\|	(jump trg)
	\|	(if (cmp rloc opand) tail tail)
	\|	(if (cmp rloc opand) (jump label) (jump label))

s	::=	(return-point label tail)
	\|	(set! rloc triv)
	\|	(set! rloc (binop rloc opand))

binop	::=	*
	\|	+
	\|	-

triv	::=	opand
	\|	label

opand	::=	int64
	\|	rloc

trg	::=	label
	\|	rloc

rloc	::=	reg
	\|	addr

reg	::=	rax
	\|	rsp
	\|	rbp
	\|	rbx
	\|	rcx
	\|	rdx
	\|	rsi
	\|	rdi
	\|	r8
	\|	r9
	\|	r10
	\|	r11
	\|	r12
	\|	r13
	\|	r14
	\|	r15

addr	::=	(fbp + dispoffset)

cmp	::=	neq?
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=

We restrict the syntax of the jumps in the branches of an if statement. Going top down, we know these are the only possibilities and adding additional structure to the language means later passes can make additional assumptions.

Exercise 13: Redesign and implement the function expose-basic-blocks. The source language is Block-nested-lang v6 and the target language is Block-asm-lang v6.

Example:

> (parameterize ([current-parameter-registers '()])
    (pretty-display
     ((compose
       expose-basic-blocks
       implement-fvars
       replace-locations
       discard-call-live
       assign-frame-variables
       assign-registers
       assign-frames
       pre-assign-frame-variables
       conflict-analysis
       undead-analysis
       uncover-locals
       select-instructions)
      '(module
         (define L.swap.1
           (lambda (x.1 y.2)
             (if (< y.2 x.1)
                 x.1
                 (let ([z.3 (apply L.swap.1 y.2 x.1)])
                   z.3))))
         (apply L.swap.1 1 2)))))
(module
  (define L.main.21
    ()
    (begin (nop) (set! (rbp + 8) 2) (set! (rbp + 0) 1) (nop) (jump L.swap.1)))
  (define L.swap.1
    ()
    (begin
      (set! (rbp + 16) r15)
      (set! r15 (rbp + 0))
      (set! r14 (rbp + 8))
      (if (< r14 r15) (jump L.nest_t.23) (jump L.nest_f.24))))
  (define L.rp.22
    ()
    (begin
      (set! rbp (- rbp 24))
      (set! r15 rax)
      (set! rax r15)
      (jump (rbp + 16))))
  (define L.nest_t.23 () (begin (set! rax r15) (jump (rbp + 16))))
  (define L.nest_f.24
    ()
    (begin
      (set! rbp (+ rbp 24))
      (set! (rbp + 8) r15)
      (set! (rbp + 0) r14)
      (set! r15 L.rp.22)
      (jump L.swap.1))))

13.10 Filling in the details

We’ve now implemented all the hard parts of this assignment. You should redesign Paren-asm v6.

Note that x64 requires a label as the target of a conditional jump.

Exercise 14: Redesign and extend the implementation of the function flatten-program. The source is Block-asm-lang v6 and the target is Paren-asm v6.

Exercise 15: Redesign and extend the implementation of the function patch-instructions. The source is Paren-asm v6 and the target is Paren-x64 v6.
Remember that you should now use the parameter current-patch-instructions-registers, which defines two temporary registers. And consider edge cases carefully, such as (set! (rbp + 0) (+ (rbp + 8) 2147483648)).

Exercise 16: Redesign and extend the implementation of the function generate-x64. The source is Paren-x64 v6 and the target is sequence of x64 instructions, as a string.

Paren-x64 v6 only slightly changed from Paren-x64 v3, but reproduced below anyway.

p	::=	(begin s ...)

s	::=	(set! rloc triv)
	\|	(set! reg rloc)
	\|	(set! reg_1 (binop reg_1 int32))
	\|	(set! reg_1 (binop reg_1 rloc))
	\|	(define label s)
	\|	(jump trg)
	\|	(compare reg opand)
	\|	(jump-if cmp label)

trg	::=	reg
	\|	label

triv	::=	trg
	\|	int64

opand	::=	int64
	\|	reg

rloc	::=	reg
	\|	addr

reg	::=	rsp
	\|	rbp
	\|	rax
	\|	rbx
	\|	rcx
	\|	rdx
	\|	rsi
	\|	rdi
	\|	r8
	\|	r9
	\|	r10
	\|	r11
	\|	r12
	\|	r13
	\|	r14
	\|	r15

addr	::=	(rbp - dispoffset)
	\|	(rbp fbp + dispoffset)

binop	::=	*
	\|	+
	\|	-

cmp	::=	neq?
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=

(define (label? s)
(and (symbol? s)
(regexp-match-exact? #rx"L\\..+\\.[0-9]+" (symbol->string s))))

Some additional restrictions from NASM and x64, some of which we’ve discussed already.

A mov instruction to an addr can only move 32-bit integers. (set! (rbp + 0) 9223372036854775807) is invalid.
The only valid characters in labels are letters, numbers, _, $, #, @, ~, ., and ?. The only characters which may be used as the first character of an identifier are letters, . (with has a special meaning and shouldn’t be used), _ and ?.
Labels cannot be longer than 4095 characters.

← prev up next →

1	CPSC 411 Syllabus
2	Course Calendar
3	The Compiler Design Recipe
4	Organizing Racket Code
5	Compiler 0: x64 to elf64
6	Compiler 1: Paren-x64 to x64
7	Compiler 2: Values-lang to x64
8	Compiler 3: Register Allocation (Original)
9	Compiler 3: Register Allocation (Corrected)
10	Compiler 4: Adding Control Flow (Original)
11	Compiler 4: Adding Control Flow (Corrected)
12	Assignment 5: The CPSC411 Compiler Recovery and Reinvestment Assignment of 2020.
13	Compiler 6: Calling Conventions
14	Compiler 7: Immediate Data Types
15	Compiler 8: Structured Data Types
16	Compiler 9: First-class Functions
17	Compiler 10: Recursive Data and Syntactic Sugar
18	Assignment 11: The CPSC411 Compiler Encore

13.1	Assignment Summary
13.2	Language Diagram
13.3	New Source Language
13.4	Calling Conventions Introduction
13.5	Designing New Intermediate Languages
13.6	Implementing Calling Convention
13.7	Assigning Frames
13.8	Adjusting the Register Allocator
13.9	Implementing New Abstractions
13.10	Filling in the details