16 Compiler 9: First-class Functions

7.5.0.17

16 Compiler 9: First-class Functions

16.1 Assignment Summary

The goal of this assignment is to add first-class functions to our language. This extends the procedure data types to include the values of free variables at the procedure’s definition. We add lambda as a first-class value in the source language, which can create a procedure at any point in the program.

This assignment is due Friday, April 3, 2020 at 11:59pm.

You can use the reference solution here, but if you hammer it it might go down and not come back up. https://www.students.cs.ubc.ca/~cs-411/2019w2/a9-interrogator.cgi

Assignment Checklist

You should find a new repository in your https://github.students.cs.ubc.ca account named a9_<team ids> with a code skeleton and a support library. You should complete the assignment in that git repository and push it to the GitHub Students instance.

You should first merge your solution to Compiler 8: Structured Data Types with the starter code provided. The new starter code has the correct provides and includes a submodule to help you run your compiler on the command line if you want. The name of the skeleton is a9-skeleton.rkt to avoid accidentally overwriting your files, but your file in the Git repository should be named a9.rkt.

16.2 Language Diagram

16.3 Preface: What’s wrong with Exprs-Lang v8

Exprs-lang v8 gave us the ability to implement structured, heap-allocated data. The language is now as expressive as many general-purpose programming languages. You could start building large programs and standard libraries, with only a few small modifications to the run-time system to enable dynamic input and output.

However, it is missing a useful feature for software development: the ability to easily abstract over computations at any point via first-class functions. Many languages provide some version of this—Python, JavaScript, Ruby, Racket, Scheme, Java, and many more. They enable the programmer to create a suspended computation, and pass it around as a value. The procedure closes over the environment in which it was created, essentially creating an object with private fields.

This week, we’ll add first-class functions as values in Exprs-lang v9:

p	::=	(module b ... e)

b	::=	(define x (lambda (x ...) e))

e	::=	v
	\|	(apply e e ...)
	\|	(let ([x e]) e)
	\|	(if e e e)

v	::=	fixnum
	\|	prim-f
	\|	x
	\|	#t
	\|	#f
	\|	()
	\|	(void)
	\|	(error uint8)
	\|	ascii-char-literal
	\|	(lambda (x ...) e)

prim-f	::=	*
	\|	+
	\|	-
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=
	\|	fixnum?
	\|	boolean?
	\|	empty?
	\|	void?
	\|	ascii-char?
	\|	error?
	\|	not
	\|	pair?
	\|	procedure?
	\|	vector?
	\|	cons
	\|	car
	\|	cdr
	\|	make-vector
	\|	vector-length
	\|	vector-set!
	\|	vector-ref
	\|	procedure-arity

Now, lambda can appear in any expression. We can still define procedures at the top-level using define, although the semantics will change slightly.

This is a syntactically small change, but it has massive implications.

Every instance of lambda will compile to a procedure. The procedure now has three pieces of information: it’s arity, the label to its code, the computation it executes when invoked, and its environment, the values of the free variables used in the definition of the procedure. We compile each application of a procedure to dereference and apply the label of the procedure, but also to pass a reference to the procedure itself as a parameter. Essentially, the procedure is an object, and receives itself as an argument. Each "free variable" x is a field of that object, and are compiled to references to self.x.

We already have the low-level abstractions in place to deal with closures, so we design this assignment top-down.

16.4 Administrative Passes

Allowing procedures to be bound in two different ways is great for programmer convenience, but annoying for a compiler writer. Before we get to implementing procedures, we simplify and regularize how procedures appear in our language.

16.4.1 uniquify

As usual with uniquify, the only change is that all names x are replaced by abstract locations aloc.

Unlike normal, there are no labels. All of our functions are procedures, not merely code, and cannot easily be lifted to the top level, so it is now the job of a later pass to introduce labels.

Below we define Impure-Exprs-safe-lang v9. We typeset the changes with respect to Exprs-lang v9.

p	::=	(module b ... e)

b	::=	(define x aloc (lambda (x aloc ...) e))

e	::=	v
	\|	(apply e e ...)
	\|	(let ([x aloc e]) e)
	\|	(if e e e)

v	::=	fixnum
	\|	prim-f
	\|	x
	\|	#t
	\|	#f
	\|	()
	\|	(void)
	\|	(error uint8)
	\|	ascii-char-literal
	\|	(lambda (x aloc ...) e)

prim-f	::=	*
	\|	+
	\|	-
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=
	\|	fixnum?
	\|	boolean?
	\|	empty?
	\|	void?
	\|	ascii-char?
	\|	error?
	\|	not
	\|	pair?
	\|	procedure?
	\|	vector?
	\|	cons
	\|	car
	\|	cdr
	\|	make-vector
	\|	vector-length
	\|	vector-set!
	\|	vector-ref
	\|	procedure-arity

Exercise 1: Redesign and extend the implementation of uniquify. The source language is Exprs-lang v9 and the target language is Impure-Exprs-safe-lang v9.

16.4.2 define->letrec

Some procedures now appear in local expressions, and some appear defined at the top-level. This presents two problems. First, we have to look for procedures in two different places to transform them: that’s annoying. Second, our compiler later assumes that all data (as opposed to code) is locally defined—we have no way to define top-level, labeled data. Since procedures are data, we need to transform top-level bindings of procedures into local bindings, so the rest of the compiler will "just work".

To do this, we elaborate define into a local binding form letrec, which will be used to bind all procedures.

letrec, unlike let, supports multiple bindings in a single form, and each bound expression can refer to any variable in the set of bindings for the letrec. This is important to capture mutually-recursive functions, and has the same binding structure as our top-level defines.

Design digression:
A real language would impose additional semantics on define, such as allowing defined data to be exported and imported at module boundaries. This would require additional handling of define, and the ability to generate labeled data in the back-end of the compiler. We continue to ignore separate compilation and linking, so we treat define as syntactic sugar for letrec.

Below we define Just-Exprs-lang v9. We typeset the changes with respect to Impure-Exprs-safe-lang v9.

p	::=	(module b ... e)

b	::=	(define aloc (lambda (x ...) e))

e	::=	v
	\|	(apply e e ...)
	\|	(letrec ([aloc (lambda (aloc ...) e)] ...) e)
	\|	(let ([aloc e]) e)
	\|	(if e e e)

v	::=	fixnum
	\|	prim-f
	\|	aloc
	\|	#t
	\|	#f
	\|	()
	\|	(void)
	\|	(error uint8)
	\|	ascii-char-literal
	\|	(lambda (aloc ...) e)

prim-f	::=	*
	\|	+
	\|	-
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=
	\|	fixnum?
	\|	boolean?
	\|	empty?
	\|	void?
	\|	ascii-char?
	\|	error?
	\|	not
	\|	pair?
	\|	procedure?
	\|	vector?
	\|	cons
	\|	car
	\|	cdr
	\|	make-vector
	\|	vector-length
	\|	vector-set!
	\|	vector-ref
	\|	procedure-arity

Exercise 2: Design and implement define->letrec. The source language is Impure-Exprs-safe-lang v9 and the target language is Just-Exprs-lang v9.

16.4.3 optimize-direct-calls

Before we start compiling lambdas, we should try to get rid of them. Direct calls to lambdas, such as (apply (lambda (x) x) 1), are simple to rewrite to a let binding, such a (let ([x 1]) x). A human programmer may not write this kind of code much, but most programs are not written by humans—compilers write far more programs. This optimization will speed-up compile time and run time for such simple programs.

Challenge 1: Design and implement the function optimize-direct-calls. The source and target language are Just-Exprs-lang v9.

16.4.4 dox-lambdas

The source language supports anonymous procedures, that is, first-class procedure values that are not necessarily bound to names. For example, we can write the following in Racket, creating and using procedures without ever binding them to names in a letrec or let form.

Example:

> ((lambda (x f) (f x x)) 1 (lambda (x y) (+ x y)))
2

The equivalent in Exprs-lang v9 is:

(apply (lambda (x f) (apply f x x)) 1 (lambda (x y) (apply + x y)))

This is great for functional programmers, who value freedom, but bad for compilers who feel it is their job to keep track of everything so they can make good decisions.

Before we closure convert, we want to bind all procedures to names. This will simplify lifting code to the top-level and assigning labels later.

We transform each `(lambda (,alocs ...) ,e) into `(letrec ([,tmp (lambda (,alocs ...) ,e)]) ,tmp), where tmp is a fresh aloc.

We define Lam-opticon-lang v9, in which we know the name of every procedure. We typeset the changes with respect to Just-Exprs-lang v9.

p	::=	(module e)

e	::=	v
	\|	(apply e e ...)
	\|	(letrec ([aloc (lambda (aloc ...) e)] ...) e)
	\|	(let ([aloc e]) e)
	\|	(if e e e)

v	::=	fixnum
	\|	prim-f
	\|	aloc
	\|	#t
	\|	#f
	\|	()
	\|	(void)
	\|	(error uint8)
	\|	ascii-char-literal
	\|	(lambda (aloc ...) e)

prim-f	::=	*
	\|	+
	\|	-
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=
	\|	fixnum?
	\|	boolean?
	\|	empty?
	\|	void?
	\|	ascii-char?
	\|	error?
	\|	not
	\|	pair?
	\|	procedure?
	\|	vector?
	\|	cons
	\|	car
	\|	cdr
	\|	make-vector
	\|	vector-length
	\|	vector-set!
	\|	vector-ref
	\|	procedure-arity

Exercise 3: Design and implement dox-lambdas. The source language Just-Exprs-lang v9 and the target language is Lam-opticon-lang v9.

16.4.5 implement-safe-primops

Not much changes in implement-safe-primops. We need to adjust its language definition to remove define and support letrec. We change the pass a little since it generates procedures definitions, which should now be bound using letrec.

The target language of the pass, Safe-apply-lang v9, is defined below. We typeset the differences compared to Impure-Exprs-data-lang v8

p	::=	(module b ... e)

b	::=	(define label (lambda (aloc ...) e))

c	::=	(begin c ...)
	\|	(primop e ...)

e	::=	v
	\|	(primop e ...)
	\|	(apply e e ...)
	\|	(let ([aloc e]) e)
	\|	(letrec ([aloc (lambda (aloc ...) e)] ...) e)
	\|	(if e e e)
	\|	(begin c ... e)

v	::=	fixnum
	\|	aloc
	\|	label
	\|	#t
	\|	#f
	\|	()
	\|	(void)
	\|	(error uint8)
	\|	ascii-char-literal

primop	::=	unsafe-fx*
	\|	unsafe-fx+
	\|	unsafe-fx-
	\|	eq?
	\|	unsafe-fx<
	\|	unsafe-fx<=
	\|	unsafe-fx>
	\|	unsafe-fx>=
	\|	fixnum?
	\|	boolean?
	\|	empty?
	\|	void?
	\|	ascii-char?
	\|	error?
	\|	not
	\|	pair?
	\|	procedure?
	\|	vector?
	\|	cons
	\|	unsafe-car
	\|	unsafe-cdr
	\|	unsafe-make-vector
	\|	unsafe-vector-length
	\|	unsafe-vector-set!
	\|	unsafe-vector-ref
	\|	make-procedure
	\|	unsafe-procedure-arity
	\|	unsafe-procedure-label

Other than the difference with letrec and define, we also remove make-procedure and unsafe-procedure-label. These are introduced by a later pass that is responsible for implementing procedures safely. This means that apply can be safely applied to arbitrary data—a later pass will implement dynamic checking for application.

Exercise 4: Redesign and extend the implementation of implement-safe-primops. The source language is Lam-opticon-lang v9 and the target language is Safe-apply-lang v9.
You might reduce the changes to your code by reusing define->letrec.

16.5 Closure Conversion

The rest of our compiler expects procedures to be little more than labeled blocks of code.

If you completed the challenge exercise last week, implement-safe-apply, they’re only slightly more complicated. Procedures are essentially tagged pointers to a simple data structures that contain a label as the procedure code, and its arity for error checking. If you completed the challenge exercise last assignment, implement-safe-apply, our procedures this week are only slightly more complicated.

Unfortunately, now our procedures can contain references to free-variables in their lexical scope. This means we cannot simply lift procedure definitions to the top-level, stick on a label, and generate a procedure. Before we can generate procedures, we collect the free-variable information and create an explicit closure data structure.

We do this in two steps.

16.5.1 uncover-free

First, we uncover the free variables in each lambda. We add these as an annotation on the lambda, which the next pass will use to generate closures.

Below we define Lambda-free-lang v9. We typeset the differences compared to Safe-apply-lang v9. We elide the definitions of v and primop, which are unchanged.

p	::=	(module e)

c	::=	(begin c ...)
	\|	(primop e ...)

e	::=	v
	\|	(primop e ...)
	\|	(apply e e ...)
	\|	(let ([aloc e]) e)
	\|	(letrec ([aloc (lambda (aloc ...) (free (aloc ...)) e)] ...) e)
	\|	(if e e e)
	\|	(begin c ... e)

v	::=	_...

primop	::=	_...

To find the free abstract locations, we traverse the body of each lambda remembering any abstract locations that have been bound (by let, lambda, or letrec), and return the set of abstract locations that have been used but were not in the defined set. On entry to the (lambda (aloc ...) e), only the formal parameters aloc ... are considered bound, initially.

The only complicated case is for letrec. Even a variable bound in a letrec is considered free in the body of a lambda.

Example:

> (uncover-free
   `(module
      (letrec ([x.1 (lambda () (apply x.1))])
        x.1)))
'(module (letrec ((x.1 (lambda () (free (x.1)) (apply x.1)))) x.1))

However, the letrec does bind those variables, so they do not contribute to the free variable set for the context surrounding the letrec.

Example:

> (uncover-free
   `(module
      (letrec ([f.1 (lambda ()
                      (letrec ([x.1 (lambda () (apply x.1))])
                        x.1))])
        f.1)))
'(module
   (letrec ((f.1
             (lambda ()
               (free ())
               (letrec ((x.1 (lambda () (free (x.1)) (apply x.1)))) x.1))))
     f.1))

Exercise 5: Design and implement of uncover-free. The source language is Safe-apply-lang v9 and the target language is Lambda-free-lang v9.
You may find the map2 function from the a9-skeleton.rkt helpful, depending on how you write your code. You shouldn’t feel like you must use it, nor try to use it if it doesn’t occur to you that you want such a function.

16.5.2 convert-closures

Now, we convert closures. Strictly speaking, all the previous languages had closures—procedures that (implicitly) close over their lexical environment. Closure conversion makes these explicit as a new data type.

Below, we define Closure-lang v9. We typeset changes with respect to Lambda-free-lang v9.

p	::=	(module e)

c	::=	(begin c ...)
	\|	(primop e ...)

e	::=	v
	\|	(primop e ...)
	\|	(apply unsafe-apply e e ...)
	\|	(let ([aloc e] ...) e)
	\|	(letrec ([aloc label (lambda (aloc ...) (free (aloc ...)) e)] ...) e)
	\|	(cletrec ([aloc (make-closure label e ...)] ...) e)
	\|	(if e e e)
	\|	(begin c ... e)

v	::=	_...

primop	::=	_...
	\|	closure-ref
	\|	closure-apply

Closure conversion changes letrec to bind labels to procedure code. After this pass, the body of lambda will not contain any free variables, and will not be a procedure data type—it is just like a function from Values-lang v6.

To encode closures, we temporarily add a new data type for closures. We add a new form, cletrec, which only binds closures. Closures can, in general, have recursive self-references, so this is a variant of the letrec form. We also add a new primop form for dereferencing the value of lexical variables from the closure (closure-ref e e). The next pass implements closures using the procedure data type.

We assume that the cletrec form only ever appears as the body of a letrec form, but we do not make this explicit in the syntax for readability. This assumption is not necessary for correctness, but simplifies an optimization presented later as a challenge exercise.

We represent a closure essentially as a vector containing a label to the code and the values of each free variable in its environment. Closures support two operations. First, you can apply a closure with closure-apply, which essentially extracts the label from the closure and calls the procedure at that label. Second, you can dereference an environment variable from the closure with closure-ref, extracting the value of a closure at an index.

Because we want to implement safe procedure application, we add a third field to the closure: it’s arity, the number of arguments expected by the code of the closure.

The closure interface is described below:

(make-closure e_label e_arity e_i ...)
Creates a closure whose code is at label e_label, which expects e_arity number of arguments, and has the values e_i in its environment.
(closure-apply e_c es ...)
Safely apply the closure e_c, invoking its code, to the arguments es ....
(closure-ref e_c e_i)
Deference the value at index e_i in the environment of the closure e_c. Since this dereference is only generated by the compiler, it always succeeds and performs no dynamic checks. The environment is 0-indexed.

There are two parts to closure conversion:

Transform each lambda. Each lambda is transformed to take a new formal parameter, which is its closure, and to be bound to a label in its enclosing letrec.

The abstract location to which the the lambda was previously bound must now be bound to a closure. The closure has n + 1 fields, where n is the number of free variables in the lambda. The first field is the label to which the closure’s code is bound. The remaining fields are references to the lexical variables in the environment of the closure.

In essence, we transform

`(letrec ([,x (lambda (,xs ...) (free (,ys ...)) ,es)] ...)
   ,e)
=>
`(letrec ([,l (lambda (,c ,xs ...)
                (let ([,ys (closure-ref ,c ,i)] ...)
                  ,es))] ...)
    (cletrec ([,x (make-closure ,l ,(length xs) ,ys ...)] ...)
      ,e))

where l is a fresh label and c is a fresh abstract location. To support the translation, we allow let to bind multiple abstract locations at once. We add the number of arguments as a field in the closure to implement safe application later.

Transform each apply. Every procedure now takes an extra argument, its closure, so we have to expand each apply. The essence of the translation is:
`(apply ,e ,es ...)
=>
`(let ([,x ,e])
(closure-apply ,e ,e ,es ...))
We use closure-apply to apply the (label of the) closure to the closure itself and its usual arguments. We need to bind the operator to avoid duplicating code.
However, if the operator is already a aloc, we should instead avoid introducing an extra let:
`(apply ,aloc ,es ...)
=>
`(closure-apply ,aloc ,aloc ,es ...)
This also simplifies the optimization optimize-known-calls.
We add unsafe-apply to the language to enable optimizing closures, an important optimization in functional languages. This unsafe-apply directly applies a label to arguments, without performing any checks. closure-apply will get translated into the safe, dynamically checked apply.

Exercise 6: Design and implement convert-closures. The source language is Lambda-free-lang v9 and the target language is Closure-lang v9.
You may find the map-n function from the a9-skeleton.rkt helpful, depending on how you write your code. You shouldn’t feel like you must use it, nor try to use it if it doesn’t occur to you that you want such a function.

16.5.3 Challenge: optimize-known-call

Closures can cause a lot of indirection, and thus performance penalty, in a functional language. We essentially transform all call into indirect calls. This causes an extra memory dereference and indirect jump, both of which can have performance penalties.

Many calls, particularly to named functions, can be optimized to direct calls. We essentially perform the following transformation on all calls where we can determine the label of the operator:

`(closure-apply ,e ,es ...)
=>
`(unsafe-apply ,l ,es ...)

where l is known to be the label of the closure e. Because e is already an aloc, we can safely discard it; we do not need to force evaluation to preserve any side-effects.

Because this transforms into an unsafe-apply, we need to inline the arity check that implement-safe-apply would insert. Something like:

`(closure-apply ,e ,es ...)
=>
`(if (eq? (procedure-arity e) ,(sub1 (length es)))
(unsafe-apply ,l ,es)
,bad-arity-error)

Remember the the procedure-arity will be one more than the closure arguments, since the closure takes itself as a hidden argument.

We could further optimize this, since we should know the arity statically when this optimization would apply.

We do this by recognizing letrec and cletrec as a single composite form:

`(letrec ([,label_l ,lam])
(cletrec ([,aloc_c (make-closure ,label_c ,es ...)])
,e))

All references uses of (closure-apply ,aloc_c ,es ...) in e and lam can be transformed into (unsafe-apply ,label_c ,es ...). We have to recognize these as a single composite form to optimize recursive calls inside lam, which will benefit the most from the optimization. This relies on the name aloc_c being bound in two places: once to define the closure, and once when dereferenced in a recursive closure.

Challenge 2: Design and implement the function optimize-known-calls. The source and target language are Closure-lang v9.

16.5.4 hoist-lambdas

Now that all lambdas are closed and labeled, we can lift them to top-level defines.

We define Hoisted-lang v9 below. We typeset differences with respect to Closure-lang v9.

p	::=	(module b ... e)

b	::=	(define label (lambda (aloc ...) e))

c	::=	(begin c ...)
	\|	(primop e ...)

e	::=	v
	\|	(primop e ...)
	\|	(unsafe-apply e e ...)
	\|	(let ([aloc e] ...) e)
	\|	(letrec ([label (lambda (aloc ...) e)] ...) e)
	\|	(cletrec ([aloc (make-closure label e e ...)] ...) e)
	\|	(if e e e)
	\|	(begin c ... e)

v	::=	_...

primop	::=	_...

The only difference is the letrec is remove and define blocks are re-added.

Exercise 7: Design and implement the function hoist-lambdas. The source language is Closure-lang v9 and the target language is Hoisted-lang v9.

16.5.5 implement-closures

Now we implement closures as procedures.

Our procedure object is going to be extended compared to last assignment. Previously, we only had a label and an arity as part of a procedure. All procedures were defined at the top-level and could not have lexical variables.

Now, a procedure will look like an extension of a vector. It will have at least three fields: the label, the arity, and a size. The size indicates how large the environment of the procedure is. The environment will be uninitialized after make-procedure, and instead the environment will be initialized manually using unsafe-procedure-set!, similar to vector initialization. As before, unsafe-procedure-label and unsafe-procedure-arity dereference the label and arity of a procedure. However, we now also have unsafe-procedure-ref which dereferences a value from the procedure’s environment, given an index into the environment, similar to unsafe-vector-ref. We still have a safe version of apply, procedure-apply.

The language Proc-apply-lang v9 is defined below. The changes are typeset with respect to Hoisted-lang v9.

p	::=	(module b ... e)

b	::=	(define label (lambda (aloc ...) e))

c	::=	(begin c ...)
	\|	(primop e ...)

e	::=	v
	\|	(primop e ...)
	\|	(unsafe-apply e e ...)
	\|	(let ([aloc e] ...) e)
	\|	(cletrec ([aloc (make-closure label e ...)] ...) e)
	\|	(if e e e)
	\|	(begin c ... e)

v	::=	_...

primop	::=	_...
	\|	make-procedure
	\|	unsafe-procedure-ref
	\|	unsafe-procedure-set!
	\|	procedure-apply
	\|	closure-ref
	\|	closure-apply

For reference, the procedure interface is described below:

(make-procedure e_label e_arity e_size)
Creates a procedure whose label is e_label, which expects e_arity number of arguments, and has an environment of size e_size.
make-procedure does not perform any error checking; it must be applied to a label and two fixnum ptrs. This is safe because no user can access make-procedure directly. Only the compiler generates uses of this operator, and surely our compiler uses it correctly.
(unsafe-procedure-ref e_proc e_index)
Return the value at index e_index in the environment of the procedure e_proc.
As with all unsafe operators, this does not perform any checking.
(unsafe-procedure-set! e_proc e_index e_val)
Set the value at index e_index in the environment of the procedure e_proc to be e_val.
(procedure-apply e_proc es ...)
Safely apply the procedure e_proc to its arguments es. Some later pass will implement this primop to check that e_proc is a procedure that expects exactly (length es) arguments.

To transform closures into procedures, we do a three simple translations:

Transform make-closure

`(cletrec ([,aloc (make-closure ,label ,arity ,es ...)] ...)
   ,e)
=>
`(let ([,aloc (make-procedure ,label ,arity ,n)] ...)
   (begin
     (unsafe-procedure-set! ,aloc 0 ,(list-ref es 0))
     ...
     (unsafe-procedure-set! ,aloc ,n ,(list-ref es n))
     ,e))

where n is (length es), the number of values in the environment.

Transform closure-ref.
`(closure-ref ,c ,i)
=>
`(unsafe-procedure-ref ,c ,i)
We can use unsafe-procedure-ref since we generate all uses of closure-ref.
Transform closure-apply.
`(closure-apply ,c ,es ...)
=>
`(procedure-apply ,c ,es ...)
procedure-apply must still be dynamically checked, since procedure applications came from user programs.

Exercise 8: Design and implement the function implement-closures. The source language is Hoisted-lang v9 and the target language is Proc-apply-lang v9.

16.5.6 sequentialize-let

Next we simplify the language once more by sequentializing let, so each let binds exactly one abstract location. It’s not terribly important when we do this, but the rest of our compiler assumes unary let, so we might as well do it now.

We define Unary-let-lang v9, typesetting the differences with respect to Proc-apply-lang v9.

p	::=	(module b ... e)

b	::=	(define label (lambda (aloc ...) e))

c	::=	(begin c ...)
	\|	(primop e ...)

e	::=	v
	\|	(primop e ...)
	\|	(unsafe-apply e e ...)
	\|	(let ([aloc e] ...) e)
	\|	(if e e e)
	\|	(begin c ... e)

v	::=	_...

primop	::=	_...
	\|	make-procedure
	\|	unsafe-procedure-ref
	\|	unsafe-procedure-set!
	\|	procedure-apply

The translation is straightforward.

Exercise 9: Design and implement the function sequentialize-let. The source language is Proc-apply-lang v9 and the target language is Unary-let-lang v9.

16.5.7 implement-safe-apply

Now we implement procedure-apply in terms of unsafe-apply unsafe-procedure-label, and unsafe-procedure-arity.

Below we define Exprs-data-lang v9. We typeset changes with respect to Unary-let-lang v9.

p	::=	(module b ... e)

b	::=	(define label (lambda (aloc ...) e))

c	::=	(begin c ...)
	\|	(primop e ...)

e	::=	v
	\|	(primop e ...)
	\|	(unsafe-apply e e ...)
	\|	(apply e e ...)
	\|	(let ([aloc e]) e)
	\|	(if e e e)
	\|	(begin c ... e)

v	::=	_...

primop	::=	unsafe-fx*
	\|	unsafe-fx+
	\|	unsafe-fx-
	\|	eq?
	\|	unsafe-fx<
	\|	unsafe-fx<=
	\|	unsafe-fx>
	\|	unsafe-fx>=
	\|	fixnum?
	\|	boolean?
	\|	empty?
	\|	void?
	\|	ascii-char?
	\|	error?
	\|	not
	\|	pair?
	\|	procedure?
	\|	vector?
	\|	cons
	\|	unsafe-car
	\|	unsafe-cdr
	\|	unsafe-make-vector
	\|	unsafe-vector-length
	\|	unsafe-vector-set!
	\|	unsafe-vector-ref
	\|	make-procedure
	\|	unsafe-procedure-arity
	\|	unsafe-procedure-label
	\|	unsafe-procedure-ref
	\|	unsafe-procedure-set!
	\|	procedure-apply

We implement procedure-apply in terms of procedure?, unsafe-procedure-label, and unsafe-procedure-arity. The essence of the transformation is:

`(procedure-apply ,e ,es ...)
=>
`(if (procedure? ,e)
     (if (eq? (unsafe-procedure-arity ,e) ,(sub1 (length es)))
         (apply (unsafe-procedure-label ,e) ,es ...)
         ,bad-arity-error)
     ,bad-proc-error)

We subtract one from the length of the parameter list to account for the closure parameter. We could equivalently add one to the procedure arity, but since the length of the parameter list is known at compile-time, this saves us at least one run-time instruction.

Design digression:
This pass assumes the closure argument must always be there. This design prevents us from optimizing away the closure parameter easily, a slight annoyance that is due to your professor missing this design mistake before releasing the assignment. A better design would place this pass before closure conversion, exposing unsafe-apply earlier. Then we would have access to the correct arity count without the closure argument, and closure conversion modify the unsafe-apply form without disrupting procedure-arity.
We do not want to capture the closure parameter in the procedure-arity value, since this value is exposed to a user, and we do not want the user to know about the internal closure paraemter. This internal parameter is not part of their code, so we should not burden them with it.

We change the name of unsafe-apply to apply, since that’s what the rest of the compiler uses.

Note that we cannot simply define procedure-apply as a procedure, like we did with other safe wrappers, since it must count its arguments, and we must support a variable number of arguments to the procedure.

Exercise 10: Design and implement the function implement-safe-apply. The source language is Unary-let-lang v9 and the target language is Exprs-data-lang v9.

16.5.8 specify-representation

Finally, we need to modify the procedure data type slightly. It was intentionally designed to be similar to the vector data type.

We define Impure-Exprs-bits-lang v9 below. There are no differences with respect to Impure-Exprs-bits-lang v8.

p	::=	(module b ... e)

b	::=	(define label (lambda (aloc ...) e))

c	::=	(begin c ...)
	\|	(mset! e e e)

e	::=	v
	\|	(let ([aloc e]) e)
	\|	(if (cmp e e) e e)
	\|	(begin c ... e)
	\|	(binop e e)
	\|	(apply e e ...)
	\|	(alloc e)
	\|	(mref e e)

v	::=	int64
	\|	label
	\|	aloc

binop	::=	*
	\|	+
	\|	-
	\|	bitwise-and
	\|	bitwise-ior
	\|	bitwise-xor
	\|	arithmetic-shift-right

cmp	::=	neq?
	\|	eq?
	\|	<
	\|	<=
	\|	>
	\|	>=

When implementing make-procedure, you may assume the size of the environment is a fixnum constant.

Exercise 11: Redesign and extend the implementation of the function specify-representation. The source language is Exprs-data-lang v9 and the target language is Impure-Exprs-bits-lang v9.

No other passes should need to be updated.

← prev up next →

1	CPSC 411 Syllabus
2	Course Calendar
3	The Compiler Design Recipe
4	Organizing Racket Code
5	Compiler 0: x64 to elf64
6	Compiler 1: Paren-x64 to x64
7	Compiler 2: Values-lang to x64
8	Compiler 3: Register Allocation (Original)
9	Compiler 3: Register Allocation (Corrected)
10	Compiler 4: Adding Control Flow (Original)
11	Compiler 4: Adding Control Flow (Corrected)
12	Assignment 5: The CPSC411 Compiler Recovery and Reinvestment Assignment of 2020.
13	Compiler 6: Calling Conventions
14	Compiler 7: Immediate Data Types
15	Compiler 8: Structured Data Types
16	Compiler 9: First-class Functions
17	Compiler 10: Recursive Data and Syntactic Sugar
18	Assignment 11: The CPSC411 Compiler Encore

16.1	Assignment Summary
16.2	Language Diagram
16.3	Preface: What’s wrong with Exprs-Lang v8
16.4	Administrative Passes
16.5	Closure Conversion