17 Compiler 10: Recursive Data and Syntactic Sugar
17.1 Assignment Summary
The goal of this assignment is to add recursive data to our language. This allows us to create streams or cyclic graphs, which are useful in a variety of algorithms (such as writing control-flow graph algorithms for compilers). With this feature, we’ll finally be able to completely collapse distinctions in our surface syntax.
We’ll also add a little syntactic sugar to the surface language. This isn’t really related to recursive data, but the assignment is small if we don’t merge the two separate goals into a single assignment.
This assignment is due Friday, April 10, 2020 at 11:59pm.
You can use the reference solution here: https://www.students.cs.ubc.ca/~cs-411/2019w2/a10-interrogator.cgi
You should first merge your solution to Compiler 9: First-class Functions with the starter code provided. The new starter code has the correct provides and includes a submodule to help you run your compiler on the command line if you want. The name of the skeleton is a10-skeleton.rkt to avoid accidentally overwriting your files, but your file in the Git repository should be named a10.rkt.
17.2 Language Diagram
17.3 Preface: What’s wrong with Exprs-Lang v9
Exprs-lang v9 gave us the ability to write first-class lexically scoped local functions, a powerful tool for abstracting over computation. However, without the ability to write recursive data, we could only write simple non-recursive local functions. This makes implementing local loops annoying; we had to go through top-level definitions instead. It also means we can’t directly implement other cyclic data structures, like streams or control-flow graphs! By adding recursive data, we can implement arbitrary cyclic data structures, and get first-class lexically-scoped recursive functions (generalized loops) as a happy side effect.
Many languages support cyclic data through imperative mutation, but as we’ll see in this assignment, that’s a low-level implementation technique that the user need not be required to use.
Most of Racket is implemented using its macro system, thereby extending Racket in Racket, which all elaborates to a simple core language.
p | ::= | (module b ... e) | ||
b | ::= | (define x e) | ||
e | ::= | v | ||
| | (apply e e ...) | |||
| | (let ([x e]) e) | |||
| | (letrec ([x e] ...) e) | |||
| | (if e e e) | |||
v | ::= | fixnum | ||
| | prim-f | |||
| | x | |||
| | #t | |||
| | #f | |||
| | () | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (x ...) e) | |||
prim-f | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | procedure? | |||
| | vector? | |||
| | cons | |||
| | car | |||
| | cdr | |||
| | make-vector | |||
| | vector-length | |||
| | vector-set! | |||
| | vector-ref | |||
| | procedure-arity |
We add letrec, which we saw in the last assignment, to the surface language. Unlike last week, it is not restricted to bind variables only to procedures. This apparently simple change has deep consequences.
Further, since we have names instead of abstract locations, letrec is restricted to only bind each name once in a single letrec.
17.4 Preliminary Passes
17.4.1 uniquify
First we modify uniquify. Below we define Racketish-Unique.
p | ::= | (module b ... e) | ||
b | ::= | (define x aloc e) | ||
e | ::= | v | ||
| | (apply e e ...) | |||
| | (let ([x aloc e]) e) | |||
| | (letrec ([x aloc e] ...) e) | |||
| | (if e e e) | |||
v | ::= | fixnum | ||
| | prim-f | |||
| | x | |||
| | aloc | |||
| | #t | |||
| | #f | |||
| | () | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (x aloc ...) e) | |||
prim-f | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | procedure? | |||
| | vector? | |||
| | cons | |||
| | car | |||
| | cdr | |||
| | make-vector | |||
| | vector-length | |||
| | vector-set! | |||
| | vector-ref | |||
| | procedure-arity |
As usual with uniquify, the only change is that all names x are replaced by abstract locations aloc.
Exercise 1: Redesign and extend the implementation of uniquify. The source language is Racketish and the target language is Racketish-Unique.
Be careful with the binding for letrec.
17.4.2 define->letrec
Next we elaborate define into letrec, as before. Below we define Just-Exprs-lang v10.
p | ::= | (module b ... e) | ||
b | ::= | (define aloc e) | ||
e | ::= | v | ||
| | (apply e e ...) | |||
| | (letrec ([aloc e] ...) e) | |||
| | (let ([aloc e]) e) | |||
| | (if e e e) | |||
v | ::= | fixnum | ||
| | prim-f | |||
| | aloc | |||
| | #t | |||
| | #f | |||
| | () | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (aloc ...) e) | |||
prim-f | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | procedure? | |||
| | vector? | |||
| | cons | |||
| | car | |||
| | cdr | |||
| | make-vector | |||
| | vector-length | |||
| | vector-set! | |||
| | vector-ref | |||
| | procedure-arity |
Exercise 2: Redesign and extend the implementation of define->letrec. The source language is Racketish-Unique and the target language is Just-Exprs-lang v10.
17.5 Resolving Recursive Binding
We’ve actually already seen this transformation; we did it when implementing
closures, although its details were somewhat obscured.
Closures are a form of recursive data—
17.5.1 purify-letrec
We define Landin-knot-lang v10.
p | ::= | (module e) | ||
c | ::= | (set! aloc e) | ||
e | ::= | v | ||
| | (apply e e ...) | |||
| | (letrec ([aloc e] ...) e) | |||
| | (let (info ((assigned (aloc ...)))) ([aloc e] ...) e) | |||
| | (if e e e) | |||
| | (begin c ... e) | |||
v | ::= | fixnum | ||
| | prim-f | |||
| | aloc | |||
| | #t | |||
| | #f | |||
| | () | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (aloc ...) e) | |||
prim-f | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | procedure? | |||
| | vector? | |||
| | cons | |||
| | car | |||
| | cdr | |||
| | make-vector | |||
| | vector-length | |||
| | vector-set! | |||
| | vector-ref | |||
| | procedure-arity |
Now letrec only binds procedures. We also allow n-ary binding in let, to simplify the compiler pass. We add a single computation, (set! aloc e). Each let form is annotated with an info structure containing the list of alocs that are assigned in its body, and is used to keep track of mutable alocs.
`(let (info ((assigned ()))) ([,xs_s ,es_s] ...) (let (info ((assigned (,xs_c ...)))) ([,xs_c (void)] ...) (letrec ([,xs_l ,es_l] ...) (let (info ((assigned ()))) ([,xs_t ,es_c] ...) (begin (set! ,xs_c ,xs_t) ... ,e)))))
The restriction on applications is to avoid problems if we ever add control-flow constructs, such as call/cc, to our language.
A binding `[,x ,e] is lambda if e is literally a lambda value.
Otherwise, the binding is complex.
Complex bindings `[,x ,e] are compiled as follows. First, we set up an initial let-binding to x to (void). Then, we introduce an auxiliary binding `[,xt ,e]. In it, e now has the correct binding, referring to x. But x does not have the right value. So we finally mutate x to have the value of xt, so now both xt and x have the same value, and the right recursive binding structure. This implements recursion using back-patching mutation, a technique called Landin’s Knot.
Exercise 3: Design and implement purify-letrec. The source language is Just-Exprs-lang v10 and the target language is Landin-knot-lang v10.
The reference implementation takes an optional second parameter specifying an optimization level. (curryr purify-letrec 0) (the default) does not treat nested letrec as simple. (curryr purify-letrec 1) tries to detect simple nested letrecs.
17.5.2 convert-assigned
Previously, we did not expose set! beyond the backend of the compiler, so we must implement set!. We cannot simply expose set! from the backend of the compiler, because this version of set! behaves differently. It acts much more like Racket’s set-box!.
(let ([x.1 (void)]) (let ([x.2 (cons 1 (lambda () x.1))]) (begin (set! x.1 x.2) (+ (car x.1) (car ((cdr x.1)))))))
(let ([x.1 (box (void))]) (let ([x.2 (cons 1 (lambda () (unbox x.1)))]) (begin (set-box! x.1 x.2) (+ (car (unbox x.1)) (car ((cdr (unbox x.1))))))))
So now, we need to implement set! and the assigned locations to give them a box-like semantics, rather than a move semantics.
Vectors use more memory than we need. We could reduce memory usage by adding a primitive box data type. However, this requires adding a new tag, modifying complex passes, and has little pedagogical value.
Below we define Landin-vec-lang v10. We no longer have set! or impure computation; everything must be let-bound once more.
p | ::= | (module e) | ||
c | ::= | (set! aloc e) | ||
e | ::= | v | ||
| | (apply e e ...) | |||
| | (letrec ([aloc (lambda (aloc ...) e)] ...) e) | |||
| | (let (info ((assigned (aloc ...)))) ([aloc e] ...) e) | |||
| | (if e e e) | |||
| | (begin c ... e) | |||
v | ::= | fixnum | ||
| | prim-f | |||
| | aloc | |||
| | #t | |||
| | #f | |||
| | () | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (aloc ...) e) | |||
prim-f | ::= | _... |
`(let (info ((assigned (x.1)))) ([x.1 (void)]) (let (info ((assigned ()))) ([tmp.2 (apply cons 1 (lambda () x.1))]) (begin (set! x.1 tmp.2) x.1))) => `(let ([x.1 (apply make-vector 1)]) (let ([tmp.2 (apply cons 1 (lambda () (apply vector-ref x.1 0)))]) (let ([tmp.3 (apply vector-set! x.1 0 tmp.2)]) (apply vector-ref x.1 0))))
Exercise 4: Design and implement the function convert-assigned. The source language is Landin-knot-lang v10 and the target language is Landin-vec-lang v10.
17.5.3 dox-lambdas, implement-safe-primops, uncover-free, convert-closures
Now we just need to make interfaces match. We’ve moved multi-arity let up a few passes, so we need to adapt the languages between here and sequentialize-let.
Exercise 5: Redesign and extend the implementation of
dox-lambdas, whose source language is Landin-vec-lang v10. You need to add support for multi-arity let.
implement-safe-primops, you need to add support for multi-arity let in the source language.
uncover-free, you need to add support for multi-arity let in the source language.
convert-closures, you need to add support for multi-arity let in the source language.
17.6 Syntactic Sugar
17.6.1 Preface: What’s wrong with Racketish?
Racketish is a good start at a core language. You could start building standard libraries and get to work. But it’s missing a handful of nice features. Most languages support various complex data literals, such as Racket’s quote form for lists. They also support features you cannot encode as functions, like short-circuiting boolean and and or.
These get elaborated very early. In Racket, these are mostly implemented as macros, a user-extensible system for expressing syntactic rewrites. We’ll design a compiler pass that isn’t quite a macro system, but should give you an idea of how a macro system works, and give you the ability to add new syntactic sugar to the language.
Below, we define Racketish-Surface.
p | ::= | (module b ... e) | ||
b | ::= | (define x e) | ||
e | ::= | v | ||
| | ’s-expr | |||
| | #(e ...) | |||
| | (apply e e ...) | |||
| | (e e ...) | |||
| | (macro-id e ...) | |||
| | (let ([x e]) e) | |||
| | (letrec ([x e] ...) e) | |||
| | (if e e e) | |||
v | ::= | fixnum | ||
| | prim-f | |||
| | x | |||
| | #t | |||
| | #f | |||
| | () | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (x ...) e) | |||
s-expr | ::= | #t | ||
| | #f | |||
| | fixnum | |||
| | ascii-char-literal | |||
| | () | |||
| | (s-expr ...) | |||
macro-id | ::= | and | ||
| | or | |||
| | quote | |||
| | vector | |||
| | begin | |||
prim-f | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | procedure? | |||
| | vector? | |||
| | cons | |||
| | car | |||
| | cdr | |||
| | make-vector | |||
| | vector-length | |||
| | vector-set! | |||
| | vector-ref | |||
| | procedure-arity |
We add two special syntaxes, ’s-expr for quoted lists, and #(e ...) for vector literals. Note that our language does not have symbols, so our s-expr differs from Racket’s. ’s-expr is a reader macro in Racket and is implicitly elaborated to (quote s-expr). #(e ...) is the notation for vector literals, and is equivalent to (vector e ...), but not syntactically identical in the same way ’s-expr and (quote s-expr) are.
Parsing is often described as a solved problem, but it is not, and is easy to get wrong, and doing a bad job can summon the dark gods to our realm. We do not want to summon dark gods, so we worked directly with data in the host language, and it was fine.
I can’t typeset these examples using my normal REPL typesetting feature since Racket implicitly renders them identically.
(match '(quote (1 2 3)) [`(quote ,es) es]) (match ''(1 2 3) [`(quote ,es) es]) (match '#(1 2 3) [`#(,es ...) es]) (match '#(1 2 3) [(vector es ...) es]) ;This one will fail to match (match '#(1 2 3) [`(vector ,es ...) es])
We also introduce a new form for applying macros, and add a few macros. We also make function application implicit; essentially, anything that looks like an application form is either a macro application, or, is implicitly a function application.
17.6.2 uniquify
Macros can often introduce new variable binding, so it’s still important to have some mechanism for handling fresh variables. Before we expand, we once more uniquify.
Below we define Racketish-Surface-Unique, updating it to include the new forms and macros. We typeset changes with respect to Racketish-Surface.
p | ::= | (module b ... e) | ||
b | ::= | (define x aloc e) | ||
e | ::= | v | ||
| | ’s-expr | |||
| | #(e ...) | |||
| | (e e ...) | |||
| | (macro-id e ...) | |||
| | (let ([x aloc e]) e) | |||
| | (letrec ([x aloc e] ...) e) | |||
| | (if e e e) | |||
v | ::= | fixnum | ||
| | prim-f | |||
| | x | |||
| | aloc | |||
| | #t | |||
| | #f | |||
| | () | |||
| | (void) | |||
| | (error uint8) | |||
| | ascii-char-literal | |||
| | (lambda (x aloc ...) e) | |||
s-expr | ::= | #t | ||
| | #f | |||
| | fixnum | |||
| | ascii-char-literal | |||
| | () | |||
| | (s-expr ...) | |||
macro-id | ::= | and | ||
| | or | |||
| | quote | |||
| | vector | |||
| | begin | |||
prim-f | ::= | * | ||
| | + | |||
| | - | |||
| | eq? | |||
| | < | |||
| | <= | |||
| | > | |||
| | >= | |||
| | fixnum? | |||
| | boolean? | |||
| | empty? | |||
| | void? | |||
| | ascii-char? | |||
| | error? | |||
| | not | |||
| | pair? | |||
| | procedure? | |||
| | vector? | |||
| | cons | |||
| | car | |||
| | cdr | |||
| | make-vector | |||
| | vector-length | |||
| | vector-set! | |||
| | vector-ref | |||
| | procedure-arity |
Exercise 6: Redesign and extend the implementation of uniquify. The source language is Racketish-Surface and the target language is Racketish-Surface-Unique.
To support implicit application, you’ll probably want predicates for determining when something is a macro id, and when something is a value keyword.
To handle ’s-expr, remember that it is implicitly elaborated to (quote s-expr) by Racket.
17.6.3 expand-macros
Now we can expand macros. Ideally, this is designed to be extensible, so we can add new macros at will, or even allow the user to define macros. We don’t force you to design your compiler this way, though.
Our target language is the (uniquified) core language: Racketish-Unique. All macros will be elaborated to existing features. We also elaborate implicit function application into explicit apply.
- (and v ...)
`(and) => #t `(and ,e) => e `(and ,e ,es ...) => `(if ,e (and ,es ...) #f) When and is given no arguments, we expand to #t, since #t is the multiplicative identity element on booleans (like 1 is for integers and multiplication). When given a single argument, we return it, since Racket is falsey. Otherwise, we elaborate to branch on the first argument, and recursively expand and applied to the rest of its arguments.
- (or v ...)
`(or) => #f `(or ,e) => e `(or ,e ,es ...) => `(let ([,tmp ,e]) (if ,tmp ,tmp (or ,es ...))) When or is given no arguments, we expand to #f, since #f is the additive identity element on booleans (like 0 is for integers and addition). When given a single argument, we return it, since Racket is falsey. Otherwise, we bind the first argument to an auxiliary variable, return it if its true, and otherwise return the or of the rest of the arguments.
Because we introduce an auxiliary variable, we see why it’s important for macros to work with unique variables. Otherwise, this could accidentally capture any use of a free variable in es ..., just like any compiler pass that introduces auxiliary variables.
- (vector e ...) and #(e ...)
- (quote s-expr)
`',v => v `'(,s-expr ,s-exprs ...) => `(cons ',s-expr '(,s-exprs ...)) - (begin e ...)
`(begin) => '(void) `(begin ,e) => e `(begin ,e ,es ...) => `(let ([,tmp ,e]) (if (error? ,tmp) ,tmp `(begin ,es ...))) We add a surface-level begin form that handles propagating errors correctly. This is essentially an implementation of do for the monad.
These rewrites are essentially syntax-case macros, recursive pattern-based macros that can perform computation at compile time.
Notice that each macro can generate more uses of macros. As a result, the recursive structure of this elaborate pass is going to be slightly different from other passes.
Exercise 7: Design and implement the function expand-macros. The source language is Racketish-Surface-Unique and the target language is Racketish-Unique.
Try designing the pass so that each of the above macros is defined in its own helper function, and make expand-macros handle macros completely generically. You’ll be part of the way to an implementation of a macro system that supports user-defined macros. If you exposed an interface in Racketish-Surface and used eval...
17.7 )
The end.
Exercise 8: Good luck. Have fun.