letrec* in R5RS

By Bjarno Oeyen | Posted on July 15, 2023

I have been using Scheme for a while now. To be more precise, I have been using R5RS (The Revised5 Report on the Algorithmic Language Scheme).

One thing thas has irked me over the years is how internal definitions are defined. It has troubled both myself and the students that I have been teaching. In this blog post, I discuss the semantics of internal definitions and explain why certain programs — that at first sight should work — fail to do so. Finally, I discuss a solution that fixes this problem. This solution is not new and I do not ownership of the idea, as it is already available in more recent variants of Scheme (R6RS and R7RS). But I hope that by describing the issues of internal definitions in this post helps other people — like my students — to not only know how to fix these issues plagueing internal definitions, but also why exactly they occur.

Semantics of Internal Definitions

The semantics of internal defintions (with define) in R5RS is not what most beginner programmers expect. Take the following (artificial) code snippet as an example:

(define (foo)
  (define a 9)
  (define (f x) (if (odd? x) (+ a x) (f (- x 1))))
  (define b (f a))

Most programmers would presume that the code above would simply work. I.e. that it would return 18 (as a is an odd number, it should return (+ 9 9) = 18). At least, that's what any sane programmer would expect, no? That is not what R5RS does. Entering this code in a compliant R5RS interpreter will result in an error. An error that complains that the reference to f is Line 4 is not yet defined. This is caused by how R5RS handles internal definitions. Let's see how the specification of the language describes internal definitions:

A body containing internal definitions can always be converted into a completely equivalent letrec expression.   – Section 5.2.2 R5RS

The code above is thus converted into a letrec form. The semantics of letrec are similar to those of let (with one minor, though important, twist that will be explained further down below): the binding expressions for the new variables are not evaluated in the same environment that the letrec (or let) constructs. Instead, each expression is evaluated without any of the other variables being in scope. I.e. the variables a, f and b are only bound, in an environment, after their values have been evaluated. It will be this environment in which the body (and only the body and any of its subexpressions) is evaluated in. This is different from top-level definitions. Indeed, executing the code above leaving out the definition of foo (i.e. just lines 2–5 as top-level expressions) will work as expected.

Binding Forms in R5RS

Scheme (R5RS) has three binding forms: let, letrec and let*.

let is the basic one. It evaluates the binding expressions in isolation (as explained before with letrec) and then evaluates the body in a new environment where these variables are assigned to these evaluation results. The difference between let and letrec is subtle, but quite fundamental in Scheme. Like its name suggests, letrec is used for recursive program. We explain its use with an example program.

(let ((fac (lambda (n)
             (if (= n 0)
                 (* n (fac (- n 1)))))))
  (fac 5))

The program above will fail to work. When evaluating the lambda expression, the resulting closure has the environment of the let expression as its lexical scope: not the environment where fac is defined. Thus the variable fac in the body of the lambda will not be defined. By using letrec instead of let, the lambda expression is evaluated in the same environment as the one in which the body will be evaluated.

An important property of letrec is that it has support for mutual recursion. An example can be found in the language specification, and I will therefore not repeat it here as it is not relevant to the present discussion.

The let* special form is unlike both let and letrec. Instead of first evaluating all binding expressions and then creating a new environment in which these are assigned, let* creates a new environment for each variable, extending it one-by-one for each variable: evaluating each succesive binding expression in the newly-extended environment.

The Missing Binding Form: letrec*

The program that we started with doesn't work with letrec (nor let) as the definition of b makes use of both a and f as defined above. This could be solved with let* but as f is a recursive procedure, replacing the internal definitions with a let* will just cause another, unrelated, error. What is needed, to fix the program above is a combination of letrec and let*. Let's give it a decent name like letrec* that makes it clear that it is a let that combines the semantics of letrec and let*.

Unfortunately, letrec* is not part of R5RS, but luckily we can create one ourselves by using macros. Macros allows us to extend the language with new syntactic forms. A possible macro definition that makes letrec* available is shown below:

;; Makes letrec* available in R5RS
(define-syntax letrec*
  (syntax-rules ()
    ((_ ((var expr) ...) body ...)
     (let* ((var #f) ...)
       (set! var expr)
       body ...))))

This macro combines the power of letrec and let* it creates one environment where all variables are initially bound to #f (line 5), and then evaluates each binding expression (in sequential order) in this same environment, assigning each result one-by-one instead of only at the end when all defining expressions have been evaluated.

We can now use letrec* to rewrite our artificial example program into...

(define (foo)
  (letrec* ((a 9)
            (f (lambda (x) (if (odd? x) (+ a x) (f (- x 1)))))
            (b (f a)))

... and we get the exact same output as we would get if these expressions would be evaluated in the top-level environment. To wit, 18. Hurray!

Does this all actually matter?

letrec* is available in R6RS and R7RS (and both dialects use it for internal definitions instead of letrec) but are considerably more complex than plain old R5RS. So this macro can be used as a patch to easily fix a faulty program, without needing to translate your code base into a different Scheme dialect.