Imagine a bacon-wrapped Ferrari. Still not better than our free technical reports.
Introduction to λ-calculus
In this post we try to give a really short overview of what’s λ-calculus and why you might want to know about it. In parts two and three we will see a basic implementation in Scala. Lets start with a short description (quote from Wikipedia):
“Lambda calculus (also written as λ-calculus or called “the lambda calculus”) is a formal system in mathematical logic and computer science for expressing computation by way of variable binding and substitution. First formulated by Alonzo Church, lambda calculus found early successes in the area of computability theory, such as a negative answer to Hilbert’s Entscheidungsproblem.
Did you know that Alonzo Church, who invented λ-calculus, would be 110 years old next week! Sadly, he died at the tender age of 92, but his contributions to theoretical computer science made much of what we do as developers possible today.
Lambda calculus is important in programming language theory, and the symbol λ has even been adopted as an unofficial symbol for the field. It can be considered the smallest universal programming language: any computable function can be evaluated in the context of λ-calculus and evaluating programs in the language consists of a single transformation rule: variable substitution.
Why would it be useful to know? Maybe you want better understanding of programming languages and computation? λ-calculus forms the basis of functional programming and as the world embraces functional programming more and more, perhaps it would be useful to know its roots.
An expression in lambda calculus can be a:
- lambda abstraction (function):
.separates the function argument and body)
- function application:
Variables are just names that are bound as arguments to lambdas. Each lambda takes a single argument and multiple arguments are obtained by nesting lambdas (
λx.(λy. x y)) – this is similar to currying in e.g. Haskell (did you know that Haskell and currying are both named after Haskell Curry, another important mathematician, also known for the Curry-Howard correspondence that relates programs to proofs). A function application consists of two expressions: the left hand side defines the function to apply and the right hand side gives the argument to apply the function to.
Compared to a Turing machine (to which lambda calculus is equivalent in computing capability), lambda calculus puts the emphasis on software, not caring about the details of the machine evaluating it. The lambda calculus in its pure form is untyped and has no concern about the types of expressions at all – it’s all about computation in the form of variable substitution. In this post we only look at this untyped λ-calculus.
It’s worth mentioning that the lambdas in lambda calculus are pretty much the same lambdas that many programming languages have (including Java 8, Scala, Kotlin etc.). Some of those languages might treat them a bit differently (e.g. non-local returns in Ruby), but mostly they are very similar to the lambda-calculus lambdas.
Programming in λ-calculus
Functions are all there is in pure λ-calculus, so every value we can pass around must be a function. The simplest function is the identity function
λx.x – it takes something (another function) as an argument and returns the same thing. When we want to represent “things” in our program, those must all be represented as functions.
So if we want to have numbers, we have to encode them as functions. Thankfully, Alonzo Church already came up with such an encoding, where the value of a numeral is equivalent to the number of times a function is applied to an argument. This can be written as
λs.λz. sn z, where
n is the natural number represented and
sn means function
s composed with itself
n times (we’ll say “applied n times” for short).
Thus the first few natural numbers are encoded as follows:
0 = λs.λz. zthe function s is applied to the argument z zero times
1 = λs.λz. s zthe function s is applied once
2 = λs.λz. s (s z)the function s is applied twice
z are short for successor and zero)
Lets also quickly look at how we can write simple functions with numbers.
succ = λn.λs.λz. s (n s z)(n + 1)
succ adds 1 to a number. Since a number n is defined as a function that applies the 1st argument to the 2nd argument n times, the result should be a function that applies the 1st argument one more time. When you substitute the variable
succ with a value
x, that’s what you get: a function that applies
s one more time than
(λn.λs.λz. s (n s z)) x → λs.λz. s (x s z)
Similarly, we define
a + b so that it returns a function applying
b times and then
add = λa.λb.λs.λz. a s (b s z)(a + b)
To understand this more clearly, lets substitute values
(λa.λb.λs.λz. a s (b s z)) x y →
(λb.λs.λz. x s (b s z)) y →
λs.λz. x s (y s z)
As we can see, after applying
y, the result is still a function that looks similar in shape to our original definition of numbers. It applies
y times, then applies
s to the result
x more times. Multiplication can be defined in a way that looks even simpler than the addition above:
mul = λa.λb.λs. a b s(a * b)
Substituting x and y gives us:
(λa.λb.λs. a b s) x y →
(λb.λs. x b s) y →
λs. x y s
It looks simpler, but is perhaps harder to grasp immediately: we used a trick and left out the z argument. Now the shape of the resulting function is different — it applies
x times, but what does it mean to apply a number y to just one argument? Remember that if y is a number, it must perform a computation of the shape
λs.λz. sy z. If we apply this to the
s in the multiplication
x times, we get the following (renaming the outer
s’ to distinguish from the inner
λs’. (λs.λz. sy z)x s’
So in this case we see that the 2nd argument
z is there in
y, but we apply
y only to
s. This is similar to partial application in many languages. Lets see what happens if
x = y = 1:
λs’. (λs.λz. s1 z)1 s’ =
λs’. (λs.λz. s z) s’ →
λs’.λz. s’ z
(this is equivalent to the definition of 1)
2 * 2 as well, where the substitutions become more complex and we rename some variables to distinguish similarly-named ones:
λs’. (λs.λz. s2 z)2 s’ =
λs’. (λs.λz. s (s z))2 s’ =
λs’. (λs.λz. s (s z)) ((λs’’.λz’’. s’’ (s’’ z’’)) s’) →
λs’. (λs.λz. s (s z)) (λz’’. s’ (s’ z’’)) →
λs’. λz. (λz’’. s’ (s’ z’’)) ((λz’’. s’ (s’ z’’)) z) →
λs’. λz. (λz’’. s’ (s’ z’’)) (s’ (s’ z)) →
λs’. λz. s’ (s’ (s’ (s’ z))))
(this is equivalent to definition of 4)
To recap, we defined multiplication using partial application of lambdas, without mentioning the second argument
z that we usually had in numbers, but the end result still has the correct shape. We could also have defined multiplication in a more verbose way that includes the
z argument. Give it a try, or try to come up with an alternative definition of multiplication that uses
Booleans can also be encoded as functions:
true = λt.λf.t2-arg function returning the 1st arg
false = λt.λf.f2-arg function returning the 2nd arg
Basically, the booleans represent if-then-else expressions. They both take two arguments, true returns the first (then) and false returns the second (else). We’ll take a look at some example programs with booleans in the next posts.
Substitution & Evaluation
Ok, so now we can represent some values and programs in λ-calculus, and we used some examples of variable substitution without defining the exact rules. What do the final results of evaluating programs look like? As evaluation happens by variable substitution, it seems the results should be programs where some substitution has happened, as we saw in the examples of multiplication. But when does the substitution begin and end?
Variable substitution is defined by a few rules, the most important is called β-reduction and means if we have an application with a lambda abstraction on the left hand side, we substitute the right hand side for the argument of the lambda. This kind of reducible expression is also called “redex”, and such a reduction corresponds to a single computation step. For example:
(λt.λf.t) x → λf.xsubstituting
There are different strategies for picking the next redex to reduce in a more complex expression. We will look at those in the next post, and implement one of the strategies. We can continue applying β-reduction until there are no more redexes. However, we must consider that it’s possible to write programs for which β-reduction does not terminate. For example, the following expression:
(λx.x x) (λx.x x)
reduces to itself when we apply a single step of β-reduction.
An obvious thing we have to be careful of when substituting variables is that we cannot change the semantics of the program by naively replacing all variables with the same name. Consider the following expression:
λx.(λx.x). Here the inner
λx.x is the identity function, and the argument of the outer lambda is never used. If we substituted all names
x, we would incorrectly get an expression with different semantics:
(λx.(λx.x)) y → λx.y
We must assume name collisions will happen and distinguish between bound and free variables. For example, in
x is bound and
y appears free (must not be substituted). In the
λx.(λx.x) example, the outer
x — which is bound to the outer lambda — should be substituted (however, there are no occurrences); and the inner
x should be left as is.
One way of making this distinction properly is to rename bound variables during substitution, making sure to always give them unique names. This is called α-conversion and expressions that only differ in bound variable names are considered α-equivalent or even completely equivalent.
But how we implement the renaming technically is not always very important. We can also do the substitutions without any renaming if we keep track of lexical scopes of the variables – each lambda abstraction defining its own scope. Then we substitute
x only if it’s defining scope is the scope of the left hand side lambda in the application.
Well, this turned out to be quite a lengthy introduction, as we spent quite a lot of time on Church numerals, which are not really useful for day-to-day programming. But on the other hand, the fact that functions can be used to represent any basic building blocks of programs — such as numbers and booleans — is quite interesting. In fact, any data structure could be represented as functions.
I think in general knowing λ-calculus isn’t necessarily going to make you a better programmer in any specific language, but it may give you a different perspective on some problems. It’s also useful if you are interested in reading programming language theory research, which often assumes some knowledge of λ-calculus.
If you want to know more, stay tuned for the next posts where we look more deeply into parsing, evaluation and substitution strategies and implement the untyped λ-calculus in Scala. If you want a more deeper understanding of the underlying theory, I’d recommend reading Benjamin Pierce’s “Types and Programming Languages”.