WORK-IN-PROGRESS: - this material is still under development
Last significant update: 19 Nov 07
Closures are a language feature that despite being around for a long time, has only recently begun to make its way onto the radar of many software developers. This is probably because the languages that have and use closures, such as lisp and Smalltalk, weren't part of the C culture that drove the development of the current mainstream languages.
I use the term Closure in this book, but naturally there is no standard term for this language element. You also see them referred to as lambdas, anonymous functions, and blocks. Each language that uses them usually has its own term for them (lispers use 'lambda', smalltalkers and rubyists use 'block'). Although they are called blocks in smalltalk and ruby, it isn't the same as blocks in C based languages.
Now I've got the terminological babble out of the way I can actually say what they are. My short definition for them is a code fragment that can be treated as an object. To get serious about this we need an example.
Let's consider the problem of getting a subset of data from a collection. Let's imagine we have a list of employees and we want all employees who are heavy travelers.
int threshold = ComputeThreshold();
var heavyTravellers = new List<Employee>();
foreach (Employee e in employeeList)
if (e.MilesOfCommute > threshold) heavyTravellers.Add(e);
Somewhere else in the code we need to get a list of employees who are managers.
var managerList = new List<Employee>();
foreach (Employee e in employeeList)
if (e.IsManager) managerList.Add(e);
These two code fragments contain a lot of duplication. In both cases we want a list that is formed by taking the members of the original list, running a boolean function against each element, and returning those for which the function returns true. It's a simple thing to envisage, but difficult to write in many languages because the thing that varies between the different code fragments is a chunk of behavior - which is often not easy to parametrize.
The most obvious way to parametrize something like this is to turn it into an object. What I need is a method on a list that will allow to select from the list based on a separate object that I pass in.
class MyList<T> {
private List<T> contents;
public MyList(List<T> contents) {
this.contents = contents;
}
public List<T> Select(FilterFunction<T> p) {
var result = new List<T>();
foreach (T candidate in contents)
if (p.Passes(candidate)) result.Add(candidate);
return result;
}
}
interface FilterFunction<T> {
Boolean Passes(T arg);
}
I can then use it to select managers like this.
var managers = new MyList<Employee>(employeeList).Select(new ManagersPredicate());
class ManagersPredicate : FilterFunction<Employee> {
public Boolean Passes(Employee e) {
return e.IsManager;
}
There's a certain programming satisfaction in doing this, but but there's so much code in setting up the predicate object that the cure is worse than the disease. This is especially true when we look at the heavy travelers case. Here I need to pass a parameter into the predicate object, which means I need a constructor in my predicate.
var threshold = ComputeThreshold();
var heavyTravellers = new MyList<Employee>(employeeList).Select(new HeavyTravellerPredicate(threshold));
class HeavyTravellerPredicate : FilterFunction<Employee> {
private int threshold;
public HeavyTravellerPredicate(int threshold) {
this.threshold = threshold;
}
public Boolean Passes(Employee e) {
return e.MilesOfCommute > threshold;
}
}
Essentially a closure is a more elegant solution to this problem, one that makes it much more easy to create a hunk of code and pass it around like an object.
You'll notice I've made my examples in C#. I did this because C# has evolved steadily towards a more convenient use of closures in the past few years. C# 2.0 introduced the notion of anonymous delegates, which are a big step in this direction. Here's the heavy traveler example again using anonymous delegates.
var threshold = ComputeThreshold();
var heavyTravellers = employeeList.FindAll(delegate(Employee e) { return e.MilesOfCommute > threshold; });
The first thing to notice here is now there's much less code involved. The duplication between this expression and a similar one for finding managers is vastly reduced. In order to make this work I've used a library function on the C# list class similar to the select function I wrote myself for the hand-written predicate. C# 2 introduced a number of changes to the libraries that took advantage of delegates. This is an important point - for closures to be really useful in a language the libraries need to be written with closures in mind.
A third point that this fragment illustrates is how easy it is to use the threshold parameter - I just use it in my boolean expression. I can put any local variable that's in scope into this expression, which saves all the faffing around with parameters that the predicate object version needed.
This reference to variables in scope is what formally makes this expression a closure. The delegate is said to close over the lexical scope of where it's defined. Even is we take the delegate and store it somewhere for later execution, those variables are still visible and usable. Essentially the system needs to take a copy of of the stack frame to allow the closure to still have access to everything it should see. Both the theory and implementation around this is quite tricky - but the result is very natural to use.
C# 3 went a step further, here's the heavy travelers expression again.
var threshold = ComputeThreshold();
var heavyTravellers = employeeList.FindAll(e => e.MilesOfCommute > threshold);
You'll notice there's really very little change here - the main factor is that the syntax is much more compact. This may be a small difference but it's a vital one. The usefulness of closures is directly proportional to how terse they are to use. This makes them far more readable.
There is a second difference, which is an important part of
making the syntax terser. In the delegate example I needed to
specify the type of the parameter Employee e. I don't
need to indicate that type with the lambda because C# 3.0 has a type
inference capability, meaning that since it can figure out what the
type of the result of the right hand side of the assignment is, you
don't have to tell it on the left.
The consequence of all this is that I can create closures and treat them just like any other object. I can store them in variables and execute them whenever I wish. To illustrate this I can make a club class that has a field for a selector.
class Club...
Predicate<Employee> selector;
internal Club(Predicate<Employee> selector) {
this.selector = selector;
}
internal Boolean IsEligable(Employee arg) {
return selector(arg);
}
and use it like this
public void clubRecognizesMember() {
var rebecca = new Employee { MilesOfCommute = 5000 };
var club = createHeavyTravellersClub();
Assert.IsTrue(club.IsEligable(rebecca));
}
private Club createHeavyTravellersClub() {
var threshold = 1000;
Club club = new Club(emps, e => e.MilesOfCommute > threshold);
return club;
}
This code creates a club in one function using a local variable to set the threshold. The club contains the closure, including the link to the now-out-of-scope local variable. I can then use the club to execute the closure at any future time.
In this case the selector closure isn't actually evaluated when it's created. Instead we create it, store it, and evaluate it later (possibly multiple times). This ability to create a block of code for later execution is what makes closures so useful for Adaptive Models.
Another language that I'm using in this book that uses Closures heavily is Ruby. Ruby was built with closures from early days, so most Ruby programs and libraries use them extensively. Defining a club class looks like this in ruby.
class Club...
def initialize &selector
@selector = selector
end
def eligable? anEmployee
@selector.call anEmployee
end
and we use it like this
def test_club
rebecca = Employee.new(5000)
club = create_heavy_travellers_club
assert club.eligable?(rebecca)
end
def create_heavy_travellers_club
threshold = 1000
return Club.new {|e| e.miles_of_commute > threshold}
end
In Ruby we can define a closure either with curly braces, as above,
or with a do..end pair.
threshold = 1000
return Club.new do |e|
e.miles_of_commute > threshold
end
The two syntaxes are almost entirely equivalent. In practice people use the curlies for one-liners and the do..end for multi-line blocks.
The sad part about this nice Ruby syntax is that you can only use it to pass a single closure into a function, if you want to pass multiple closures you have to use a less elegant syntax. [TBD: Maybe add example ]
Closures play a couple of useful roles in DSLs. Most obviously they are an essential element for Nested Closure. They also can make it easier to define a Adaptive Model.