martinfowler.com logo Home Blog Articles Books About Me Contact Me ThoughtWorks

CollectionClosureMethod design 1 August 2005 Reactions

When I first started programming in Smalltalk one of the things I liked right from the start were the collection classes. They allowed you to simply do a bunch of common and powerful operations on collection classes. When Java appeared, I missed these kinds of methods - the Java (and C#) collections were very limited compared to Smalltalk. The main reason for this limitation is that Java doesn't have any convenient implementation for a Closure. The powerful Smalltalk methods for collections all relied on closures.

In recent years I've done a lot of Ruby programming. One of the main things that got me hooked into Ruby was the fact that it has these powerful collection methods, which Ruby can have since ruby does have closures in its language.

So what are these closure using methods that are so good for collections? The centerpiece method is 'each' which in design patterns terminology is an internal iterator. (In Smalltalk this method is called 'do'.)

  employees.each do |e|
    e.doSomething
  end

The each method takes a one argument block (Ruby and Smalltalk both refer to closures as blocks). It then executes the block on each element in the collection. It essentially is the same as the foreach statement you find in many modern languages (and recently arrived in Java with 1.5). With these languages the foreach method is all you get, but with collections and closures the each method is just the start.

A common thing you need to do with collections is to find all the elements of a collection that satisfy some boolean condition. For this you write some code like this.

managers = []
for e in employees
  if e.manager?
    managers << e 
  end
end

Although this is legal ruby (<< adds to a collection) no decent rubyist would write this. Instead they would write the following.

managers = employees.select {|e| e.manager?}

Like each, select takes a block as an argument. In this case the block is delimited by curlies rather than do/end (both are legal but curlies are better for one-liners). It applies the block to each element of the list, if the block returns true it puts that element into the result collection which it returns at the end. As you can see having this kind of method can really simplify this common case. Smalltalk also called this method 'select', ruby has an alias for this method called 'find_all'. There's also a sibling method called 'reject' which returns all the elements for which the block returns false.

The next most common closure method I use with collections is collect. This is similar but where you need to gather the results of a method call. Here's the traditional code:

  offices = []
  for e in employees
   offices << e.office
  end
  

Again the closure method allows you to use a one-liner.

offices = employees.collect {|e| e.office}

You can see what this does, it's similar to select but instead puts the result of the method call into the returned collection. Smalltalk also called this 'collect'. Lisp has a similar function called 'map', in ruby 'map' is an alias for 'collect'.

There's a concept that's come out of modern functional programming languages that's similar to the two preceding closure methods - it's called a list comprehension. List comprehensions have made their way into the python language. They provide a syntactic approach to getting the kinds of benefits we've seen so far. Here are the two examples again using python list comprehensions.

  managers = [e for e in employees if e.isManager]
  offices = [e.office for e in employees]
  

List comprehensions make it easy to combine the two.

managersOffices = [e.office for e in employees if e.isManager]

You can also do this be chaining block methods together.

managersOffices = employees.select{|e| e.manager?}.
                            map   {|m| m.office}

List comprehensions are nice, but they really only handle select/collect cases. Blocks allow much more, here are some more things you can do with collections and blocks.

Similar to select are tests to see if all the elements of a collection match a condition or if any of them do.

   allManagers = employees.all? {|e| e.manager?}
   noManagers = ! employees.any? {|e| e.manager?}
  

The partition method combines select and reject. It works well with the multiple assignment feature in ruby.

managers, plebs = employees.partition{|e| e.manager?}

You don't just need cases with single argument blocks. Another one I use a lot is the sort method, which (in its classic form) takes two arguments to sort a list.

sortedEmployees = employees.sort {|a,b| a.lastname <=> b.lastname}

The sort method returns a list sorted using the code that's in the block. The <=> operator is the comparison operator, generally known as the starship operator. It returns -1 if a is less than b, +1 if greater, and 0 if the same.

Ruby 1.8 allows me to sort more easily with a single argument block

sortedEmployees = employees.sort_by {|e| e.lastname}

Another two argument method is each_with_index which iterates through a list in the same way as each but passes the index as well as the element to the block.

Find (aliased to the smalltalk name 'detect') looks for the first one that matches a condition.

volunteer = employees.find {|e| e.steppedForward?}

Often with something like find, you want to do something if no element matches the condition. Find will return nil if nothing matches, so you can test the result. However you can also pass a second block which is used if nothing matches.

volunteer = employees.find(lambda{self.pickVictim}) {|e| e.steppedForward?}

Ruby's syntax is nice for methods that take a single block as an argument, but I find it rather more clunky for multiple blocks. Smalltalk (according to my questionable memory) would look like this.

volunteer := employees 
               detect: [:each| each hasSteppedForward]
               ifNone: [self pickVictim]

As usual Smalltalk's keyword parameters make it much easier to read a multi-argument method.

The last closure method I'll mention is one that people often find hard to understand: inject. Inject is good when you want a cumulative result on a collection. Let's say we want the total of all our employees' salaries. The traditional way would be like this:

  total = 0
  for e in employees
    total += e.salary
  end

Inject does it like this.

total = employees.inject(0) {|result, e| result + e.salary}

At each element in the collection inject assigns the result of executing the block to the result variable. The result of the final execution gets returned from inject.

An important point about collection closure methods is how they can be composed easily to create complex expressions. Imagine you want to find your oldest programmer in a product's team based in a country.

aProduct.teams.
    select {|t| t.country == aCountry}.
    map{|t| t.members}.
    flatten.
    select{|e| e.isProgrammer}.
    max{|a,b| a.age <=> b.age}

Now there's a serious question here about a method that complex. (And I must admit I used to be wary of chaining like this.) But it's a good illustration of how useful it is to string these methods together. (And in a language that supports the right kind of concurrency, it has important implications for multi-processor performance.)

My purpose here isn't to say how wonderful Ruby (and Smalltalk, Lisp etc) are because they have these methods. My point is that the combination of closures and collections leads to some very nice things. Languages that don't have closures really miss out on these things. If you do get the chance to program in a closureable language, get used to using these kinds of methods on your collections. I find they make a big difference.

(Thanks to Masanori Kado, Rik Hemsley, Christian Neukirchen and Stanislav Karchebny for helping me fix a couple of errors with the first posting of this article)


Links
home
bliki
feed 
Translations
Japanese
Spanish
Korean
Chinese
Thai
Categories
agile
design
dsl
leisure
refactoring
ruby
thoughtWorks
tools
uml
writing
Blog Roll
ThoughtBlogs
TW Alumni
Nicholas Carr
Steve Cook
Brian Foote
Simon Harris
Gregor Hohpe
/\ndy Hunt
Ralph Johnson
Patrick Logan
David Ing
Brian Marick
Jeremy Miller
Jimmy Nilsson
Samuel Pepys
Keith Ray
Johanna Rothman
Kathy Sierra
Dave Thomas

© Copyright Martin Fowler, all rights reserved