| EAA-dev Home |

WORK-IN-PROGRESS: - this material is still under development

Literal Collection Expression

Form language expressions using literal collection syntax

Forming a language involves composing expressions into larger structures. Many languages provide convenient mechanism to express collections, such as lists and maps, with a concise literal syntax. Literal Collection Expression uses this mechanism to express language constructs.

[TBD: I intend to split this into separate patterns for Literal List and Lieral Map.]

How it Works

Much of computer programming involves manipulating collections of things. As a result almost every computer language provides some data structure for representing a collection. The most common historical collection is the array, which ends up being a particularly awkward structure to use. As object-oriented systems have grown in popularity, we've seen libraries emerge which handle a better range of collections. Most OO systems now provide lists, and maps (also called dictionaries, hashes, associative arrays). Although the ease of manipulating these structures varies greatly with the platform, they allow you to build handy structures quite easily

Scripting languages, in particular, provide not just handy libraries for collections, but also special syntax for using them. If you have a syntax to easily express lists and maps, you can use this syntax as a tool for a DSL.

Lists are, unsurprisingly, particularly good for multiple items of the same thing (parent ::= child*). Maps work well for an unordered argument list, particularly when there are a lot of optional arguments.

You can nest lists to create all sorts of structures, this is what Lispers have been doing for a long time. I don't think I would do this outside of lisp, however. A list of function calls is usually a better fit with more mainstream languages.

Although many languages don't provide a good literal collection syntax, you can often get a literal list by using a variable argument function call. Similarly literal maps can be implemented by named parameters - although these are much rarer.

When to use it

There are two forms of Literal Collection Expression, lists and maps and they differ in when best to use them.

Lists work well when you have multiple items nested inside another element, usually a function call. A logical grammar such as (parent ::= child*). In a statically typed language they all need to be of the same type (if only supertype), but in a dynamically typed language you may mix completely different things. If you do nest a list inside a function call, usually the best way to do it is with a varargs argument.

Maps work best when you have multiple optional arguments that can appear in any order. Remember that for this to work each argument can only appear once. If you want multiple values for an argument you'll need to nest a list into the value of the map's element. Method Chaining is a good alternative approach for this situation. If you need to fix the order of the arguments, then Nested Function or Method Chaining are usually better choices.

Literal Collection Expression doesn't work very well at the top level of your language. The various statemments of a DSL are usually better handled with Function Sequence, or possibly Nested Function.

You can only use Literal Collection Expression if your language supports it, sadly many popular languages don't.

The contra-arguments to Literal Collection Expression that I've just gone through don't hold for lisp, which is designed with literal lists in mind, so you should use them much more widely.

Example: The computer configuration using lists and maps. (Ruby)

Following its scripting language tradition, Ruby provides very good literal syntax for lists and maps. Here's an example of using this syntax for the computer configuration example.

computer(processor(:cores => 2, :type => :i386),
         disk(:size => 150),
         disk(:size => 75, :speed => 7200, :interface => :sata))

I'm not using entirely Literal Collection Expression here, as usual it's good to mix Literal Collection Expression with other techniques. (I will explore using Literal Collection Expression exclusively later on.) Here I have three functions: computer, processor, and disk. Each of these functions takes a collection as an argument: computer takes a list, the others take a map. I'm using Object Scoping with a builder class that implements the functions. Since this is Ruby I can use instance_eval to evaluate the DSL script in the context of an instance of the builder, which saves me from having to make a subclass.

I'll start with processor.

class MixedLiteralBuilder...
  def processor map
    check_keys map, [:cores, :type]
    return Processor.new(map[:cores], map[:type])
  end

Making use of a map is simple, I just pick the necessary items out of the map using the keys. The danger with using a map like this is that it's easy for the caller to introduce an incorrect key by accident, so it's worth a little checking here.

class MixedLiteralBuilder...
  def check_keys map, validKeys
    bad_keys = map.keys - validKeys
    raise IncorrectKeyException.new(bad_keys) unless bad_keys.empty?
  end
class IncorrectKeyException < Exception
  def initialize bad_keys
    @bad_keys = bad_keys
  end
  def to_s
    "unrecognized keys: #{@bad_keys.join(', ')}"
  end
end

I use the same approach for the disk

class MixedLiteralBuilder...
  def disk map
    check_keys map, [:size, :speed, :interface]
    return Disk.new(map[:size], map[:speed], map[:interface])
  end

Because everything is a simple value, I can create the domain object and return it within each Nested Function. The computer function can create the computer object, using a vararg for the multiple disks.

class MixedLiteralBuilder...
  def computer proc, *disks
    @result = Computer.new(proc, *disks)
  end

(Using a "*" in the argument list helps in working with variable arguments in ruby. In the argument list *disks indicates a vararg. I can then refer to all the disks passed in as an array named "disks". If I call another function with "*disks" the elements of the disks array are passed in as separate arguments.)

To process the DSL script, I get the builder to evaluate the script using instance_eval.

class MixedLiteralBuilder...
  def load aStream
    instance_eval aStream
  end

Example: Evolving to Greenspun Form (Ruby)

As with other elements in an internal DSL, a good DSL uses several different techniques together. So in the previous example I used Nested Function as well as Literal Collection Expression. Sometimes, however, it's interesting to push a single technique as far as it can go just to get a sense of how it turns out. It's quite possible to write even a fairly complex DSL expression using only Literal Collection Expression, let's see what that might look like.

[:computer, 
 [:processor, {:cores => 2, :type => :i386}],
 [:disk, {:size => 150}],
 [:disk, {:size => 75, :speed => 7200, :interface => :sata}]
]

In this version I've replaced all the function calls with literal lists, where the first element in the list is the name of the item to be processed and the rest of the list contains the arguments. I can process this array by first evaluating the ruby code and then passing it to a method that interprets the computer expression.

class LiteralOnlyBuilder...
  def load aStream
    @result = handle_computer(eval(aStream))
  end

I handle each expression by checking the first element of the array and then processing the other elements.

class LiteralOnlyBuilder...
  def handle_computer anArray
    check_head :computer, anArray
    processor = handle_processor(anArray[1])
    disks = anArray[2..-1].map{|e| handle_disk e}
    return Computer.new(processor, *disks)
  end
  def check_head expected, array
    raise "error: expected #{expected}, got #{array.first}" unless 
      array.first == expected
  end

As you can see this is essentially following the form of a Recursive Descent Parser. I say that the computer clause has a processor and multiple disks and I call the methods to process them returning a newly created computer.

Handling a processor is straightforward, just unpicking the arguments out of the provided map.

class LiteralOnlyBuilder...
  def handle_processor anArray
    check_head :processor, anArray
    check_arg_keys anArray, [:cores, :type]
    args = anArray[1]
    return Processor.new(args[:cores], args[:type])
  end
  def check_arg_keys array, validKeys
    bad_keys = array[1].keys - validKeys
    raise IncorrectKeyException.new(bad_keys) unless bad_keys.empty?    
  end

Handling the disks works the same way

class LiteralOnlyBuilder...
  def handle_disk anArray
    check_head :disk, anArray
    check_arg_keys anArray, [:size, :speed, :interface]
    args = anArray[1]
    return Disk.new(args[:size], args[:speed], args[:interface])
  end    

One thing to notice about this approach is that it gives me complete control over the order of evaluation of the elements of the language. I choose here to evaluate the processor and disk expressions before creating the computer object, but I can do things pretty much any way I wish. In many ways the DSL script is like an external DSL encoded in internal literal collection syntax instead of a string.

This form mixes lists and maps, but it's also possible to do this using only lists, which might appropriately be called Greenspun Form.

[:computer, 
 [:processor, 
  [:cores, 2,], 
  [:type, :i386]],
 [:disk, 
  [:size, 150]],
 [:disk, 
  [:size, 75], 
  [:speed, 7200], 
  [:interface, :sata]]]

All I've really done here is replace each map with a list of two element sub-lists, where each sub-list is a key and value.

The main loading code is the same, breaking down handling the symbolic expression (sexp) for the computer into a processor and several disks.

class ListOnlyBuilder...
  def load aStream
    @result = handle_computer(eval(aStream))
  end
  def handle_computer sexp
    check_head :computer, sexp
    processor = handle_processor(sexp[1])
    disks = sexp[2..-1].map{|e| handle_disk e}
    return Computer.new(processor, *disks)
  end

The difference comes with the sub-clauses, which need some extra code that's the equivalent to looking up things in a map.

class ListOnlyBuilder...
  def handle_processor sexp
    check_head :processor, sexp
    check_arg_keys sexp, [:cores, :type]
    return Processor.new(select_arg(:cores, sexp),
                         select_arg(:type, sexp))
  end
  def handle_disk sexp
    check_head :disk, sexp
    check_arg_keys sexp, [:size, :speed, :interface]
    return Disk.new(select_arg(:size, sexp),
                    select_arg(:speed, sexp), 
                    select_arg(:interface, sexp))
  end
  def select_arg key, list
    assoc = list.tail.assoc(key)
    return assoc ? assoc[1] : nil    
  end

Using only lists does result in a more regular DSL script, but using a list of pairs as a map doesn't fit in so well with ruby's style. Either case isn't as good as the earlier example that mixed function calls with Literal Collection Expression.

Yet this approach of nested lists does lead us to another place where this style is natural. As many readers will have long ago realized, this is essentially what lisp looks like. In lisp the DSL script might look like this.

(computer 
 (processor 
  (cores 2) 
  (type i386))
 (disk 
  (size 150))
 (disk 
  (size 75) 
  (speed 7200) 
  (interface sata)))

The list structure is a lot clearer in lisp. Bare words are symbols by default, and since expressions are either atoms or lists there's no need for commas.

Significant Revisions

06 Mar 08: First Draft