WORK-IN-PROGRESS: - this material is still under development
Form language expressions using literal collection syntax
Forming a language involves composing expressions into larger structures. Many languages provide convenient mechanism to express collections, such as lists and maps, with a concise literal syntax. Literal Collection Expression uses this mechanism to express language constructs.
[TBD: I intend to split this into separate patterns for Literal List and Lieral Map.]Much of computer programming involves manipulating collections of things. As a result almost every computer language provides some data structure for representing a collection. The most common historical collection is the array, which ends up being a particularly awkward structure to use. As object-oriented systems have grown in popularity, we've seen libraries emerge which handle a better range of collections. Most OO systems now provide lists, and maps (also called dictionaries, hashes, associative arrays). Although the ease of manipulating these structures varies greatly with the platform, they allow you to build handy structures quite easily
Scripting languages, in particular, provide not just handy libraries for collections, but also special syntax for using them. If you have a syntax to easily express lists and maps, you can use this syntax as a tool for a DSL.
Lists are, unsurprisingly, particularly good for multiple items
of the same thing (parent
::= child*). Maps work well for an unordered argument list,
particularly when there are a lot of optional arguments.
You can nest lists to create all sorts of structures, this is what Lispers have been doing for a long time. I don't think I would do this outside of lisp, however. A list of function calls is usually a better fit with more mainstream languages.
Although many languages don't provide a good literal collection syntax, you can often get a literal list by using a variable argument function call. Similarly literal maps can be implemented by named parameters - although these are much rarer.
There are two forms of Literal Collection Expression, lists and maps and they differ in when best to use them.
Lists work well when you have multiple items nested inside another
element, usually a function call. A logical grammar such as (parent
::= child*). In a statically typed language they all need to
be of the same type (if only supertype), but in a dynamically typed
language you may mix completely different things. If you do nest a
list inside a function call, usually the best way to do it is with a
varargs argument.
Maps work best when you have multiple optional arguments that can appear in any order. Remember that for this to work each argument can only appear once. If you want multiple values for an argument you'll need to nest a list into the value of the map's element. Method Chaining is a good alternative approach for this situation. If you need to fix the order of the arguments, then Nested Function or Method Chaining are usually better choices.
Literal Collection Expression doesn't work very well at the top level of your language. The various statemments of a DSL are usually better handled with Function Sequence, or possibly Nested Function.
You can only use Literal Collection Expression if your language supports it, sadly many popular languages don't.
The contra-arguments to Literal Collection Expression that I've just gone through don't hold for lisp, which is designed with literal lists in mind, so you should use them much more widely.
Following its scripting language tradition, Ruby provides very good literal syntax for lists and maps. Here's an example of using this syntax for the computer configuration example.
computer(processor(:cores => 2, :type => :i386),
disk(:size => 150),
disk(:size => 75, :speed => 7200, :interface => :sata))
I'm not using entirely Literal Collection Expression here, as usual it's good to mix
Literal Collection Expression with other techniques. (I will explore using Literal Collection Expression
exclusively later on.) Here I have three functions:
computer, processor, and
disk. Each of these functions takes a collection as an
argument: computer takes a list, the others take a
map. I'm using Object Scoping with a builder class
that implements the functions. Since this is Ruby I can use
instance_eval to evaluate the DSL script in the context
of an instance of the builder, which saves me from having to make a
subclass.
I'll start with processor.
class MixedLiteralBuilder...
def processor map
check_keys map, [:cores, :type]
return Processor.new(map[:cores], map[:type])
end
Making use of a map is simple, I just pick the necessary items out of the map using the keys. The danger with using a map like this is that it's easy for the caller to introduce an incorrect key by accident, so it's worth a little checking here.
class MixedLiteralBuilder...
def check_keys map, validKeys
bad_keys = map.keys - validKeys
raise IncorrectKeyException.new(bad_keys) unless bad_keys.empty?
end
class IncorrectKeyException < Exception
def initialize bad_keys
@bad_keys = bad_keys
end
def to_s
"unrecognized keys: #{@bad_keys.join(', ')}"
end
end
I use the same approach for the disk
class MixedLiteralBuilder...
def disk map
check_keys map, [:size, :speed, :interface]
return Disk.new(map[:size], map[:speed], map[:interface])
end
Because everything is a simple value, I can create the domain object and return it within each Nested Function. The computer function can create the computer object, using a vararg for the multiple disks.
class MixedLiteralBuilder...
def computer proc, *disks
@result = Computer.new(proc, *disks)
end
(Using a "*" in the argument list helps in working with variable arguments in ruby. In the argument list *disks indicates a vararg. I can then refer to all the disks passed in as an array named "disks". If I call another function with "*disks" the elements of the disks array are passed in as separate arguments.)
To process the DSL script, I get the builder to evaluate the script
using instance_eval.
class MixedLiteralBuilder...
def load aStream
instance_eval aStream
end
As with other elements in an internal DSL, a good DSL uses several different techniques together. So in the previous example I used Nested Function as well as Literal Collection Expression. Sometimes, however, it's interesting to push a single technique as far as it can go just to get a sense of how it turns out. It's quite possible to write even a fairly complex DSL expression using only Literal Collection Expression, let's see what that might look like.
[:computer,
[:processor, {:cores => 2, :type => :i386}],
[:disk, {:size => 150}],
[:disk, {:size => 75, :speed => 7200, :interface => :sata}]
]
In this version I've replaced all the function calls with literal lists, where the first element in the list is the name of the item to be processed and the rest of the list contains the arguments. I can process this array by first evaluating the ruby code and then passing it to a method that interprets the computer expression.
class LiteralOnlyBuilder...
def load aStream
@result = handle_computer(eval(aStream))
end
I handle each expression by checking the first element of the array and then processing the other elements.
class LiteralOnlyBuilder...
def handle_computer anArray
check_head :computer, anArray
processor = handle_processor(anArray[1])
disks = anArray[2..-1].map{|e| handle_disk e}
return Computer.new(processor, *disks)
end
def check_head expected, array
raise "error: expected #{expected}, got #{array.first}" unless
array.first == expected
end
As you can see this is essentially following the form of a Recursive Descent Parser. I say that the computer clause has a processor and multiple disks and I call the methods to process them returning a newly created computer.
Handling a processor is straightforward, just unpicking the arguments out of the provided map.
class LiteralOnlyBuilder...
def handle_processor anArray
check_head :processor, anArray
check_arg_keys anArray, [:cores, :type]
args = anArray[1]
return Processor.new(args[:cores], args[:type])
end
def check_arg_keys array, validKeys
bad_keys = array[1].keys - validKeys
raise IncorrectKeyException.new(bad_keys) unless bad_keys.empty?
end
Handling the disks works the same way
class LiteralOnlyBuilder...
def handle_disk anArray
check_head :disk, anArray
check_arg_keys anArray, [:size, :speed, :interface]
args = anArray[1]
return Disk.new(args[:size], args[:speed], args[:interface])
end
One thing to notice about this approach is that it gives me complete control over the order of evaluation of the elements of the language. I choose here to evaluate the processor and disk expressions before creating the computer object, but I can do things pretty much any way I wish. In many ways the DSL script is like an external DSL encoded in internal literal collection syntax instead of a string.
This form mixes lists and maps, but it's also possible to do this using only lists, which might appropriately be called Greenspun Form.
[:computer, [:processor, [:cores, 2,], [:type, :i386]], [:disk, [:size, 150]], [:disk, [:size, 75], [:speed, 7200], [:interface, :sata]]]
All I've really done here is replace each map with a list of two element sub-lists, where each sub-list is a key and value.
The main loading code is the same, breaking down handling the symbolic expression (sexp) for the computer into a processor and several disks.
class ListOnlyBuilder...
def load aStream
@result = handle_computer(eval(aStream))
end
def handle_computer sexp
check_head :computer, sexp
processor = handle_processor(sexp[1])
disks = sexp[2..-1].map{|e| handle_disk e}
return Computer.new(processor, *disks)
end
The difference comes with the sub-clauses, which need some extra code that's the equivalent to looking up things in a map.
class ListOnlyBuilder...
def handle_processor sexp
check_head :processor, sexp
check_arg_keys sexp, [:cores, :type]
return Processor.new(select_arg(:cores, sexp),
select_arg(:type, sexp))
end
def handle_disk sexp
check_head :disk, sexp
check_arg_keys sexp, [:size, :speed, :interface]
return Disk.new(select_arg(:size, sexp),
select_arg(:speed, sexp),
select_arg(:interface, sexp))
end
def select_arg key, list
assoc = list.tail.assoc(key)
return assoc ? assoc[1] : nil
end
Using only lists does result in a more regular DSL script, but using a list of pairs as a map doesn't fit in so well with ruby's style. Either case isn't as good as the earlier example that mixed function calls with Literal Collection Expression.
Yet this approach of nested lists does lead us to another place where this style is natural. As many readers will have long ago realized, this is essentially what lisp looks like. In lisp the DSL script might look like this.
(computer (processor (cores 2) (type i386)) (disk (size 150)) (disk (size 75) (speed 7200) (interface sata)))
The list structure is a lot clearer in lisp. Bare words are symbols by default, and since expressions are either atoms or lists there's no need for commas.