WORK-IN-PROGRESS: - this material is still under development
Compose functions by nesting function calls as arguments of of other calls.
computer(
processor(
cores(2),
Processor.Type.i386
),
disk(
size(150)
),
disk(
size(75),
speed(7200),
Disk.Interface.SATA
)
);
The most notable property of Nested Function is the way it affects the
evaluation order of its arguments. Function Sequence and Method Chaining both
evaluate the functions in a left-to-right sequence. Nested Function evaluates
the arguments of a function before the enclosing function itself. I
find this most memorable with the "Old Macdonald" example: to sing the
chorus you type o(i(e(i(e()))). This evaluation order has
an impact on both how to use Nested Function and when to chose it compared to
alternatives.
Evaluating the enclosing function last can be very handy in that it provides a built-in context to work with the arguments. Consider defining a computer processor configuration.
processor( cores(2), speed(2100), type(i386) )
The nice thing here is that the argument functions can return fully formed values which the processor function can then assemble into its return value. Since the processor function evaluates last, we don't need to worry about the stopping problem of Method Chaining, nor do we need to have the Context Variable that we need for Function Sequence.
With mandatory elements in the grammar, along the lines of
parent::= first secondNested Function works particularly
well. A parent function can define exactly the arguments required in
the child functions and with a statically typed language can also
define the return types which enables IDE completion.
One issue with function arguments is how to label them so as to
make them readbale. Consider indicating a the size and speed of a
disk. The natural programming response is disk(150, 7200)
but this isn't terribly readable as there's no indication what the
numbers mean unless you have a language with keyword arguments. A way
to deal with this is to use a wrapping function that does nothing
other than provide a name: disk(size(150),
speed(7200)). In the simplest form of this the wrapping
function just returns the argument value - as a result it's pure
syntactic sugar. It also means that there's no enforcement of the
meaning of these functions - a call to disk(speed(7200),
size(150)) could easily result in a very slow disk. You can
avoid this by making the nested functions return intermediate data
such as a builder or token - although that is more effort to set
up.
Optional arguments can also present problems. If the base language supports default arguments for functions, you can use these for the optional case. If you don't have this one approach is to define different functions for each combination of the optional arguments. If you only have a couple of cases this is tedious but reasonable. As the number of optional arguments increase so does the tediousness (but not the reasonableness). One way out of this problem is to use intermediate data again - tokens can be a particularly effective choice.
If your language supports it, a Literal Map is often a good way out of these quandries. In this case you get just the right data structure to deal with the issue. The only problem is that C-like languages don't usually support Literal Map.
With multiple arguments of the same thing, a varargs parameter is the best choice if the host language supports it. You can also think of this as a nested Literal List. Mutliple arguments of differnet kinds end up being like optional arguments, with the same complications.
The worst case of this is a grammar like parent::= (this |
that)*. The issue here is that, unless you have keyword
arguments, the only way to identify the arguments is through their
position and type. This can make picking out which argument is which
messy - and downright impossible if this and
that have the same types. Once this happens you are
forced into either returning intermediate results, or using a
Context Variable. Using a Context Variable is particularly difficult here since the
parent function isn't evaluated till the end, forcing you to use the
broader context of the langauge to properly set up the Context Variable.
In order to keep the DSL readable, you usually want Nested Functions to be
bare function calls. This implies you either need to make them global
functions or use Object Scoping. Since global
functions are problematic, I usually look to use Object Scoping if I can. However global functions can often
much less problematic in Nested Function because the biggest problem with
global functions is when they come with global parsing state. A global
function that just returns a value, such as a static method like
DayOfWeek.MONDAY is often a good choice.
One of the great strengths, and weakneesses of Nested Function is the order of evaluation. With Nested Function the arguments are evaluated before the parent function (unless you use closures for arguments). This is very useful for building up a hierarchy of values because you can have the arguments create fully formed framework objects which can be assembled by the parent function. This can avoid much of the mucking about with replacements and intermediate data that you get with Function Sequence and Method Chaining.
Conversely this evaluation order causes problems in a sequence of
commands leading to the Old Macdonald problem:
o(i(e(i(e()))). So for a sequence that you want to read
left to right, Function Sequence or Method Chaining are usually a better bet. For precise control
of when to evaluate multiple arguments, use Nested Closure.
Nested Function also often struggles with optional arguments and multiple varied arguments. Nested Function very much expects to say what you want and in the precise order, so if you need greater flexibility you'll need to look to Method Chaining or a Literal Map. A Literal Map is often a good choice as it allows you to get the arguments sorted out before calling the parent while giving you the flexibility of ordering and optionality of the arguments, particularly with a hash argument.
Another disadvantage of Nested Function is the punctuation, which usually relies on matching brackets and putting commas in the right place. At its worst this can look like a disfigured lisp, with all the parentheses and added warts. This is less of an issue for DSLs aimed at programmers, who get more used to these warts.
Name clashes are less of a trouble here than with Function Sequence, since the parent function provides context to interpret the nested function call. As a result you can hapily use "speed" for processor speed and disk speed and use the same function as long as the types are compatable.
Here's the script the common running example of stating the configuration of a simple computer
computer(
processor(
cores(2),
Processor.Type.i386
),
disk(
size(150)
),
disk(
size(75),
speed(7200),
Disk.Interface.SATA
)
);
For this case each clause in the script returns a framework object, so I can use the nested evaluation order to build up the entire expression without using Context Variables. I'll start from the bottom, looking at the processor clause.
class Builder...
static Processor processor(int cores, Processor.Type type) {
return new Processor(cores, type);
}
static int cores(int value) {
return value;
}
I've defined the builder functions as static functions on a builder
class. By using Java's static import feature I can use bare function
calls to invoke the functions. (Is it only me who finds it confusing
that we call them "static imports" but have to declare them with
import static.) I also use static imports to bring in
enum types defined by the framework which I can easily use directly
here. In case you skipped dessert before reading this I've included a
pure sugar (sucratic?) cores funtion for readability.
The disk clase has optional arguments. Since there's only a couple I'll nap for a while I write out the combination of functions.
class Builder...
static Disk disk(int size, int speed, Disk.Interface iface) {
return new Disk(size, speed, iface);
}
static Disk disk(int size) {
return disk(size, Disk.UNKNOWN_SIZE, null);
}
static Disk disk(int size, int speed) {
return disk(size, speed, null);
}
static Disk disk(int size, Disk.Interface iface) {
return disk(size, Disk.UNKNOWN_SIZE, iface);
}
For the top level computer clause, I use varargs parameter to handle the multiple disks.
class Builder...
static Computer computer(Processor p, Disk... d) {
return new Computer(p, d);
}
One of the trickier areas to use Nested Function is where you have multiple arguments of different kinds. Consider a language for defining properties of an onscreen box.
box(
topBorder(2),
bottomBorder(2),
leftMargin(3),
transparent
);
box(
leftMargin(2),
rightMargin(5)
);
In this situation we can have any number of a wide combination of properties to set. There's no strong reason to force an order in declaring the properties, so the usual style of argument identification in C# (position) doesn't work too well. For this example I'll explore using tokens to identify the arguments to compose them into the structure.
Here's a look at the target framework object.
class Box {
public bool IsTransparent = false;
public int[] Borders = { 1, 1, 1, 1 }; //TRouBLe - top right bottom left
public int[] Margins = { 0, 0, 0, 0 }; //TRouBLe - top right bottom left
The various contained functions all return a token data type, which looks like this
class BoxToken {
public enum Types { TopBorder, BottomBorder, LeftMargin, RightMargin, Transparent }
public readonly Types Type;
public readonly Object Value;
public BoxToken(Types type, Object value) {
Type = type;
Value = value;
}
I'm using Object Scoping and defined the clauses of the DSL as functions on the builder supertype.
class Builder...
protected BoxToken topBorder(int arg) {
return new BoxToken(BoxToken.Types.TopBorder, arg);
}
protected BoxToken transparent {
get {
return new BoxToken(BoxToken.Types.Transparent, true);
}
}
I'm only showing a couple of them, but I'm sure you can deduce from these what the rest look like.
The parent function now just runs through the argument results and assembles a box.
class Builder...
protected void box(params BoxToken[] args) {
Box newBox = new Box();
foreach (BoxToken t in args) updateAttribute(newBox, t);
boxes.Add(newBox);
}
List<Box> boxes = new List<Box>();
private void updateAttribute(Box box, BoxToken token) {
switch (token.Type) {
case BoxToken.Types.TopBorder:
box.Borders[0] = (int)token.Value;
break;
case BoxToken.Types.BottomBorder:
box.Borders[2] = (int)token.Value;
break;
case BoxToken.Types.LeftMargin:
box.Margins[3] = (int)token.Value;
break;
case BoxToken.Types.RightMargin:
box.Margins[1] = (int)token.Value;
break;
case BoxToken.Types.Transparent:
box.IsTransparent = (bool)token.Value;
break;
default:
throw new InvalidOperationException("Unreachable");
}
}
Most languages differentiate between different function arguments
by their position. So in the above example, we might set the size
and speed of a disk with a function like disk(150,
7200). That bare function isn't too readable, so in the above
example I wrapped the numbers with simple functions to get
disk(size(150), speed(7200)). In the earlier code
example the functions just return their argument, which aids
readability but doesn't prevent someone typing the erroneous
disk(speed(7200), size(150)).
Using simple tokens, like in the Box example, provides a mechanism for error checking. By returning a token of [size, 150] you can use the token type to check you have the right argument in the right position, or indeed make the arguments work in any order.
Checking is all very well, but in a statically typed language with modern IDE you want to go further. You want code completion pop ups to force you to put size before speed. By using subclasses you can pull this off.
The tokens I used above used the token type as a property of the token. The alternative is to create a different subtype for each token, I can then use the subtype for in the parent function definition.
Here's the short script I want to support.
disk(
size(150),
speed(7200)
);
Here's the target framework object
public class Disk {
private int size, speed;
public Disk(int size, int speed) {
this.size = size;
this.speed = speed;
}
public int getSize() {
return size;
}
public int getSpeed() {
return speed;
}
}
To handle size and speed I create a general integer token with subclasses for the two kinds of clause
public class IntegerToken {
private final int value;
public IntegerToken(int value) {
this.value = value;
}
public int getValue() {
return value;
}
}
public class SpeedToken extends IntegerToken {
public SpeedToken(int value) {
super(value);
}
}
public class SizeToken extends IntegerToken {
public SizeToken(int value) {
super(value);
}
}
I can then define static functions in a builder that define the right arguments.
class Builder...
public static Disk disk(SizeToken size, SpeedToken speed){
return new Disk(size.getValue(), speed.getValue());
}
public static SizeToken size (int arg) {
return new SizeToken(arg);
}
public static SpeedToken speed (int arg) {
return new SpeedToken(arg);
}
With these set up the IDE will only suggest the right functions in the right places and I'll see comforting red squigglies should I do any reckless typing.
I used to live in the South End of Boston. There was much to like about living in a downtown area of the city, close to restaurents and other ways to pass the time and spend my money. There were irritations, however, and one of them was street cleaning. On the first and third Monday of the month between April and October they would clean the streets near my apartment and I had to be sure I didn't leave my car there. Often I forgot and I got a ticket.
The rules for my street was that the cleaning occurred on the first and third Monday of the month between april and october. I could write a DSL expression for this:
Schedule.First(DayOfWeek.Monday)
.And(Schedule.Third(DayOfWeek.Monday))
.From(Month.April)
.Till(Month.October);
This example combines Method Chaining with Nested Function. Usually when I use Nested Function I prefer to combine it with Object Scoping, but in this case the functions that I'm nesting just return a value so I don't really feel a strong need to use Object Scoping.
Recurring events are a recurring event in software systems. You often want to schedule things on particular combinations of dates like that. The way I think of them these days is that they are a Specification of dates. We want code that can tell us if a given date is included on a schedule. We do this by defining a general specification interface, which we can make generic as specifications are useful in all sorts of situations.
internal interface Specification<T> {
bool Includes(T arg);
}
When building a specification model for a particular type, I like to identify small building blocks that I can combine together. One small building block is the notion of a particular period in a year, such as between April and October.
internal class PeriodInYear : Specification<DateTime>
{
private readonly int startMonth;
private readonly int endMonth;
public PeriodInYear(int startMonth, int endMonth) {
this.startMonth = startMonth;
this.endMonth = endMonth;
}
public bool Includes(DateTime arg) {
return arg.Month >= startMonth && arg.Month <= endMonth;
}
Another element is the notion of the first monday in the month. This class is a little more tricky as I have to walk through sample dates in the month to see which one is the first.
[TBD: Move index check ] internal class DayInMonth : Specification<DateTime> {
private readonly int index;
private readonly DayOfWeek dayOfWeek;
public DayInMonth(int index, DayOfWeek dayOfWeek) {
this.index = index;
this.dayOfWeek = dayOfWeek;
}
public bool Includes(DateTime arg) {
if (index <= 0) throw new NotSupportedException("index must be positive");
int currentMatch = 0;
foreach (DateTime d in new MonthEnumerator(arg.Month, arg.Year)) {
if (d > arg) return false;
if (d.DayOfWeek == dayOfWeek) {
currentMatch++;
if (currentMatch == index) return (d == arg);
}
}
return false;
}
}
To walk through the days in a month, this specification makes use of a special enumerator. I set the enumerator with a particular month and year.
internal class MonthEnumerator : IEnumerator<DateTime>, IEnumerable<DateTime> {
private int year;
private Month month;
public MonthEnumerator(int month, int year) {
this.month = new Month(month);
this.year = year;
Reset();
}
It implements the IEnumerator methods.
class MonthEnumerator...
private DateTime current;
DateTime IEnumerator<DateTime>.Current { get { return current; } }
public object Current { get { return current; } }
public void Reset() {
current = new DateTime(year, month.Number, 1).AddDays(-1);
}
public void Dispose() {}
public bool MoveNext() {
current = current.AddDays(1);
return month.Includes(current);
}
And also implements IEnumerable to allow it to be used in a foreach loop.
class MonthEnumerator...
IEnumerator<DateTime> IEnumerable<DateTime>.GetEnumerator() {
return this;
}
public IEnumerator GetEnumerator() {
return this;
}
Also taking part is a very simple Month class, which also acts as a specification.
private readonly int number;
public int Number { get { return number; } }
public Month(int number) {
this.number = number;
}
public bool Includes(DateTime arg) {
return number == arg.Month;
}
These are useful building blocks, but can't do much their own. To really make them sing and dance I need to able to combine them into logical expressions, which I do with a couple more specifications.
abstract class CompositeSpecification<T> : Specification<T> {
protected IList<Specification<T>> elements = new List<Specification<T>>();
public CompositeSpecification(params Specification<T>[] elements) {
this.elements = elements;
}
public abstract bool Includes(T arg);
}
internal class AndSpecification<T> : CompositeSpecification<T> {
public AndSpecification(params Specification<T>[] elements)
: base(elements) {}
public override bool Includes(T arg) {
foreach (Specification<T> s in elements)
if (! s.Includes(arg)) return false;
return true;
}
}
internal class OrSpecification<T> : CompositeSpecification<T> {
public OrSpecification(params Specification<T>[] elements)
: base(elements) {}
public override bool Includes(T arg) {
foreach (Specification<T> s in elements)
if (s.Includes(arg)) return true;
return false;
}
}
I trust you can figure out how to implement a NotSpecfication.
One thing I don't like about this framework is my usage of the DateTime class. The problem is that DateTime has sub-second precision, but I'm only working at day precision. Using over-precise temporal data types is very common, becuase usually libraries push us in that direction. However they can easily result in awkward bugs when you compare two DateTimes that are different below the level of precision you care about. If I were doing this on a real project I'd make a proper Date class with the correct precision.
Here's the DSL text for my old street cleaning schedule.
Schedule.First(DayOfWeek.Monday)
.And(Schedule.Third(DayOfWeek.Monday))
.From(Month.April)
.Till(Month.October);
Like most realistic DSLs it uses a combination of internal DSL technique, here a mix of Method Chaining and Nested Function. I'm not going to worry too much about the Method Chaining here, instead I'll concentrate on the way that Nested Function is used. Since each Nested Function returns a simple value I don't find a strong need for Object Scoping as they won't need any Context Variables. As a result I'll use static methods. As I'm in C# this means all the static methods need to be prefixed with their class name. This reads pretty well, although it does add noise compared to an Object Scoping approach.
Two of the Nested Functions are calls to return a simple
value. DayOfWeek.Monday is actually built into the
.NET libraries. I added Month.April and friends myself.
class Month...
public static readonly Month January = new Month(1);
public static readonly Month February = new Month(2);
// I don't need to show more do I?
The calls on Schedule are a bit different. The initial use of
Schedule.First is an example of a common feature in
these languages - using a bare function to create a starting object
to begin the chaining. Schedule here is an Expression Builder. It's not called "builder" because I
think it reads better as just "schedule".
class Schedule...
public static Schedule First(DayOfWeek dayOfWeek) {
return new Schedule(new DayInMonth(1, dayOfWeek));
}
Like most Expression Builders, the schedule builds up a content, which is a specification.
class Schedule...
private Specification<DateTime> content;
public Specification<DateTime> Content { get { return content; } }
public Schedule(Specification<DateTime> content) {
this.content = content;
}
Notice how the initial call returns a schedule that wraps the
first element in the specification. The later call to
Third is the same (except for the parameter). I
would usually argue against writing different methods for
something that would be better handled as a parameter, but this
is yet another example where you have different rules of good
programming when you use an Expression Builder.
It's the Method Chaining that actually builds up the composite structure. Here's the interestingly named "and" method.
class Schedule...
public Schedule And(Schedule arg) {
content = new OrSpecification<DateTime>(content, arg.content);
return this;
}
We say "first and third monday" in our language, but in terms of the specification it's the first or third monday that matches the boolean condition. It's an interesting example of where the DSL is opposite to the model in order for both to read naturally.
The period at the end is similarly assembled using Method Chaining calls.
class Schedule...
public Schedule From(Month m) {
Debug.Assert(null == periodStart);
periodStart = m;
return this;
}
public Schedule Till(Month m) {
Debug.Assert(null != periodStart);
PeriodInYear period = new PeriodInYear(periodStart.Number, m.Number);
content = new AndSpecification<DateTime>(content, period);
return this;
}
Here I use a Context Variable to properly build up the period.
This example uses simple static methods for the Nested Functions, would it benefit by getting rid of the class names? I think it would read better to say "Monday" rather than "DayOfWeek.Monday". Object Scoping would provide this at the cost of requiring the inheritance relationship. In java I could use static imports. The gain isn't huge but would probably be worthwhile.