Dsl bliki
DomainSpecificLanguage, DslBoundary, DslReadings, EmbedmentHelper, ExpressionBuilder, FlexibleAntlrGeneration, FluentInterface, HelloRacc, InstallingOpenArchitectureWare, InternalDslStyle, LanguageWorkbenchReadings, MDSDandDSL, MetaProgrammingSystem, ParserFear, RubyAnnotations, SyntacticNoise
| MDSDandDSL |
dsl |
14 July 2008 |
Reactions |
|
What is the connection between
ModelDrivenSoftwareDevelopment (MDSD) and
DomainSpecificLanguages (DSLs)?
It's pretty common to see the term "DSL" crop up in the context
of MDSD. Indeed some MDSD people seem to think that DSLs only exist
within the MDSD world. I've been writing a lot on DSLs recently for
my book, but so far I haven't really touched on the MDSD angle much
Instead I've concentrated on DSLs role in more conventional
programming. DSLs exist in both the textual language and MDSD worlds
and play pretty much the same role for both. In an MDSD context DSLs are again a language targeted at a
specific kind of problem as opposed to general purpose languages
such as the UML. As a result they can have the same kind of
relationship: build a system in the general purpose modeling
language and use DSLs for various specific aspects. Since MDSD
hasn't caught on that much, however, you also see a different
approach where modeling DSLs are used in the context of a
traditional language environment. Here you might use several
modeling DSLs that generate Java code to be combined in a Java
project. In this case there's no general purpose MDSD model around -
you use MDSD for each DSL relatively independently. In order to use model-oriented DSLs you need a different,
RepositoryBasedCode,
approach to tooling. This introduces quite a few pragmatic issues as
the general support environment for such tools is less
established. In order to define your own DSLs you need more
specialized tooling - something I call a Language
Workbench. DSLs seem to have a proportionately higher emphasis in the MDSD
world than they do in the mainstream programming world. Cynics think
this is a result of the MDSD community desperately searching for a
way to remain relevant, fans of MDSD regard it as a sign of MDSD's
superior sophistication. I think this is mainly due to the fact that
the MDSD community is smaller and has far less in the form of
established practice. A particularly visible sub-community of MDSD is centered around
ModelDrivenArchitecture (MDA). I'm not much of a fan of MDA in
particular, but am particularly
skeptical of MDA DSLs. There is much that model-oriented DSLs share with textual DSLs. I
put a lot of emphasis with textual DSLs in basing work around a
Semantic
Model. MDSD, as its name indicates, is very much about driving a
system from that kind of a model. A difference is that most MDSD
people assume that you'll want to generate code from that model
rather than executing the model directly. As I write this, I'm not sure how much I'm going to cover
language workbenches in my book. Certainly I'll at least discuss the
overall concept behind them, but the coverage may not be that
deep. This will be partly due to the large amount of material I seem
to be generating on textual DSLs and partly due to the fact that
language workbenches are much newer and thus more volatile and less
mature.
|
| SyntacticNoise |
dsl |
9 June 2008 |
Reactions |
|
A common phrase that's bandied about when talking about
DomainSpecificLanguages (or indeed any computer language) is that of
noisy syntax. People may say that Ruby is less noisy than Java, or
that external DSLs are less noisy than internal DSLs. By Syntactic
Noise, what people mean is extraneous characters that aren't part of
what we really need to say, but are there to satisfy the language
definition. Noise characters are bad because they obscure the meaning
of our program, forcing us to puzzle out what it's doing. Like many concepts, syntactic noise is both loose and subjective,
which makes it hard to talk about. A while ago Gilhad Braha tried to
illustrate his perception of syntactic noise during a talk at
JAOO. Here I'm going to have a go at a similar approach and apply it
to several formulations of a DSL that I'm using in my current
introduction in my DSL book. (I'm using a subset of the example state
machine, to keep the text a reasonable size.) In his talk he illustrated noise by coloring what he considered to
be noise characters. A problem with this, of course, is this requires
us to define what we mean by noise characters. I'm going to side-step
that and make a different distinction. I'll distinguish between what
I'll call domain text and punctuation. The DSL scripts I'm looking at
define a state machine, and thus talk about states, events, and
commands. Anything that describes information about my particular
state machine - such as the names of states - I'll define as domain
text. Anything else is punctuation and I'll highlight the latter in
red. I'll start with the custom syntax of an external DSL. events
doorClosed D1CL
drawOpened D2OP
lightOn L1ON
end
commands
unlockDoor D1UL
lockPanel PNLK
end
state idle
actions {unlockDoor lockPanel}
doorClosed => active
end
state active
drawOpened => waitingForLight
lightOn => waitingForDraw
endA custom syntax tends to minimize noise, so as a result you see
relatively small amount of punctuation here. This text also makes
clear that we need some punctuation. Both events and commands are
defined by giving their name and their code - you need the punctuation
in order to tell them apart. So punctuation isn't the same as noise, I
would say that the wrong kind of punctuation is noise, or too much
punctuation is noise. In particular I don't think it's a good idea to
try to reduce punctuation to the absolute minimum, too little
punctuation also makes a DSL harder to comprehend. Let's now look at an internal DSL for the same domain information
in Ruby. event :doorClosed, "D1CL"
event :drawOpened, "D2OP"
event :lightOn, "L1ON"
command :lockPanel, "PNLK" 
command :unlockDoor, "D1UL" 
state :idle do 
actions :unlockDoor, :lockPanel
transitions :doorClosed => :active
end 
state :active do 
transitions :drawOpened => :waitingForLight, 
:lightOn => :waitingForDraw
end 
Now we see a lot more punctuation. Certainly I could have made some
choices in my DSL to reduce punctuation, but I think most people would
still agree that a ruby DSL has more punctuation than a custom
one. The noise here, at least for me, is the little things: the ":" to
mark a symbol, the "," to separate arguments, the '"' to quote
strings. One of the main themes in my DSL thinking is that a DSL is a way to
populate a framework. In this case the framework is one that describes
state machines. As well as populating a framework with a DSL you can
also do it with a regular push-button API. Let's color the punctuation
on that. Event doorClosed = new Event("doorClosed", "D1CL"); 
Event drawOpened = new Event("drawOpened", "D2OP"); 
Event lightOn = new Event("lightOn", "L1ON"); 
Command lockPanelCmd = new Command("lockPanel", "PNLK"); 
Command unlockDoorCmd = new Command("unlockDoor", "D1UL"); 
State idle = new State("idle"); 
State activeState = new State("active"); 
StateMachine machine = new StateMachine(idle); 
idle.addTransition(doorClosed, activeState);
idle.addCommand(unlockDoorCmd);
idle.addCommand(lockPanelCmd);
activeState.addTransition(drawOpened, waitingForLightState);
activeState.addTransition(lightOn, waitingForDrawState);Here's a lot more punctuation. All sorts of quotes and brackets as
well as method keywords and local variable declarations. The latter
present an interesting classification question. I've counted the
declaring of a local variable as punctuation (as it duplicates the
name) but it's later use as domain text. Java can also be written in a fluent way, so here's the fluent
version from the book. Events doorClosed, drawOpened, lightOn; 
Commands lockPanel, unlockDoor; 
States idle, active; 
protected void defineStateMachine() { 
doorClosed. code("D1CL"); 
drawOpened. code("D2OP"); 
lightOn. code("L1ON"); 
lockPanel. code("PNLK"); 
unlockDoor. code("D1UL"); 
 
idle 
.actions(unlockDoor, lockPanel) 
.transition(doorClosed).to(active) 
; 
 
active 
.transition(drawOpened).to(waitingForLight) 
.transition(lightOn). to(waitingForDraw) 
; 
} 
 
Whenever two or three are gathered together to talk about syntactic
noise, XML is bound to come up. <stateMachine start = "idle">
<event name="doorClosed" code="D1CL"/>
<event name="drawOpened" code="D2OP"/>
<event name="lightOn" code="L1ON"/>
<command name="lockPanel" code="PNLK"/>
<command name="unlockDoor" code="D1UL"/>
<state name="idle">
<transition event="doorClosed" target="active"/>
<action command="unlockDoor"/>
<action command="lockPanel"/>
</state>
<state name="active">
<transition event="drawOpened" target="waitingForLight"/>
<transition event="lightOn" target="waitingForDraw"/>
</state>
</stateMachine>
I don't think we can read too much into this particular example,
but it does provide some food for thought. Although I don't think we
can make a rigorous separation between useful punctuation and noise,
the distinction between domain text and punctuation can help us focus
on the punctuation and consider what punctuation serves us best. And I
might add that having more characters of punctuation than you
do of domain text in a DSL is a smell. (Mikael Jansson has put out a lisp
version of this example. Mihailo Lalevic did one in JavaScript.)
|
| ParserFear |
dsl |
20 May 2008 |
Reactions |
|
I talk quite a bit with people about DomainSpecificLanguages
these days and a common reaction I get to external DSLs is that it's
hard to write a parser. Indeed one of the justifications for using
XML as the carrier syntax for an external DSL is that "you get the
parser for free". This doesn't jive with my experience - I think
parsers are much easier to write than most people think, and they
aren't really any harder than parsing XML. I even have evidence. Well it's actually only one case, but I'll
quote it anyway as it supports my argument. When I wrote the introductory
example for my book I developed multiple external DSLs to
populate a simple state machine. One of these was using XML (using
it as a gateway drug) another was a custom syntax which I parsed
with the help of Antlr. Writing
the code to fully parse these formats took about the same amount of
time. Although you get an XML parser for free (I used Elliotte Rusty Harold's
excellent XOM framework) the output of an XML parser is effectively
a parse tree in the form of an XML DOM. In order to do anything
useful with that you have to process it
further. My practice with DSLs to is base them around a clear
Semantic
Model, so the true output of parsing in this case is a populated
state machine model. In order to do this I have to write code that
walks its way through the XML DOM. This isn't especially difficult,
particularly since I can use XPath expressions to pick out the bits
of the DOM I'm interested in. Indeed I'm not walking the DOM tree at
all - for each thing I'm interested in I have a method that issues
an XPath query, iterates through the resulting nodes and populates the
state machine model. So the XML processing is easy, but it isn't non existent - around
a hundred lines of code. It took me a couple of hours. I hadn't used
XOM in a while, so there was some familiarization required, but
it's a very easy library to learn and use. The Antlr processing was remarkably similar. Antlr has a notation
that allows you to put some simple rules in the grammar file to
populate an AST. The code to process the AST and populate the
semantic model was very similar to the XML code - query for the
right nodes in the tree and then process them. Including the grammar
file the resulting code is around 250 lines, but took me about the
same amount of time to write. I was familiar with most of Antlr
before this, having used it a few times, but I hadn't actually used
the tree construction stuff before. (If you're interested you can
find a description of this example in my book's work in progress.) Although my explorations of parser generators have got me used to
the fact that they are much easier to write than many people think,
I was surprised when I realized it was actually no slower than the
XML case. In a more carefully controlled example, I would still
expect it to take longer because I did the Antlr example second and as
any programmer knows, things always go much faster with a second
implementation. Even so, the difference is much less than what many
people seem to expect - when the word "parser" seems to mean "too
complicated". I can't deny there is certainly a learning curve to get used to
parser generators. You have to get used to grammar files and how
they interact with code samples. There's different strategies you
can use (what I currently refer to as Tree Construction, Embedded
Translation and Embedded Interpretation). You also have to think
about the syntax of your custom syntax, which involves more decisions
than wondering whether to make something an attribute or an element
in XML. But that curve isn't really that high. Modern tools make it
much easier. Antlr is my current default choice, it comes with a
very nice IDE which helps in exploring grammar expressions and
seeing how they get parsed into an AST. But once you've got used to
how one parser generator works, it's not hard to pick up others. So why is there an unreasonable fear of writing parsers for DSLs?
I think it boils down to two main reasons. - You didn't do the compiler class at university and therefore
think parsers are scary.
- You did do the compiler class at university and are therefore
convinced that parsers are scary.
The first is easy to understand, people are naturally nervous of
things they don't know about. The second reason is the one that's
interesting. What this boils down to is how people come across parsing
in universities. Parsing is usually only taught in a compiler class,
where the context is to parse a full general purpose
language. Parsing a general purpose language is much harder than
parsing a Domain Specific Language, if nothing else because the
grammar will be much bigger and often contain nasty wrinkles which you
can avoid with a DSL. This problem is compounded by encouraging code that
tangles up parsing with output processing and code generation. For me
the key to keeping things straight is to use a Semantic Model, so that
your parser does no more than populate that model. Most of the time I
can then do what I need by just executing that semantic model like any
other OO framework. Most of the time I don't need to do code
generation, and when I do I base it off the semantic model so it's
independent of the parser. I think that if you've got code generation
statements inside your grammars, things are way too coupled together. For people to work effectively with external DSLs they have to be
taught about it quite differently to how you'd teach parsing a general
purpose language. The small size of both the language and the scripts
in the language changes many of the concerns that people typically
have with parsing. Avoiding code generation unless you really need it
can remove a big hunk of the complexity. Using a clear semantic model
can separate out the steps into much more tractable chunks. The problem, of course, is that there isn't much written that
follows these guidelines. (Which is one of the triggers for me to be
spending so much time on it.) You're hard put to find any
documentation out there on parser generator tools. When you do get
some really nice documentation (like Terence Parr's Antlr book) it's still usually written
with a general purpose language mindset. Don't get me wrong, I find the
Antlr book very helpful (it's a big reason why Antlr is my default
choice of parser generator) but I believe that there's an assumption
there of parsing general purpose languages rather than domain specific
languages that makes it harder to approach than it could be. The nice thing, however, with all this is that you can still mount
that learning curve. If you haven't tried working with a parser
generator I'd certainly suggest giving it a try. Try writing a simple
DSL of your own. Don't worry about code generation when you start,
just create a domain model as you normally would and get the DSL to
populate it. Start with something really silly (like I did with
HelloAntlr) and gradually work it up from there. Poke
around some open source projects that use a DSL and see what they
do.
What we're trying to do is introduce the tools that are often
used in compilers but are much more general than that to an audience
that associates the tools only with compilers, because that's how
they've always been taught.
--Rebecca Parsons
|
| InstallingOpenArchitectureWare |
dsl |
27 July 2007 |
Reactions |
|
Update: the procedure and complaints here are no longer
valid. Open ArchitectureWare has released a new version with Eclipse
3.3 that looks like it will install much more easily than what I
just went through. There's also know a packaged distribution
that includes eclipse and all the OAW stuff. There are few things more frustrating than spending hours trying
to install a piece of software and then having to delete everything
and start again. Today at 9.30 I began installing
openArchitectureWare, I finally had it installed (I think) at
15.30. So I thought I'd write this to help someone else do it more quickly. OpenArchitectureWare is a set of tools, based on Eclipse, to
support Model Driven Development. I'm interested in exploring some of
its tools that are oriented towards
DomainSpecificLanguages. (Xtext - which helps you develop
textual languages - is something that's specifically been pointed out
to me as worth looking at.) I don't know how worthwhile these tools
are yet, after all it took me most of the day just to install the
dratted thing, but we'll see. One of my problems with the installation was that I'm not an
Eclipse user - my usual Java IDE is IntelliJ. To install
openArchitectureWare you need to know how to deal with the plugin
system in Eclipse - and I'd never done anything with Eclipse before
so that was new to me. The first step was the easiest one - install Eclipse. I installed
it on my Ubuntu machine, so all I had to do was wajig install
eclipse (wajig is a unified command-line for various debian
packaging and sysadmin tools). Then all hell broke loose. Rather
than go through my miserable morning, I'll explain what I would do now. The trouble with OpenArchitectureWare is that it has
dependencies, other eclipse plugins that need to be installed before
it can work. As anyone with experience in these things knows,
sorting out dependencies can be a right pain without a good
tool. apt-get for Debian and gem for ruby are examples of a
good tool that resolves dependencies. When I installed eclipse,
apt-get knew it had to pull down a whole host of dependencies and
installed them for me. The situation in Eclipse is not so good. To install openArchitectureWare you need a bunch of plugins: EMF,
UML2, ATL, and GMF. I couldn't see from the web pages exactly how to
get these things, or if they had their own dependencies. There are several ways of installing plugins in Eclipse, although
I had to hunt a bit for instructions. The easiest way is a menu option
in Eclipse itself. In the menus pick [Help -> Software Updates -> Find
and Install] (no I don't know why it's on the help menu). With a bit
of button pushing you can get it to download a list of packages - the
relevant source is the Callisto Discovery Site. Once you have that
list downloaded look in the Models and Model Development section and
select Eclipse Modeling Framework (EMF) and Graphical Modeling
Framework (GMF). You'll get an error message saying that these have an
unresolved dependency. Take note of the button on the right that says
'select required'. Hit it and it will find the dependency to GEF and
its dependency on Batik. If you don't see that button and hit it
you'll have a frustrating time trying to find them (believe me, I
know). That gets two of openArchitectureWare's dependencies. The others,
and openArchitectureWare itself need to be done the harder
way. Digging around the eclipse site I found the relevant web pages
for UML2 and ATL. These need to be downloaded as zip files as does openArchitectureWare itself. When you unzip the UML2 and openArchitectureWare folders they
unzip into a folder called eclipse that contains subfolders for
plugins and features. You can take the contents of these folders and
put them into corresponding folders on your load environment (in
my case /usr/local/lib/eclipse). As that didn't work for me when I
tried it first, I found another way. The way to tell if stuff has installed properly is to go to [Help ->
Software Updates -> Manage Configuration]. When you open that you
have the option of "Add an Extension Location". An extension
location is (almost) any directory that contains an eclipse folder
with subfolder for plugins and features. I say almost because the
eclipse folder also needs a file called
.eclipseextension. This is just an empty file so you
can create it with touch .eclipseextension. What I did
is created folders in /usr/local/lib for
openArchitectureWare and uml2-eclipse,
moved the unzipped eclipse folders in there, did touch
.eclipseextension inside each of them and then added them
using "Add an Extension Location". ATL just produces a plugin
directory so I copied the contents of it into the plugin directory
for openArchitectureWare. It's important that you do this after you use the Find and
Install tool because if you do it first, the Find and Install tool
will tell you have an unresolved dependency and refuse to do
anything until you fix it. When I was all installed it tells me
"UML2 End-User Features (2.1.1.v200707181556) requires plug-in
"org.eclipse.emf.ecore.xmi (2.3.0)". I don't know how to fix this
and I have a bunch of emf.ecore jars present in EMF. However the
rest of eclipse seems to
work so far, so I'm carrying on regardless.
|
| DslReadings |
dsl |
13 July 2007 |
Reactions |
|
(See my note on DomainSpecificLanguage for a quick
intro to this topic and my terminology on it.) Update:David Laribee has written a post contrasting what
he calls ordered and unordered fluent interfaces. The distinction is
that ordered fluent interfaces force a particular flow on how you
compose your DSL sentence. He provides an example where he uses
multiple interfaces on a single ExpressionBuilder - the same
technique that's used by JMock. Anders Norås has written two interesting articles on writing
internal DSLs in C#. The first article gives a sample of the DSL and
a discussion against Chromatic's cynical check-list. The second
article goes into details about its implementation. Piers Cawley makes the point that a key characteristic of DSLs
is their narrow focus on a domain.
|
| HelloRacc |
dsl |
30 May 2007 |
Reactions |
|
When I said HelloCup I was looking at a yacc based parser in a
language that didn't require me to handle my dirty pointers. Another
alternative to play with is Ruby which now has a yaccish parser
built in to the standard library - inevitably called racc. Racc has an interesting interplay between ruby and grammar
syntax. You define the grammar with a racc file which will generate
a parser class. Again I'll do my simple hello world case. The input text is
item camera
item laser
I'll populate item objects inside a catalog, using the following
model classes.
class Item
attr_reader :name
def initialize name
@name = name
end
end
class Catalog
extend Forwardable
def initialize
@items = []
end
def_delegators :@items, :size, :<<, :[]
end
Forwardable is a handy library that allows me to
delegate methods to an instance variable. In this case I delegate a
bunch of methods to the @items list.
I test what I read with this.
class Tester < Test::Unit::TestCase
def testReadTwo
parser = ItemParser.new
parser.parse "item camera\nitem laser\n"
assert_equal 2, parser.result.size
assert_equal 'camera', parser.result[0].name
assert_equal 'laser', parser.result[1].name
end
def testReadBad
parser = ItemParser.new
parser.parse "xitem camera"
fail
rescue #expected
end
end
To build the file and run the tests I use a simple rake file.
# rakefile...
task :default => :test
file 'item.tab.rb' => 'item.y.rb' do
sh 'racc item.y.rb'
end
task :test => 'item.tab.rb' do
require 'rake/runtest'
Rake.run_tests 'test.rb'
end
The racc command needs to be installed on your
system. I did it the easy way on Ubuntu with
apt-get. It takes the input file and creates one named
inputFileName.tab.rb. The parser grammar class is a special format, but one that's
pretty familiar to yaccish people. For this simple example it looks
like this:
#file item.y.rb...
class ItemParser
token 'item' WORD
rule
catalog: item | item catalog;
item: 'item' WORD {@result << Item.new(val[1])};
end
The tokens clause declares the token's we get from the lexer. I
use the string 'item' and WORD as a
symbol. The rule clause starts the production rules which are in the
usual BNF form for yacc. As you might expect I can write actions
inside curlies. To refer to the elements of the rule I use the
val array, so val[1] is the equivalent to
$2 in yacc (ruby uses 0 based array indexes, but I've
forgiven it). Should I wish to return a value from the rule
(equivalent to yacc's $$) I assign
it to the variable result. The most complicated part of using racc is to sort out the lexer.
Racc expects to call a method that yields tokens, where each token is a
two-element array with the first element being the type of token
(matching the token declaration) and the second element the value
(what shows up in val - usually the text). You mark the
end of the token stream with [false, false]. The sample
code with racc uses regular expression matching on a string. A better
choice for most cases is to use StringScanner, which is
in the standard ruby library. I can use this scanner to convert a string into an array of tokens.
#file item.y.rb....
---- inner
def make_tokens str
require 'strscan'
result = []
scanner = StringScanner.new str
until scanner.empty?
case
when scanner.scan(/\s+/)
#ignore whitespace
when match = scanner.scan(/item/)
result << ['item', nil]
when match = scanner.scan(/\w+/)
result << [:WORD, match]
else
raise "can't recognize <#{scanner.peek(5)}>"
end
end
result << [false, false]
return result
endTo integrate the scanner into the parser, racc allows you to
place code into the generated parser class. You do this by adding code
to the grammar file. The declaration ---- inner marks the
code to go inside the generated class (you can also put code at the
head and foot of the generated file). I'm calling a parse
method in my test, so I need to implement that.
#file item.y.rb....
---- inner
attr_accessor :result
def parse(str)
@result = Catalog.new
@tokens = make_tokens str
do_parse
end
The do_parse method initiates the generated
parser. This will call next_token to get at the next
token, so we need to implement that method and include it in the
inner section.
#file item.y.rb....
---- inner
def next_token
@tokens.shift
end
This is enough to make racc work with the file. However as I play
with it I find the scanner more messy than I would like. I really
just want it to tell the lexer what patterns to match and what to
return with them. Something like this.
#file item.y.rb....
---- inner
def make_lexer aString
result = Lexer.new
result.ignore /\s+/
result.keyword 'item'
result.token /\w+/, :WORD
result.start aString
return result
end
To make this work I write my own lexer wrapper over the base
functionality provided by StringScanner. Here's the code to set up
the lexer and and handle the above configuration.
class Lexer...
require 'strscan'
def initialize
@rules = []
end
def ignore pattern
@rules << [pattern, :SKIP]
end
def token pattern, token
@rules << [pattern, token]
end
def keyword aString
@rules << [Regexp.new(aString), aString]
end
def start aString
@base = StringScanner.new aString
end
To perform the scan I need to use StringScanner to compare the
rules against the input stream.
class Lexer...
def next_token
return [false, false] if @base.empty?
t = get_token
return (:SKIP == t[0]) ? next_token : t
end
def get_token
@rules.each do |key, value|
m = @base.scan(key)
return [value, m] if m
end
raise "unexpected characters <#{@base.peek(5)}>"
end
I can then alter the code in the parser to call this lexer
instead.
#file item.y.rb....
---- inner
def parse(arg)
@result = Catalog.new
@lexer = make_lexer arg
do_parse
end
def next_token
@lexer.next_token
end As well as giving me a better way to define the rules, this also
allows the grammar to control the lexer because it's only grabbing
one token at a time - this would give me a mechanism to implement
lexical states later on. On the whole racc is pretty easy to set up and use - providing
you know yacc. The documentation is on the minimal side of
sketchy. There's a simple manual on the website and some sample
code. There's also a very helpful presentation on racc. I also
got a few tips from our Mingle team who've used it for a nifty customization language inside Mingle.
|
|
|