Martin Fowler's Bliki
A cross between a blog and wiki of my partly-formed ideas on software development
| UpcomingTalks |
writing |
18 July 2008 |
Reactions |
|
On July 28th I'll be doing a webinar with my colleague Jez Humble on "Lean Release
Management". Jez spent several years as one of our build-dudes
helping our clients get their builds in shape. Recently he's been
working on Cruise, our new Continuous Integration tool which
concentrates on helping people with managing staged builds. We'll be talking about how we see principles from Lean applying to
release management, the practices they lead to, and how the new Cruise
product can support the practices. While we'll talk about the new
Cruise, this will also spend some time on general practices - so we
hope it will be more than regular sales pitch.
|
| MDSDandDSL |
dsl |
14 July 2008 |
Reactions |
|
What is the connection between
ModelDrivenSoftwareDevelopment (MDSD) and
DomainSpecificLanguages (DSLs)?
It's pretty common to see the term "DSL" crop up in the context
of MDSD. Indeed some MDSD people seem to think that DSLs only exist
within the MDSD world. I've been writing a lot on DSLs recently for
my book, but so far I haven't really touched on the MDSD angle much
Instead I've concentrated on DSLs role in more conventional
programming. DSLs exist in both the textual language and MDSD worlds
and play pretty much the same role for both. In an MDSD context DSLs are again a language targeted at a
specific kind of problem as opposed to general purpose languages
such as the UML. As a result they can have the same kind of
relationship: build a system in the general purpose modeling
language and use DSLs for various specific aspects. Since MDSD
hasn't caught on that much, however, you also see a different
approach where modeling DSLs are used in the context of a
traditional language environment. Here you might use several
modeling DSLs that generate Java code to be combined in a Java
project. In this case there's no general purpose MDSD model around -
you use MDSD for each DSL relatively independently. In order to use model-oriented DSLs you need a different,
RepositoryBasedCode,
approach to tooling. This introduces quite a few pragmatic issues as
the general support environment for such tools is less
established. In order to define your own DSLs you need more
specialized tooling - something I call a Language
Workbench. DSLs seem to have a proportionately higher emphasis in the MDSD
world than they do in the mainstream programming world. Cynics think
this is a result of the MDSD community desperately searching for a
way to remain relevant, fans of MDSD regard it as a sign of MDSD's
superior sophistication. I think this is mainly due to the fact that
the MDSD community is smaller and has far less in the form of
established practice. A particularly visible sub-community of MDSD is centered around
ModelDrivenArchitecture (MDA). I'm not much of a fan of MDA in
particular, but am particularly
skeptical of MDA DSLs. There is much that model-oriented DSLs share with textual DSLs. I
put a lot of emphasis with textual DSLs in basing work around a
Semantic
Model. MDSD, as its name indicates, is very much about driving a
system from that kind of a model. A difference is that most MDSD
people assume that you'll want to generate code from that model
rather than executing the model directly. As I write this, I'm not sure how much I'm going to cover
language workbenches in my book. Certainly I'll at least discuss the
overall concept behind them, but the coverage may not be that
deep. This will be partly due to the large amount of material I seem
to be generating on textual DSLs and partly due to the fact that
language workbenches are much newer and thus more volatile and less
mature.
|
| ModelDrivenSoftwareDevelopment |
design |
14 July 2008 |
Reactions |
|
Model Driven Software Development (MDSD) is a style of software
development that considers itself as an alternative to the
traditional style of programming. The approach centers itself on
building models of a software system. These models are typically
made manifest through diagrammatic design notations - the UML is one
option. The idea is that you use these diagrams, to specify your
system to a modeling tool and then you generate code in a
conventional programming language. The MDSD vision evolved from the development of graphical design
notations and CASE tools. Proponents of these techniques saw
graphical design notations as a way to raise the abstraction level
above programming languages - thus improving development
productivity. While these techniques and tools never caught on too
far, the basic core ideas still live on and there is an ongoing
community of people still developing them. Although I've been involved, to some extent, in MDSD for most of
my career, I'm rather skeptical of its future. Most fans of MDSD
base their enthusiasm on the basis that models are ipso facto a
higher level abstraction than programming languages. I don't agree
with that argument - sometimes graphical notations can be a better
abstraction, but not always - it depends on the specific
cases. Furthermore To use MDSD you need tools that support
RepositoryBasedCode, and these tools currently introduce
a number of pragmatic issues in tooling - of which source control is
the canonical example. MDSD is surrounded by a terminological mess. One particular vision
of MDSD is ModelDrivenArchitecture (MDA) which is an OMG
initiative based on the UML. Many people in the MDSD community,
however, don't think that MDA or UML is the right vision for
MDSD. For a long time I would hear people talking about Model Driven
Development (MDD) as the general concept and MDA as the OMG's
specific vision. However the OMG has trademarks on several "Model
Driven *" and "Model Based *" phrases - including MDD. As a
consequence people have to be careful about what model driven phrase
they use. I'm using MDSD as that is the title of a useful book on the topic.
|
| IncrementalMigration |
agile |
7 July 2008 |
Reactions |
|
Like any profession, software development has it's share of
oft-forgotten activities that are usually ignored but have a habit
of biting back at just the wrong moment. One of these is data migration. Most new software projects involve data that's lived somewhere
else and now needs to be moved into the new system once it's live. A
system replacement might have to move all the old data, new
functionality may lead to data being loaded from some other system. It's common to not take this task very seriously. After all, it's
just reading some data, munging it a bit, and loading into the new
system. Furthermore the code only ever has to be run once, so
there's no point making particularly fast or pretty. Once the
migration is done the code can be safely chucked away. And of course there's no need to worry about it till the end of
the project since you only want to run the migration just before the
new system goes live. I have a high opinion of my readers, if only for their taste in
software writing, so I'm sure I can see the wistful smiles. Data
migration often looks easy from the safety of whiteboard abstractions, but is
usually full of nasty details to trip you up. - You may suspect that
the existing data is somewhat messy, but everyone is usually taken
aback at how dirty the data really is. As a result the whole
exercise is often far more complicated than it ought to be.
- Because it's single use, throw-away code people don't tend to put
much design effort into migration code since they assume it's below
the DesignPayoffLine. That assumption is often
wrong, especially with the previous bullet point.
- Doing an activity that balloons into something harder than you
think is never fun, but when you leave it till close to the ship
date you're offering trouble a big signing bonus.
There's a soundbite I like to use in an agile context: if it
hurts do it more often. Its surface illogicality makes it
memorable, and there's a real truth in there. Many difficult
activities can be made much more straightforward by doing them more
frequently. XPers are particularly well known for applying this
principle to testing, integration, design, and planning - so it
shouldn't surprise anyone to see it applied to data migration. I first saw this done by my colleague Josh Mackenzie on a
moderately sized project (dozen developers for one year) with two
failed attempts in its recent past. He decided he would migrate data with
every two-week iteration. Each iteration the team figured out what
data they needed to add to support the new functionality that was
being built and updated the data migration system to migrate that
extra data from the live system. As is often the case with these things it ended up being much
less impossible than people feared and the resulting reduction of risk
and stress made it a worthwhile choice. They appreciated the obvious
benefits, which boiled down to a distinct lack of hasty panic close to
going live. The most interesting benefit, however, was the one they didn't
expect. Incremental migration made a significant improvement in
communication with the domain experts. Usually when you want to talk
about use cases with domain experts, you make up some pretend
scenario. By using incremental data migration the team got into the
habit of using real examples, which were much easier for the domain
experts to relate to. Furthermore when the development made builds
available for the domain experts to look at, it included a copy of
the live data. As a result the domain experts could investigate how
the new system worked with tricky cases they had run into
recently. Particularly juicy predicaments could easily be copied
over into the test environment. Even without the improved communication it's worth the effort
to do incremental migration. If you do, be prepared to take
advantage of the opportunity to use real data to talk to domain
experts.
|
| AgileVersusLean |
agile |
26 June 2008 |
Reactions |
|
I'm thinking of using agile software development - but
should I use Lean software development instead?
This question is one I've run into a few times recently. It's not
a question I can answer quickly as the question is based on a false
premise about the relationship between lean and agile. So first I
need to go into some history to help explain that relationship. "Lean" fundamentally refers an approach in the manufacturing
world that was originally developed by Toyota in the 1950's. At this
time Japanese industry was recovering from the ravages of the second
world war. This approach, often called the Toyota Production System
is mostly credited to Taiichi Ohno,
although he was influenced by various western thinkers -
particularly Deming. The
Toyota Production System became well known in the rest of the world
in the 1990's when westerners started writing books to explain why
the Japanese were beating the US at so many industries. The western
writers called this approach Lean Manufacturing. Although Japanese
industry in general has run into stickier times since then, Toyota
continues to outperform most western auto companies. Agile software development is an umbrella term for several
software development methods (including Extreme Programming and
Scrum) that were developed in the 1990s. These methods share a
common philosophy which was described as values and principles in
the Manifesto for Agile Software Development. (My essay "The New
Methodology" goes into this in more depth.) There was a connection between lean manufacturing and agile
software from the beginning in that many of the developers of the
various agile methods were influenced by the ideas of lean
manufacturing. This connection was made more explicit by Mary and Tom Poppendieck. Mary had
worked in a manufacturing plant that had adopted lean manufacturing
and her husband Tom is an experienced software developer. They have
written a couple of books on the application of Lean ideas in the
software world. When people talk about Lean Software they are
usually referring to the ideas in these books, although others have
been making similar links. Lean manufacturing and agile software methods have a very similar
philosophy. Both place a lot of stress on adaptive planning and a
people focused approach. As a result lean's ideas fit in very well
with the agile software story. Mary and Tom have both been very
active in the agile community - indeed I'd credit Mary with an
important role in forming the Agile Alliance. (Like me, she was a
founding board member of the Agile Alliance, but she was far more
effective in it than I was.) I've already mentioned that lean manufacturing was an influence
on agilists from the beginning, and in the last few years more ideas
have appeared in the agile world with a clear lean manufacturing
heritage. These range from concrete techniques like Value Stream
Maps, to articulations of existing agile concepts such as the Last
Responsible Moment for making design decisions. The idea of thinking
of analysis and design documentation as inventory came from the
Poppendiecks. Several agilists I know emphasize the importance of
reducing cycle time - another strongly lean-influenced idea. My
colleague Richard Durnall has a nice summary of how lean knowledge
can blend in with agile thinking. A particularly strong appeal to many people about lean ideas is
that they provide a way of explaining agile software development to
people - particularly senior people both within IT and senior
customers. This explanation ranges from a crude appeal to emulate
Toyota to detailed discussions of how lean manufacturing's benefits
translate to the ideas in agile software development. So as you can see, lean and agile are deeply intertwined in the
software world. You can't really talk about them being alternatives,
if you are doing agile you are doing lean and vice-versa. Agile was
always meant as a very broad concept, a core set of values and
principles that was shared by processes that look superficially
different. You don't do agile or lean you do agile and
lean. The only question is how explicitly you use ideas that draw
directly from lean manufacturing. The Poppendiecks didn't introduce lean as a separate idea, nor
did they introduce lean as a published process in the style of Scrum
or XP. Rather they introduced lean as a set of thinking tools that
could easily blend in with any agile approach. I think of lean as a
strand of thinking within the agile community, like a pattern in a
rich carpet. There is a distinct lean software community, as in a mailing list
calling itself lean and people who label themselves as lean
thinkers. But this is no different to the fact that there are also
strong XP, Scrum, and other communities. Most people in these
communities consider themselves part of the broader agile movement
and many people are active in more than one of these agile
communities. The whole point of coining the word 'agile' comes from
a recognition that we share a core set of values and principles and
this common core means what we have in common is greater than our
differences.
|
| SegmentationByFreshness |
design |
24 June 2008 |
Reactions |
|
One of the biggest issues with media websites is dealing with
high amounts of traffic. Media is all about getting eyeballs, but if
you get too many hits at once, slow performance can cause problems
and damage your reputation. This problem is exacerbated by the
bursty nature of this web traffic. You can be cruising along at a
manageable rate, then get hit with a big news story which causes a
big spike. One of our clients have seen spikes of two orders of
magnitude in a matter of a couple of minutes. The general solution in computing to speed up access to the same
information is to use caches. If you keep requesting my home page
the web server will build up a cache in memory so repeated requests
avoid touching the disk. It's easy to keep a cache for my website, because this page, like my
entire site, is entirely static. Most media sites, however, contain a
lot of dynamic content. You might not think there's much business
logic on your average newspaper website, but once you start looking
at advertising links, related stories, special features and the
like, things get a good bit more interesting. A travel story to
France might link to articles on french food, and advertising that
knows that a web browser in Canada is interested in a holiday in
the Loire Valley. Personalization makes this even worse, my personalized
preferences should generate a personalized feature list on heavy red
wines. Such logic is complex in its own right, it makes
for a lot of computation with each request, and crucially it ruins
most caching strategies. The way to deal with this is to divide a page up into segments
where each segment has a similar determination of freshness. The
article on Loire travel can be relatively static, changing only to
correct errors. A related article list which feeds off tags for
"France" and "Loire" will change more often, but maybe only every
few days. If we arrange this properly a request for a page with
these two items may be able to gather everything from caches. The most common way of doing this that I've seen is to form
caches on the web server and assemble the page segments when the
page gets hit. Tools like Sitemesh are a good option for this
approach. As you write the page for 18th century loire delights, you
include call-outs for sections like related articles. When you get
the actual web request the web server takes the page and assembles
the page from the separate pieces. Much of this can be cached in the
web server, which avoids hitting the back-end domain logic and database. An interesting possibility is to go even further and use the many
caches that exist in the web itself. Most calls for this web page
don't even reach my web server since my page gets cached many times
along the way. If you build a web page dynamically and assemble it on
the server, you have to take the hit to deliver the page. An
alternative is to assemble the page on the client and then draw each
segment from its own URL. Each segment could be cached in different
places with different caching policies. How might this work? We might store the static article content as
XHTML at an URL like
http://gallifreyTimes/travel/18-century-loire-delights. Inside that
file we want to insert some related articles by looking up articles
tagged with "loire" and "france". In the static page we put in a
simple "a" tag.
<a class = "relatedLinks" href = "relatedLinks/france+loire">Related Links</a>
In the header for the static page we link it to some javascript
in a separate library file. When we download the Loire article the
javascript runs and scans the article for elements with the right
class: in this case an "a" element with the "relatedLinks"
class. (The behavior
library is a good way to do this.) When it finds the element it
uses the information in the element to synthesize an URL for that
segment. In this case it would use what's in the element's href
attribute to come up with an URL like
http://gallifreyTimes/relatedArticles/france+loire. Once
it's got that URL it then gets the content and uses it to
replace the original "a" element. Since the related articles list is
handled through an URL, other gets on that URL cause caches through
the Internet to warm up, so there's a good chance that retrieving
the page may never cause a hit on the original server. This technique of using Javascript to replace a placeholder
element with more content is a form of Progressive
Enhancement. The descriptions I've found for Progressive
Enhancement focus on adding features for accessibility with limited
browsers. This example also has that benefit. If I browse the page with a
browser that has no javascript, I'll get a useful link. The general
idea behind Progressive Enhancement is that the basic page served is
useful on basic browsers, then we use techniques such as javascript
to add in more fancy features. In the context of caching, the value is that each progressive
enhancement weaves in a lump of HTML with different freshness
rules. The original page is static, the related links change daily,
but both can be cached independently and weaved together. I can do
all sorts of additional elements, as long as I take care to keep
segment the page by the freshness rules. So I could include a
personalized weather forecast based on the user's profile to every
page by having the javascript pick up the user id from the http
session, using it to construct an URL like
http://gallifreyTimes/personalWeather/martinfowler,
retrieving the content (which would often be cached on my hard
drive) and weaving it into the page.
|
| SyntacticNoise |
dsl |
9 June 2008 |
Reactions |
|
A common phrase that's bandied about when talking about
DomainSpecificLanguages (or indeed any computer language) is that of
noisy syntax. People may say that Ruby is less noisy than Java, or
that external DSLs are less noisy than internal DSLs. By Syntactic
Noise, what people mean is extraneous characters that aren't part of
what we really need to say, but are there to satisfy the language
definition. Noise characters are bad because they obscure the meaning
of our program, forcing us to puzzle out what it's doing. Like many concepts, syntactic noise is both loose and subjective,
which makes it hard to talk about. A while ago Gilhad Braha tried to
illustrate his perception of syntactic noise during a talk at
JAOO. Here I'm going to have a go at a similar approach and apply it
to several formulations of a DSL that I'm using in my current
introduction in my DSL book. (I'm using a subset of the example state
machine, to keep the text a reasonable size.) In his talk he illustrated noise by coloring what he considered to
be noise characters. A problem with this, of course, is this requires
us to define what we mean by noise characters. I'm going to side-step
that and make a different distinction. I'll distinguish between what
I'll call domain text and punctuation. The DSL scripts I'm looking at
define a state machine, and thus talk about states, events, and
commands. Anything that describes information about my particular
state machine - such as the names of states - I'll define as domain
text. Anything else is punctuation and I'll highlight the latter in
red. I'll start with the custom syntax of an external DSL. events
doorClosed D1CL
drawOpened D2OP
lightOn L1ON
end
commands
unlockDoor D1UL
lockPanel PNLK
end
state idle
actions {unlockDoor lockPanel}
doorClosed => active
end
state active
drawOpened => waitingForLight
lightOn => waitingForDraw
endA custom syntax tends to minimize noise, so as a result you see
relatively small amount of punctuation here. This text also makes
clear that we need some punctuation. Both events and commands are
defined by giving their name and their code - you need the punctuation
in order to tell them apart. So punctuation isn't the same as noise, I
would say that the wrong kind of punctuation is noise, or too much
punctuation is noise. In particular I don't think it's a good idea to
try to reduce punctuation to the absolute minimum, too little
punctuation also makes a DSL harder to comprehend. Let's now look at an internal DSL for the same domain information
in Ruby. event :doorClosed, "D1CL"
event :drawOpened, "D2OP"
event :lightOn, "L1ON"
command :lockPanel, "PNLK" 
command :unlockDoor, "D1UL" 
state :idle do 
actions :unlockDoor, :lockPanel
transitions :doorClosed => :active
end 
state :active do 
transitions :drawOpened => :waitingForLight, 
:lightOn => :waitingForDraw
end 
Now we see a lot more punctuation. Certainly I could have made some
choices in my DSL to reduce punctuation, but I think most people would
still agree that a ruby DSL has more punctuation than a custom
one. The noise here, at least for me, is the little things: the ":" to
mark a symbol, the "," to separate arguments, the '"' to quote
strings. One of the main themes in my DSL thinking is that a DSL is a way to
populate a framework. In this case the framework is one that describes
state machines. As well as populating a framework with a DSL you can
also do it with a regular push-button API. Let's color the punctuation
on that. Event doorClosed = new Event("doorClosed", "D1CL"); 
Event drawOpened = new Event("drawOpened", "D2OP"); 
Event lightOn = new Event("lightOn", "L1ON"); 
Command lockPanelCmd = new Command("lockPanel", "PNLK"); 
Command unlockDoorCmd = new Command("unlockDoor", "D1UL"); 
State idle = new State("idle"); 
State activeState = new State("active"); 
StateMachine machine = new StateMachine(idle); 
idle.addTransition(doorClosed, activeState);
idle.addCommand(unlockDoorCmd);
idle.addCommand(lockPanelCmd);
activeState.addTransition(drawOpened, waitingForLightState);
activeState.addTransition(lightOn, waitingForDrawState);Here's a lot more punctuation. All sorts of quotes and brackets as
well as method keywords and local variable declarations. The latter
present an interesting classification question. I've counted the
declaring of a local variable as punctuation (as it duplicates the
name) but it's later use as domain text. Java can also be written in a fluent way, so here's the fluent
version from the book. Events doorClosed, drawOpened, lightOn; 
Commands lockPanel, unlockDoor; 
States idle, active; 
protected void defineStateMachine() { 
doorClosed. code("D1CL"); 
drawOpened. code("D2OP"); 
lightOn. code("L1ON"); 
lockPanel. code("PNLK"); 
unlockDoor. code("D1UL"); 
 
idle 
.actions(unlockDoor, lockPanel) 
.transition(doorClosed).to(active) 
; 
 
active 
.transition(drawOpened).to(waitingForLight) 
.transition(lightOn). to(waitingForDraw) 
; 
} 
 
Whenever two or three are gathered together to talk about syntactic
noise, XML is bound to come up. <stateMachine start = "idle">
<event name="doorClosed" code="D1CL"/>
<event name="drawOpened" code="D2OP"/>
<event name="lightOn" code="L1ON"/>
<command name="lockPanel" code="PNLK"/>
<command name="unlockDoor" code="D1UL"/>
<state name="idle">
<transition event="doorClosed" target="active"/>
<action command="unlockDoor"/>
<action command="lockPanel"/>
</state>
<state name="active">
<transition event="drawOpened" target="waitingForLight"/>
<transition event="lightOn" target="waitingForDraw"/>
</state>
</stateMachine>
I don't think we can read too much into this particular example,
but it does provide some food for thought. Although I don't think we
can make a rigorous separation between useful punctuation and noise,
the distinction between domain text and punctuation can help us focus
on the punctuation and consider what punctuation serves us best. And I
might add that having more characters of punctuation than you
do of domain text in a DSL is a smell. (Mikael Jansson has put out a lisp
version of this example. Mihailo Lalevic did one in JavaScript.)
|
|
|