martinfowler.com logo Home Blog Articles Books About Me Contact Me ThoughtWorks

Martin Fowler's Bliki

A cross between a blog and wiki of my partly-formed ideas on software development


UpcomingTalks writing 18 July 2008 Reactions

On July 28th I'll be doing a webinar with my colleague Jez Humble on "Lean Release Management". Jez spent several years as one of our build-dudes helping our clients get their builds in shape. Recently he's been working on Cruise, our new Continuous Integration tool which concentrates on helping people with managing staged builds.

We'll be talking about how we see principles from Lean applying to release management, the practices they lead to, and how the new Cruise product can support the practices. While we'll talk about the new Cruise, this will also spend some time on general practices - so we hope it will be more than regular sales pitch.


MDSDandDSL dsl 14 July 2008 Reactions

What is the connection between ModelDrivenSoftwareDevelopment (MDSD) and DomainSpecificLanguages (DSLs)?

It's pretty common to see the term "DSL" crop up in the context of MDSD. Indeed some MDSD people seem to think that DSLs only exist within the MDSD world. I've been writing a lot on DSLs recently for my book, but so far I haven't really touched on the MDSD angle much Instead I've concentrated on DSLs role in more conventional programming. DSLs exist in both the textual language and MDSD worlds and play pretty much the same role for both.

In an MDSD context DSLs are again a language targeted at a specific kind of problem as opposed to general purpose languages such as the UML. As a result they can have the same kind of relationship: build a system in the general purpose modeling language and use DSLs for various specific aspects. Since MDSD hasn't caught on that much, however, you also see a different approach where modeling DSLs are used in the context of a traditional language environment. Here you might use several modeling DSLs that generate Java code to be combined in a Java project. In this case there's no general purpose MDSD model around - you use MDSD for each DSL relatively independently.

In order to use model-oriented DSLs you need a different, RepositoryBasedCode, approach to tooling. This introduces quite a few pragmatic issues as the general support environment for such tools is less established. In order to define your own DSLs you need more specialized tooling - something I call a Language Workbench.

DSLs seem to have a proportionately higher emphasis in the MDSD world than they do in the mainstream programming world. Cynics think this is a result of the MDSD community desperately searching for a way to remain relevant, fans of MDSD regard it as a sign of MDSD's superior sophistication. I think this is mainly due to the fact that the MDSD community is smaller and has far less in the form of established practice.

A particularly visible sub-community of MDSD is centered around ModelDrivenArchitecture (MDA). I'm not much of a fan of MDA in particular, but am particularly skeptical of MDA DSLs.

There is much that model-oriented DSLs share with textual DSLs. I put a lot of emphasis with textual DSLs in basing work around a Semantic Model. MDSD, as its name indicates, is very much about driving a system from that kind of a model. A difference is that most MDSD people assume that you'll want to generate code from that model rather than executing the model directly.

As I write this, I'm not sure how much I'm going to cover language workbenches in my book. Certainly I'll at least discuss the overall concept behind them, but the coverage may not be that deep. This will be partly due to the large amount of material I seem to be generating on textual DSLs and partly due to the fact that language workbenches are much newer and thus more volatile and less mature.


ModelDrivenSoftwareDevelopment design 14 July 2008 Reactions

Model Driven Software Development (MDSD) is a style of software development that considers itself as an alternative to the traditional style of programming. The approach centers itself on building models of a software system. These models are typically made manifest through diagrammatic design notations - the UML is one option. The idea is that you use these diagrams, to specify your system to a modeling tool and then you generate code in a conventional programming language.

The MDSD vision evolved from the development of graphical design notations and CASE tools. Proponents of these techniques saw graphical design notations as a way to raise the abstraction level above programming languages - thus improving development productivity. While these techniques and tools never caught on too far, the basic core ideas still live on and there is an ongoing community of people still developing them.

Although I've been involved, to some extent, in MDSD for most of my career, I'm rather skeptical of its future. Most fans of MDSD base their enthusiasm on the basis that models are ipso facto a higher level abstraction than programming languages. I don't agree with that argument - sometimes graphical notations can be a better abstraction, but not always - it depends on the specific cases. Furthermore To use MDSD you need tools that support RepositoryBasedCode, and these tools currently introduce a number of pragmatic issues in tooling - of which source control is the canonical example.

MDSD is surrounded by a terminological mess. One particular vision of MDSD is ModelDrivenArchitecture (MDA) which is an OMG initiative based on the UML. Many people in the MDSD community, however, don't think that MDA or UML is the right vision for MDSD. For a long time I would hear people talking about Model Driven Development (MDD) as the general concept and MDA as the OMG's specific vision. However the OMG has trademarks on several "Model Driven *" and "Model Based *" phrases - including MDD. As a consequence people have to be careful about what model driven phrase they use. I'm using MDSD as that is the title of a useful book on the topic.


IncrementalMigration agile 7 July 2008 Reactions

Like any profession, software development has it's share of oft-forgotten activities that are usually ignored but have a habit of biting back at just the wrong moment. One of these is data migration.

Most new software projects involve data that's lived somewhere else and now needs to be moved into the new system once it's live. A system replacement might have to move all the old data, new functionality may lead to data being loaded from some other system.

It's common to not take this task very seriously. After all, it's just reading some data, munging it a bit, and loading into the new system. Furthermore the code only ever has to be run once, so there's no point making particularly fast or pretty. Once the migration is done the code can be safely chucked away.

And of course there's no need to worry about it till the end of the project since you only want to run the migration just before the new system goes live.

I have a high opinion of my readers, if only for their taste in software writing, so I'm sure I can see the wistful smiles. Data migration often looks easy from the safety of whiteboard abstractions, but is usually full of nasty details to trip you up.

  • You may suspect that the existing data is somewhat messy, but everyone is usually taken aback at how dirty the data really is. As a result the whole exercise is often far more complicated than it ought to be.
  • Because it's single use, throw-away code people don't tend to put much design effort into migration code since they assume it's below the DesignPayoffLine. That assumption is often wrong, especially with the previous bullet point.
  • Doing an activity that balloons into something harder than you think is never fun, but when you leave it till close to the ship date you're offering trouble a big signing bonus.

There's a soundbite I like to use in an agile context: if it hurts do it more often. Its surface illogicality makes it memorable, and there's a real truth in there. Many difficult activities can be made much more straightforward by doing them more frequently. XPers are particularly well known for applying this principle to testing, integration, design, and planning - so it shouldn't surprise anyone to see it applied to data migration.

I first saw this done by my colleague Josh Mackenzie on a moderately sized project (dozen developers for one year) with two failed attempts in its recent past. He decided he would migrate data with every two-week iteration. Each iteration the team figured out what data they needed to add to support the new functionality that was being built and updated the data migration system to migrate that extra data from the live system.

As is often the case with these things it ended up being much less impossible than people feared and the resulting reduction of risk and stress made it a worthwhile choice. They appreciated the obvious benefits, which boiled down to a distinct lack of hasty panic close to going live.

The most interesting benefit, however, was the one they didn't expect. Incremental migration made a significant improvement in communication with the domain experts. Usually when you want to talk about use cases with domain experts, you make up some pretend scenario. By using incremental data migration the team got into the habit of using real examples, which were much easier for the domain experts to relate to. Furthermore when the development made builds available for the domain experts to look at, it included a copy of the live data. As a result the domain experts could investigate how the new system worked with tricky cases they had run into recently. Particularly juicy predicaments could easily be copied over into the test environment.

Even without the improved communication it's worth the effort to do incremental migration. If you do, be prepared to take advantage of the opportunity to use real data to talk to domain experts.


AgileVersusLean agile 26 June 2008 Reactions

I'm thinking of using agile software development - but should I use Lean software development instead?

This question is one I've run into a few times recently. It's not a question I can answer quickly as the question is based on a false premise about the relationship between lean and agile. So first I need to go into some history to help explain that relationship.

"Lean" fundamentally refers an approach in the manufacturing world that was originally developed by Toyota in the 1950's. At this time Japanese industry was recovering from the ravages of the second world war. This approach, often called the Toyota Production System is mostly credited to Taiichi Ohno, although he was influenced by various western thinkers - particularly Deming. The Toyota Production System became well known in the rest of the world in the 1990's when westerners started writing books to explain why the Japanese were beating the US at so many industries. The western writers called this approach Lean Manufacturing. Although Japanese industry in general has run into stickier times since then, Toyota continues to outperform most western auto companies.

Agile software development is an umbrella term for several software development methods (including Extreme Programming and Scrum) that were developed in the 1990s. These methods share a common philosophy which was described as values and principles in the Manifesto for Agile Software Development. (My essay "The New Methodology" goes into this in more depth.)

There was a connection between lean manufacturing and agile software from the beginning in that many of the developers of the various agile methods were influenced by the ideas of lean manufacturing. This connection was made more explicit by Mary and Tom Poppendieck. Mary had worked in a manufacturing plant that had adopted lean manufacturing and her husband Tom is an experienced software developer. They have written a couple of books on the application of Lean ideas in the software world. When people talk about Lean Software they are usually referring to the ideas in these books, although others have been making similar links.

Lean manufacturing and agile software methods have a very similar philosophy. Both place a lot of stress on adaptive planning and a people focused approach. As a result lean's ideas fit in very well with the agile software story. Mary and Tom have both been very active in the agile community - indeed I'd credit Mary with an important role in forming the Agile Alliance. (Like me, she was a founding board member of the Agile Alliance, but she was far more effective in it than I was.)

I've already mentioned that lean manufacturing was an influence on agilists from the beginning, and in the last few years more ideas have appeared in the agile world with a clear lean manufacturing heritage. These range from concrete techniques like Value Stream Maps, to articulations of existing agile concepts such as the Last Responsible Moment for making design decisions. The idea of thinking of analysis and design documentation as inventory came from the Poppendiecks. Several agilists I know emphasize the importance of reducing cycle time - another strongly lean-influenced idea. My colleague Richard Durnall has a nice summary of how lean knowledge can blend in with agile thinking.

A particularly strong appeal to many people about lean ideas is that they provide a way of explaining agile software development to people - particularly senior people both within IT and senior customers. This explanation ranges from a crude appeal to emulate Toyota to detailed discussions of how lean manufacturing's benefits translate to the ideas in agile software development.

So as you can see, lean and agile are deeply intertwined in the software world. You can't really talk about them being alternatives, if you are doing agile you are doing lean and vice-versa. Agile was always meant as a very broad concept, a core set of values and principles that was shared by processes that look superficially different. You don't do agile or lean you do agile and lean. The only question is how explicitly you use ideas that draw directly from lean manufacturing.

The Poppendiecks didn't introduce lean as a separate idea, nor did they introduce lean as a published process in the style of Scrum or XP. Rather they introduced lean as a set of thinking tools that could easily blend in with any agile approach. I think of lean as a strand of thinking within the agile community, like a pattern in a rich carpet.

There is a distinct lean software community, as in a mailing list calling itself lean and people who label themselves as lean thinkers. But this is no different to the fact that there are also strong XP, Scrum, and other communities. Most people in these communities consider themselves part of the broader agile movement and many people are active in more than one of these agile communities. The whole point of coining the word 'agile' comes from a recognition that we share a core set of values and principles and this common core means what we have in common is greater than our differences.


SegmentationByFreshness design 24 June 2008 Reactions

One of the biggest issues with media websites is dealing with high amounts of traffic. Media is all about getting eyeballs, but if you get too many hits at once, slow performance can cause problems and damage your reputation. This problem is exacerbated by the bursty nature of this web traffic. You can be cruising along at a manageable rate, then get hit with a big news story which causes a big spike. One of our clients have seen spikes of two orders of magnitude in a matter of a couple of minutes.

The general solution in computing to speed up access to the same information is to use caches. If you keep requesting my home page the web server will build up a cache in memory so repeated requests avoid touching the disk.

It's easy to keep a cache for my website, because this page, like my entire site, is entirely static. Most media sites, however, contain a lot of dynamic content. You might not think there's much business logic on your average newspaper website, but once you start looking at advertising links, related stories, special features and the like, things get a good bit more interesting. A travel story to France might link to articles on french food, and advertising that knows that a web browser in Canada is interested in a holiday in the Loire Valley. Personalization makes this even worse, my personalized preferences should generate a personalized feature list on heavy red wines. Such logic is complex in its own right, it makes for a lot of computation with each request, and crucially it ruins most caching strategies.

The way to deal with this is to divide a page up into segments where each segment has a similar determination of freshness. The article on Loire travel can be relatively static, changing only to correct errors. A related article list which feeds off tags for "France" and "Loire" will change more often, but maybe only every few days. If we arrange this properly a request for a page with these two items may be able to gather everything from caches.

The most common way of doing this that I've seen is to form caches on the web server and assemble the page segments when the page gets hit. Tools like Sitemesh are a good option for this approach. As you write the page for 18th century loire delights, you include call-outs for sections like related articles. When you get the actual web request the web server takes the page and assembles the page from the separate pieces. Much of this can be cached in the web server, which avoids hitting the back-end domain logic and database.

An interesting possibility is to go even further and use the many caches that exist in the web itself. Most calls for this web page don't even reach my web server since my page gets cached many times along the way. If you build a web page dynamically and assemble it on the server, you have to take the hit to deliver the page. An alternative is to assemble the page on the client and then draw each segment from its own URL. Each segment could be cached in different places with different caching policies.

How might this work? We might store the static article content as XHTML at an URL like http://gallifreyTimes/travel/18-century-loire-delights. Inside that file we want to insert some related articles by looking up articles tagged with "loire" and "france". In the static page we put in a simple "a" tag.

  <a class = "relatedLinks" href = "relatedLinks/france+loire">Related Links</a>

In the header for the static page we link it to some javascript in a separate library file. When we download the Loire article the javascript runs and scans the article for elements with the right class: in this case an "a" element with the "relatedLinks" class. (The behavior library is a good way to do this.) When it finds the element it uses the information in the element to synthesize an URL for that segment. In this case it would use what's in the element's href attribute to come up with an URL like http://gallifreyTimes/relatedArticles/france+loire. Once it's got that URL it then gets the content and uses it to replace the original "a" element. Since the related articles list is handled through an URL, other gets on that URL cause caches through the Internet to warm up, so there's a good chance that retrieving the page may never cause a hit on the original server.

This technique of using Javascript to replace a placeholder element with more content is a form of Progressive Enhancement. The descriptions I've found for Progressive Enhancement focus on adding features for accessibility with limited browsers. This example also has that benefit. If I browse the page with a browser that has no javascript, I'll get a useful link. The general idea behind Progressive Enhancement is that the basic page served is useful on basic browsers, then we use techniques such as javascript to add in more fancy features.

In the context of caching, the value is that each progressive enhancement weaves in a lump of HTML with different freshness rules. The original page is static, the related links change daily, but both can be cached independently and weaved together. I can do all sorts of additional elements, as long as I take care to keep segment the page by the freshness rules. So I could include a personalized weather forecast based on the user's profile to every page by having the javascript pick up the user id from the http session, using it to construct an URL like http://gallifreyTimes/personalWeather/martinfowler, retrieving the content (which would often be cached on my hard drive) and weaving it into the page.


SyntacticNoise dsl 9 June 2008 Reactions

A common phrase that's bandied about when talking about DomainSpecificLanguages (or indeed any computer language) is that of noisy syntax. People may say that Ruby is less noisy than Java, or that external DSLs are less noisy than internal DSLs. By Syntactic Noise, what people mean is extraneous characters that aren't part of what we really need to say, but are there to satisfy the language definition. Noise characters are bad because they obscure the meaning of our program, forcing us to puzzle out what it's doing.

Like many concepts, syntactic noise is both loose and subjective, which makes it hard to talk about. A while ago Gilhad Braha tried to illustrate his perception of syntactic noise during a talk at JAOO. Here I'm going to have a go at a similar approach and apply it to several formulations of a DSL that I'm using in my current introduction in my DSL book. (I'm using a subset of the example state machine, to keep the text a reasonable size.)

In his talk he illustrated noise by coloring what he considered to be noise characters. A problem with this, of course, is this requires us to define what we mean by noise characters. I'm going to side-step that and make a different distinction. I'll distinguish between what I'll call domain text and punctuation. The DSL scripts I'm looking at define a state machine, and thus talk about states, events, and commands. Anything that describes information about my particular state machine - such as the names of states - I'll define as domain text. Anything else is punctuation and I'll highlight the latter in red.

I'll start with the custom syntax of an external DSL.

events
  doorClosed  D1CL
  drawOpened  D2OP
  lightOn     L1ON
end
  
commands
  unlockDoor D1UL
  lockPanel   PNLK
end
   
state idle
  actions {unlockDoor lockPanel}
  doorClosed => active
end
   
state active
  drawOpened => waitingForLight
  lightOn    => waitingForDraw
end

A custom syntax tends to minimize noise, so as a result you see relatively small amount of punctuation here. This text also makes clear that we need some punctuation. Both events and commands are defined by giving their name and their code - you need the punctuation in order to tell them apart. So punctuation isn't the same as noise, I would say that the wrong kind of punctuation is noise, or too much punctuation is noise. In particular I don't think it's a good idea to try to reduce punctuation to the absolute minimum, too little punctuation also makes a DSL harder to comprehend.

Let's now look at an internal DSL for the same domain information in Ruby.

event :doorClosed, "D1CL"  
event :drawOpened,  "D2OP"  
event :lightOn, "L1ON"  

command  :lockPanel,   "PNLK" 
command  :unlockDoor,  "D1UL" 

state :idle do 
  actions :unlockDoor, :lockPanel
  transitions :doorClosed => :active
end 

state :active do 
  transitions :drawOpened => :waitingForLight, 
              :lightOn => :waitingForDraw
end 

Now we see a lot more punctuation. Certainly I could have made some choices in my DSL to reduce punctuation, but I think most people would still agree that a ruby DSL has more punctuation than a custom one. The noise here, at least for me, is the little things: the ":" to mark a symbol, the "," to separate arguments, the '"' to quote strings.

One of the main themes in my DSL thinking is that a DSL is a way to populate a framework. In this case the framework is one that describes state machines. As well as populating a framework with a DSL you can also do it with a regular push-button API. Let's color the punctuation on that.

Event doorClosed = new Event("doorClosed", "D1CL"); 
Event drawOpened = new Event("drawOpened", "D2OP"); 
Event lightOn = new Event("lightOn", "L1ON"); 
 
Command lockPanelCmd = new Command("lockPanel", "PNLK"); 
Command unlockDoorCmd = new Command("unlockDoor", "D1UL"); 

State idle = new State("idle"); 
State activeState = new State("active"); 
 
StateMachine machine = new StateMachine(idle); 

idle.addTransition(doorClosed, activeState);
idle.addCommand(unlockDoorCmd);
idle.addCommand(lockPanelCmd);

activeState.addTransition(drawOpened, waitingForLightState);
activeState.addTransition(lightOn, waitingForDrawState);

Here's a lot more punctuation. All sorts of quotes and brackets as well as method keywords and local variable declarations. The latter present an interesting classification question. I've counted the declaring of a local variable as punctuation (as it duplicates the name) but it's later use as domain text.

Java can also be written in a fluent way, so here's the fluent version from the book.

  Events doorClosed, drawOpened, lightOn; 
  Commands lockPanel, unlockDoor; 
  States idle, active; 

  protected void defineStateMachine() { 
    doorClosed. code("D1CL"); 
    drawOpened. code("D2OP"); 
    lightOn.    code("L1ON"); 

    lockPanel.  code("PNLK"); 
    unlockDoor. code("D1UL"); 
 
    idle 
        .actions(unlockDoor, lockPanel) 
        .transition(doorClosed).to(active) 
        ; 
 
    active 
        .transition(drawOpened).to(waitingForLight) 
        .transition(lightOn).   to(waitingForDraw) 
        ; 
 } 
 

Whenever two or three are gathered together to talk about syntactic noise, XML is bound to come up.

<stateMachine start = "idle"> 
    <event name="doorClosed" code="D1CL"/>  
    <event name="drawOpened" code="D2OP"/> 
    <event name="lightOn" code="L1ON"/> 

    <command name="lockPanel" code="PNLK"/> 
    <command name="unlockDoor" code="D1UL"/> 

  <state name="idle"> 
    <transition event="doorClosed" target="active"/> 
    <action command="unlockDoor"/> 
    <action command="lockPanel"/> 
  </state> 

  <state name="active"> 
    <transition event="drawOpened" target="waitingForLight"/> 
    <transition event="lightOn" target="waitingForDraw"/> 
  </state>
</stateMachine> 

I don't think we can read too much into this particular example, but it does provide some food for thought. Although I don't think we can make a rigorous separation between useful punctuation and noise, the distinction between domain text and punctuation can help us focus on the punctuation and consider what punctuation serves us best. And I might add that having more characters of punctuation than you do of domain text in a DSL is a smell.

(Mikael Jansson has put out a lisp version of this example. Mihailo Lalevic did one in JavaScript.)


20 May 2008ParserFear
12 April 2008SchoolsOfSoftwareDevelopment
8 February 2008CheaperTalentHypothesis
17 January 2008PreferDesignSkills
14 January 2008RepositoryBasedCode
6 December 2007TestCancer
4 December 2007BookCode
28 November 2007GroovyOrJRuby
9 October 2007AltNetConf
9 September 2007RollerSkateImplementation
7 September 2007DoctorWho
6 September 2007TimeZoneUncertainty
4 September 2007CustomerLoyaltySoftware
2 September 2007IsChangingInterfacesRefactoring
28 July 2007OneLanguage
28 July 2007RubyMicrosoft
27 July 2007InstallingOpenArchitectureWare
13 July 2007DslReadings
12 July 2007UiPatternsReadings
20 June 2007DesignStaminaHypothesis
13 June 2007DuplexBook
30 May 2007HelloRacc
22 May 2007RailsConf2007
13 May 2007HelloCup
10 May 2007Translations
26 April 2007OutputBuildTarget
26 April 2007TouchFile
26 April 2007PendingHead
17 April 2007FlexibleAntlrGeneration
3 April 2007NetNastiness
26 March 2007EmbedmentHelper
18 March 2007Transactionless
7 March 2007HelloAntlr
11 February 2007HelloSablecc
19 January 2007JRubyVelocity
11 January 2007TypeInstanceHomonym
9 January 2007ClassInstanceVariable
4 January 2007ExpressionBuilder
2 January 2007AtomFeeds
22 December 2006RoleInterface
18 December 2006JRake
16 December 2006BigScreen
14 December 2006SemanticDiffusion
14 December 2006Web2.0
Links
home
bliki
feed 
Translations
Japanese
Spanish
Korean
Chinese
Thai
Categories
agile
design
dsl
leisure
refactoring
ruby
thoughtWorks
tools
uml
writing
Blog Roll
ThoughtBlogs
TW Alumni
Nicholas Carr
Steve Cook
Brian Foote
Simon Harris
Gregor Hohpe
/\ndy Hunt
Ralph Johnson
Patrick Logan
David Ing
Brian Marick
Jeremy Miller
Jimmy Nilsson
Samuel Pepys
Keith Ray
Johanna Rothman
Kathy Sierra
Dave Thomas

martinfowler.com logo mingle logo thoughtworks logo

© Copyright Martin Fowler, all rights reserved