martinfowler.com logo Home Blog Articles Books About Me Contact Me ThoughtWorks

Martin Fowler's Bliki

A cross between a blog and wiki of my partly-formed ideas on software development


HumaneRegistry design 1 December 2008 Reactions

One of the features of the new world of services that SOA-gushers promoted was the notion of registries. Often this was described in terms of automated systems that would allow systems to automatically look up useful services in a registry and bind and consume those services all by themselves.

Well computers may look clever occasionally, but I didn't particularly buy that idea. While there might the be odd edge case for automated service lookup, I reckon twenty-two times out of twenty it'll be a human programmer who is doing the looking up.

I was chatting recently to my colleague Erik Dörnenburg about a project he did with Halvard Skogsrud to build a service registry that was designed for humans to use and maintain. The organization was already using ServiceCustodians to manage the development on the project, so the registry needed to work in that context. This led to the following principles:

  • People develop and use services, so orient it around people (sorry UDDI, thank you for playing).
  • Don't expect people to enter stuff to keep it up to date, people are busy enough as it is.
  • Make it easy for people to read and contribute.

The heart of the registry is a wiki that allows people to easily enter information on a particular service. Not just the builders of the service, but also people who've used it. After all users' opinions are often more useful than providers (I'm guessing product review sites get more traffic than the vendors' sites).

A wiki makes it easy for people to describe the service, but that relies on people having time to contribute. A wiki helps make that easy as you can just click and go, but there's still time involved. So they backed up the human entry with some useful information gathered automatically.

  • A tool that interrogates the source code control systems and displays who has committed to a service, when, and how much. This helps human readers find out who are the other humans who they should talk to. Someone who did most of the commits, even if a while ago, probably knows a lot about the core design and purpose of the service. People who made a few recent commits might know more about the recent usage and quirks.
  • RSS feeds from CI servers and source code control systems.
  • Task and bug information from issue tracking systems.
  • Traffic data from the message bus indicating how much the service is used, and when. Also the message bus gives some clues about the consumers of the service.
  • Interceptors in the EJB container that captured consumer application names - again to get a sense of who is consuming the service. These were on the consumer side to capture consumer application names, and on the service to get a sense of the usage patterns.
  • Information from the Ivy dependencies.

Much of this functionality was inspired by ohloh.net, in particular this view.

The point of a registry like this is that it does a lot of automated work to get information, but presents it in a way that expects a human reader. Furthermore it understands that the most important questions the human reader has are about the humans who have worked on the project: who are they, when did they work on this, who should I email, and where do I go for a really good caipirinha?


DatabaseThaw design 24 November 2008 Reactions

A few years ago I heard programming language people talk about the "Nuclear Winter" in languages caused by Java. The feeling was that everyone had so converged on Java's computational model (C# at that point seen as little more than a rip-off) that creativity in programming languages had disappeared. That feeling is now abating, but perhaps a more important thaw that might be beginning - the longer and deeper freeze in thinking about databases.

Tim Bray's thought-provoking keynote talked about storage; including highlighting several alternatives to the conventional database world

When I started in the software development profession, I worked with several people who had evangelized relational databases. I came across them in the object-oriented world. Many people at that time expected OO databases to be the next evolutionary step for databases. As we now know, that didn't happen. These days relational databases are so deeply embedded that most projects assume an RDBMS right out of the gate.

At QCon last week, there was a strong thread of talks that questioned this assumption. Certainly one that struck me was Tim Bray's keynote, which took a journey through several aspects of data management. In doing so he highlighted a number of interesting projects.

  • Drizzle is a form of relational database, but one that eschews much of the machinery of modern relational products. I think of it as a RISC RDBMS - supporting only the bare bones of the relational feature set.
  • Couch DB is one of many forays into a distributed key-value pair model. Although a sharply simple data-model (nothing more than a hashmap really) this kind of approach has become quite popular in high-volume websites.
  • Gemstone was one of the object database crowd, and I found the Gemstone-Smalltalk combination a very powerful development environment (superior to most of its successors). Gemstone is still around as a niche player, but may gain more traction through Maglev - a project to bring its approach (essentially a fusion of database and virtual machine) to the Ruby world.

As well as this talk, there was a whole track on alternative databases hosted by Kresten Krab Thorup. One of the additional tools mentioned there was Neo4J - a graph (network) database tool that earned some rare praise from Jim Webber.

The natural question to ask about these products is why they should prevail when the ODBMSs failed. What's changed in the environment that could thaw the relational grip? There are many hypotheses about why relational has been so dominant - my opinion is that their dominance is due less to their role in data management than their role in integration.

Kresten Krab Thorup does a great job as a leader of the technical content of the JAOO and QCon conferences.

For many organizations today, the primary pattern for integration is Shared Database Integration - where multiple applications are integrated by all using a common database. When you have these IntegrationDatabases, it's important that all these applications can easily get at this shared data - hence the all important role of SQL. The role of SQL as mostly-standard query language has been central to the dominance of databases.

The heating of the database space comes from the presence of alternatives to integration - in particular the rise of web services. Under various banners there's a growing movement for applications to talk to each other by passing text (mostly XML) documents over HTTP. The web, both in internet and intranet forms, has made this integration mode even more prevalent than SQL. This is a good thing, I've never liked the approach of multiple applications tightly coupled through a common database - you can't get bigger breach of encapsulation than that.

If you switch your integration protocol from SQL to HTTP, it now means you can change databases from being IntegrationDatabases to ApplicationDatabases. This change is profound. In the first step it supports a much simpler approach to object-relational mapping - such as the approach taken by Ruby on Rails. But furthermore it breaks the vice-like grip of the relational data model. If you integrate through HTTP it no longer matters how an application stores its own data, which in turn means an application can choose a data model that makes sense for its own needs.

I don't think this means that relational databases will disappear - after all they are the right choice for many situations. But it does mean that now application developers should think about what the right option is for their needs. As non-relational projects grow in popularity and maturity, more and more will go for other options.


ServiceCustodian design 14 November 2008 Reactions

Let's imagine a pretty world of SOA-happiness where the computing needs of an enterprise are split into many small applications that provide services to each other to allow effective collaboration. One fine morning a consumer service needs some information from a supplier service. The twist is that although the supplier service has the necessary data and processing logic to get this information, it doesn't yet expose that information through a service interface. The supplier has a potential service, but it isn't actually there yet.

In an ideal world the developers of the consumer service just asks the supplier service to develop the potential service and all is dandy. But life is not ideal - the sticking point here is that the developers of the supplier service have other things to do, usually things that are more important to their customer and management than helping out the consumer service team.

Recently I was chatting with my colleague Erik Dörnenburg and he told me about an approach he saw a client use to deal with this problem. They took a leaf out of the open source play-book and made all their services into internal open source systems. This allows consumer service developers write the service themselves.

I'm sure many readers are rolling their eyes at the visions of chaos this would cause, but just as open source projects don't allow just anyone to edit anything; this client uses open-source-style control mechanisms. In particular each service has a couple of custodians - people whose responsibility it is to keep the service in a healthy state. In the normal course of events the consumer developer wouldn't actually commit changes to the supplier source tree directly, instead they send a patch to the custodian. Just like an open-source maintainer, the custodian receives the patch and reviews it to see if it's good enough to commit. If not there's a dialog with the consumer developer.

As Erik knows well from his own open source work, reviewing a patch is much less effort than making a change yourself. So although the custodian approach doesn't entirely eliminate the problem of consumer developers needing to wait on supplier developers, it does a lot to reduce the difficulty. And again following the open-source model, a consumer developer can be made a committer once the custodians are comfortable. This still means that commits can get reviewed by the custodians, but avoids the custodians becoming a bottleneck.

Related to this was their approach to a service registry. We've seen a lot of fancy products being sold to provide service registry capabilities so that people can lookup services and see how to use them. This client discarded them and used a HumaneRegistry instead.


EstimatedInterest agile 6 November 2008 Reactions

TechnicalDebt is a very useful concept, but it raises the question of how do you measure it? Sadly technical debt isn't like financial debt, so it's not easy to tell how far you are in hock (although we seem to have had some trouble with measuring the financial kind recently).

Here's one idea to consider. When a team completes a feature ask them to tell you how long it took them (the actual effort) and how long they think it would have taken if the system were properly clean. The difference between the two is the interest of the technical debt. (So if it actually took them 5 days but they think it would have taken them 3 days with a clean system, then you paid 2 days of effort as interest on your technical debt.)

There are certainly some serious flaws with this technique. The statement of how long it would have taken on a clean system is an estimate based on an imaginary state - so is difficult to make objective. There's the effort in capturing this information, which is easy to get out of hand. But the result may help project a picture of the state of the code-base in a way that's visible to non-technical staff.

Furthermore it may also help with decisions about whether to pay the principal. Some teams like to add technical debt stories to their product backlog - with estimates on how long it would take to remove them. Such technical debt stories are also estimates, but also provide a picture of how much debt has built up. You could get a bit more clever with the estimated interest payments by apportioning them to these debt stories (I spent an extra day on this feature because of the bad state of the flipper module). Comparing interest payments with the principal may help inform a decision about whether to pay off the principal.

I ran into someone recently who tried something a little like this and found it handy, but it's not something I've run into a lot. Certainly there are flaws with doing it - but it may be worth a try for a few iterations.


EarlyPain agile 4 November 2008 Reactions

A few years ago I was talking with a client who told me something he didn't like about the agile approach we were using: "it's doesn't feel right to have these difficulties this early in the project". Contrary to his reaction, in my mind this early pain is one of the great benefits of an agile or indeed any iterative development process.

I have many complaints about the waterfall process, but probably my greatest problem with it is how it tends to defer discovery of problems till late in the project, at which point there's little time or energy to deal with them effectively. Iterative cycles try to flush out as many problems as possible as early as possible. This gives you more time to cope, or at least raises the problems early enough to cancel before investing too much money and effort in a problematic project.

A useful exercise is to reflect on past projects and think about where problems cropped up late. Now ask yourself how you could make those problems crop up earlier. The more pain you get earlier, the better.


UpcomingTalks writing 4 November 2008 Reactions

Neal Ford and Rebecca Parsons will be working with me again on a DSL tutorial at QCon San Francisco. Rebecca and I will do a keynote.

My last scheduled talking arrangement this year is to return to QCon San Francisco. QCon is a organized by the JAOO team and organized in collaboration with InfoQ. It brings the conference formula that I like so well from JAOO to the US.

I've got two talks planned. One is an all day tutorial on DSLs with Neal Ford and Rebecca Parsons. This tutorial is the one we've done at a number of JAOO/QCon conferences in the last year.

The second talk is a keynote with Rebecca on the relationship between agile thinking and enterprise architecture groups. In our work we often have to bridge the gap between our approaches and somewhat traditional enterprise architecture teams - and Rebecca is usually working right on that boundary. So this talk brings our thoughts on how enterprise architecture fits in with an agile mind-set.


Oslo dsl 28 October 2008 Reactions

Oslo is a project at Microsoft, of which various things have been heard but with little details until this week's PDC conference. What we have known is that it has something to do with ModelDrivenSoftwareDevelopment and DomainSpecificLanguages.

A couple of weeks ago I got an early peek behind the curtain as I, and my language-geek colleague Rebecca Parsons, went through a preview of the PDC coming-out talks with Don Box, Gio Della-Libera and Vijaye Raji. It was a very interesting presentation, enough to convince me that Oslo is a technology to watch. It's broadly a Language Workbench. I'm not going to attempt a comprehensive review of the tool here, but just my scattered impressions from the walk-through. It was certainly interesting enough that I thought I'd publish my impressions here. With the public release at the PDC I'm sure you'll be hearing a lot more about it in the coming weeks. As I describe my thoughts I'll use a lot of the language I've been developing for my book, so you may find the terminology a little dense.

Oslo has three main components:

  • a modeling language (currently code-named M) for textual DSLs
  • a design surface (named Quadrant) for graphical DSLs
  • a repository (without a name) that stores semantic models in a relational database.

(All of these names are current code names. The marketing department will still use the same smarts that replaced "Avalon and Indigo" with "WPF and WCF". I'm just hoping they'll rename "Windows" to "Windows Technology Foundation".)

The textual language environment is bootstrapped and provides three base languages:

  • MGrammar: defines grammars for Syntax Directed Translation.
  • MSchema: defines schemas for a Semantic Model
  • MGraph: is a textual language for representing the population of a Semantic Model. So while MSchema represents types, MGraph represents instances. Lispers might think of MGraph as s-expressions with a ugly syntax.

You can represent any model in MGraph, but the syntax is often not too good. With MGrammar you can define a grammar for your own DSL which allows you to write scripts in your own DSL and build a parser to translate them into something more useful.

Using the state machine example from my book introduction, you could define a state machine semantic model with MSchema. You could then populate it (in an ugly way) with MGraph. You can build a decent DSL to populate it using MGrammar to define the syntax and to drive a parser.

There is a grammar compiler (called mg) that will take an input file in MGrammar and compile it into what they call an image file, or .mgx file. This is different to most parser generator tools. Most parser generators tools take the grammar and generate code which has to be compiled into a parser. Instead Oslo's tools compile the grammar into a binary form of the parse rules. There's then a separate tool (mgx) that can take an input script and a compiled grammar and outputs the MGraph representation of the syntax tree of the input script.

More likely you can take the compiled grammar and add it to your own code as a resource. With this you can call a general parser mechanism that Oslo provides as a .NET framework, supply the reference to the compiled grammar file, and generate an in-memory syntax tree. You can then walk this syntax tree and use it to do whatever you will - the parsing strategy I refer to as Tree Construction.

The parser gives you a syntax tree, but that's often not the same as a semantic model. So usually you'll write code to walk the tree and populate a semantic model defined with MSchema. Once you've done this you can easily take that model and store it in the repository so that it can accessed via SQL tools. Their demo showed entering some data via a DSL and accessing corresponding tables in the repository, although we didn't go into complicated structures.

You can also manipulate the semantic model instance with Quadrant. You can define a graphical notation for a schema and then the system can project the model instance creating a diagram using that notation. You can also change the diagram which updates the model. They showed a demo of two graphical projections of a model, updating one updated the other using Observer Synchronization. In that way using Quadrant seems like a similar style of work to a graphical Language Workbench such MetaEdit.

As they've been developing Oslo they have been using it on other Microsoft projects to gain experience in its use. Main ones so far have been with ASP, Workflow, and web services.

More on M

We spent most of the time looking at the textual environment. They have a way of hooking up a compiled grammar to a text editing control to provide a syntax-aware text editor with various completion and highlighting goodness. Unlike tools such as MPS, however, it is still a text editor. As a result you can cut and paste stretches of text and manipulate text freely. The tool will give you squigglies if there's a problem parsing what you've done, but it preserves the editing text experience.

I think I like this. When I first came across it, I rather liked the MPS notion of: "it looks like text, but really it's a structured editor". But recently I've begun to think that we lose a lot that way, so the Oslo way of working is appealing.

Another nice text language tool they have is an editor to help write MGrammars. This is a window divided into three vertical panes. The center pane contains MGrammar code, the left pane contains some input text, and the right pane shows the MGraph representation of parsing the input text with the MGrammar. It's very example driven. (However it is transient, unlike tests.) The tool resembles the capability in Antlr to process sample text right away with a grammar. In the conversation Rebecca referred to this style as "anecdotal testing" which is a phrase I must remember to steal.

The parsing algorithm they use is a GLR parser. The grammar syntax is comparable to EBNF and has notation for Tree Construction expressions. They use their own varient of regex notation in the lexer to be more consistent with their other tools, which will probably throw people like me more used to ISO/Perl regexp notation. It's mostly similar, but different enough to be annoying.

One of the nice features of their grammar notation is that they have provided constructs to easily make parameterized rules - effectively allowing you to write rule subroutines. Rules can also be given attributes (aka annotations), in a similar way to .NET's language attributes. So you can make a whole language case insensitive by marking it with an attribute. (Interestingly they use "@" to mark an attribute, as in the Java syntax.)

The default way a grammar is run is to do tree construction. As it turns out the tree construction is the behavior of the default class that gets called by the grammar while it's processing some input. This class has an interface and you can write your own class that implements this. This would allow you to do embedded translation and embedded interpretation. It's not the same as code actions, as the action code isn't in the grammar, but in this other class. I reckon this could well be better since the code inside actions often swamp grammars.

They talked a bit about the ability to embed one language in another and switch the parsers over to handle this gracefully - heading into territory that's been explored by Converge. We didn't look at this deeply but that would be interesting.

An interesting tidbit they mentioned was that originally they intended to only have the tools for graphical languages. However they found that graphical languages just didn't work well for many problems - including defining schemas. So they developed the textual tools.

(Here's a thought for the marketing department. If you stick with the name "M" you could use this excellent film for marketing inspiration ;-))

Comparisons

Plainly this tool hovers in the same space as tools like Intentional Software and JetBrains MPS that I dubbed as Language Workbenches in 2005. Oslo doesn't exactly fit the definition for a language workbench that I gave back then. In particular the textual component isn't a projectional editor and you don't have to use a storage representation based on the abstract representation (semantic model), instead you can store the textual source in a more conventional style. This lesser reliance on a persistent abstract representation is similar to Xtext. At some point I really need to rethink what I consider the defining elements of a Language Workbench to be. For the moment let's just say that Xtext and Oslo feel like Language Workbenches and until I revisit the definition I'll treat them as such.

One particularly interesting point in this comparison is comparing Oslo with Microsoft's DSL tools. They are different tools with a lot of overlap, which makes you wonder if there's a place for both them. I've heard vague "they fit together" phrases, but am yet to be convinced. It could be one of those situations (common in big companies) where multiple semi-competing projects are developed. Eventually this could lead to one being shelved. But it's hard to speculate about this as much depends on corporate politics and it's thus almost impossible to get a straight answer out of anyone (and even if you do, it's even harder to tell if it is a straight answer).

The key element that Oslo shares with its cousins is that it provides a toolkit to define new languages, integrate them together, and define tooling for those languages. As a result you get the freedom of syntax of external DomainSpecificLanguages with decent tooling - something that deals with one of the main disadvantages of external DSLs.

Oslo supports both textual and graphical DSLs and seems to do so reasonably evenly (although we spent more time on the textual). In this regard it seems to provide more variety than MPS and Intentional (structured textual) and MetaEdit/Microsoft's DSL tools (graphical). It's also different in its textual support in that it provides real free text input not the highly structured text input of Intentional/MPS.

Using a compiled grammar that plugs into a text editor strikes me as a very nice route for supporting entering DSL scripts. Other tools either require you to have the full language workbench machinery or to use code generation to build editors. Passing around a representation of the grammar that I could plug into an editor strikes me as a good way to do it. Of course if that language workbench is Open Source (as I'm told MPS will be), then that may make this issue moot.

One of the big issues with storing stuff like this in a repository is handling version control. The notion that we can all collaborate on a single shared database (the moral equivalent of a team editing one copy of its code on a shared drive) strikes me as close to irresponsible. As a result I tend to look askance at any vendors who suggest this approach. The Oslo team suggests, wisely, that you treat the text files as the authoritative source which allows you to use regular version control tools. Of course the bad news for many Microsoft shops would be that this tool is TFS (or, god-forbid, VSS), but the great advantage of using plain text files as your source is that you can use any of the multitude of version control systems to store it.

A general thing I liked was most of the tools leant towards run-time interpretation rather than code generation and compilation. Traditionally parser generators and many language workbenches assume you are going to generate code from your models rather than interpreting them. Code generation is all very well, but it always has this messy feel to it - and tends to lead to all sorts of ways to trip you up. So I do prefer the run-time emphasis.

It was only a couple of hours, so I can't make any far-reaching judgements about Oslo. I can, however, say it looks like some very interesting technology. What I like about it is that it seems to provide a good pathway to using language workbenches. Having Microsoft behind it would be a big deal although we do need to remember that all sorts of things were promised about Longhorn that never came to pass. But all in all I think this is an interesting addition to the Language Workbench scene and a tool that could make DSLs much more prevalent.


16 September 2008ObservedRequirement
12 September 2008EvolutionarySOA
9 September 2008DslQandA
4 August 2008DslBookRoadmap
14 July 2008MDSDandDSL
14 July 2008ModelDrivenSoftwareDevelopment
7 July 2008IncrementalMigration
26 June 2008AgileVersusLean
24 June 2008SegmentationByFreshness
9 June 2008SyntacticNoise
20 May 2008ParserFear
12 April 2008SchoolsOfSoftwareDevelopment
8 February 2008CheaperTalentHypothesis
17 January 2008PreferDesignSkills
14 January 2008RepositoryBasedCode
6 December 2007TestCancer
4 December 2007BookCode
28 November 2007GroovyOrJRuby
9 October 2007AltNetConf
9 September 2007RollerSkateImplementation
7 September 2007DoctorWho
6 September 2007TimeZoneUncertainty
4 September 2007CustomerLoyaltySoftware
2 September 2007IsChangingInterfacesRefactoring
28 July 2007OneLanguage
28 July 2007RubyMicrosoft
27 July 2007InstallingOpenArchitectureWare
13 July 2007DslReadings
12 July 2007UiPatternsReadings
20 June 2007DesignStaminaHypothesis
13 June 2007DuplexBook
30 May 2007HelloRacc
22 May 2007RailsConf2007
13 May 2007HelloCup
10 May 2007Translations
26 April 2007OutputBuildTarget
26 April 2007TouchFile
26 April 2007PendingHead
17 April 2007FlexibleAntlrGeneration
3 April 2007NetNastiness
26 March 2007EmbedmentHelper
18 March 2007Transactionless
7 March 2007HelloAntlr
11 February 2007HelloSablecc
Links
home
bliki
feed 
Translations
Japanese
Spanish
Korean
Chinese
Thai
Categories
agile
design
dsl
leisure
refactoring
ruby
thoughtWorks
tools
uml
writing
Blog Roll
ThoughtBlogs
TW Alumni
Nicholas Carr
Steve Cook
Brian Foote
Simon Harris
Gregor Hohpe
/\ndy Hunt
Ralph Johnson
Patrick Logan
David Ing
Brian Marick
Jeremy Miller
Jimmy Nilsson
Samuel Pepys
Keith Ray
Johanna Rothman
Kathy Sierra
Dave Thomas

martinfowler.com logo mingle logo thoughtworks logo

© Copyright Martin Fowler, all rights reserved