martinfowler.com logo Home Blog Articles Books About Me Contact Me ThoughtWorks

Design bliki


AbundantMutation, AccessModifier, Agiledox, AltNetConf, AnemicDomainModel, Annotation, ApplicationBoundary, ApplicationDatabase, AssetCapture, BuildLanguage, BuildingArchitect, CallSuper, CatastrophicFailover, CheaperTalentHypothesis, ClassInstanceVariable, ClockWrapper, Closure, Closures, CobolInference, CodeSmell, CollectionClosureMethod, CommandOrientedInterface, CommandQuerySeparation, ConstructorInitialization, ContextualValidation, CourtesyImplementation, CurrencyAsValue, CustomerLoyaltySoftware, DataClump, DataModels, DatabaseStyles, DecoratedCommand, DesignPayoffLine, DesignStaminaHypothesis, DesignedInheritance, Detestable, DiffDebugging, DirectingAttitude, DuckInterface, DynamicTyping, EnablingAttitude, EncapsulatedCollection, EnterpriseArchitecture, ErraticTestFailure, EvansClassification, EventInterception, EventPoster, FirstLaw, FixedLengthString, FoundationFramework, GangOfFour, GetterEradicator, HarvestedFramework, HeaderInterface, HierarchicDataModel, HistoryIsNotBunk, HollywoodPrinciple, HumaneInterface, ImplicitInterfaceImplementation, InMemoryTestDatabase, IntegrationDatabase, InterfaceImplementationPair, InversionOfControl, JAOO2005, JunitNewInstance, LanguageForLearningObjects, LayeringPrinciples, LazyInitialization, LocalDTO, MakingStubs, MinimalInterface, ModelDrivenSoftwareDevelopment, MultipleCanonicalModels, NetworkDataModel, OOPSLA2004, OOPSLA2005, ObjectMother, ObservableState, OneLanguage, OpenInheritance, OutputBuildTarget, POJO, PatternShare, PatternsAreNothingNew, PostModernProgramming, PresentationDomainSeparation, ProtectedData, ProvideServiceStub, PublicCsharpFields, PublishedInterface, RelationalDataModel, ReportingDatabase, RepositoryBasedCode, RoleInterface, Seal, SecurityAndDesign, Seedwork, SegmentationByFreshness, SelfEncapsulation, SelfTestingCode, SemanticDiff, ServiceCustodian, ServiceOrientedAmbiguity, SetterInitialization, SmalltalkBooks, SoftwareDevelopmentAttitude, SourceBasedCode, StaticSubstitution, StranglerApplication, SunkCostDrivenArchitecture, TechnicalDebt, TestCancer, TestDouble, TestDrivenDevelopment, TestInvariant, TestingResourcePools, TimeZoneUncertainty, TouchFile, Transactionless, TypeInstanceHomonym, TypedCollection, UbiquitousLanguage, UiPatternsReadings, UseOfXml, ValueObject, VotingMachines, Wardish, Web2.0, Xunit


ServiceCustodian design 14 November 2008 Reactions

Let's imagine a pretty world of SOA-happiness where the computing needs of an enterprise are split into many small applications that provide services to each other to allow effective collaboration. One fine morning a consumer service needs some information from a supplier service. The twist is that although the supplier service has the necessary data and processing logic to get this information, it doesn't yet expose that information through a service interface. The supplier has a potential service, but it isn't actually there yet.

In an ideal world the developers of the consumer service just asks the supplier service to develop the potential service and all is dandy. But life is not ideal - the sticking point here is that the developers of the supplier service have other things to do, usually things that are more important to their customer and management than helping out the consumer service team.

Recently I was chatting with my colleague Erik Dörnenburg and he told me about an approach he saw a client use to deal with this problem. They took a leaf out of the open source play-book and made all their services into internal open source systems. This allows consumer service developers write the service themselves.

I'm sure many readers are rolling their eyes at the visions of chaos this would cause, but just as open source projects don't allow just anyone to edit anything; this client uses open-source-style control mechanisms. In particular each service has a couple of custodians - people whose responsibility it is to keep the service in a healthy state. In the normal course of events the consumer developer wouldn't actually commit changes to the supplier source tree directly, instead they send a patch to the custodian. Just like an open-source maintainer, the custodian receives the patch and reviews it to see if it's good enough to commit. If not there's a dialog with the consumer developer.

As Erik knows well from his own open source work, reviewing a patch is much less effort than making a change yourself. So although the custodian approach doesn't entirely eliminate the problem of consumer developers needing to wait on supplier developers, it does a lot to reduce the difficulty. And again following the open-source model, a consumer developer can be made a committer once the custodians are comfortable. This still means that commits can get reviewed by the custodians, but avoids the custodians becoming a bottleneck.

Related to this was their approach to a service registry. We've seen a lot of fancy products being sold to provide service registry capabilities so that people can lookup services and see how to use them. This client discarded them and used an approach that combined wikis with some interesting data mining (more on that soon).


ModelDrivenSoftwareDevelopment design 14 July 2008 Reactions

Model Driven Software Development (MDSD) is a style of software development that considers itself as an alternative to the traditional style of programming. The approach centers itself on building models of a software system. These models are typically made manifest through diagrammatic design notations - the UML is one option. The idea is that you use these diagrams, to specify your system to a modeling tool and then you generate code in a conventional programming language.

The MDSD vision evolved from the development of graphical design notations and CASE tools. Proponents of these techniques saw graphical design notations as a way to raise the abstraction level above programming languages - thus improving development productivity. While these techniques and tools never caught on too far, the basic core ideas still live on and there is an ongoing community of people still developing them.

Although I've been involved, to some extent, in MDSD for most of my career, I'm rather skeptical of its future. Most fans of MDSD base their enthusiasm on the basis that models are ipso facto a higher level abstraction than programming languages. I don't agree with that argument - sometimes graphical notations can be a better abstraction, but not always - it depends on the specific cases. Furthermore To use MDSD you need tools that support RepositoryBasedCode, and these tools currently introduce a number of pragmatic issues in tooling - of which source control is the canonical example.

MDSD is surrounded by a terminological mess. One particular vision of MDSD is ModelDrivenArchitecture (MDA) which is an OMG initiative based on the UML. Many people in the MDSD community, however, don't think that MDA or UML is the right vision for MDSD. For a long time I would hear people talking about Model Driven Development (MDD) as the general concept and MDA as the OMG's specific vision. However the OMG has trademarks on several "Model Driven *" and "Model Based *" phrases - including MDD. As a consequence people have to be careful about what model driven phrase they use. I'm using MDSD as that is the title of a useful book on the topic.


SegmentationByFreshness design 24 June 2008 Reactions

One of the biggest issues with media websites is dealing with high amounts of traffic. Media is all about getting eyeballs, but if you get too many hits at once, slow performance can cause problems and damage your reputation. This problem is exacerbated by the bursty nature of this web traffic. You can be cruising along at a manageable rate, then get hit with a big news story which causes a big spike. One of our clients have seen spikes of two orders of magnitude in a matter of a couple of minutes.

The general solution in computing to speed up access to the same information is to use caches. If you keep requesting my home page the web server will build up a cache in memory so repeated requests avoid touching the disk.

It's easy to keep a cache for my website, because this page, like my entire site, is entirely static. Most media sites, however, contain a lot of dynamic content. You might not think there's much business logic on your average newspaper website, but once you start looking at advertising links, related stories, special features and the like, things get a good bit more interesting. A travel story to France might link to articles on french food, and advertising that knows that a web browser in Canada is interested in a holiday in the Loire Valley. Personalization makes this even worse, my personalized preferences should generate a personalized feature list on heavy red wines. Such logic is complex in its own right, it makes for a lot of computation with each request, and crucially it ruins most caching strategies.

The way to deal with this is to divide a page up into segments where each segment has a similar determination of freshness. The article on Loire travel can be relatively static, changing only to correct errors. A related article list which feeds off tags for "France" and "Loire" will change more often, but maybe only every few days. If we arrange this properly a request for a page with these two items may be able to gather everything from caches.

The most common way of doing this that I've seen is to form caches on the web server and assemble the page segments when the page gets hit. Tools like Sitemesh are a good option for this approach. As you write the page for 18th century loire delights, you include call-outs for sections like related articles. When you get the actual web request the web server takes the page and assembles the page from the separate pieces. Much of this can be cached in the web server, which avoids hitting the back-end domain logic and database.

An interesting possibility is to go even further and use the many caches that exist in the web itself. Most calls for this web page don't even reach my web server since my page gets cached many times along the way. If you build a web page dynamically and assemble it on the server, you have to take the hit to deliver the page. An alternative is to assemble the page on the client and then draw each segment from its own URL. Each segment could be cached in different places with different caching policies.

How might this work? We might store the static article content as XHTML at an URL like http://gallifreyTimes/travel/18-century-loire-delights. Inside that file we want to insert some related articles by looking up articles tagged with "loire" and "france". In the static page we put in a simple "a" tag.

  <a class = "relatedLinks" href = "relatedLinks/france+loire">Related Links</a>

In the header for the static page we link it to some javascript in a separate library file. When we download the Loire article the javascript runs and scans the article for elements with the right class: in this case an "a" element with the "relatedLinks" class. (The behavior library is a good way to do this.) When it finds the element it uses the information in the element to synthesize an URL for that segment. In this case it would use what's in the element's href attribute to come up with an URL like http://gallifreyTimes/relatedArticles/france+loire. Once it's got that URL it then gets the content and uses it to replace the original "a" element. Since the related articles list is handled through an URL, other gets on that URL cause caches through the Internet to warm up, so there's a good chance that retrieving the page may never cause a hit on the original server.

This technique of using Javascript to replace a placeholder element with more content is a form of Progressive Enhancement. The descriptions I've found for Progressive Enhancement focus on adding features for accessibility with limited browsers. This example also has that benefit. If I browse the page with a browser that has no javascript, I'll get a useful link. The general idea behind Progressive Enhancement is that the basic page served is useful on basic browsers, then we use techniques such as javascript to add in more fancy features.

In the context of caching, the value is that each progressive enhancement weaves in a lump of HTML with different freshness rules. The original page is static, the related links change daily, but both can be cached independently and weaved together. I can do all sorts of additional elements, as long as I take care to keep segment the page by the freshness rules. So I could include a personalized weather forecast based on the user's profile to every page by having the javascript pick up the user id from the http session, using it to construct an URL like http://gallifreyTimes/personalWeather/martinfowler, retrieving the content (which would often be cached on my hard drive) and weaving it into the page.


CheaperTalentHypothesis design 8 February 2008 Reactions

One of the commonly accepted beliefs in the software world is that talented programmers are more productive. Since we CannotMeasureProductivity this is a belief that cannot be proven, but it seems reasonable. After all just about every human endeavor shows some people better than others, often markedly so. It's also commonly observed by programmers themselves, although it always seems to be remarked on by those who consider themselves to be in the better talented category.

Naturally better programmers cost more, either as full-time hires or in contracting. But the interesting question is, despite this, are more expensive programmers actually cheaper?

On the face of it, this seems a silly question. How can a more expensive resource end up being cheaper? The trick, as it is so often, is to think about the broader picture of cost and value.

Although the technorati generally agree that talented programmers are more productive than the average, the impossibility of measurement means they cannot come up with an actual figure. So let's invent one for argument sake: 2. If you can find a factor-2 talented programmer for less than twice of the salary of an average programmer - then that programmer ends up being cheaper. To state this more generally: If the cost premium for a more productive developer is less than the higher productivity of that developer, then it's cheaper to hire the more expensive developer. The cheaper talent hypothesis is that the cost premium is indeed less, and thus it's cheaper to hire more productive developers even if they are more expensive.

In case anyone hasn't noticed this hypothesis is a key part of our philosophy at ThoughtWorks and is one of the main reasons why I ended up switching from an independent consultant to join. We believe we actually end up cheaper for our clients, even though our rates were higher. Of course, we do have difficulty persuading many clients that this is true - that lack of objective productivity measures strikes again. I still remember a meeting with one prospective client complaining about how our rates were higher than a company who had made a previous, failed, attempt at the system we were bidding on. We had to politely point out that paying less rates for a project that delivered no value was hardly a financially prudent strategy.

There are some notable consequences to the the cheaper talent hypothesis. Most notably is one that it actually follows a positive scaling effect - the bigger the team the bigger the benefits of cheaper talent. Let's assume we actually have put together a team of ten talented developers to run a project in some alternative universe where we have actually measures that they are twice as productive as the average - and thus do cost exactly twice as much to hire. In this case you might naturally assume that a rival team of average programmers would be a team of twenty.

The trouble is that that assumption assumes productivity scales linearly with team size, which again observation indicates isn't the case. Software development depends very much on communication between team members. The biggest issue on software teams is making sure everyone understands what everyone else is doing. As a result productivity scales a good bit less than linearly with team size. As usual we have no clear measure, but I'm inclined to guess at it being closer to the square root. If we use my evidence-free guess as the basis then to get double the productivity we need to quadruple the team size. So our average talent team needs to have forty people to match our ten talented people - at which point it costs twice as much.

Another factor that plays a role here is time-to-market. Let's assume two teams of four people, one talented and one average. To stack the deck of our argument against our talented team, discount the previous paragraphs, and assume the talented team is only twice as productive as the average team. If the talented team charges twice as much then can we assume that it doesn't matter financially which team we pick?

I'm afraid the talented team wins again. They'll complete the project in half of the time of the average team, which means that the customer will start yielding value from the delivered software earlier. This earlier value, compounded by the time value of money, represents a financial gain for picking the talented team, even thought their cost per output is the same.

Agile development further accelerates this effect. A talented team has a faster cycle time than an average team. This allows the full team to explore options faster: building, evaluating, optimizing. This accelerates producing better software, thus generating higher value. This compounds the time-to-market effect. (And it's natural to assume that a talented team is more likely to produce better software in any case.)

Faster cycle time leads to a better external product, but perhaps the greatest contribution a talented team can make is to produce software with greater internal quality. It strikes to me that the productivity difference between a talented programmer and an average programmer is probably less than the productivity difference between a good code-base and an average code-base. Since talented programmer tend to produce good code-bases, this implies that the productivity advantages compound over time due to internal quality too.

All this sounds, at least to me, like a highly compelling argument. It's also one that's widely accepted (at least by programmers who consider themselves talented). But it's far off being accepted by the software industry as a whole. We can tell this because the premium for talented developers (in terms of salary/contracting fees) is less than the productivity difference. Probably the major reason for this the inability to objectively measure productivity. A hirer cannot have objective proof that a more expensive programmer is actually more productive. Only the higher cost is objective. As a result a hirer has to match a subjective judgment of higher value against an objective higher cost. Many hirers, even if they believe the talented programmer is worthwhile personally, isn't prepared to justify the full higher cost to managers, HR, and purchasing.

This effect is compounded by the difficulty in making even a subjective assessment. At ThoughtWorks we rely on peer assessment - developers abilities are assessed by fellow team members. The result is hardly pinpoint precision, but it's the best anyone can do.

Which all points out that hiring and retaining talented programmers is hard work. Hiring and assessment is hard work. You have to deal with people with very individual desires, which are even more important to track as they are effectively underpaid. So a hirer is faced with certain extra work and higher costs versus only a judgment call for higher productivity.

So I understand the situation but don't accept it. I believe that if the software industry is to fulfill its potential it needs to recognize the cheaper talent hypothesis and close the gap between high productivity and higher compensation.


RepositoryBasedCode design 14 January 2008 Reactions

An alternative to SourceBasedCode is the idea that the core definition of a system should be held in a model and edited through projections.

To talk about this style of environment I find it handy to think in terms of multiple representations of the system:

  • editable representation: what you edit in order to change the system.
  • storage representation: the persistent record of the system definition.
  • executable representation: what is executed to make the system run - the executable.
  • abstract representation: used to manipulate and reason about system definition.
  • visualization representation: a non-editable view of the system definition.

A source based system combines the editable and storage representations in the source file. It executes the source by transforming the source into an executable representation either in one observable step (interpretation) or multiple steps via a compiler. In order to do this it usually transforms the source into an abstract representation as an intermediate step, but this abstract representation is transitory and only around during compilation. The source is seen as the core definition of the system.

With a repository based system the abstract representation is the is core definition of the system. A tool manipulates the abstract representation and projects multiple editable representations for the programmer to change the definition of the system. The tool persists the abstract representation in a storage representation, but this is entirely separated from any of the editable representations that it projects. The relationship to the executable representation is pretty much the same - the executable is produced through a series of transformations from the abstract representation.

An important difference between repository and source based environments is the split between persistent storage and editing. Repositories can choose any persistence mechanism that they choose, while source systems need to have some universal storage mechanism - which is why they are almost always text files.

The abstract representation may be edited through multiple projections, each projection can show a limited amount of the total information which isn't tied to the actual structure of the abstract representation. Repository systems thus usually show a wider range of editing environments - including graphical and tabular structures - rather than just a textual form.

Sophisticated source based IDEs also show multiple projections - for instance a side pane showing a list of methods for a class with graphical annotations to indicate their AccessModifiers. However these projections are usually very much secondary to a source editor, and often the projections can't be edited directly - you have to change the source and see the projection update.

Such PostIntelliJ IDEs do this by creating an abstract representation when they load the source files (which is why they can take a while to start up). They also use the abstract representation to do perform lots of other code-assistance features such as contextual code completion and refactoring.

A significant pragmatic problem with repository based systems is the fact that there is no generally accepted format for the storage representation. The fact that programmer-readable text is the universal choice for source files means that a whole slew of tools can be built to process them: editors, source-code control, difference visualizers etc. Repositories have to do all this themselves, which is often why these things are often lacking. In particular many repository based environments suffer greatly because they don't have a decent configuration control system, which makes it much harder for multiple people to collaborate on the same system definition. This is a big contrast to source based environments that have a plethora of source code control systems to do this task.

Repository based systems are closely connected with Model-Driven Development (MDD), although I don't think the two are entirely synonyms. In an MDD context the abstract representation is usually referred to as the model. Certainly almost all MDD tools are repository based, but many all repository based tools, eg Microsoft Access, would not consider themselves to be MDD.

(I first explored this way of looking at environments in my essay on Language Workbenches. I've described it here because I think the notion of repository based environments is broader than just in Language Workbenches.)


TestCancer design 6 December 2007 Reactions

As my career has turned into full-time authorship, I often worry about distancing myself from the realities of day-to-day software development. I've seen other well-known figures lose contact with reality, and I fear the same fate. My greatest source of resistance to this is ThoughtWorks, which acts as a regular dose of reality to keep my feet on the ground.

ThoughtWorks also acts as a source of ideas from the field, and I enjoy writing about useful things that my colleagues have discovered and developed. Usually these are helpful ideas, that I hope that some of my readers will be able to use. My topic today isn't such a pleasant topic. It's a problem and one that we don't have an answer for.

The scenario runs like this. We carry out a project for a client and hand over a shiny new piece of software. As is our habit these days, we also hand over a bevy of automated tests for this software (typically there are as many lines of code of tests as there are of functional code). These tests are usually a mix of unit tests and broader ranging functional and acceptance tests. Either way the tests act as an active description of what the software does and a bug detector to quickly find problems as we evolve the software. We treasure these tests, they are a key to our success in building software systems.

Some months later the happy customer calls us back to do some further work on the software, adding new features and capabilities. We come in, keen to work on a code base that may have faults - but at least are our faults. Then we make an unpleasant discovery.

The tests no longer run.

Sometimes the tests are excluded from the build scripts, and haven't been run in months. Sometimes the "tests" are run, but a good proportion of them are commented out. Either way our precious tests are afflicted with a nasty cancer that is time-consuming and frustrating to eradicate.

We ask what happened and are told things like "we made a change and some tests broke, so we removed the tests". You can look at this as our failing - we haven't managed to fully teach the client teams about the value of the tests. We need to do more to pass on that failing tests need to be investigated, not simply ignored. But whatever anyone says, we've discovered that cancer of the tests is a common disease.

We don't think that the fact that Test Cancer appears is a reason against writing tests. Even if a particularly virulant strain wipes them all out the day after we leave, we still got value from them while we were building the system. And tests don't always get cancer. We recently spoke to a developer who had become a convert to TDD after maintaining a system we'd handed over a few years ago. The tests made our code much easier to work with than code that other firms had added later.


Links
home
bliki
feed 
Translations
Japanese
Spanish
Korean
Chinese
Thai
Categories
agile
design
dsl
leisure
refactoring
ruby
thoughtWorks
tools
uml
writing
Blog Roll
ThoughtBlogs
TW Alumni
Nicholas Carr
Steve Cook
Brian Foote
Simon Harris
Gregor Hohpe
/\ndy Hunt
Ralph Johnson
Patrick Logan
David Ing
Brian Marick
Jeremy Miller
Jimmy Nilsson
Samuel Pepys
Keith Ray
Johanna Rothman
Kathy Sierra
Dave Thomas

martinfowler.com logo mingle logo thoughtworks logo

© Copyright Martin Fowler, all rights reserved