William E. Caputo
ThoughtWorks
Oren Miller
ThoughtWorks
July 2002
The Continuous Integration principles are applicable in many different development environments. While the principles remain largely unchanged from project to project, the implementation of this practice can vary considerably. Variables such as language, platform, team size & team location provide unique implementation challenges. Here we will outline how we've implemented CI in a COM/Windows environment for a project developing primarily in Visual C++.
These are the actions we identified that needed to happen for us to have a successful build:
One issue that presented additional difficulty using Visual C++ was the location of SDK libraries and binaries. These files must be installed into the correct folders, discovered by the build process in a consistent order, and registered properly. Because of this, we found there is a minimum standard setup that a target machine had to have in addition to being "clean."
We found the following were best installed prior to introducing a machine to our project:
Since technically these SDK files are part of the "source" we had considered placing them with the rest of the code base in the repository. For pragmatic reasons, it was easier to make all machines conform to this standard, than to extract the library code needed from these SDK's. Also, this ensures that as updates and Service Packs are issued by Microsoft, they can be installed across all development and build machines consistently.
We did keep other third-party libraries in the repository. These were checked out and built just like our project-developed code.
As noted above, the first step in getting a successful build, is actually building the code. When we began implementing CI for this project several constraints confronted us immediately:
These constraints led to the decision to write scripts that manipulated MSDEV via its command line interface. MSDEV is the name of the DevStudio IDE executable. It uses generated Make-like files in a proprietary format to manage its build environment.
MSDEV has a simple, yet flexible command line syntax.
msdev FileName [/MAKE "ProjectName - ConfigName | ALL"] [/REBUILD /CLEAN /NORECURSE /OUT LogFile /USEENV] Where FileName is the name of your project (.dsp) or workspace (.dsw) file. For example, the following syntax deletes all intermediate files and then builds a project called MyProject: msdev MyProject.dsp /MAKE "MyProject - Win32 Debug" /REBUILD
From: Building a Project from the Command Line
-- [MSDN Library-MSDEV]
We quickly created a Windows Scripting Host(WSH) script that could traverse the source tree, build and execute command line MSDEV commands, and monitor the success or failure of that execution.
Using MSDEV allowed us to build the entire source tree using a single script, and still use the IDE for configuration of project settings, and project dependencies (thus allowing the developers to continue building from within the IDE).
We had satisfied our initial build requirements!
This was a quick solution, but there were issues, the largest being the management of dependencies, and project settings. MSDEV uses two files to manage its projects: A workspace file (.dsw) and a Project file (.dsp). Unfortunately, the workspace file defines how projects depend on one another, so one cannot simply include a project as a dependency in a workspace and have that project's dependencies managed without adding them to the workspace as well. Worse, each .dsp has its own build settings, causing a large maintenance burden.
Here are some strategies that deal with these issues:
We chose the final option because it fit within our constraints. What we found however, is that while this worked well in the short term, as we refactored our code, we started migrating more toward the first option (fewer, larger workspaces). As time went on, we found the Ant and Make options much more attractive. Given the option, I would recommend using Ant and/or Make from the beginning. However, if you are faced (as we were) with an existing file structure centered on the dsw/dsp setup, the MSDEV command line option can ease your migration headaches.
So, as the code base grew we migrated to using Ant and CruiseControl to call MSDEV. This gave us more flexibility than the simple script, and the developers could continue using the MS Visual Studio environment as they had been.
We evolved several standards that supported CI as the project progressed. Here is a list of the more important in no particular order:
Commands: copy $(InputPath) $(DEVROOT)\Bin copy $(ProjDir)\*.tlb $(DEVROOT)\Include copy $(ProjDir)\*.h $(DEVROOT)\Include\ regsvr32 /s "$(DEVROOT)\Bin\$(InputName).dll" echo regsvr32 exec. time > "$(OutDir)\install.trg" Outputs: $(DEVROOT)\Bin\$(TargetName).dll $(OutDir)\install.trgWhere DEVROOT is the environment variable defined as part of our standard.
All of these things are necessary for any project. The key points to keep in mind are:
We chose CPPUnit as our Unit Testing framework. Initially we simply used the existing text ui testRunner that comes with CPPUnit, as it works well for single Visual Studio projects. As our code base grew we wrote a new runner that allowed us to better manage dependencies and provided more flexible test discovery, thus facilitating its integration into the CI framework we had built.
As mentioned above we used Ant and CruiseControl to provide us with automated builds. This tied together monitoring the code repository, building our components, building our database, testing the code, and communicating the status via a web site and email. Here is how we did it:
First off, keep in mind what Ant is not: Ant is not a complete substitute for MAKE. Specifically, Ant does not provide the time-stamp checking that MAKE provides. As a result, if you are going to build C++ code with Ant, you will need to call a tool in your task that does this checking for you. Since we were using MSDEV (which is a MAKE-like tool in its own right) to do our actual building, what we ended up using Ant for was the management of the various build steps, and to efficiently walk the directory tree, calling MSDEV where appropriate.
In case you are wondering, this is not ideal. It means managing both an Ant script, and the dsp files. However, our Ant script was very simple, consisting only of the high-level build steps like building the code, building the database, stopping and starting services, registering COM dll's etc. The careful reader will have noted that above we mentioned that we registered the dlls from within the dsp files. We have begun migrating that to the Ant scripts to reduce the maintenance burden on the .dsp files. Since Ant concerns itself only with high-level stuff, MSDEV manages the file-level dependencies. This decentralizes the responsibility, and keeps the Ant configuration very stable. This was important to us, as almost all of the developers on this project were new to Ant and Java, but had worked with VC++ for a long time.
As our code base grew we considered migrating some of the build responsibility up into the Ant scripts. CL (the Visual C++ compiler) provides time-stamp versioning so it can be called from an Ant target and provide the same results as Make. Such a change would have increased the maintenance of the Ant scripts, but also would have provided a reduction in the number of workspaces, and projects that had to be maintained.
CruiseControl provided our project with several benefits. Among these were monitoring changes in the code base, organizing the activities we wanted to happen every time someone checked in, and reporting the results. Because CruiseControl is open source, we were able to customize it as our needs dictated.
Organizing the build activites was only part of the benefits CruiseControl provided. Perhaps even more signficantly, was the continous testing, reporting, and deployment that we able to do. Our unit tests, and our customer-written acceptance tests were also run after each successful build. Deployment can be automated with CruiseControl as well, thus taking the guesswork out of this critical task.
If you are unfamiliar with CruiseControl I suggest you check it out. It makes these central activities of Continuous Integration much easier to manage.
As mentioned above, we chose StarTeam as our source repository. One of the reasons for this was because the StarTeam API, while sometimes awkward, allows complete scripting of its functionality and yet can still be integrated with Dev Studio. The modification set we developed to interface with StarTeam has been folded into the default CruiseControl package.
Because of several differences between the debug and release builds of our application, we chose to actually run two separate build machines, one for each configuration. Both instances of CC monitored the repository for changes, and notified developers when builds were complete. When the release build ran correctly, a final task copied the binaries to a deployment staging area so the application could be deployed into any one of several test and production environments.
Finally, since CruiseControl produces XML logs of its output, and uses XSLT to create its reports, it was easy for us to group information by story card as to what tests were passing, and what were not. Thus we were able to give the customer exactly the kind of information that was most useful to him.
More than just providing build status information, the CruiseControl data related to all aspects of our project, including project status, and feature completion status. Up-to-date class diagrams thanks to auto-generated documentation via Doxygen was also maintenance free. Far from simply being of interest to developers, the data generated and displayed by our CruiseControl installation provided much needed communication to project managers, customers, testers, and even the executives - continuously and accurately.
Here is an example of what our acceptance test results looked like after CruiseControl ran the tests:
An example of our build page (Note the builds are named, not numbered)
Like most things, CI is harder to manage when using C++. The main reason is the need to manage physical dependencies (which you should be doing anyway). If you haven't done so, I recommend you pick up a copy of Large Scale C++ Software Design., by John Lakos. This book provides good coverage of how to manage the physical design of your application.
One thing we didn't foresee when we began setting up Continuous Integration on this project was how our physical layout for our then relatively small code-base was going to become a bottleneck as our project grew in size. But then an interesting thing happened. By working to make our system build continuously, cleanly, & automatically we were forced to confront many of the issues and pitfalls that Lakos outlines as problems for larger projects.
Even more importantly, when we needed to change our physical structure, we could do so -- continuously because we were never more than about 30 minutes from feedback on our changes. Also, the pain of increasing build time, right there in front of us, gave us ample warning when our physical design was in need of refactoring, before it became critical.
You must pay attention to your code's physical layout when implementing Continuous Integration when using C++, but you have to manage this anyway for an application with more than a few files and components. By implementing CI, you address this issue early in the project, and have constant feedback on the health of your physical design.
As Martin points out in his article, "Is Design Dead?", there are some XP practices that provide direct benefit, but only if you do the enabling XP practices. CI enables the XP principle of evolutionary design Do The Simplest Thing That Can Possibly Work. Since you have constant feedback on the "health" of your physical design, you can limit the upfront thinking on this topic to that necessary to get Continuous Integration working, and then refactor (a third XP principle) your physical layout as your design needs change and grow. As I mentioned above, CI encourages you to adopt good habits early, so this can be a big win for your project
There is one dependency pitfall specific to Visual C++ and COM that I should mention: COM can cause compile-time dependencies in your code beyond those mentioned in Lakos' book. One of the benefits of COM, and one of its big selling points is that clients of your COM interfaces remain blissfully ignorant of your implementation -- including its physical location. So, how does this present a compile-time dependency? The answer is via the #import statement.
Microsoft provides a language extension to C++: #import. When the pre-processor encounters this directive, it resolves and locates the given type library, and generates two files a .tli file, and a .tlh file. The purpose of these two files is to provide the using component access to: a calling syntax similar to VB for the methods on the contained interfaces, the CLSID's and other GUID's needed to create the resources in the type library, support for intellisense, and other symbols exported from the .dll (e.g. enumerations).
In short, it provides a lot of syntactic sugar that can make working with COM easier - or more complex. So why use it? Well, if you want the syntactic assistance its nice, but there are two other more insidious reasons why it can be difficult to avoid this statement: First, ATL uses this statement a lot, and second, there is a small but measurable performance gain when using the CLSID's because you avoid a trip to the COM subsystem when creating instances.
Because the physical file must be discovered at compile-time and because COM objects are often used by high-level components to access lower-level components, its easy to get long dependency chains in your code.
So, what do you do?
The purported benefits of #import are convenience and performance. Convenience is a personal choice. Personally, I think the #import statement introduces almost as many complexities as it removes, so convenience is a minimal benefit. As for performance, pre-optimization is the source of much wasted time on a project. Use CoCreateInstance with the ProgID's when building your system. If you follow the old adage of "Make it work, make it right, make it fast", you will turn to performance tuning after you implement a feature. When performance tuning, measure your system, define your bottlenecks, and then if you find that extra API call is causing significant performance loss, resort to the #import statement.
If you need the benefits of the #import statement, my recommendation is that you apply the Dependency Inversion Principle. Pull your .idl files up high in your physical hierarchy, perhaps keeping them all in one workspace for easy reference, and have your low-level interfaces inherit from them with another MS language extension, the #importlib statement. You can use this statement inside your IDL library definitions to bring in the high-level interface definition. Now your high-level module and your low-level interface are physical peers, equally dependant on a higher level abstraction.
No project is perfect. Why? Because every time we humans think we have everything figured out, reality steps in and reminds us of how imperfect we are. Or as a great poet once said: "The best-laid schemes o' mice an 'men Gang aft agley". In short, we make do with what we got.
This project was no exception. Implementing Continuous Integration presented us with several challenges and some of our solutions would not necessarily be considered ideal. Keep in mind however, that this configuration worked well enough to provide a project started with no intention of practicing Continuous Integration the benefits thereof all the way through its first release.
The key to making that happen was recognizing that perfect solutions unattained are of less value than imperfect solutions that work right now. In other words, shoot for perfection, but know when good enough is good enough.
That said, reflecting on what one can do better is a valuable exercise both for the improvement of an existing project, and as a means of moving toward more general solutions. This section attempts to do that.
For starters, things would have been much easier if we had been continuously integrating from the first line of code. This can't be stressed enough, so I will say it again: Start implementing your CI infrastructure on day 0 of your project. It will be faster and easier to do. The best part is you can start simply, and refactor your CI implementation along with your physical layout and your code.
Another thing we felt hindered us was never being able to completely migrate to a build configuration tool such as Make or Ant. The Workspace/Project file setup in Visual Studio makes CI more difficult, particularly with more than a couple of developers. We settled on a hybrid approach because people were more familiar with the IDE way of doing things, but CI is simpler if you just use a single file management solution, and dsp's don't cut it.
Finally, while we did a decent job managing dependencies, we could have benefited from more detailed measurements on compile times. A good dependency analyzer, and some enhancements to our Ant scripts could have given us this information. Aggressively refactoring code toward, and judicious selection of, physical design best-practices can go a long way toward making dependency management continuous and painless.
As you can see from this article, there is little, if any conceptual difference between doing CI on a COM/C++ project and doing CI on a Java project. It all comes down to automating check-in, build, test, report, check-out in as tight a feedback loop as you can manage. There are several implementation items that are unique to COM and VC++, this article tries to cover those that might cause some hang-ups the first time you try to apply Continuous Integration to your project.
A successful build includes more than just compiling and linking. You must also ensure that COM components get registered correctly, your tests are passing, and external dependencies like services and databases are in their proper configuration.
It is imperative that all the files needed to set up your system, are kept in a single place. A source code repository is a vital part of Continuous Integration. It is possible that there are certain things that represent a minimal configuration that a machine must have prior to building your system. Items like Service Packs, and SDK's may be better installed once on the build machine.
When compiling and linking your code, you need to be able to manage your files effectively, and yet still be able to build from an automated script. Tools such as MAKE and Ant offer this functionality, but present problems when used in conjunction with the MS DevStudio IDE. Msdev.exe is a viable alternative. You can drive this executable from batch or WSH scripts, or call from a simple Ant task. MSDEV provides some answers when you need to balance usability of the Microsoft tool set, and your implementation.
Tools like Ant and CruiseControl can significantly reduce the ramp-up time for getting Continuous Integration going on your project. Using CruiseControl to coordinate our builds gave us ample room for customization, and freed us from much of the detailed mechanics of getting CI up and running.
The key to successfully implementing CI in C++ is dependency management. Continuous Integration forces you to consider your physical layout early in the project. At the same time, CI frees you from spending too much time on this critical aspect of your design - as you will have constant feedback regarding your build times, and the impact your physical changes have on them. An additional caveat in COM development is the impact of the #import statement on your physical design. A good strategy is to reserve this statement as an optimization strategy.
Finally, keep in mind that no project is perfect, nor are any two CI implementations going to be identical. What counts are the principles of Continuous Integration, and the benefits it can bring to your project. Always consider specific solutions to be a result of the forces at work in a specific context. Keep an eye for improvements and use them to eliminate your biggest issues.
As you implement CI in Visual C++ I hope that you find the pitfalls and solutions outlined here useful. Suggestions on ways to improve this article, obvious errors and omissions, and general feedback are always welcome.
Good luck!