For couple of weeks now, I’ve been looking into Continuous Integration with multiple branches with Jenkins/Hudson. I’ve searched the intertubes, talked to friends, colleagues and other nerds out there.
My greatest discovery was that although CI is about committing, pushing and getting feedback often (i.e. the CI cluster provides you with feedback that your workstation could never give you in the same amount of time), the true purist CI actually has one more requirement — that the team needs to work on the same baseline.
I wish there was some kind of tooling for not-so-purist CI and I’ll try to explain why…
First, the non-technical view of the problem
I work on products that are distributed as downloads. After a person downloads, it is up to them to update to a newer version if they want to. So I can’t push out quick fixes that get deployed to clients automatically — the software will inform the user when a new version is available, but there are no guarantees that they will upgrade.
This means we need to take extra steps in our quality assurance (QA). In contrast, if your product is offered as a service then most of the time you can push out changes and they are visible to all clients out there. No such luck for me. This limitation puts extra attention on QA.
We have created two main branches, DEV and STABLE. DEV is the mainline that we share. The CI cluster provides extra quick feedback on this branch. It runs a test-suite of our software, testing it on a handful of JEE containers out there (e.g. Tomcat, JBoss and Websphere series). If the tests succeed, then the code is pushed to the stable branch. The CI cluster then runs the test-suite on 50 container versions.
This approach would take the pressure to have a machine or OS that supports so many test environments away from devs. They get quick feedback if their push has broken any container that they did not test locally (e.g. “Did my Resin changes break anything on Glassfish?”)
Okay, I seem to have stuff figured out, so what’s the problem then? Well, if developers want to live in their own branch for a while and still get the benefits of automated testing on a large variety of environments, then they won’t be able to do that. They are not CI engineers.
Technical view of the problem
Jenkins/Hudson uses a notion of jobs. A job is something that is usually tied to a VCS URL and then gets built (run) when there is a change in the VCS repository. The build can do anything, from executing shell scripts to sending out tweets of the status of the build.
The build can produce all kinds of results. In our case, we use Maven and Maven artefacts are the results. The artefacts are stored in one repository for the DEV branch, and in another for the STABLE branch. This is because multiple DEV branch jobs can re-use prebuilt snapshots that won’t conflict with STABLE snapshots.
There are two set of jobs. The DEV jobs and the STABLE jobs. There are two because the VCS urls differ and the maven repositories differ and now we have a problem. The two set of jobs needs to stay in sync. So if we change a job in DEV (e.g. add a new step), we’d want to change a job in STABLE and vice versa.
If devs want to live in their own branch, they need to sync any jobs that they duplicate. And now the job management is just getting out of hand:
- They start the duplication of jobs
- They try to keep them in sync
- They start debugging the jobs (e.g. “Why is there a port race between SAP NetWeaver and OC4J9?”)
- The delete the jobs after integration
Instead of developing the feature, the dev is now becoming a CI engineer. Shouldn’t he just get fast feedback from a large variety of environments without any hassle?
The many posts that I’ve read through suggest writing the tooling for branch management. Shell scripts and ANT tasks that duplicate jobs on the filesystem level are popular. Scripts using the Jenkins/Hudson API are also a good choice.
But what if I don’t want to write another set of tooling? This is something that should be provided by the CI software I’m using. I’m sure I’m not the only one out there using branching and wanting CI support for it.
The CI purists will say that I’m doing it wrong and that I should either drop the feature branches or live with the problems. But does it have to be that way?
I’d really like to see a Jenkins plugin that will understand multiple branches, multiple Maven repositories and just deal with the problem. If CI necessarily implies a single branch, maybe we can change the name of these servers to Continuous Integration and Build Automation servers and we could throw out the implication?
We’ll have to wait and see…
- Multiple Branches Best Practices – post to Jenkins list
- Is it possible to handle multiple branches where some jobs should run on each one without duplicating – post to Jenkins list
- Parallel Development with Branches – post to Jenkins list
- Configure or Create Hudson Job Automatically – stackoverflow.com
- Continuous Integration with Multiple Branch Development – stackoverflow.com
- Handling Multiple Branches in Continuous Integration – stackoverflow.com
- Continuous integration – Martin Fowler
- Continuous integration – Wikipedia