Git[/SVN/Mercurial] and Growing a FOSS Community
4/11/2013 9:58 AM
Question: How do you get more developers to contribute to a free and open source software project? Contribution is the lifeblood of a FOSS community. Without contributions the community can’t grow beyond the initial project founders. People don’t just show up ready to work. They very likely start as users, even of the fledgling software before it really starts to take shape as the robust solution it could become.
Let’s approach the question of getting more developers involved as “software engineers” instead of as “community organizers” by asking a different question: Why do we use software versioning a.k.a. software configuration management tools?
A lack of consistent version control is one of the strangest things I’ve seen over thirty years. Development teams in the IT world didn’t rigorously use versioning tools for the longest time (and may still not in some places). Good software teams, whether developing sophisticated enterprise applications in-house, products for sale, or collaborating on open source software projects on the Web all use them unthinkingly.
At first blush, we use versioning tools because without then we would never be able to build the complex software solutions we build today (and really over the past several decades.) Without some form of software versioning, we couldn’t track experiments, or alternative lines of development for particular situations like a different chipset or screen size, or have two developers trying alternative approaches to a problem.
At this level, software versioning and configuration management tools makes it simple to keep source code “in sync” without complex manual tree-cloning that could easily loose track of all the differences that get introduced during a divergent set of experiments. It always allows us to know: which source artifacts were assembled into this running software.
Versioning tools bring scalability
Without such knowledge we could never scale software to more than a couple of users. That is the real reason good developers rigorously use such tool bases. One needs to know what software is executing to know what changes to make, to answer questions about unexpected behaviour, to evolve and bugfix the software, and to grow the software. Software is remarkably dynamic and invariably changes through use as people find issues, bugs, and want to extend the software in new ways. Without software versioning engines software can’t grow.
Without such tool support, we could never reliably build a known executable instance of software, whether it’s running a website or it’s a binary for broad distribution. The only way to know, is to rigorously manage the configuration of software versions, and the recipes that turn source files and other artifacts into executable software. Tools like Git, subversion, and Mercurial do this for us.
And this hints at the next requirement: the recipes. Whether you’re using simple makefiles or an integrated development environment binary representation of the recipe, the options set and the order of libraries, headers, and indeed the order of steps all critically define the working instance of a software executable. Without knowing how the software was constructed one cannot answer questions about its behaviour.
Scaling Products to Scaling Projects
That would mean one could never offer support cost effectively. If you had to inspect the binary and determine its source provenance, build structure, and configuration, costs for support would soar. If you can’t support the software it can’t evolve. This is true of in-house applications development, ISV products in wide distribution, mobile apps, and most especially widely collaborative development projects, a.k.a. open source software.
[Understand, any one that says “update because the new version should solve that problem” better know that it does for good reasons having everything to do with running a rigorous configuration management environment.]
Some folks go so far as to manage the configuration of the tool chains used to deliver the binaries. Switching versions of a compiler can introduce all sorts of differences from the higher order structures of libraries and preprocessor dependencies, down to default command options and actual binaries outputs, regardless of whether the language itself is backwards compatible on the compiler.
Every artifact that leads to the running executable instance of software needs to be captured and cataloged so that we can reliably rebuild the software to a known state.
So why is this even vaguely important to open source software and community development?
Without reliably being able to build software to a known state, no one can enhance it. If getting to a known state takes too much time because a FOSS project hasn’t made it easy, then a user that wants to contribute can’t without spending unreasonable amounts of time trying to build the software into a known state. This is before they even make the change, test the change, and then start to package the change as a contribution that a project may (no guarantees) chose to adopt. If there's ANY FRICTION in this pipeline the project runs the risk of losing the slim opportunity of an outside developer’s attention and time.
Without making change repeatably and reliably easy, the project can’t grow beyond the few people that know the magic incantation in all its complexity to turn a collection of source and build artifacts into a working executable instance of the software.
Best Practices are Universal
Good developers know this. It’s why software vendors adopted such tools. They would otherwise never have been able to scale support to their customer base appropriately. It’s why software configuration management practices were so lax in many enterprise IT shops -- they didn’t understand the relationship to support costs at the “end of the development life cycle” such front-end development tools provided at the start of the life-cycle, and historically such tools were themselves expensive and had to be justified. Git might be free as in beer today, but Aide-de-Camp (the first proprietary tool set that expressed the version management problem as change sets instead of versions) was serious money.
In today’s world, where there is a wealth of such configuration management tools available as free and open source software from versioning engines (CVS, svn, Mecurial, Git), to build engines, automated test engines (xUnit, JUnit) to fully integrated development and deployment environments (Eclipse), there is no excuse for not solving the problem of generating known software. The open source world is lucky enough that all the key components are available for free in forge sites such as Github, codeplex.com, SourceForge, Google Code.
Making it easy for other people to “make” your software to a known starting state will make it easy for them to fix and enhance your software. Making it easy to reliably get to a known state allows people to experiment with it and contribute. It’s not enough to make it “easy to fork”. It needs to be easy to build. It needs to be easy to test to a known state. (The test tree/harness/cases is part of the configuration － right?) These tool platforms are the only way a FOSS community can scale a community of users and developers to the success of the project in the same way that a software product team could scale development and support to the success of the product.
This is how developers on a FOSS project can get more developers involved. Make it easy to contribute by making the software easy to configure, build, and test to a known state. The more time you save outside developers that might be interested in contributing, the more time they have to work on the contribution they want to make, rather than losing time and possibly interest in trying to get past building the software.
If you're looking for more information, Kohsuke Kawaguchi who leads the Jenkins community gives a great talk on this whole idea of making it easy to participate. [Slides and video below.]