Git Subtrees and Dependency Management

Component-based development has always seemed difficult to manage directly in Git. Legacy systems like ClearCase UCM have the idea of formal baselines to manage the dependencies in a project, and of course Subversion uses externals to capture the same concept. By contrast, Git started life as a single-project repository system, and the submodule and subtree concepts seemed clunky and difficult to manage. A lot of software teams overcame that problem by deferring component manifests to build or CI systems. The latest incarnation of Git subtrees is significantly improved, however, and worth a second look for dependency management.

The latest version of Git subtree is available with Git 1.7.11+. (If you need the most recent version of Git for your platform, WANdisco offers certified binaries.) It offers a much simplified workflow for importing dependencies, updating the version of an imported dependency, and making small fixes to a dependency.

For example, let’s say we have three components in our software library, and we have two teams working on different sets of those components.

Component Architecture

Component Architecture

With subtrees, we can easily create new ‘super project’ repositories containing different sets of components. To get started, we add component repos as new remotes in the super project, then define the subtree.

 

git remote add modA git@repohost:modA
git fetch modA
git subtree add --prefix=modA --squash modA/master
git remote add modB git@repohost:modB
git fetch modB
git subtree add --prefix=modB --squash modB/master

We repeat this process with a different set of components in the second super project, yielding a directory tree that looks like this:

├───super1
│   ├───modA
│   └───modB
└───super2
   ├───modB
   └───modC

As the architect I’ve determined the set of components used in the super projects, and the rest of the team gets the right set of data just by regular clones and pulls. Similarly, if I want to update to the latest code, I just run:

git subtree pull --prefix=modB --squash modB master

Or, if I want to peg a component to a specific branch:

git subtree pull --prefix=modB --squash modB r1.1

By using –squash I generate a single merge commit when I add or update a subtree. That’s equivalent to one commit every time I adjust the version of a component, which is usually the right way to track this activity. Keep in mind that it is very easy to create a new branch off of a specific tag or commit at any time.

Similarly, if I want to contribute a bug fix, I just commit into the component and push the change back:

echo "mod b change from super 1" >> .\modB\readme.txt
git commit -am "change to modB from super 1"
git subtree push --prefix=modB modB master

There are a couple of good rules to follow when using subtrees. First, don’t make changes to a subtree unless you really want to contribute a bug fix or patch back upstream. Second, don’t make commits that span multiple subtrees or a subtree and the super project code. Both of these rules can be enforced with hooks if necessary, and you can rebase to fix any mistakes before pushing.

Git subtrees are now a very effective and convenient tool for component and dependency management. Combined with the power of modern build and CI systems, they can manage a reasonably complex development project.

Questions about how to take advantage of Git subtrees? WANdisco is here to help with Professional Git Support and Services.

 

 

1 Response to “Git Subtrees and Dependency Management”


  • Hi there!

    First of all, thanks for the article, was helpful.

    Would you mind extending the last paragraph to explain more about the good rules and what do you mean with rebasing before pushing as an alternative?

    Thanks again!

Leave a Reply