Git Alternates And Sparse Trees

I was working on a project that was kept in a large repository. Unfortunately I was working on two parts of it, each on a different branch. I wanted, therefore, two working directories, but I didn’t really want to pay the disk space cost of two checkouts, nor to have the entire project checked out in each directory. The answer:

  • Shared repositories
  • Sparse checkouts

You begin then with a clone of the upstream repository. It’s better not to run git clone to do this. Instead just point at upstream:

$ mkdir project-branch1; cd project-branch1
$ git init
$ git remote add origin UPSTREAM_URL

Now we’ll do configure a sparse checkout of selected directories (note that these paths are absolute relative to the repository root, and without the leading / would match anywhere – similarly to .gitignore paths):

$ echo "/sub/one" > .git/info/sparse-checkout
$ echo "/sub/two" >> .git/info/sparse-checkout
$ git config core.sparseCheckout true

Now let’s say this project is a local clone, and we don’t want to pay for the repository twice.

Let’s now make a local clone that references this first checkout, saving us the need to hold the same set of objects twice. Note carefully the number of parent directory markers used; alternates are done relative to the .git/objects directory, so you need at least three ../ markers to get to another repository.

$ echo "../../../project-local/.git/objects" > .git/objects/info/alternates

The magic sauce here is the use of “.git/objects/info/alternates” to setup the reference. You have to be a little careful when using alternates because orphan references in one repository are not necessarily orphan references in another repository. However, provided you make sure you copy all the branch references between the two, you should be okay.

$ git fetch
$ git checkout -b sparse-branch upstream/branch

In my case, I saved about 670M on the clone-cost and 1.7G on the working-directory-checkout-cost. Not to mention keeping the working directories nicely focussed on their purposes.

It’s very easy to forget you’re working in a sparse tree though. If you add a new directory in that sparse tree, but forget to add it to sparse-checkout, then next time get tries to checkout that directory (probably on a rebase or similar), git will actually remove that directory. It will never destroy anything permanently, since it only removes what has been checked in, but it will give you a nasty surprise.

Once you’ve made a sparse tree, you might find it hard to work out what directories are actually available. To help with that you’ll want the commands

$ git ls-tree HEAD
$ git ls-tree HEAD path/to/sub

Then just pick and choose which paths you want from those lists and write them in .git/info/sparse-checkout.

This entry was posted in FussyLogic and tagged , . Bookmark the permalink. Trackbacks are closed, but you can post a comment.

Post a Comment

You must be logged in to post a comment.