I've been using Git for nearly ten years now. Ten years is a long time, and I've been able to try different approaches and evaluate how effective they are in my workflow. I've also had the opportunity to teach Git to others; both to colleagues in an informal environment, and to students in the more structured environment of the Casimir graduate school. This experience has given me the chance to reflect on the Git workflow and how best to use the tool.
There's one question in particular which often comes up among people who have used Git for a while, and
there never seems to really be any concensus on how to use it properly:
Let's start with a quick recap of what
git rebase does for us. Let's say that we're developing a new
feature on an aptly-named branch:
◯—◯ ← feature ╱ ◯—◯—◯ ← master
We then pull in some changes from master, so that the histories for the master and feature branches are now divergent:
◯—◯ ← feature ╱ ◯—◯—◯—◯—◯ ← master
Now, if the changes made on
master were made to the same places in the same files as the
feature, then we know that when we finally merge our feature branch we're going
to get conflicts. It's a general rule that the longer that you leave a branch un-merged, the
more likely it is that you are going to get conflicts. Generally, while we're developing on
feature we're going to want to incorporate the changes from
master every so often, so
that we don't have to deal with all the merge conflicts at once during the final merge.
At this point we have 2 options for incorporating the changes from
◯—◯—◯ ← feature ╮ ╱ ╱ │ merge ◯—◯—◯—◯—◯ ← master ╯ ◯—◯ ← feature ╮ ╱ │ rebase ◯—◯—◯—◯—◯ ← master ╯
See what we did? Rebase allows us to "chop" the link attaching the base
feature branch and re-attach it (re-base geddit?) to the commit
master is pointing now.
Then we add a couple more commits and merge:
◯—◯—◯—◯—◯ ← feature ╮ ╱ ╱ ╲ │ merge ◯—◯—◯—◯—◯———————◯ ← master ╯ ◯—◯—◯—◯ ← feature ╮ ╱ ╲ │ rebase ◯—◯—◯—◯—◯—————————◯ ← master ╯
rebase in this way allows us to maintain an almost-linear history (i.e. we could
always fast-forward when merging instead of creating an explicit merge commit), which makes
it easier to understand what we've done.
The above usage of rebase is pretty uncontentious; you start to get divided opinions when you start talking about interactive rebase, which allows us to rewrite history in more exotic ways. For example, we can use interactive rebase to re-order commits or squash them together:
A B C D ◯—◯—◯—◯ ← feature ╱ ◯—◯—◯—◯—◯ ← master C' B' A+D ◯——◯———◯ ← feature ╱ ◯—◯—◯—◯—◯ ← master
Developing is an inherently iterative process; your understanding of a problem evolves as you work on the solution. This means that the logical separation of ideas may not become apparent until after the fact. Git rebase can help us express the logical set of changes, rather than the (convoluted) set of changes as they actually happened.
So what's the problem?
Rebase rewrites history. Each git commit contains a pointer to the parent commit(s), so when we rebase a set of commits they won't hash to the same values as they did before the rebase, even though the changeset may be the same.
This rewriting of history makes it problematic to use rebase on branches that are also being worked on by other people, and it's the generally accepted wisdom not to use rebase with any branch that you've pushed to a remote repository (i.e. made public).
My Git workflow
When conducting scientific experiments, one will typically keep a lab book, which contains notes, observations and key results as they occur. The goal of keeping a lab book is to make sure that you don't forget what you were doing. The goal of a lab book is, however, not to communicate results to a wider community. A lab book — despite being an accurate record — requires context to understand; it is messy, and does not present information in a way that someone without the relevant context can easily understand. A scientific article — on the other hand — is designed to disseminate information to a wide audience, and to give the necessary context to understand any conclusions. When doing science, both of these ways of working are necessary: an accurate recollection of what has been done, and then a reorganisation and reinterpretation of what was done.
In my daily work I use Git as both a lab book and a scientific article. When I am developing
a new feature or fixing a bug I will create a new branch, and then start experimenting; committing
whenever I make incremental progress towards my goal. This incremental progress will certainly include
many dead-ends and false starts, and that's fine. By committing early and committing often I can ensure
that any work I do won't be lost. However, when it's time to explain to other people bwhat I've done, it's
time to make sense of that history. This is when I'll go through my lab book of commits and use the
rebase to sequence everything into logical changes. When my changes are reviewed there will
typically be small fixups (refactoring, naming fixes etc.). During the review I make these changes
as separate commits, which makes it easier for the reviewer to see that I have applied their suggestions.
Once the reviewer is happy I do one final pass with interactive rebase to incorporate the changes
into the commits where they make the most sense. I then rebase on top of the branch into which I'm
merging and perform the merge using the
--no-ff option (to ensure that an explicit merge commit is made).
Enforcing this strategy for merging in changes has a few nice features. Firstly, the history is essentially
linear — any merges could have been "fast-forward" — which makes it easier to visualise in tools like
gitk. Secondly, preserving the individual commits from each merge means that anyone looking back in
history can see the logical set of changes that went into implementing a particular feature or bugfix.
Finally, cleaning up the commits (i.e. not merging the "lab book" into the master branch) means that
anyone looking back in history will not have to sift through endless trivia to get to the meat of a changeset.