I've been doing a lot of looking at the git subtree merge strategy for tracking remote projects. Ideally, our remote projects would be tracked with submodules. The issue is, we are developing our own libraries for use in our own internal projects. We commonly do major development on the libraries and the submodules at the same time. While submodules work, we find the overhead of having to repeatedly commit in each submodule and then in the parent project somewhat prohibitive to being productive. A good source code management system should help, not get in your way of being productive. While git helps and is out of the way for the most part, we found this not to be the case when developing our own submodules. (They are much better suited when you're tracking a 3rd party library.)
That being said, the subtree merge strategy lets you use your repository more naturally. When you commit, everything is committed and when you pull, you don't have to worry about updating the submodules to the correct state, switching to the master branch, or spending time worrying about whether or not you just broke the state of your working directory. On the other hand, subtree merges have a little bit more overhead to deal with when you set them up and when you want to push back to the library you've made changes to. Referring to the link above, it takes a number of commands to pull in the library and you need to understand what is happening.
The braid project has been set up to make managing subtree merges a little easier. It has been criticized for being too complex and able to be replaced with subtrees by themselves (see the 40 lines of Rake post.)
Here are a few reasons I think braid is not too complex.
- It does diff's correctly.
In order to find the diff of your merged in subtree against the library, you can use git diff-tree. This is quite daunting to do correctly in a shell. In order to get a patch that really represents what the diff is, you have to do a diff against just the subtree, not the project root or path. You get this by using git rev-parse to find the pulled in tree:
- # you need the base revision of the last time you pulled the library in (via fetch)
- # and did the merge. The revision is not what you see in gitk or by browsing the parent
- # of the merge, you find it with rev-parse.
- $ git rev-parse <parent library revision of last merge from library>:
- # next, you need the sha1 of just the subtree directory of the commit that you want to
- # get the diff for (HEAD or a previous commit will work too)
- # this can be retrieved with ls-tree
- $ git ls-tree HEAD path/to/lib
- # after parsing out those two sha1 ids, you can use diff-tree to get the correct diff
- $ git diff-tree <lib sha1> <subtree sha1> --src-prefix=path/to/lib --dst-prefix=path/to/lib
Part of the problem with a shell script is finding the sha1 ids. When was the last time you imported the library? What if there are multiple parents? Braid takes care of these things by storing some meta information about the subtrees it manages in a .braids file. It will save you a lot of time later.
- Braid stores the merge strategy for you.
When you shell script your solution, there is no meta information about your merge strategy. You can manually add additional scripts to merge with a squash or full type merge, but you'll have to remember to use the correct script when you update your library. Braid takes care of this with it's meta data again.
- Braid caches for you.
There is a problem with fetching huge remote repositories in git. The creator of braid pointed this thread out to me that outlines the problem. Braid can maintain a cache of the remote repository for you which allows you to use the subtree merge strategy effectively without worrying about git's fetch problem.