How to remove or edit a commit in your git repository

So you just committed 15 things to your git repository and now you want to push your changes. Oops, commit #2 added your password file. Or perhaps you misspelled words in the commit message. Now, being a git expert, you think to yourself, I’ll just create a temporary branch, cherry-pick the commits that are correct, re-commit correctly the incorrect commit, and then merge/push my temporary branch instead of this flawed branch I have. That all works fine, but luckily there is a much easier way to do it.

> git rebase -i 

When you use ‘-i’ with rebase, you get an interactive edit screen that gives you an option for each commit _after_ the sha1 you enter. For each commit you can either delete the entry entirely or replace the word “pick” with “edit”. Deleting an entry causes that commit to lost. Editing a commit gives you the option to unstage some of the files, edit the commit message, or really, to do whatever you want to change it how you intended. If you edit, or the rebase has a conflict, after setting things how you like, just use rebase –continue to go on your way.

A few reasons braid is better than 40 lines of Rake.

I’ve been doing a lot of looking at the git subtree merge strategy for tracking remote projects. Ideally, our remote projects would be tracked with submodules. The issue is, we are developing our own libraries for use in our own internal projects. We commonly do major development on the libraries and the submodules at the same time. While submodules work, we find the overhead of having to repeatedly commit in each submodule and then in the parent project somewhat prohibitive to being productive. A good source code management system should help, not get in your way of being productive. While git helps and is out of the way for the most part, we found this not to be the case when developing our own submodules. (They are much better suited when you’re tracking a 3rd party library.)

That being said, the subtree merge strategy lets you use your repository more naturally. When you commit, everything is committed and when you pull, you don’t have to worry about updating the submodules to the correct state, switching to the master branch, or spending time worrying about whether or not you just broke the state of your working directory. On the other hand, subtree merges have a little bit more overhead to deal with when you set them up and when you want to push back to the library you’ve made changes to. Referring to the link above, it takes a number of commands to pull in the library and you need to understand what is happening.

The braid project has been set up to make managing subtree merges a little easier. It has been criticized for being too complex and able to be replaced with subtrees by themselves (see the 40 lines of Rake post.)

Here are a few reasons I think braid is not too complex.

  • It does diff’s correctly.

    In order to find the diff of your merged in subtree against the library, you can use git diff-tree. This is quite daunting to do correctly in a shell. In order to get a patch that really represents what the diff is, you have to do a diff against just the subtree, not the project root or path. You get this by using git rev-parse to find the pulled in tree:

    # you need the base revision of the last time you pulled the library in (via fetch) 
    # and did the merge.  The revision is not what you see in gitk or by browsing the parent
    # of the merge, you find it with rev-parse.
    $ git rev-parse :
    # next, you need the sha1 of just the subtree directory of the commit that you want to 
    # get the diff for (HEAD or a previous commit will work too)
    # this can be retrieved with ls-tree
    $ git ls-tree HEAD path/to/lib
    # after parsing out those two sha1 ids, you can use diff-tree to get the correct diff
    $ git diff-tree   --src-prefix=path/to/lib --dst-prefix=path/to/lib
    

    Part of the problem with a shell script is finding the sha1 ids. When was the last time you imported the library? What if there are multiple parents? Braid takes care of these things by storing some meta information about the subtrees it manages in a .braids file. It will save you a lot of time later.

  • Braid stores the merge strategy for you.

    When you shell script your solution, there is no meta information about your merge strategy. You can manually add additional scripts to merge with a squash or full type merge, but you’ll have to remember to use the correct script when you update your library. Braid takes care of this with it’s meta data again.

  • Braid caches for you.

    There is a problem with fetching huge remote repositories in git. The creator of braid pointed this thread out to me that outlines the problem. Braid can maintain a cache of the remote repository for you which allows you to use the subtree merge strategy effectively without worrying about git’s fetch problem.

Sharing git branches

I’ve been learning git lately. Here are a few tips for sharing branches I’ve collected during the past few weeks.

Create a branch

In git, branches are stored on your local machine. Even the commonly named “master”, is just a branch on your local machine. Master is simply set up to track a branch on a remote machine by default. To create a new local branch, you can do one of two things. Create the branch and then check it out, or do a checkout with a branch switch.

# using the branch tag
$ git branch newbranch
$ git checkout newbranch
# or just do it with checkout first
$ git checkout -b newbranch
# you can look at your local branches
$ git branch

Pushing a branch to the server

This is only applicable for using git with a central server to track changes. You can of course, use the same method to push a branch to another machine that isn’t being used as a central server (like another development machine) but I haven’t done that as much.

$ git push origin newbranch
# You can look at the new branch on the server
$ git branch -r

Checkout a remote branch

If someone else has pushed a branch to the server, you can check it out on your machine.

$ git checkout -b newbranch origin/newbranch

NOTE There is a –track switch on the checkout command you might need to add. For me, it has been doing that by default though.

NOTE After you push a branch to a server, your local branch isn’t set up to track changes on that branch. There may be another way to accomplish getting that set up, but I just check out the branch again.

$ git push origin newbranch
$ git checkout -b newbranch_renamed origin/newbranch

You could of course, delete “newbranch” and then re-checkout the branch to the same name. It’s important to realize that the names don’t matter really. Newbranch can be named differently on the local and remote machine.

Deleting branches

To delete a branch, just use the -d switch on branch.

$ git checkout master # or another branch
# do your merging before you do the delete 
$ git branch -d newbranch
# you can delete the remote branch after you have deleted the local one like this:
# the syntax is slightly confusing, but it works
$ git push origin :newbranch