Autarchy of the Private Cave

Tiny bits of bioinformatics, [web-]programming etc

    How to truncate git history (sample script included)

    28th March 2011

    Under a few assumptions (most importantly – you do not have any non-merged branches,), it is very easy to throw away git repository commits older than an arbitrarily-chosen commit.

    Here’s a sample script (call it e.g. git-truncate and put into your ~/bin or whichever location you have in PATH).


    #!/bin/bash
    git checkout --orphan temp $1
    git commit -m "Truncated history"
    git rebase --onto temp $1 master
    git branch -D temp
    # The following 2 commands are optional - they keep your git repo in good shape.
    git prune --progress # delete all the objects w/o references
    git gc --aggressive # aggressively collect garbage; may take a lot of time on large repos

    Invocation: cd to your repository, then git-truncate refspec, where refspec is either a commit’s SHA1 hash-id, or a tag.

    Expected result: a git repository starting with “Truncated history” initial commit, and continuing to the tip of the branch you were on when calling the script.

    If you truncate repositories often, then consider adding an optional 2nd argument (truncate-commit message) and also some safeguards against improper use – currently, even if refspec is wrong, the script will not abort after a failed checkout.

    Thanks for posting any improvements you may have.

    Source: Tekkub’s post on github discussions.
    See also: how to remove a single file from all of git’s commits.

    9 Responses to “How to truncate git history (sample script included)”

    1. Sean Flanigan Says:

      Thanks for the script!

      Git 1.7.1 doesn’t support “checkout –orphan”, so I found I had to replace
      git checkout –orphan temp $1
      with
      git checkout $1
      git symbolic-ref HEAD refs/heads/temp

      which may or may not be correct, but it /seems/ to work.

    2. Jimmy Soho Says:

      This doesn’t really truncate. It detaches all the commits up till refspec. But the commits remain present in the git tree. Any idea how to delete all the detached objects?

    3. Bogdan Says:

      If I remember it right, objects with no references to them are cleaned up during garbage collection (`git gc`); maybe there is a need to supply an extra option to gc (like `–prune`?).

    4. Mikael Lundin Says:

      Thank you. This helped me a lot. I neeeded to truncate history to get rid of secrets I had stored in the code, as passwords and app keys, before pushing the whole repository to a public github repo.

    5. Tom Says:

      Based on notes at the bottom of “http://git-scm.com/docs/git-filter-branch”, I tried the technique in the original post above, then cloned into a new repo, then gc pruned, and then it was much smaller.

      I haven’t tried the “graft then filter-branch then clone (and prune)” strategy. I presume it would give similar results?

      I also haven’t tried the “destructive” reflog shrinking technique mention in the Git docs.

    6. LaboDJ Says:

      It works modified like so in latest git version:

      #!/bin/bash
      git checkout --orphan temp $1
      git commit -m 'Truncated history'
      git rebase --onto temp $1 master
      git branch -D temp

    7. Bogdan Says:

      LaboDJ, I don’t see any differences to my code in the post…

      P.S. I’ve also edited both my post and your comment to properly show double-dashes.

    8. cy Says:

      Two things to add to that.

      git prune --progress
      git gc --aggressive

      the last might not be necessary. git prune will delete all the unreffed objects.

    9. Bogdan Says:

      @Cy – thanks, added both.

      P.S. Nice email address! :)

    Leave a Reply

    XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>