We took on a new client yesterday, I had the privilege of pushing their git repository to a remote server for the first time but quickly discovered that the repository was over 800mb. The code base itself was definitely smaller than 20mb, so I figured it had to be some large binary files in the history.
A quick search on the intertubes revealed a command called git-filter-branch, which
lets you effectively rewrite history. But I still needed to know what files were the culprits clogging
up my repository. I found
this answer on Stack Overflow which included a handy script that goes through the history and lists
all blobs that are above a specified size. It revealed an assets folder that had contained some large
videos.
I decided to nuke the entire assets folder from history with this command:
git filter-branch --index-filter 'git rm -r --cached --ignore-unmatch assets' HEAD
This effectively removes the references to those blobs from history, but the actual blobs are still there because git does not garbage collect them until they are more than 30 days old.
The next few lines are lifted directly from the git-filter-branch manpage at http://www.kernel.org/pub/software/scm/git/docs/git-filter-branch.html.
WARNING: These commands are DESTRUCTIVE. Make a backup first.
DISCLAIMER: We are not responsible for any damages caused by this blog post.
- Remove the original refs backed up by
git-filter-branch: saygit for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d. - Expire all reflogs with
git reflog expire --expire=now --all. - Garbage collect all unreferenced objects with
git gc --prune=now(or if yourgit-gcis not new enough to support argumentsto --prune, usegit repack -ad; git pruneinstead).
After all this, I managed to trim down the repository from 800mb to a respectable 80mb. I win.











