Have you ever commited a big file? credentials for any service? Or any other sensitive data that you don’t want to be available for public ?
Step 1: Remove unwanted files
Use the project 🚀BFG
1
2
3
4
5
6
git clone https://repo1.maven.org/maven2/com/madgag/bfg/1.14.0/bfg-1.14.0.jar
alias bfg='java -jar ~/bfg-1.14.0.jar'
git clone --mirror git://example.com/some-big-repo.git
cd some-big-repo
bfg --strip-blobs-bigger-than 50M .
git reflog expire --expire=now --all && git gc --prune=now --aggressive
BFG Can perform other cleanups such as: Removing Passwords, Credentials & other Private data
Step 2: Check nothing has changed
You must have the original repo cloned into another folder and run this script
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#!/bin/bash
# PURPOSE: Compare if two directories have the same source code in all branches
# WHY?: Ensure that bgf has not altered the code after running
# HOW?: Easy! dir1 is the original untouched repo, dir2 is what have been altered by BFG.
# 1. From untouched directory, grab all the branches and iterate over them
# 2. Magic happens in compare_branch_and_sha: checkout to that branch and get an md5sum of all files for each directory.
# 3. The output is self-explanatory!
# Thanks to https://rtyley.github.io/bfg-repo-cleaner/ for that awesome and magic project :)
dir1=$1
dir2=$2
compare_branch_and_sha(){
cd $1
git checkout $3 > /dev/null 2>&1
sha_of_all_files=$(find . -type f -not -path "./.git/*" -exec md5sum {} \; | sort -k 2 | md5sum -|cut -d' ' -f1)
actual_branch=$(git rev-parse --abbrev-ref HEAD)
cd ..
src_resume="$actual_branch:$sha_of_all_files"
cd $2
git checkout $3 > /dev/null 2>&1
sha_of_all_files=$(find . -type f -not -path "./.git/*" -exec md5sum {} \; | sort -k 2 | md5sum -|cut -d' ' -f1)
actual_branch=$(git rev-parse --abbrev-ref HEAD)
cd ..
dest_resume="$actual_branch:$sha_of_all_files"
if [[ "$src_resume" == "$dest_resume" ]]; then
echo "Same $1:$2 branch $3"
else
echo "Different $1:$2 branch $3 ($src_resume!=$dest_resume)"
fi
}
cd $1
list_of_branches=$(git branch -a -l --format "%(refname)"|grep -v detached|grep origin|cut -d'/' -f4)
cd ..
while IFS= read -r branch; do
compare_branch_and_sha $1 $2 $branch
done <<< "$list_of_branches"
Resources
- https://gist.github.com/ssaid/1b95350441192bffbf4b2d422442c9b4
- https://rtyley.github.io/bfg-repo-cleaner/
