git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git filter-branch --filter-renames ?
@ 2018-10-25 21:30 Jason Cooper
  0 siblings, 0 replies; only message in thread
From: Jason Cooper @ 2018-10-25 21:30 UTC (permalink / raw
  To: git

All,

I recently needed to extract the git history of a portion of an existing
repository.  My initial attempts using --subdirectory-filter, subtrees,
etc weren't as successful as I'd hoped.

The primary reason for my failures were due to the fact that this
particular source repository has seen a lot of code movement and renames
in-place.  As a result, filters such as subdirectory filter failed to
keep commits prior to the renames.

So, long story short, I've attached below a hacked together script (yes,
it's sad when one writes a script to call a script :-/) that solves the
problem for me.

My hope is that some other poor sob in my position discovers this
script, uses it and moves on.  If enough people think it's useful
despite the cornercases [1], I'd be happy to work on integrating it into
filter-branch.

thx,

Jason.

[1] Namely that if two different files held the same full-path name at
different times in the source repo, you'll get some errant commits in
the history.

------------------->8--------------------------------------------------
#!/bin/bash
#
# git-filter-renames: Similar to --subdirectory-filter but tracks renames
#
# Basic use:
#  $ git clone path/to/source_repo dest_repo
#  $ cd dest_repo
#  $ git tags | xargs git tag -d # ours are signed, so would fail to verify
#  $ git remote remove origin
#  $ git gc --aggressive --prune=now --force
#  $ git fsck
#  $ git-filter-renames.sh "[PREFIX] " fileA subdirB/ fileC subdirD/subdirE ...
#  $ rm -rf .git/refs/original
#  $ git gc --aggressive --prune=now --force
#  $ git fsck

DEBUG=1

if [ $# -le 1 ]; then
	echo >&2 "Usage:"
	echo >&2 "    ${0##*/} '[subj prefix] ' fileA fileB dir1 sub/dir2"
	echo >&2 ""
	exit 1
fi

if [ $DEBUG == 1 ]; then
	rm -rf /tmp/git-filter-renames-*
fi

TMP_DIR="`mktemp -d /tmp/git-filter-renames-XXXXXX`"

PREFIX="${1}"
shift

# take in the list of files to preserve
# note: directories are recursed
echo -n "" >$TMP_DIR/user_list.txt
for arg in $*; do
	if [ -d "$arg" ]; then
		find $arg -type f >>$TMP_DIR/user_list.txt
	elif [ -f "$arg" ]; then
		echo "$arg" >>$TMP_DIR/user_list.txt
	else
		echo >&2 "What the hell is '$arg'?"
	fi
done

echo -n "" >$TMP_DIR/trace_list.txt
while read fn <&4; do
	while read ofn <&5; do
		echo "^$ofn\$"
	done 5< <(git log --format=format: --follow --name-only -- "$fn" | \
		  sed -e '/^$/d' | sort -u)
done 4< <(cat $TMP_DIR/user_list.txt) | sort -u >>$TMP_DIR/trace_list.txt

# stage the filter script
cat >$TMP_DIR/filter.sh <<EOF
git ls-files | \\
	grep -vf $TMP_DIR/trace_list.txt | \\
	xargs -r git rm -qrf --ignore-unmatch
EOF
chmod +x $TMP_DIR/filter.sh

# stage the msg filter script
cat >$TMP_DIR/msg_filter.sh <<EOF
sed -e "1 s/^/$PREFIX/"
EOF
chmod +x $TMP_DIR/msg_filter.sh

# do the filtering
echo >&2 "Doing filtering"
git filter-branch --prune-empty -f --index-filter "$TMP_DIR/filter.sh" \
	--msg-filter "$TMP_DIR/msg_filter.sh" \
	HEAD
# cleanup
if [ $DEBUG == 0 ]; then
	rm -rf $TMP_DIR
fi

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2018-10-25 21:46 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-10-25 21:30 git filter-branch --filter-renames ? Jason Cooper

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).