git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git filter-branch --subdirectory-filter not working as expected, history of other folders is preserved
@ 2016-10-10 13:42 Seaders Oloinsigh
  2016-10-10 15:30 ` Jeff King
  0 siblings, 1 reply; 5+ messages in thread
From: Seaders Oloinsigh @ 2016-10-10 13:42 UTC (permalink / raw)
  To: git

From

https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History

Now your new project root is what was in the trunk subdirectory each time.
> Git will also automatically remove commits that did not affect the
> subdirectory.
>

We have a git repository that looks like

sdk/
android/
ios/
unity/
windows/

Which we'd like to split into 4 repositories, 1 for each platform.  To
start this process (for splitting android out), I ran,

git filter-branch -f --prune-empty --subdirectory-filter android -- --all

Which rewrote a ton of history and commits, and looked like it worked, but
on closer inspection had left a ton of history behind.

If I run

git log --all -- unity/

It returns a list of commits that happened in the unity/ subfolder of the
original root.

commit c4ea2797...
> Author: tom... <tom@...ve.com>
> Date:   Thu Feb 25 14:20:59 2016 +0000
>
>     kick off build
>

> ...
>


Which only contains an edit to a file with path "unity/tom" relative to the
root *before* the filter-branch, doesn't exist any more, and from my
understanding of the docs, shouldn't have been taken across.

It's also not an isolated instance, if I run the same checks against
"ios/", "windows/", any file that existed in a folder other than "android"
to the old root, that history is also preserved.

I've just about resorted to running multiple other, explicit filters to
remove all references to those other folders, but it seems like this would
be redoing the job that I understood git filter-branch should have been
doing.

Any help in this regard is greatly appreciated.

seaders.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: git filter-branch --subdirectory-filter not working as expected, history of other folders is preserved
  2016-10-10 13:42 git filter-branch --subdirectory-filter not working as expected, history of other folders is preserved Seaders Oloinsigh
@ 2016-10-10 15:30 ` Jeff King
  2016-10-10 16:12   ` Seaders Oloinsigh
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff King @ 2016-10-10 15:30 UTC (permalink / raw)
  To: Seaders Oloinsigh; +Cc: git

On Mon, Oct 10, 2016 at 02:42:36PM +0100, Seaders Oloinsigh wrote:

> We have a git repository that looks like
> 
> sdk/
> android/
> ios/
> unity/
> windows/
> 
> Which we'd like to split into 4 repositories, 1 for each platform.  To
> start this process (for splitting android out), I ran,
> 
> git filter-branch -f --prune-empty --subdirectory-filter android -- --all

OK, so that should rewrite each ref to have only the contents of the
"android" directory at the top-level.

Note that filter-branch saves a copy of the old refs in refs/original.

> Which rewrote a ton of history and commits, and looked like it worked, but
> on closer inspection had left a ton of history behind.
> 
> If I run
> 
> git log --all -- unity/
> 
> It returns a list of commits that happened in the unity/ subfolder of the
> original root.

Here you asked for "--all", which includes refs/original. So you are
seeing the original, unwritten commits (and none of your new ones, of
course, because they do not have a unity/ directory!).

Try:

  git log --all --source -- unity

to see which ref each commit is coming from.

Or try:

  git log --branches --tags -- unity

to confirm that your branches and tags do not include that path.

Or just:

  git for-each-ref --format='delete %(refname)' refs/original |
  git update-ref --stdin

to get rid of the backup refs entirely.

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: git filter-branch --subdirectory-filter not working as expected, history of other folders is preserved
  2016-10-10 15:30 ` Jeff King
@ 2016-10-10 16:12   ` Seaders Oloinsigh
  2016-10-10 18:19     ` Jeff King
  0 siblings, 1 reply; 5+ messages in thread
From: Seaders Oloinsigh @ 2016-10-10 16:12 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Thanks for the reply, Jeff.

Clearing the backups of the branches, those starting with
"refs/original" has gotten me closer, but I also needed to do that
with "refs/tags" as well, or change my filter-branch command to,

  git filter-branch -f --prune-empty --tag-name-filter cat
--subdirectory-filter android -- --all

I still have remnants of that other history, though.

Due to the structure of this repo, it looks like there are some
branches that never had anything to do with the android/ subdirectory,
so they're not getting wiped out.  My branch is in a better state to
how I want it, but still, if I run your suggestion,

  git log --all --source -- unity/

I get output like

> commit 4853c... refs/heads/unity-sdk-3_1_3
> Author: serg... <serg...@...ve.com>
> Date:   Thu Sep 11 16:30:01 2014 +0100
>
>    Started 3.1.3

Which is basically logs of branches which contain only edits within
the unity/ subdirectory of the original root.  There are other
branches like that for the other platforms / subdirectories of the
original root, which if that is the case, I would consider
filter-branch with the subdirectory-filter isn't acting as expected,
and doesn't get rid of all the history you want it to get rid of.


On Mon, Oct 10, 2016 at 4:30 PM, Jeff King <peff@peff.net> wrote:
> On Mon, Oct 10, 2016 at 02:42:36PM +0100, Seaders Oloinsigh wrote:
>
>> We have a git repository that looks like
>>
>> sdk/
>> android/
>> ios/
>> unity/
>> windows/
>>
>> Which we'd like to split into 4 repositories, 1 for each platform.  To
>> start this process (for splitting android out), I ran,
>>
>> git filter-branch -f --prune-empty --subdirectory-filter android -- --all
>
> OK, so that should rewrite each ref to have only the contents of the
> "android" directory at the top-level.
>
> Note that filter-branch saves a copy of the old refs in refs/original.
>
>> Which rewrote a ton of history and commits, and looked like it worked, but
>> on closer inspection had left a ton of history behind.
>>
>> If I run
>>
>> git log --all -- unity/
>>
>> It returns a list of commits that happened in the unity/ subfolder of the
>> original root.
>
> Here you asked for "--all", which includes refs/original. So you are
> seeing the original, unwritten commits (and none of your new ones, of
> course, because they do not have a unity/ directory!).
>
> Try:
>
>   git log --all --source -- unity
>
> to see which ref each commit is coming from.
>
> Or try:
>
>   git log --branches --tags -- unity
>
> to confirm that your branches and tags do not include that path.
>
> Or just:
>
>   git for-each-ref --format='delete %(refname)' refs/original |
>   git update-ref --stdin
>
> to get rid of the backup refs entirely.
>
> -Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: git filter-branch --subdirectory-filter not working as expected, history of other folders is preserved
  2016-10-10 16:12   ` Seaders Oloinsigh
@ 2016-10-10 18:19     ` Jeff King
  2016-10-11 13:56       ` Seaders Oloinsigh
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff King @ 2016-10-10 18:19 UTC (permalink / raw)
  To: Seaders Oloinsigh; +Cc: git

On Mon, Oct 10, 2016 at 05:12:25PM +0100, Seaders Oloinsigh wrote:

> Due to the structure of this repo, it looks like there are some
> branches that never had anything to do with the android/ subdirectory,
> so they're not getting wiped out.  My branch is in a better state to
> how I want it, but still, if I run your suggestion,
> [...]

Hmm. Yeah, I think this is an artifact of the way that filter-branch
works with pathspec limiting. It keeps a mapping of commits that it has
rewritten (including ones that were rewritten only because their
ancestors were), and realizes that a branch ref needs updated when the
commit it points to was rewritten.

But if we don't touch _any_ commits in the history reachable from a
branch (because they didn't even show up in our pathspec-limited
rev-list), then it doesn't realize we touched the branch's history at
all.

I agree that the right outcome is for it to delete those branches
entirely. I suspect the fix would be pretty tricky, though.

In the meantime, I think you can work around it by either:

  1. Make a pass beforehand for refs that do not touch your desired
     paths at all, like:

       path=android ;# or whatever
       git for-each-ref --format='%(refname)' |
       while read ref; do
         if test "$(git rev-list --count "$ref" -- "$path")" = 0; then
	   echo "delete $ref"
	 fi
       done |
       git update-ref --stdin

     and then filter what's left:

       git filter-branch --subdirectory-filter $path -- --all

or

  2. Do the filter-branch, and because you know you specified --all and
     that your filters would touch all histories, any ref which _wasn't_
     touched can be deleted. That list is anything which didn't get a
     backup entry in refs/original. So something like:

       git for-each-ref --format='%(refname)' |
       perl -lne 'print $1 if m{^refs/original/(.*)}' >backups

       git for-each-ref --format='%(refname)' |
       grep -v ^refs/original >refs

       comm -23 refs backups |
       sed "s/^/delete /" |
       git update-ref --stdin

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: git filter-branch --subdirectory-filter not working as expected, history of other folders is preserved
  2016-10-10 18:19     ` Jeff King
@ 2016-10-11 13:56       ` Seaders Oloinsigh
  0 siblings, 0 replies; 5+ messages in thread
From: Seaders Oloinsigh @ 2016-10-11 13:56 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Mon, Oct 10, 2016 at 7:19 PM, Jeff King <peff@peff.net> wrote:
> On Mon, Oct 10, 2016 at 05:12:25PM +0100, Seaders Oloinsigh wrote:
>
>> Due to the structure of this repo, it looks like there are some
>> branches that never had anything to do with the android/ subdirectory,
>> so they're not getting wiped out.  My branch is in a better state to
>> how I want it, but still, if I run your suggestion,
>> [...]
>
> Hmm. Yeah, I think this is an artifact of the way that filter-branch
> works with pathspec limiting. It keeps a mapping of commits that it has
> rewritten (including ones that were rewritten only because their
> ancestors were), and realizes that a branch ref needs updated when the
> commit it points to was rewritten.
>
> But if we don't touch _any_ commits in the history reachable from a
> branch (because they didn't even show up in our pathspec-limited
> rev-list), then it doesn't realize we touched the branch's history at
> all.
>
> I agree that the right outcome is for it to delete those branches
> entirely. I suspect the fix would be pretty tricky, though.
>
> In the meantime, I think you can work around it by either:
>
>   1. Make a pass beforehand for refs that do not touch your desired
>      paths at all, like:
>
>        path=android ;# or whatever
>        git for-each-ref --format='%(refname)' |
>        while read ref; do
>          if test "$(git rev-list --count "$ref" -- "$path")" = 0; then
>            echo "delete $ref"
>          fi
>        done |
>        git update-ref --stdin
>
>      and then filter what's left:
>
>        git filter-branch --subdirectory-filter $path -- --all

This is the perfect solution for me.  Going through the delete
branches runthrough also quickened the filter-branch command, and I'm
left with a much more complete version of where I want to be.

I would still contend that the filter-branch either doesn't work as
expected, or the docs need updating to provide extra steps like you've
done, because when dealing with a large repo like we have, running
multiple filter-branch commands, trying different combinations is
quite a time sync, when you're left with the same incorrect solution
again and again.

>
> or
>
>   2. Do the filter-branch, and because you know you specified --all and
>      that your filters would touch all histories, any ref which _wasn't_
>      touched can be deleted. That list is anything which didn't get a
>      backup entry in refs/original. So something like:
>
>        git for-each-ref --format='%(refname)' |
>        perl -lne 'print $1 if m{^refs/original/(.*)}' >backups
>
>        git for-each-ref --format='%(refname)' |
>        grep -v ^refs/original >refs
>
>        comm -23 refs backups |
>        sed "s/^/delete /" |
>        git update-ref --stdin
>
> -Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-10-11 13:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-10 13:42 git filter-branch --subdirectory-filter not working as expected, history of other folders is preserved Seaders Oloinsigh
2016-10-10 15:30 ` Jeff King
2016-10-10 16:12   ` Seaders Oloinsigh
2016-10-10 18:19     ` Jeff King
2016-10-11 13:56       ` Seaders Oloinsigh

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).