git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [BUG] Symbolic links break "git fast-export"?
@ 2019-06-24 11:03 Lars Schneider
  2019-06-24 12:33 ` Elijah Newren
  2019-06-24 12:55 ` Elijah Newren
  0 siblings, 2 replies; 9+ messages in thread
From: Lars Schneider @ 2019-06-24 11:03 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Jeff King

Hi folks,

Is my understanding correct, that `git fast-export | git fast-import` 
should not modify the repository? If yes, then we might have a bug in 
`git fast-export` if symbolic directory links are removed and converted 
to a real directory.

Consider this test case:

    # Create test repo
    git init .
    mkdir foo
    echo "foo" >foo/baz
    git add .
    git commit -m "add foo dir"
    ln -s foo bar
    git add .
    git commit -m "add bar dir as link"
    rm bar
    mkdir bar
    echo "bar" >bar/baz
    git add .
    git commit -m "remove link and make bar dir real"

    printf "BEFORE: "
    git rev-parse HEAD

    # Fast export, import ... that should not change anything!
    git fast-export --no-data --all --signed-tags=warn-strip \
        --tag-of-filtered-object=rewrite | git fast-import --force --quiet

    printf "AFTER: "

I would assume that the BEFORE/AFTER hashes match. Unfortunately, with 
Git 2.22.0 they do no. The problem is this export output I think:

    remove link and make bar dir real
    from :2
    M 100644 5716ca5987cbf97d6bb54920bea6adde242d87e6 bar/baz
    D bar

The new file in the `bar` directory is added to the repo first and
afterwards the path `bar` is deleted. I think that deletes the entire 
directory `bar`?

If you confirm that this is a bug, then I will try to provide a fix.

Thanks,
Lars

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] Symbolic links break "git fast-export"?
  2019-06-24 11:03 [BUG] Symbolic links break "git fast-export"? Lars Schneider
@ 2019-06-24 12:33 ` Elijah Newren
  2019-06-24 18:58   ` Jeff King
  2019-06-30 14:01   ` Lars Schneider
  2019-06-24 12:55 ` Elijah Newren
  1 sibling, 2 replies; 9+ messages in thread
From: Elijah Newren @ 2019-06-24 12:33 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Git Mailing List, Jeff King

On Mon, Jun 24, 2019 at 5:05 AM Lars Schneider <larsxschneider@gmail.com> wrote:
>
> Hi folks,
>
> Is my understanding correct, that `git fast-export | git fast-import`
> should not modify the repository? If yes, then we might have a bug in
> `git fast-export` if symbolic directory links are removed and converted
> to a real directory.
>
> Consider this test case:
>
>     # Create test repo
>     git init .
>     mkdir foo
>     echo "foo" >foo/baz
>     git add .
>     git commit -m "add foo dir"
>     ln -s foo bar
>     git add .
>     git commit -m "add bar dir as link"
>     rm bar
>     mkdir bar
>     echo "bar" >bar/baz
>     git add .
>     git commit -m "remove link and make bar dir real"
>
>     printf "BEFORE: "
>     git rev-parse HEAD
>
>     # Fast export, import ... that should not change anything!
>     git fast-export --no-data --all --signed-tags=warn-strip \
>         --tag-of-filtered-object=rewrite | git fast-import --force --quiet
>
>     printf "AFTER: "
>
> I would assume that the BEFORE/AFTER hashes match. Unfortunately, with
> Git 2.22.0 they do no. The problem is this export output I think:
>
>     remove link and make bar dir real
>     from :2
>     M 100644 5716ca5987cbf97d6bb54920bea6adde242d87e6 bar/baz
>     D bar
>
> The new file in the `bar` directory is added to the repo first and
> afterwards the path `bar` is deleted. I think that deletes the entire
> directory `bar`?
>
> If you confirm that this is a bug, then I will try to provide a fix.

My first reaction was, "we regressed on this again?", but it looks
like my original fix for directory/file changes only handled one
direction.  Thus, my commit 060df6242281 ("fast-export: Fix output
order of D/F changes", 2010-07-09) probably *caused* this bug.  We
should probably just sort not based on filename, but on changetype --
send all the deletes to fast-import before we send the modifies.

We should probably also make a corresponding improvement to
fast-import; it also makes some attempts to be smart about handling
order of modifies and deletes, but misses this case.  See commit
253fb5f8897d ("fast-import: Improve robustness when D->F changes
provided in wrong order", 2010-07-09).  It'd be nice if fast-import
could go through the list of changes, apply the deletes first, then
the modifies -- although I'm not sure where renames go in the order
off the top of my head.

Thanks for flagging this and working on it.

Elijah

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] Symbolic links break "git fast-export"?
  2019-06-24 11:03 [BUG] Symbolic links break "git fast-export"? Lars Schneider
  2019-06-24 12:33 ` Elijah Newren
@ 2019-06-24 12:55 ` Elijah Newren
  1 sibling, 0 replies; 9+ messages in thread
From: Elijah Newren @ 2019-06-24 12:55 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Git Mailing List, Jeff King

On Mon, Jun 24, 2019 at 5:05 AM Lars Schneider <larsxschneider@gmail.com> wrote:
>
> Hi folks,
>
> Is my understanding correct, that `git fast-export | git fast-import`
> should not modify the repository?

I forgot to respond to this part.  The answer is mostly yes, but actually no:

* fast-export strips any extended commit headers, if any (it only
looks for expected headers and handles those)
* fast-export omits any tags of trees
* fast-export sometimes rewrites tags of tags of commits to tags of
commits (depends on options passed; it either mistakenly does this, or
dies with an error)
* fast-export defaults to re-encoding commits to be in UTF-8, with no
option in git <= 2.22.0 to leave the encoding alone
* annotated tags outside of the refs/tags/ namespace will have their
location mangled
* references that include commit cycles in their history (which can be
created with git-replace(1)) will not be flagged to the user as an
error but will be silently deleted by fast-export as though the branch
or tag contained no interesting files
* fast-export or fast-import may die on some repositories (e.g. if
there are tags of blobs, if there are signed tags and no
--signed-tags= option passed, if a commit has improperly formatted
info such as a timezone with more than four digits (which exist in the
wild, e.g. in the rails repo, and google will find you more), or in
latest git master if the commit has a recorded encoding and no
--reencode option is passed)
* ...and the bug you just found.

It's a valid question whether these are bugs or not; I'd say that some
definitely are, but not all.  I had to document all of these up at
https://github.com/newren/git-filter-repo#inherited-limitations, among
one or two others based on the options I default to passing to
fast-export.

Hope that helps,
Elijah

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] Symbolic links break "git fast-export"?
  2019-06-24 12:33 ` Elijah Newren
@ 2019-06-24 18:58   ` Jeff King
  2019-06-30 13:05     ` Lars Schneider
  2019-06-30 14:01   ` Lars Schneider
  1 sibling, 1 reply; 9+ messages in thread
From: Jeff King @ 2019-06-24 18:58 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Lars Schneider, Git Mailing List

On Mon, Jun 24, 2019 at 06:33:38AM -0600, Elijah Newren wrote:

> We should probably also make a corresponding improvement to
> fast-import; it also makes some attempts to be smart about handling
> order of modifies and deletes, but misses this case.  See commit
> 253fb5f8897d ("fast-import: Improve robustness when D->F changes
> provided in wrong order", 2010-07-09).  It'd be nice if fast-import
> could go through the list of changes, apply the deletes first, then
> the modifies -- although I'm not sure where renames go in the order
> off the top of my head.

You'd have to split the renames into separate delete/adds, since they
can have a circular dependency. E.g. renaming "foo" to "bar" and "bar"
to "foo", you must remove "foo" and "bar" both, and then add them back
in.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] Symbolic links break "git fast-export"?
  2019-06-24 18:58   ` Jeff King
@ 2019-06-30 13:05     ` Lars Schneider
  2019-06-30 18:28       ` Johannes Sixt
  0 siblings, 1 reply; 9+ messages in thread
From: Lars Schneider @ 2019-06-30 13:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren, Git Mailing List



> On Jun 24, 2019, at 11:58 AM, Jeff King <peff@peff.net> wrote:
> 
> On Mon, Jun 24, 2019 at 06:33:38AM -0600, Elijah Newren wrote:
> 
>> We should probably also make a corresponding improvement to
>> fast-import; it also makes some attempts to be smart about handling
>> order of modifies and deletes, but misses this case.  See commit
>> 253fb5f8897d ("fast-import: Improve robustness when D->F changes
>> provided in wrong order", 2010-07-09).  It'd be nice if fast-import
>> could go through the list of changes, apply the deletes first, then
>> the modifies -- although I'm not sure where renames go in the order
>> off the top of my head.
> 
> You'd have to split the renames into separate delete/adds, since they
> can have a circular dependency. E.g. renaming "foo" to "bar" and "bar"
> to "foo", you must remove "foo" and "bar" both, and then add them back
> in.

@peff: Can you give me a hint how one would perform this circular
dependency in a single commit? I try to write a test case for this.

Thank you,
Lars


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] Symbolic links break "git fast-export"?
  2019-06-24 12:33 ` Elijah Newren
  2019-06-24 18:58   ` Jeff King
@ 2019-06-30 14:01   ` Lars Schneider
  2019-07-01 18:02     ` Elijah Newren
  1 sibling, 1 reply; 9+ messages in thread
From: Lars Schneider @ 2019-06-30 14:01 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Git Mailing List, Jeff King



> On Jun 24, 2019, at 5:33 AM, Elijah Newren <newren@gmail.com> wrote:
> 
> On Mon, Jun 24, 2019 at 5:05 AM Lars Schneider <larsxschneider@gmail.com> wrote:
>> 
>> Hi folks,
>> 
>> Is my understanding correct, that `git fast-export | git fast-import`
>> should not modify the repository? If yes, then we might have a bug in
>> `git fast-export` if symbolic directory links are removed and converted
>> to a real directory.
>> 
>> ...
> 
> My first reaction was, "we regressed on this again?", but it looks
> like my original fix for directory/file changes only handled one
> direction.  Thus, my commit 060df6242281 ("fast-export: Fix output
> order of D/F changes", 2010-07-09) probably *caused* this bug.  We
> should probably just sort not based on filename, but on changetype --
> send all the deletes to fast-import before we send the modifies.

060df6242281 is interesting! If I revert the changes in builtin/fast-export.c,
then the "t9350:directory becomes symlink" test still passes nowadays. 

Plus, my my new test case passes too:

	test_expect_success 'when transforming a symlink to a directory' '
		test_create_repo src &&

	   	mkdir src/foo &&
		echo a_line >src/foo/file.txt &&
		git -C src add foo/file.txt &&
		git -C src commit -m 1st_commit &&

		ln -s src/foo src/bar &&
		git -C src add bar &&
		git -C src commit -m 2nd_commit &&

		rm src/bar &&
	   	mkdir src/bar &&
	   	echo b_line >src/bar/b_file.txt &&
		git -C src add . &&
		git -C src commit -m 3rd_commit &&

		test_create_repo dst &&
		git -C src fast-export --all &&
		git -C src fast-export --all | git -C dst fast-import &&
		git -C src show >expected &&
		git -C dst show >actual &&
		test_cmp expected actual
	'

Do you think it would make sense to revert the qsort change
in fast-export? I haven't bisected yet which other change made
the qsort change obsolete.

Thanks,
Lars

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] Symbolic links break "git fast-export"?
  2019-06-30 13:05     ` Lars Schneider
@ 2019-06-30 18:28       ` Johannes Sixt
  2019-07-01 17:45         ` Elijah Newren
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Sixt @ 2019-06-30 18:28 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Jeff King, Elijah Newren, Git Mailing List

Am 30.06.19 um 15:05 schrieb Lars Schneider:
>> On Jun 24, 2019, at 11:58 AM, Jeff King <peff@peff.net> wrote:
>> You'd have to split the renames into separate delete/adds, since they
>> can have a circular dependency. E.g. renaming "foo" to "bar" and "bar"
>> to "foo", you must remove "foo" and "bar" both, and then add them back
>> in.
> 
> @peff: Can you give me a hint how one would perform this circular
> dependency in a single commit? I try to write a test case for this.

git mv Makefile foo
git mv COPYING Makefile
git mv foo COPYING
git diff -B HEAD

-- Hannes

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] Symbolic links break "git fast-export"?
  2019-06-30 18:28       ` Johannes Sixt
@ 2019-07-01 17:45         ` Elijah Newren
  0 siblings, 0 replies; 9+ messages in thread
From: Elijah Newren @ 2019-07-01 17:45 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Lars Schneider, Jeff King, Git Mailing List

On Sun, Jun 30, 2019 at 12:28 PM Johannes Sixt <j6t@kdbg.org> wrote:
>
> Am 30.06.19 um 15:05 schrieb Lars Schneider:
> >> On Jun 24, 2019, at 11:58 AM, Jeff King <peff@peff.net> wrote:
> >> You'd have to split the renames into separate delete/adds, since they
> >> can have a circular dependency. E.g. renaming "foo" to "bar" and "bar"
> >> to "foo", you must remove "foo" and "bar" both, and then add them back
> >> in.
> >
> > @peff: Can you give me a hint how one would perform this circular
> > dependency in a single commit? I try to write a test case for this.
>
> git mv Makefile foo
> git mv COPYING Makefile
> git mv foo COPYING
> git diff -B HEAD
>
> -- Hannes

Interestingly, fast-export has special code to handle cases like this;
possibly due to the understanding of how all
filemodify/filedelete/filerename commands take effect immediately (see
below for more on that).  If I make the above changes in git.git and
commit them, then:

$ git diff --name-status -B HEAD~1 HEAD
R100    Makefile        COPYING
R100    COPYING Makefile

BUT:

$ git fast-export -B -M --no-data HEAD~2..HEAD | tail -n 10
commit refs/heads/master
mark :7
author Elijah Newren <newren@gmail.com> 1562000065 -0600
committer Elijah Newren <newren@gmail.com> 1562000065 -0600
data 8
Testing
from :6
R COPYING Makefile
M 100644 8a7e2353520ddd7e0c8074d2b32d0441d97c1597 COPYING

I.e. fast-export breaks the rename and translates it into a modify
instead.  This comes from here:

        case DIFF_STATUS_COPIED:
        case DIFF_STATUS_RENAMED:
                /*
                 * If a change in the file corresponding to ospec->path
                 * has been observed, we cannot trust its contents
                 * because the diff is calculated based on the prior
                 * contents, not the current contents.  So, declare a
                 * copy or rename only if there was no change observed.
                 */
                if (!string_list_has_string(changed, ospec->path)) {
                        <snipped code for handling rename/copy>
                }
                /* fallthrough */
        case DIFF_STATUS_TYPE_CHANGED:
        case DIFF_STATUS_MODIFIED:
        case DIFF_STATUS_ADDED:

There is a question of whether fast-import should try to handle
different exporters that aren't as careful; e.g. if one gets a stream
like:

commit refs/heads/master
mark :4
author Me My <self@and.eye> 110000000 -0700
committer Me My <self@and.eye> 110000000 -0700
data 11
correction
R letters numbers
R numbers letters

Should git-fast-import attempt to divine the user's intent to swap
these two files (though it's not clear if that is the intent; see
below), or would that violate the documented behavior:

           A filerename command takes effect immediately. Once the source
           location has been renamed to the destination any future commands
           applied to the source location will create new files there and not
           impact the destination of the rename.

(I'm pretty sure Shawn would have said the latter; see e.g.
https://public-inbox.org/git/20100706193455.GA19476@spearce.org/ and
the follow-ups.)  I think the view of "immediately taking effect"
implies that this is a rename of 'letters' to 'numbers' which deletes
'numbers', and that the subsequent entry just renames the file back,
making it an expensive almost no-op; almost because it has the
side-effect of deleting the original 'numbers' file.  This is
certainly what fast-import does right now.

Elijah

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG] Symbolic links break "git fast-export"?
  2019-06-30 14:01   ` Lars Schneider
@ 2019-07-01 18:02     ` Elijah Newren
  0 siblings, 0 replies; 9+ messages in thread
From: Elijah Newren @ 2019-07-01 18:02 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Git Mailing List, Jeff King

On Sun, Jun 30, 2019 at 8:01 AM Lars Schneider <larsxschneider@gmail.com> wrote:
> > On Jun 24, 2019, at 5:33 AM, Elijah Newren <newren@gmail.com> wrote:
> >
> > On Mon, Jun 24, 2019 at 5:05 AM Lars Schneider <larsxschneider@gmail.com> wrote:
> >>
> >> Hi folks,
> >>
> >> Is my understanding correct, that `git fast-export | git fast-import`
> >> should not modify the repository? If yes, then we might have a bug in
> >> `git fast-export` if symbolic directory links are removed and converted
> >> to a real directory.
> >>
> >> ...
> >
> > My first reaction was, "we regressed on this again?", but it looks
> > like my original fix for directory/file changes only handled one
> > direction.  Thus, my commit 060df6242281 ("fast-export: Fix output
> > order of D/F changes", 2010-07-09) probably *caused* this bug.  We
> > should probably just sort not based on filename, but on changetype --
> > send all the deletes to fast-import before we send the modifies.
>
> 060df6242281 is interesting! If I revert the changes in builtin/fast-export.c,
> then the "t9350:directory becomes symlink" test still passes nowadays.
>
> Plus, my my new test case passes too:
>
>         test_expect_success 'when transforming a symlink to a directory' '
>                 test_create_repo src &&
>
>                 mkdir src/foo &&
>                 echo a_line >src/foo/file.txt &&
>                 git -C src add foo/file.txt &&
>                 git -C src commit -m 1st_commit &&
>
>                 ln -s src/foo src/bar &&
>                 git -C src add bar &&
>                 git -C src commit -m 2nd_commit &&
>
>                 rm src/bar &&
>                 mkdir src/bar &&
>                 echo b_line >src/bar/b_file.txt &&
>                 git -C src add . &&
>                 git -C src commit -m 3rd_commit &&
>
>                 test_create_repo dst &&
>                 git -C src fast-export --all &&
>                 git -C src fast-export --all | git -C dst fast-import &&
>                 git -C src show >expected &&
>                 git -C dst show >actual &&
>                 test_cmp expected actual
>         '
>
> Do you think it would make sense to revert the qsort change
> in fast-export? I haven't bisected yet which other change made
> the qsort change obsolete.

No need to bisect; it was the other commit I pointed out in my last
email, commit 253fb5f8897d ("fast-import: Improve robustness when D->F
changes
provided in wrong order", 2010-07-09).  The original bug was
fixed/worked around in two places, because Shawn specifically said the
fast-import side being more robust was okay but that fast-export was
buggy and needed fixing.  See
https://public-inbox.org/git/20100706193455.GA19476@spearce.org/ and
the thread around there.  (It'd be really nice to be able to cc Shawn
on this...sigh.)  Reverting the fast-export change would be okay if we
replaced it with a better change (something like sorting deletes
before modifies, though maybe with extra work around renames as
discussion elsewhere in this thread is touching on), but if we
straight up revert that change it would leave fast-export in violation
of the stream format.

Hope that helps,
Elijah

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-07-01 18:02 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-24 11:03 [BUG] Symbolic links break "git fast-export"? Lars Schneider
2019-06-24 12:33 ` Elijah Newren
2019-06-24 18:58   ` Jeff King
2019-06-30 13:05     ` Lars Schneider
2019-06-30 18:28       ` Johannes Sixt
2019-07-01 17:45         ` Elijah Newren
2019-06-30 14:01   ` Lars Schneider
2019-07-01 18:02     ` Elijah Newren
2019-06-24 12:55 ` Elijah Newren

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).