git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* merge-recursive thinks symlink's child dirs are "real"
@ 2019-09-16 21:47 Jonathan Tan
  2019-09-16 22:15 ` SZEDER Gábor
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Jonathan Tan @ 2019-09-16 21:47 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan

This was raised by a coworker at $DAYJOB. I run the following script:

  $GIT init test && cd test
  ln -s . foo
  mkdir bar && touch bar/file
  $GIT add foo bar/file
  $GIT commit -m "foo symlink"
  
  $GIT checkout -b branch1
  $GIT commit --allow-empty -m "empty commit"
  
  $GIT checkout master
  $GIT rm foo
  mkdir foo
  (cd foo; ln -s ../bar bar)
  $GIT add foo/bar
  $GIT commit -m "replace foo symlink with real foo dir and foo/bar symlink"
  
  $GIT checkout branch1
  $GIT cherry-pick master

The cherry-pick must be manually resolved, when I would expect it to
happen without needing user intervention.

You can see that at the point of the cherry-pick, in the working
directory, ./foo is a symlink and ./foo/bar is a directory. I traced the
code that ran during the cherry-pick to process_entry() in
merge-recursive.c. When processing "foo/bar", control flow correctly
reaches "Case B: Added in one", but the dir_in_way() invocation returns
true, since lstat() indeed reveals that "foo/bar" is a directory. If I
hardcode dir_in_way() to return false, then the cherry-pick happens
without needing user intervention. I checked with "ls-tree -r" and the
resulting tree is as I expect (foo is a real dir, foo/bar is a symlink).

Is this use case something that Git should be able to handle, and if
yes, is the correct solution to teach dir_in_way() that dirs reachable
from symlinks are not really in the way (presumably the implementation
would climb directories until we reach the root or we reach a filesystem
boundary, similar to what we do when we search for the .git directory)?
Also, my proposed solution would work in the specific use case outlined
in the script above, but can anyone think offhand of a case that it
would make worse?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: merge-recursive thinks symlink's child dirs are "real"
  2019-09-16 21:47 merge-recursive thinks symlink's child dirs are "real" Jonathan Tan
@ 2019-09-16 22:15 ` SZEDER Gábor
  2019-09-17 15:54   ` Elijah Newren
  2019-09-17  0:09 ` Jonathan Nieder
  2019-09-17 15:48 ` Elijah Newren
  2 siblings, 1 reply; 13+ messages in thread
From: SZEDER Gábor @ 2019-09-16 22:15 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, Elijah Newren

On Mon, Sep 16, 2019 at 02:47:07PM -0700, Jonathan Tan wrote:
> This was raised by a coworker at $DAYJOB. I run the following script:
> 
>   $GIT init test && cd test
>   ln -s . foo
>   mkdir bar && touch bar/file
>   $GIT add foo bar/file
>   $GIT commit -m "foo symlink"
>   
>   $GIT checkout -b branch1
>   $GIT commit --allow-empty -m "empty commit"
>   
>   $GIT checkout master
>   $GIT rm foo
>   mkdir foo
>   (cd foo; ln -s ../bar bar)
>   $GIT add foo/bar
>   $GIT commit -m "replace foo symlink with real foo dir and foo/bar symlink"
>   
>   $GIT checkout branch1
>   $GIT cherry-pick master
> 
> The cherry-pick must be manually resolved, when I would expect it to
> happen without needing user intervention.
> 
> You can see that at the point of the cherry-pick, in the working
> directory, ./foo is a symlink and ./foo/bar is a directory. I traced the
> code that ran during the cherry-pick to process_entry() in
> merge-recursive.c. When processing "foo/bar", control flow correctly
> reaches "Case B: Added in one", but the dir_in_way() invocation returns
> true, since lstat() indeed reveals that "foo/bar" is a directory. If I
> hardcode dir_in_way() to return false, then the cherry-pick happens
> without needing user intervention. I checked with "ls-tree -r" and the
> resulting tree is as I expect (foo is a real dir, foo/bar is a symlink).
> 
> Is this use case something that Git should be able to handle,

FWIW, Git used to handle this case, but it broke with edd2faf52e
(merge-recursive: Consolidate process_entry() and process_df_entry(),
2011-08-11).

Cc-ing Elijah for insights...

> and if
> yes, is the correct solution to teach dir_in_way() that dirs reachable
> from symlinks are not really in the way (presumably the implementation
> would climb directories until we reach the root or we reach a filesystem
> boundary, similar to what we do when we search for the .git directory)?
> Also, my proposed solution would work in the specific use case outlined
> in the script above, but can anyone think offhand of a case that it
> would make worse?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: merge-recursive thinks symlink's child dirs are "real"
  2019-09-16 21:47 merge-recursive thinks symlink's child dirs are "real" Jonathan Tan
  2019-09-16 22:15 ` SZEDER Gábor
@ 2019-09-17  0:09 ` Jonathan Nieder
  2019-09-17 16:02   ` Elijah Newren
  2019-09-17 15:48 ` Elijah Newren
  2 siblings, 1 reply; 13+ messages in thread
From: Jonathan Nieder @ 2019-09-17  0:09 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, Elijah Newren, szeder.dev

Jonathan Tan wrote:

> This was raised by a coworker at $DAYJOB. I run the following script:
[reproduction recipe from (*) snipped]
> The cherry-pick must be manually resolved, when I would expect it to
> happen without needing user intervention.
>
> You can see that at the point of the cherry-pick, in the working
> directory, ./foo is a symlink and ./foo/bar is a directory. I traced the
> code that ran during the cherry-pick to process_entry() in
> merge-recursive.c. When processing "foo/bar", control flow correctly
> reaches "Case B: Added in one", but the dir_in_way() invocation returns
> true, since lstat() indeed reveals that "foo/bar" is a directory.

Gábor covered the "what happened", so let me say a little more about
the motivation.

The "foo" symlink is being replaced by a "foo" directory containing a
"bar" file.  We're pretty far along now: we want to write actual files
to disk.  Using the index we know where we were going from and to, but
not everything in the world is tracked in the index: there could be
build outputs under "foo/bar" blocking the operation from moving
forward.

So we check whether there's a directory there.  Once we are writing
things out, there won't be, but the symlink confuses us.  Nice find.

[...]
> Is this use case something that Git should be able to handle, and if
> yes, is the correct solution to teach dir_in_way() that dirs reachable
> from symlinks are not really in the way (presumably the implementation
> would climb directories until we reach the root or we reach a filesystem
> boundary, similar to what we do when we search for the .git directory)?

The crucial detail here is that "foo" is going to be removed before we
write "foo/bar".  We should be able to notice that and skip the
dir_in_way check.

Just my uninformed guess --- I haven't tried it. :)

Thoughts?

Thanks,
Jonathan

(*) https://public-inbox.org/git/20190916214707.190171-1-jonathantanmy@google.com/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: merge-recursive thinks symlink's child dirs are "real"
  2019-09-16 21:47 merge-recursive thinks symlink's child dirs are "real" Jonathan Tan
  2019-09-16 22:15 ` SZEDER Gábor
  2019-09-17  0:09 ` Jonathan Nieder
@ 2019-09-17 15:48 ` Elijah Newren
  2019-09-17 21:50   ` [RFC PATCH] merge-recursive: symlink's descendants not in way Jonathan Tan
  2 siblings, 1 reply; 13+ messages in thread
From: Elijah Newren @ 2019-09-17 15:48 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Git Mailing List

On Mon, Sep 16, 2019 at 4:09 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> This was raised by a coworker at $DAYJOB. I run the following script:
>
>   $GIT init test && cd test
>   ln -s . foo
>   mkdir bar && touch bar/file
>   $GIT add foo bar/file
>   $GIT commit -m "foo symlink"
>
>   $GIT checkout -b branch1
>   $GIT commit --allow-empty -m "empty commit"
>
>   $GIT checkout master
>   $GIT rm foo
>   mkdir foo
>   (cd foo; ln -s ../bar bar)
>   $GIT add foo/bar
>   $GIT commit -m "replace foo symlink with real foo dir and foo/bar symlink"
>
>   $GIT checkout branch1
>   $GIT cherry-pick master
>
> The cherry-pick must be manually resolved, when I would expect it to
> happen without needing user intervention.
>
> You can see that at the point of the cherry-pick, in the working
> directory, ./foo is a symlink and ./foo/bar is a directory. I traced the
> code that ran during the cherry-pick to process_entry() in
> merge-recursive.c. When processing "foo/bar", control flow correctly
> reaches "Case B: Added in one", but the dir_in_way() invocation returns
> true, since lstat() indeed reveals that "foo/bar" is a directory. If I
> hardcode dir_in_way() to return false, then the cherry-pick happens
> without needing user intervention. I checked with "ls-tree -r" and the
> resulting tree is as I expect (foo is a real dir, foo/bar is a symlink).

Nice catch, and good digging.

> Is this use case something that Git should be able to handle, and if
> yes, is the correct solution to teach dir_in_way() that dirs reachable
> from symlinks are not really in the way

Yes, and mostly yes.  I'd phrase it instead as "if any leading path
within the repository of the given filename is a symlink, then it is
not in the way".  More on that below.

> (presumably the implementation
> would climb directories until we reach the root or we reach a filesystem
> boundary, similar to what we do when we search for the .git directory)?

The .git directory handling is a bit different because you don't have
a repository toplevel yet.  Here you are trying to add a path
"foo/bar"; it could just as well have been named "a/b/c/d/e/f", but
regardless, in all cases it's not an absolute path but a path relative
to the toplevel of the repository.  We'll have to check all paths up
to the repository toplevel and if any of the leading directories are a
symlink (just "foo/" in the first case, five checks for "a/" and
"a/b/" and ... "a/b/c/d/e" in the second case), then have dir_in_way()
return false.

It might be interesting to test out the performance afterward; we may
need to cache the check-paths-for-symlink status to avoid re-stating
the same leading directories repeatedly.  (But then again, we only
need to do this for files that unpack_trees() couldn't resolve and
marked as conflicted, so maybe it's not that big of an issue.)

> Also, my proposed solution would work in the specific use case outlined
> in the script above, but can anyone think offhand of a case that it
> would make worse?

Well, merge-recursive is hairy and slightly brittle based on its
update-as-you-go-and-do-a-four-way-merge design, so there's a small
chance this could introduce problems.  I've tried to come up with some
and haven't been able to.  It is the right idea implementationally,
though.  Let me explain:

In merge_trees() we loop over the index entries (from entries->nr-1
down to 0) calling process_entry(), and process_entry() updates the
index and working copy as it goes.  The thing about this reverse order
is that in the case of directory/file conflicts, directories will be
handled before the corresponding file, giving us a chance to check if
the directory has been removed (i.e. the resolution of the merge
removes that directory and there were no untracked files left in it)
before we handle the file. Let's walk through how this works:

After unpack_trees, we end up with a result that looks like:

Index:
  * bar/file (not conflicted; same in all three commits)
  * foo (deleted on one side -- conflict (could be part of rename),
also, part of D/F conflict)
  * foo/bar (added on one side -- conflict (could be part of rename),
also, leading path part of D/F conflict)

Then we check for renames.  There are none.  Then we process all the
index entries in reverse lexicographic order, skipping the ones that
have already been processed (i.e. which weren't marked as conflicted
by unpack_trees).

foo/bar is first; it was added on one side, so we should keep it.  Is
there a foo/bar/ directory in the way in the index?  No.  Is there a
foo/bar/ directory in the way in the working copy?  The current code
got tricked, but as you point out the answer should be no because foo
is a symlink.  Note that the code is already smart enough to say "no"
if "foo" is present as a regular file.  In particular, we don't care
that there is a file in the way of a leading path at all.
dir_in_way() simply says there is nothing _critical_ in the way (which
is defined to only be subdirectories), and then make_room_for_path()
will nuke any leading files in the way of creating directories.

Once we've processed foo/bar, the next entry we process is "foo".  In
the case, it was unmodifed on one side and deleted on the other so we
can just delete it.  We delete it from the index, note it was already
nuked from the working copy, and we're happy.  (By way of comparison,
if you changed your empty commit to modify the location of the foo
symlink, then we would instead have a modify/delete conflict for foo.
While foo will have been removed from the working directory, we can
re-instate it at an alternate location.  For D/F conflicts, we make
the assumption that it's always better to move the file of the D/F
conflict aside to a different path than to move the whole directory
aside.)

So, in short, all your idea does is treat symlinks like regular files
in terms of removability, which is exactly what we want.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: merge-recursive thinks symlink's child dirs are "real"
  2019-09-16 22:15 ` SZEDER Gábor
@ 2019-09-17 15:54   ` Elijah Newren
  0 siblings, 0 replies; 13+ messages in thread
From: Elijah Newren @ 2019-09-17 15:54 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Jonathan Tan, Git Mailing List

On Mon, Sep 16, 2019 at 3:15 PM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> On Mon, Sep 16, 2019 at 02:47:07PM -0700, Jonathan Tan wrote:
> > This was raised by a coworker at $DAYJOB. I run the following script:
> >
> >   $GIT init test && cd test
> >   ln -s . foo
> >   mkdir bar && touch bar/file
> >   $GIT add foo bar/file
> >   $GIT commit -m "foo symlink"
> >
> >   $GIT checkout -b branch1
> >   $GIT commit --allow-empty -m "empty commit"
> >
> >   $GIT checkout master
> >   $GIT rm foo
> >   mkdir foo
> >   (cd foo; ln -s ../bar bar)
> >   $GIT add foo/bar
> >   $GIT commit -m "replace foo symlink with real foo dir and foo/bar symlink"
> >
> >   $GIT checkout branch1
> >   $GIT cherry-pick master
> >
> > The cherry-pick must be manually resolved, when I would expect it to
> > happen without needing user intervention.
> >
> > You can see that at the point of the cherry-pick, in the working
> > directory, ./foo is a symlink and ./foo/bar is a directory. I traced the
> > code that ran during the cherry-pick to process_entry() in
> > merge-recursive.c. When processing "foo/bar", control flow correctly
> > reaches "Case B: Added in one", but the dir_in_way() invocation returns
> > true, since lstat() indeed reveals that "foo/bar" is a directory. If I
> > hardcode dir_in_way() to return false, then the cherry-pick happens
> > without needing user intervention. I checked with "ls-tree -r" and the
> > resulting tree is as I expect (foo is a real dir, foo/bar is a symlink).
> >
> > Is this use case something that Git should be able to handle,
>
> FWIW, Git used to handle this case, but it broke with edd2faf52e
> (merge-recursive: Consolidate process_entry() and process_df_entry(),
> 2011-08-11).
>
> Cc-ing Elijah for insights...

Thanks for the heads up.

Yeah, git used to get _extremely_ confused with D/F conflicts, even
worse than here.  Interesting that it took 8 years to find this case.
Anyway, the problem here is that a symlink involved in a D/F conflict
could be a symlink to a directory instead of to a file, and then make
it look like we have some weird nested D/F conflicts in cases when we
don't (i.e. make it look like not only "foo" exists as both a file
(symlink) and directory, but that "foo/bar" does too).

Anyway, being more careful to treat symlinks in D/F conflicts more
like how we treat files in such conflicts, using the way Jonathan Tan
suggested, should fix this issue up.  I put more details in my email
to him.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: merge-recursive thinks symlink's child dirs are "real"
  2019-09-17  0:09 ` Jonathan Nieder
@ 2019-09-17 16:02   ` Elijah Newren
  0 siblings, 0 replies; 13+ messages in thread
From: Elijah Newren @ 2019-09-17 16:02 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Jonathan Tan, Git Mailing List, SZEDER Gábor

On Mon, Sep 16, 2019 at 5:09 PM Jonathan Nieder <jrnieder@gmail.com> wrote:
>
> Jonathan Tan wrote:
>
> > This was raised by a coworker at $DAYJOB. I run the following script:
> [reproduction recipe from (*) snipped]
> > The cherry-pick must be manually resolved, when I would expect it to
> > happen without needing user intervention.
> >
> > You can see that at the point of the cherry-pick, in the working
> > directory, ./foo is a symlink and ./foo/bar is a directory. I traced the
> > code that ran during the cherry-pick to process_entry() in
> > merge-recursive.c. When processing "foo/bar", control flow correctly
> > reaches "Case B: Added in one", but the dir_in_way() invocation returns
> > true, since lstat() indeed reveals that "foo/bar" is a directory.
>
> Gábor covered the "what happened", so let me say a little more about
> the motivation.
>
> The "foo" symlink is being replaced by a "foo" directory containing a
> "bar" file.  We're pretty far along now: we want to write actual files
> to disk.  Using the index we know where we were going from and to, but
> not everything in the world is tracked in the index: there could be
> build outputs under "foo/bar" blocking the operation from moving
> forward.
>
> So we check whether there's a directory there.  Once we are writing
> things out, there won't be, but the symlink confuses us.  Nice find.

Yep.

>
> [...]
> > Is this use case something that Git should be able to handle, and if
> > yes, is the correct solution to teach dir_in_way() that dirs reachable
> > from symlinks are not really in the way (presumably the implementation
> > would climb directories until we reach the root or we reach a filesystem
> > boundary, similar to what we do when we search for the .git directory)?
>
> The crucial detail here is that "foo" is going to be removed before we
> write "foo/bar".  We should be able to notice that and skip the
> dir_in_way check.

I know what you're getting at from a high level view, but this view is
incompatible with the machinery's internals.  In particular,
merge-recursive's design provides no way to "notice" that something is
"*going* to be removed" (every path is updated on the fly as it
processes it); the dir_in_way() check is there precisely to determine
if it's safe to write to the given path -- which basically means if
there is no directory in the way (a "foo/bar/" directory, in this
case).  So we definitely do not want to skip the dir_in_way() check,
we want to modify it to be aware that a leading path being a symlink
doesn't count as in the way (much as a the existence of a file on disk
corresponding to one of our leading paths doesn't count as in the way
for our purposes).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH] merge-recursive: symlink's descendants not in way
  2019-09-17 15:48 ` Elijah Newren
@ 2019-09-17 21:50   ` Jonathan Tan
  2019-09-17 22:23     ` Junio C Hamano
                       ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Jonathan Tan @ 2019-09-17 21:50 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, newren

When the working tree has:
 - foo (symlink)
 - foo/bar (directory)

and the user merges a commit that deletes the foo symlink and instead
contains:
 - foo (directory)
 - foo/bar (file)

the merge should happen without requiring user intervention. However,
this does not happen.

In merge_trees(), process_entry() will be invoked first for "foo/bar",
then "foo" (in reverse lexicographical order). process_entry() correctly
reaches "Case B: Added in one", but dir_in_way() states that "bar" is
already present as a directory, causing a directory/file conflict at the
wrong point.

Instead, teach dir_in_way() that directories under symlinks are not "in
the way", so that symlinks are treated as regular files instead of
directories containing other directories and files. Thus, the "else"
branch will be followed instead: "foo/bar" will be added to the working
tree, make_room_for_path() being indirectly called to unlink the "foo"
symlink (just like if "foo" were a regular file instead). When
process_entry() is subsequently invoked for "foo", process_entry() will
reach "Case A: Deleted in one", and will handle it as "Add/delete" or
"Modify/delete" appropriately (including reinstatement of the previously
unlinked symlink with a new unique filename if necessary, again, just
like if "foo" were a regular file instead).

Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
Thanks to Elijah for his help. Some of the commit message is based on
his explanation [1].

I'm finding this relatively complicated, so I'm sending this as RFC. My
main concern is that whether all callers of dir_in_way() are OK with its
behavior change, and if yes, how to explain it. I suspect that this is
correct because dir_in_way() should behave consistently for all its
callers, but I might be wrong.

The other thing is whether the commit message is clear enough. In
particular, whether it needs more detail (I didn't mention the index,
for example), or whether it should be more concise (for example, I
thought of just stating that we treat symlinks as regular files and this
would be backed up by the fact that I've updated the only part of
merge-recursive.c that calls lstat() and then S_ISDIR).

[1] https://public-inbox.org/git/CABPp-BHpXWF+1hKUTfn8s-y4MJZXz+jUVS_K10eKyD6PGwo=gg@mail.gmail.com/
---
 merge-recursive.c          | 40 +++++++++++++++++++++++++++++++++-----
 t/t3030-merge-recursive.sh | 27 +++++++++++++++++++++++++
 2 files changed, 62 insertions(+), 5 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 6b812d67e3..a2029a4e94 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -747,7 +747,7 @@ static int dir_in_way(struct index_state *istate, const char *path,
 {
 	int pos;
 	struct strbuf dirpath = STRBUF_INIT;
-	struct stat st;
+	int ret = 0;
 
 	strbuf_addstr(&dirpath, path);
 	strbuf_addch(&dirpath, '/');
@@ -758,13 +758,43 @@ static int dir_in_way(struct index_state *istate, const char *path,
 		pos = -1 - pos;
 	if (pos < istate->cache_nr &&
 	    !strncmp(dirpath.buf, istate->cache[pos]->name, dirpath.len)) {
-		strbuf_release(&dirpath);
-		return 1;
+		ret = 1;
+		goto cleanup;
 	}
 
+	if (check_working_copy) {
+		struct stat st;
+
+		strbuf_trim_trailing_dir_sep(&dirpath);
+		if (lstat(dirpath.buf, &st))
+			goto cleanup; /* doesn't exist, OK */
+		if (!S_ISDIR(st.st_mode))
+			goto cleanup; /* not directory, OK */
+		if (empty_ok && is_empty_dir(dirpath.buf))
+			goto cleanup;
+
+		/*
+		 * The given path is a directory, but this should not
+		 * be treated as "in the way" if one of this
+		 * directory's ancestors is a symlink. Check for this
+		 * case.
+		 */
+		while (1) {
+			char *slash = strrchr(dirpath.buf, '/');
+
+			if (!slash) {
+				ret = 1;
+				goto cleanup;
+			}
+			*slash = '\0';
+			if (!lstat(dirpath.buf, &st) && S_ISLNK(st.st_mode))
+				goto cleanup;
+		}
+	}
+
+cleanup:
 	strbuf_release(&dirpath);
-	return check_working_copy && !lstat(path, &st) && S_ISDIR(st.st_mode) &&
-		!(empty_ok && is_empty_dir(path));
+	return ret;
 }
 
 /*
diff --git a/t/t3030-merge-recursive.sh b/t/t3030-merge-recursive.sh
index ff641b348a..dfd617a845 100755
--- a/t/t3030-merge-recursive.sh
+++ b/t/t3030-merge-recursive.sh
@@ -452,6 +452,33 @@ test_expect_success 'merge-recursive d/f conflict result' '
 
 '
 
+test_expect_success 'dir in working tree with symlink ancestor does not produce d/f conflict' '
+	git init sym &&
+	(
+		cd sym &&
+		ln -s . foo &&
+		mkdir bar &&
+		touch bar/file &&
+		git add foo bar/file &&
+		git commit -m "foo symlink" &&
+
+		git checkout -b branch1 &&
+		git commit --allow-empty -m "empty commit" &&
+
+		git checkout master &&
+		git rm foo &&
+		mkdir foo &&
+		touch foo/bar &&
+		git add foo/bar &&
+		git commit -m "replace foo symlink with real foo dir and foo/bar file" &&
+
+		git checkout branch1 &&
+
+		# Exercise to make sure that this works without errors
+		git cherry-pick master
+	)
+'
+
 test_expect_success 'reset and 3-way merge' '
 
 	git reset --hard "$c2" &&
-- 
2.23.0.237.gc6a4ce50a0-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] merge-recursive: symlink's descendants not in way
  2019-09-17 21:50   ` [RFC PATCH] merge-recursive: symlink's descendants not in way Jonathan Tan
@ 2019-09-17 22:23     ` Junio C Hamano
  2019-09-17 22:32       ` Jonathan Tan
  2019-09-17 23:02     ` SZEDER Gábor
  2019-09-18  0:35     ` Elijah Newren
  2 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2019-09-17 22:23 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, newren

Jonathan Tan <jonathantanmy@google.com> writes:

> When the working tree has:
>  - foo (symlink)
>  - foo/bar (directory)

Whoa, wait.  I assume, since this is about merge, the assumption is
that the working tree is clean with respect to the index, so 'foo'
is a symbolic link that is in the index.  Now, if foo is a symlink,
how can foo/bar (whether it is a directory or something else) exist,
which requires foo to be a directory in the first place?

> and the user merges a commit that deletes the foo symlink and instead
> contains:
>  - foo (directory)
>  - foo/bar (file)

This side is possible.  If foo is a directory, then there can be
foo/bar.  But I do not get the initial setup you start with.

In any case, if the working tree has 'foo' as a symlink, Git should
not look at or get affected by what 'foo' points at.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] merge-recursive: symlink's descendants not in way
  2019-09-17 22:23     ` Junio C Hamano
@ 2019-09-17 22:32       ` Jonathan Tan
  2019-09-17 22:37         ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Jonathan Tan @ 2019-09-17 22:32 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, newren

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > When the working tree has:
> >  - foo (symlink)
> >  - foo/bar (directory)
> 
> Whoa, wait.  I assume, since this is about merge, the assumption is
> that the working tree is clean with respect to the index, so 'foo'
> is a symbolic link that is in the index.  Now, if foo is a symlink,
> how can foo/bar (whether it is a directory or something else) exist,
> which requires foo to be a directory in the first place?

I was trying to be concise but I guess I omitted too much detail. This is
what's happening:

Working tree:
 - foo (symlink pointing to .)
 - bar (directory)
 - bar/file (file)

And the new commit:
 - foo (directory)
 - foo/bar (file)
 - bar (directory)
 - bar/file (file)

I'll update the commit message when I send out another version.

> In any case, if the working tree has 'foo' as a symlink, Git should
> not look at or get affected by what 'foo' points at.

Git should not, but it does - there is a call in process_entry() that calls
lstat() on "foo/bar", which indeed reports that "foo/bar" is a directory. This
patch adds a check that none of its ancestors are symlinks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] merge-recursive: symlink's descendants not in way
  2019-09-17 22:32       ` Jonathan Tan
@ 2019-09-17 22:37         ` Junio C Hamano
  2019-09-17 22:49           ` Jonathan Tan
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2019-09-17 22:37 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, newren

Jonathan Tan <jonathantanmy@google.com> writes:

>> In any case, if the working tree has 'foo' as a symlink, Git should
>> not look at or get affected by what 'foo' points at.
>
> Git should not, but it does - there is a call in process_entry() that calls
> lstat() on "foo/bar", which indeed reports that "foo/bar" is a directory. This
> patch adds a check that none of its ancestors are symlinks.

Yeah, I recall having to add has_symlink_leading_path() long time
ago in different codepaths (including "apply").  It is not surprising
to see a similar glitch remaining in merge-recursive (it's a tricky
issue, and it's a tricky code).


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] merge-recursive: symlink's descendants not in way
  2019-09-17 22:37         ` Junio C Hamano
@ 2019-09-17 22:49           ` Jonathan Tan
  0 siblings, 0 replies; 13+ messages in thread
From: Jonathan Tan @ 2019-09-17 22:49 UTC (permalink / raw)
  To: gitster; +Cc: jonathantanmy, git, newren

> Yeah, I recall having to add has_symlink_leading_path() long time
> ago in different codepaths (including "apply").  It is not surprising
> to see a similar glitch remaining in merge-recursive (it's a tricky
> issue, and it's a tricky code).

Thanks for the pointer to has_symlink_leading_path() - I tried it and it
seems to work too. I'll wait for more replies and send an updated
version tomorrow.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] merge-recursive: symlink's descendants not in way
  2019-09-17 21:50   ` [RFC PATCH] merge-recursive: symlink's descendants not in way Jonathan Tan
  2019-09-17 22:23     ` Junio C Hamano
@ 2019-09-17 23:02     ` SZEDER Gábor
  2019-09-18  0:35     ` Elijah Newren
  2 siblings, 0 replies; 13+ messages in thread
From: SZEDER Gábor @ 2019-09-17 23:02 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, newren

On Tue, Sep 17, 2019 at 02:50:40PM -0700, Jonathan Tan wrote:
> diff --git a/t/t3030-merge-recursive.sh b/t/t3030-merge-recursive.sh
> index ff641b348a..dfd617a845 100755
> --- a/t/t3030-merge-recursive.sh
> +++ b/t/t3030-merge-recursive.sh
> @@ -452,6 +452,33 @@ test_expect_success 'merge-recursive d/f conflict result' '
>  
>  '
>  
> +test_expect_success 'dir in working tree with symlink ancestor does not produce d/f conflict' '

This test needs the SYMLINKS prereq.

> +	git init sym &&
> +	(
> +		cd sym &&
> +		ln -s . foo &&
> +		mkdir bar &&
> +		touch bar/file &&

Nit: >bar/file (though because of the symlink this test won't be run
on the most fork()-challenged platform).

> +		git add foo bar/file &&
> +		git commit -m "foo symlink" &&
> +
> +		git checkout -b branch1 &&
> +		git commit --allow-empty -m "empty commit" &&
> +
> +		git checkout master &&
> +		git rm foo &&
> +		mkdir foo &&
> +		touch foo/bar &&
> +		git add foo/bar &&
> +		git commit -m "replace foo symlink with real foo dir and foo/bar file" &&
> +
> +		git checkout branch1 &&
> +
> +		# Exercise to make sure that this works without errors
> +		git cherry-pick master

Working without errors is great, but I think this test should also
check that the results are as expected, e.g. 'foo' is now a directory,
etc.

> +	)
> +'
> +
>  test_expect_success 'reset and 3-way merge' '
>  
>  	git reset --hard "$c2" &&
> -- 
> 2.23.0.237.gc6a4ce50a0-goog
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] merge-recursive: symlink's descendants not in way
  2019-09-17 21:50   ` [RFC PATCH] merge-recursive: symlink's descendants not in way Jonathan Tan
  2019-09-17 22:23     ` Junio C Hamano
  2019-09-17 23:02     ` SZEDER Gábor
@ 2019-09-18  0:35     ` Elijah Newren
  2 siblings, 0 replies; 13+ messages in thread
From: Elijah Newren @ 2019-09-18  0:35 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Git Mailing List

Hi Jonathan,

On Tue, Sep 17, 2019 at 2:50 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> When the working tree has:
>  - foo (symlink)
>  - foo/bar (directory)
>
> and the user merges a commit that deletes the foo symlink and instead
> contains:
>  - foo (directory)
>  - foo/bar (file)
>
> the merge should happen without requiring user intervention. However,
> this does not happen.
>
> In merge_trees(), process_entry() will be invoked first for "foo/bar",
> then "foo" (in reverse lexicographical order). process_entry() correctly
> reaches "Case B: Added in one", but dir_in_way() states that "bar" is
> already present as a directory, causing a directory/file conflict at the
> wrong point.

I don't think the notes about hitting the "Case B: Added in one"
codepath help; that's only one codepath that calls dir_in_way(), and
I'm pretty sure with a little work we could trigger the same bug with
the other ones.

> Instead, teach dir_in_way() that directories under symlinks are not "in
> the way", so that symlinks are treated as regular files instead of
> directories containing other directories and files. Thus, the "else"
> branch will be followed instead: "foo/bar" will be added to the working
> tree, make_room_for_path() being indirectly called to unlink the "foo"
> symlink (just like if "foo" were a regular file instead). When
> process_entry() is subsequently invoked for "foo", process_entry() will
> reach "Case A: Deleted in one", and will handle it as "Add/delete" or
> "Modify/delete" appropriately (including reinstatement of the previously
> unlinked symlink with a new unique filename if necessary, again, just
> like if "foo" were a regular file instead).

I was trying to think of a way to summarize it a bit, and then Junio
later in the thread came in and provided a different and compatible
way to view the issue that summarizes it quite nicely:

"In any case, if the working tree has 'foo' as a symlink, Git should
not look at or get affected by what 'foo' points at."

We can probably make the commit message pretty concise using that
wording or something similar.  Maybe adding something like "In
particular, the presence of a symlink should be treated much the same
as the presence of a file; since dir_in_way() is only checking to see
if there is a directory in the way, we don't want symlinks in leading
paths to sometimes cause dir_in_way() to return true."

>
> Helped-by: Elijah Newren <newren@gmail.com>
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
> Thanks to Elijah for his help. Some of the commit message is based on
> his explanation [1].
>
> I'm finding this relatively complicated, so I'm sending this as RFC. My
> main concern is that whether all callers of dir_in_way() are OK with its
> behavior change, and if yes, how to explain it. I suspect that this is
> correct because dir_in_way() should behave consistently for all its
> callers, but I might be wrong.

Yes, we want all callers of dir_in_way() to get this change; if they
don't, I'm pretty sure with some work we could devise special
scenarios that exhibit the same bug.



Thanks for working on this,
Elijah

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-09-18  0:35 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-16 21:47 merge-recursive thinks symlink's child dirs are "real" Jonathan Tan
2019-09-16 22:15 ` SZEDER Gábor
2019-09-17 15:54   ` Elijah Newren
2019-09-17  0:09 ` Jonathan Nieder
2019-09-17 16:02   ` Elijah Newren
2019-09-17 15:48 ` Elijah Newren
2019-09-17 21:50   ` [RFC PATCH] merge-recursive: symlink's descendants not in way Jonathan Tan
2019-09-17 22:23     ` Junio C Hamano
2019-09-17 22:32       ` Jonathan Tan
2019-09-17 22:37         ` Junio C Hamano
2019-09-17 22:49           ` Jonathan Tan
2019-09-17 23:02     ` SZEDER Gábor
2019-09-18  0:35     ` Elijah Newren

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).