git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] difftool: avoid symlinks when reusing worktree files
@ 2015-10-27 21:24 David Aguilar
  2015-10-27 22:24 ` Junio C Hamano
  2015-10-27 22:41 ` Junio C Hamano
  0 siblings, 2 replies; 8+ messages in thread
From: David Aguilar @ 2015-10-27 21:24 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Ismail Badawi, John Keeping, Tim Henigan, Git Mailing List

difftool's dir-diff should never reuse a symlink, regardless of
what it points to.  Tighten use_wt_file() so that it rejects all
symlinks.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: David Aguilar <davvid@gmail.com>
---
 git-difftool.perl | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/git-difftool.perl b/git-difftool.perl
index 1abe647..873db57 100755
--- a/git-difftool.perl
+++ b/git-difftool.perl
@@ -70,13 +70,13 @@ sub use_wt_file
 	my ($repo, $workdir, $file, $sha1) = @_;
 	my $null_sha1 = '0' x 40;
 
-	if (! -f "$workdir/$file") {
-		return (0, $null_sha1);
+	my $workfile = "$workdir/$file";
+	if (-f $workfile && ! -l $workfile) {
+		my $wt_sha1 = $repo->command_oneline('hash-object', $workfile);
+		my $use = ($sha1 eq $null_sha1) || ($sha1 eq $wt_sha1);
+		return ($use, $wt_sha1);
 	}
-
-	my $wt_sha1 = $repo->command_oneline('hash-object', "$workdir/$file");
-	my $use = ($sha1 eq $null_sha1) || ($sha1 eq $wt_sha1);
-	return ($use, $wt_sha1);
+	return (0, $null_sha1);
 }
 
 sub changed_files
-- 
2.6.2.282.gfefd36e

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] difftool: avoid symlinks when reusing worktree files
  2015-10-27 21:24 [PATCH] difftool: avoid symlinks when reusing worktree files David Aguilar
@ 2015-10-27 22:24 ` Junio C Hamano
  2015-10-29  1:55   ` David Aguilar
  2015-10-27 22:41 ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2015-10-27 22:24 UTC (permalink / raw
  To: David Aguilar; +Cc: Ismail Badawi, John Keeping, Tim Henigan, Git Mailing List

David Aguilar <davvid@gmail.com> writes:

> difftool's dir-diff should never reuse a symlink, regardless of
> what it points to.  Tighten use_wt_file() so that it rejects all
> symlinks.
>
> Helped-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: David Aguilar <davvid@gmail.com>
> ---

Sorry.  I do recall saying "it is wrong to feed the contents of a
file that a symlink points at to hash-object" but other than that,
I completely lost track.

What purpose does this function play in its callchain?  What does
its caller wants it to compute?  Is use of the entity in the working
tree completely optional?  Would the caller happily produce correct
result even if we changed this function to unconditionally return
($use=0, $wt_sha1='0'x40) regardless of the result of lstat(2) on
"$workdir/$file"?

The conclusion of the thought process that starts from "it is wrong
to feed the contents of a file that a symlink points at to
hash-object" may not be "so let's return $use=0 for all symlinks",
which is this patch. Depending on what its caller wants it to
compute, the right conclusion may be "we need to call hash-object
correctly by first running readlink and then feeding the result to
it".

And if the answer is "the caller wants us to compute the hash for a
symbolic link and say $use=1", then we would instead need to do
an equivalent of

	wt_sha1=$(readlink "$workdir/$file" | hash-object --stdin)

I cannot quite tell which from the patch and explanation.

Perhaps an additional test or two would help illustrate what issues
are being addressed better?

Thanks.

>  git-difftool.perl | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/git-difftool.perl b/git-difftool.perl
> index 1abe647..873db57 100755
> --- a/git-difftool.perl
> +++ b/git-difftool.perl
> @@ -70,13 +70,13 @@ sub use_wt_file
>  	my ($repo, $workdir, $file, $sha1) = @_;
>  	my $null_sha1 = '0' x 40;
>  
> -	if (! -f "$workdir/$file") {
> -		return (0, $null_sha1);
> +	my $workfile = "$workdir/$file";
> +	if (-f $workfile && ! -l $workfile) {
> +		my $wt_sha1 = $repo->command_oneline('hash-object', $workfile);
> +		my $use = ($sha1 eq $null_sha1) || ($sha1 eq $wt_sha1);
> +		return ($use, $wt_sha1);
>  	}
> -
> -	my $wt_sha1 = $repo->command_oneline('hash-object', "$workdir/$file");
> -	my $use = ($sha1 eq $null_sha1) || ($sha1 eq $wt_sha1);
> -	return ($use, $wt_sha1);
> +	return (0, $null_sha1);
>  }
>  
>  sub changed_files

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] difftool: avoid symlinks when reusing worktree files
  2015-10-27 21:24 [PATCH] difftool: avoid symlinks when reusing worktree files David Aguilar
  2015-10-27 22:24 ` Junio C Hamano
@ 2015-10-27 22:41 ` Junio C Hamano
  1 sibling, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2015-10-27 22:41 UTC (permalink / raw
  To: David Aguilar; +Cc: Ismail Badawi, John Keeping, Tim Henigan, Git Mailing List

David Aguilar <davvid@gmail.com> writes:

> -	if (! -f "$workdir/$file") {
> -		return (0, $null_sha1);
> +	my $workfile = "$workdir/$file";
> +	if (-f $workfile && ! -l $workfile) {

I still don't know if return (0, $null) is the right thing to do,
but in any case, I find the original flow easier to read, i.e. "we
notice a few cases we cannot do the main 'hash-object' thing this
function is meant to do and return early".  I.e.

	if (-l "$workdir/$file" || ! -f _) {
        	return (0, $null_sha1);
	}
	my $wt_sha1 = $repo->command_oneline('hash-object', "$workdir/$file");
	my $use = ($sha1 eq $null_sha1) || ($sha1 eq $wt_sha1);
	return ($use, $wt_sha1);

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] difftool: avoid symlinks when reusing worktree files
  2015-10-27 22:24 ` Junio C Hamano
@ 2015-10-29  1:55   ` David Aguilar
  2015-10-29 17:59     ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: David Aguilar @ 2015-10-29  1:55 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Ismail Badawi, John Keeping, Tim Henigan, Git Mailing List

On Tue, Oct 27, 2015 at 03:24:49PM -0700, Junio C Hamano wrote:
> David Aguilar <davvid@gmail.com> writes:
> 
> > difftool's dir-diff should never reuse a symlink, regardless of
> > what it points to.  Tighten use_wt_file() so that it rejects all
> > symlinks.
> >
> > Helped-by: Junio C Hamano <gitster@pobox.com>
> > Signed-off-by: David Aguilar <davvid@gmail.com>
> > ---
> 
> Sorry.  I do recall saying "it is wrong to feed the contents of a
> file that a symlink points at to hash-object" but other than that,
> I completely lost track.
> 
> What purpose does this function play in its callchain?  What does
> its caller wants it to compute?  Is use of the entity in the working
> tree completely optional?  Would the caller happily produce correct
> result even if we changed this function to unconditionally return
> ($use=0, $wt_sha1='0'x40) regardless of the result of lstat(2) on
> "$workdir/$file"?
> 
> The conclusion of the thought process that starts from "it is wrong
> to feed the contents of a file that a symlink points at to
> hash-object" may not be "so let's return $use=0 for all symlinks",
> which is this patch. Depending on what its caller wants it to
> compute, the right conclusion may be "we need to call hash-object
> correctly by first running readlink and then feeding the result to
> it".
> 
> And if the answer is "the caller wants us to compute the hash for a
> symbolic link and say $use=1", then we would instead need to do
> an equivalent of
> 
> 	wt_sha1=$(readlink "$workdir/$file" | hash-object --stdin)
> 
> I cannot quite tell which from the patch and explanation.
> 
> Perhaps an additional test or two would help illustrate what issues
> are being addressed better?
> 
> Thanks.

Right.  At first I thought I could revise the commit message to
make it clearer that we simply want to skip all symlinks, since
it never makes sense to reuse a worktree symlinks, but looking
at the tests and implementation makes me realize that it's not
that simple.

This is going to take a bit more time to get right.  John, I was
hoping you'd be able to take a look -- I'm playing catch-up too.
When it was first reported I let it sit for a while in hopes
that the original author would pickup the issue, but months
passed and I figured I'd take a stab at helping the user out.

Anyways, it'll take me a bit more time to understand the code
and work out a sensible solution.  My gut feeling is that we
should adjust the dir-diff feature so that it ignores all
symlinks.  That seems like a simple answer since we're deciding
to skip that chunk of complexity.

John, do you have any thoughts on how we can best handle this?
-- 
David

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] difftool: avoid symlinks when reusing worktree files
  2015-10-29  1:55   ` David Aguilar
@ 2015-10-29 17:59     ` Junio C Hamano
  2015-10-29 18:19       ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2015-10-29 17:59 UTC (permalink / raw
  To: David Aguilar; +Cc: Ismail Badawi, John Keeping, Tim Henigan, Git Mailing List

David Aguilar <davvid@gmail.com> writes:

> Right.  At first I thought I could revise the commit message to
> make it clearer that we simply want to skip all symlinks, since
> it never makes sense to reuse a worktree symlinks, but looking
> at the tests and implementation makes me realize that it's not
> that simple.
>
> This is going to take a bit more time to get right.  John, I was
> hoping you'd be able to take a look -- I'm playing catch-up too.
> When it was first reported I let it sit for a while in hopes
> that the original author would pickup the issue, but months
> passed and I figured I'd take a stab at helping the user out.
>
> Anyways, it'll take me a bit more time to understand the code
> and work out a sensible solution.  My gut feeling is that we
> should adjust the dir-diff feature so that it ignores all
> symlinks.  That seems like a simple answer since we're deciding
> to skip that chunk of complexity.

What dir-diff wants to do is to prepare two directory hierarchies on
the filesystem and run "diff -r" (or an equivalent command of user's
choice) on them, e.g. "diff -r left/ right/".  "left/" tree is
typically what you want to compare your working tree files agaist
(e.g. a clean checkout of "the other version"), and "right/" tree is
populated with either copies of the working tree or symbolic links.
The copying to "right/" feels wasteful, but your working tree may be
littered with build artifacts, and making a clean copy with only
tracked files is one way to make sure that "diff -r" with a clean
checout of "the other version" will not show them.

In the loop that walks the @rawdiff array, there are a lot of "if we
see a symbolic link on the left, do this" before the last one that
says "if the working tree side is not $null (i.e. not missing), ask
ut_wt_file()".  That code remembers which path on either side had
symbolic links.

Later in the same function, there is this comment "Symbolic links
require special treatment."  The intent of the code here is that any
path that involves a symbolic link should be tweaked there.  The
loop over %symlink expects left/ and right/ to be populated normally
by the loop over @working_tree, and then for any path that is a
symbolic link is replaced with a phony regular file (not a symbolic
link) that says "Here is a symbolic link".

So I think it is fine to return $use=0 for any symbolic link from
use_wt_file.  Anything you do there will be replaced by the loop
over %symlink that appears later in the caller.  The caller discards
$wt_sha1 when $use=0 is returned, so the second return value does
not matter.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] difftool: avoid symlinks when reusing worktree files
  2015-10-29 17:59     ` Junio C Hamano
@ 2015-10-29 18:19       ` Junio C Hamano
  2015-10-30  7:28         ` David Aguilar
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2015-10-29 18:19 UTC (permalink / raw
  To: David Aguilar; +Cc: Ismail Badawi, John Keeping, Tim Henigan, Git Mailing List

Junio C Hamano <gitster@pobox.com> writes:

> So I think it is fine to return $use=0 for any symbolic link from
> use_wt_file.  Anything you do there will be replaced by the loop
> over %symlink that appears later in the caller.  The caller discards
> $wt_sha1 when $use=0 is returned, so the second return value does
> not matter.

So let me try to update your patch with the result of the study of
the codeflow.

-- >8 --
From: David Aguilar <davvid@gmail.com>
Subject: difftool: ignore symbolic links in use_wt_file

The caller is preparing a narrowed-down copy of the working tree and
this function is asked if the path should be included in that copy.
If we say yes, the path from the working tree will be either symlinked
or copied into the narrowed-down copy.

For any path that is a symbolic link, the caller later fixes up the
narrowed-down copy by unlinking the path and replacing it with a
regular file it writes out that mimics the way how "git diff"
compares symbolic links.

Let's answer "no, you do not want to copy/symlink the working tree
file" for all symbolic links from this function, as we know the
result will not be used because it will be overwritten anyway.

Incidentally, this also stops the function from feeding a symbolic
link in the working tree to hash-object, which is a wrong thing to
do to begin with. The link may be pointing at a directory, or worse
may be dangling (both would be noticed as an error).  Even if the
link points at a regular file, hashing the contents of a file that
is pointed at by the link is not correct (Git hashes the contents of
the link itself, not the pointee).

Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 git-difftool.perl   |  4 +---
 t/t7800-difftool.sh | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/git-difftool.perl b/git-difftool.perl
index 7df7c8a..488d14b 100755
--- a/git-difftool.perl
+++ b/git-difftool.perl
@@ -70,9 +70,7 @@ sub use_wt_file
 	my ($repo, $workdir, $file, $sha1) = @_;
 	my $null_sha1 = '0' x 40;
 
-	if (! -e "$workdir/$file") {
-		# If the file doesn't exist in the working tree, we cannot
-		# use it.
+	if (-l "$workdir/$file" || ! -e _) {
 		return (0, $null_sha1);
 	}
 
diff --git a/t/t7800-difftool.sh b/t/t7800-difftool.sh
index ea35a02..a771cf7 100755
--- a/t/t7800-difftool.sh
+++ b/t/t7800-difftool.sh
@@ -504,4 +504,23 @@ test_expect_success PERL 'difftool properly honors gitlink and core.worktree' '
 	)
 '
 
+test_expect_success PERL,SYMLINKS 'difftool --dir-diff symlinked directories' '
+	git init dirlinks &&
+	(
+		cd dirlinks &&
+		git config diff.tool checktrees &&
+		git config difftool.checktrees.cmd "echo good" &&
+		mkdir foo &&
+		: >foo/bar &&
+		git add foo/bar &&
+		test_commit symlink-one &&
+		ln -s foo link &&
+		git add link &&
+		test_commit symlink-two &&
+		echo good >expect &&
+		git difftool --tool=checktrees --dir-diff HEAD~ >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] difftool: avoid symlinks when reusing worktree files
  2015-10-29 18:19       ` Junio C Hamano
@ 2015-10-30  7:28         ` David Aguilar
  2015-10-30 16:19           ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: David Aguilar @ 2015-10-30  7:28 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Ismail Badawi, John Keeping, Tim Henigan, Git Mailing List

On Thu, Oct 29, 2015 at 11:19:01AM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > So I think it is fine to return $use=0 for any symbolic link from
> > use_wt_file.  Anything you do there will be replaced by the loop
> > over %symlink that appears later in the caller.  The caller discards
> > $wt_sha1 when $use=0 is returned, so the second return value does
> > not matter.
> 
> So let me try to update your patch with the result of the study of
> the codeflow.
> 
> -- >8 --
> From: David Aguilar <davvid@gmail.com>
> Subject: difftool: ignore symbolic links in use_wt_file
> 
> The caller is preparing a narrowed-down copy of the working tree and
> this function is asked if the path should be included in that copy.
> If we say yes, the path from the working tree will be either symlinked
> or copied into the narrowed-down copy.
> 
> For any path that is a symbolic link, the caller later fixes up the
> narrowed-down copy by unlinking the path and replacing it with a
> regular file it writes out that mimics the way how "git diff"
> compares symbolic links.
> 
> Let's answer "no, you do not want to copy/symlink the working tree
> file" for all symbolic links from this function, as we know the
> result will not be used because it will be overwritten anyway.
> 
> Incidentally, this also stops the function from feeding a symbolic
> link in the working tree to hash-object, which is a wrong thing to
> do to begin with. The link may be pointing at a directory, or worse
> may be dangling (both would be noticed as an error).  Even if the
> link points at a regular file, hashing the contents of a file that
> is pointed at by the link is not correct (Git hashes the contents of
> the link itself, not the pointee).
> 
> Signed-off-by: David Aguilar <davvid@gmail.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---

This is a very nicely worded commit message.  Thanks for the
thorough explanation.


>  git-difftool.perl   |  4 +---
>  t/t7800-difftool.sh | 19 +++++++++++++++++++
>  2 files changed, 20 insertions(+), 3 deletions(-)
> 
> diff --git a/git-difftool.perl b/git-difftool.perl
> index 7df7c8a..488d14b 100755
> --- a/git-difftool.perl
> +++ b/git-difftool.perl
> @@ -70,9 +70,7 @@ sub use_wt_file
>  	my ($repo, $workdir, $file, $sha1) = @_;
>  	my $null_sha1 = '0' x 40;
>  
> -	if (! -e "$workdir/$file") {
> -		# If the file doesn't exist in the working tree, we cannot
> -		# use it.
> +	if (-l "$workdir/$file" || ! -e _) {
>  		return (0, $null_sha1);
>  	}

The "-e _" shorthand caught my eye ~ I didn't know perl could do that!
Nice.

Underline is barely mentioned in perlvar, but it's obvious what
(I think) it means, and since Perl is DWIM, it must be right. ;-)
-- 
David

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] difftool: avoid symlinks when reusing worktree files
  2015-10-30  7:28         ` David Aguilar
@ 2015-10-30 16:19           ` Junio C Hamano
  0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2015-10-30 16:19 UTC (permalink / raw
  To: David Aguilar; +Cc: Ismail Badawi, John Keeping, Tim Henigan, Git Mailing List

David Aguilar <davvid@gmail.com> writes:

>> +	if (-l "$workdir/$file" || ! -e _) {
>>  		return (0, $null_sha1);
>>  	}
>
> The "-e _" shorthand caught my eye ~ I didn't know perl could do that!
> Nice.
>
> Underline is barely mentioned in perlvar, but it's obvious what
> (I think) it means, and since Perl is DWIM, it must be right. ;-)

It is mentioned under -X (i.e. the headline for -f, -e, -d,
... tests) in perlfunc with an interesting example of doing
an explicit stat() first and then running many variants of -X
with _ as the target.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-10-30 16:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-27 21:24 [PATCH] difftool: avoid symlinks when reusing worktree files David Aguilar
2015-10-27 22:24 ` Junio C Hamano
2015-10-29  1:55   ` David Aguilar
2015-10-29 17:59     ` Junio C Hamano
2015-10-29 18:19       ` Junio C Hamano
2015-10-30  7:28         ` David Aguilar
2015-10-30 16:19           ` Junio C Hamano
2015-10-27 22:41 ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).