Allow passing pipes to diff --no-index + bugfix

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* Allow passing pipes to diff --no-index + bugfix
@ 2020-09-18 11:32 Thomas Guyot-Sionnest
  2020-09-18 11:32 ` [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat Thomas Guyot-Sionnest
                   ` (2 more replies)
  0 siblings, 3 replies; 50+ messages in thread
From: Thomas Guyot-Sionnest @ 2020-09-18 11:32 UTC (permalink / raw)
  To: git; +Cc: dermoth

Hello,

This set of patches adds the ability to generate diffs directly from
shell process substitution using the <(...) syntax. This is extremely
useful to generate diffs with files streamed directly form remote
systems or when it may be useful to filter them on the fly to generate
the diffs.

For example:

  $ git diff --stat \
    <(sed -r 's/^\S+\s//' /boot/System.map-4.19.0-8-amd64|sort) \
	<(sed -r 's/^\S+\s//' /boot/System.map-4.19.0-9-amd64|sort)

   /dev/fd/{63 => 62} | 9500 ++++++++++++++++++++++----------------------
   1 file changed, 4789 insertions(+), 4711 deletions(-)


Along with it a small fix in --stat and --numstat that affected one one
git range-diff test, where added/removed lines stts were missing (needed
for difffing the pipes too)

Regards,

Thomas Guyot-Sionnest



^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-18 11:32 Allow passing pipes to diff --no-index + bugfix Thomas Guyot-Sionnest
@ 2020-09-18 11:32 ` Thomas Guyot-Sionnest
  2020-09-18 14:46   ` Taylor Blau
                     ` (2 more replies)
  2020-09-18 11:32 ` [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index Thomas Guyot-Sionnest
  2020-09-18 17:51 ` Allow passing pipes to diff --no-index + bugfix Junio C Hamano
  2 siblings, 3 replies; 50+ messages in thread
From: Thomas Guyot-Sionnest @ 2020-09-18 11:32 UTC (permalink / raw)
  To: git; +Cc: dermoth, Thomas Guyot-Sionnest

In builtin_diffstat(), when both files are coming from "stdin" (which
could be better described as the file's content being written directly
into the file object), oideq() compares two null hashes and ignores the
actual differences for the statistics.

This patch checks if is_stdin flag is set on both sides and compare
contents directly.

Signed-off-by: Thomas Guyot-Sionnest <tguyot@gmail.com>
---
 diff.c                | 5 ++++-
 t/t3206-range-diff.sh | 8 ++++----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
index a5114fa864..2995527896 100644
--- a/diff.c
+++ b/diff.c
@@ -3681,7 +3681,10 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
 		return;
 	}
 
-	same_contents = oideq(&one->oid, &two->oid);
+	if (one->is_stdin && two->is_stdin)
+		same_contents = !strcmp(one->data, two->data);
+	else
+		same_contents = oideq(&one->oid, &two->oid);
 
 	if (diff_filespec_is_binary(o->repo, one) ||
 	    diff_filespec_is_binary(o->repo, two)) {
diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
index e024cff65c..4715e75b68 100755
--- a/t/t3206-range-diff.sh
+++ b/t/t3206-range-diff.sh
@@ -258,11 +258,11 @@ test_expect_success 'changed commit with --stat diff option' '
 	     a => b | 0
 	     1 file changed, 0 insertions(+), 0 deletions(-)
 	3:  $(test_oid t3) ! 3:  $(test_oid c3) s/11/B/
-	     a => b | 0
-	     1 file changed, 0 insertions(+), 0 deletions(-)
+	     a => b | 2 +-
+	     1 file changed, 1 insertion(+), 1 deletion(-)
 	4:  $(test_oid t4) ! 4:  $(test_oid c4) s/12/B/
-	     a => b | 0
-	     1 file changed, 0 insertions(+), 0 deletions(-)
+	     a => b | 2 +-
+	     1 file changed, 1 insertion(+), 1 deletion(-)
 	EOF
 	test_cmp expect actual
 '
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index
  2020-09-18 11:32 Allow passing pipes to diff --no-index + bugfix Thomas Guyot-Sionnest
  2020-09-18 11:32 ` [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat Thomas Guyot-Sionnest
@ 2020-09-18 11:32 ` Thomas Guyot-Sionnest
  2020-09-18 14:36   ` Taylor Blau
  2020-09-18 21:56   ` brian m. carlson
  2020-09-18 17:51 ` Allow passing pipes to diff --no-index + bugfix Junio C Hamano
  2 siblings, 2 replies; 50+ messages in thread
From: Thomas Guyot-Sionnest @ 2020-09-18 11:32 UTC (permalink / raw)
  To: git; +Cc: dermoth, Thomas Guyot-Sionnest

A very handy way to pass data to applications is to use the <() process
substitution syntax in bash variants. It allow comparing files streamed
from a remote server or doing on-the-fly stream processing to alter the
diff. These are usually implemented as a symlink that points to a bogus
name (ex "pipe:[209326419]") but opens as a pipe.

Git normally tracks symlinks targets. This patch makes it detect such
pipes in --no-index mode and read the file normally like it would do for
stdin ("-"), so they can be compared directly.

Signed-off-by: Thomas Guyot-Sionnest <tguyot@gmail.com>
---
 diff-no-index.c          |  56 ++++++++++--
 t/t4053-diff-no-index.sh | 189 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 238 insertions(+), 7 deletions(-)

diff --git a/diff-no-index.c b/diff-no-index.c
index 7814eabfe0..779c686d23 100644
--- a/diff-no-index.c
+++ b/diff-no-index.c
@@ -41,6 +41,33 @@ static int read_directory_contents(const char *path, struct string_list *list)
  */
 static const char file_from_standard_input[] = "-";
 
+/* Check that file is - (STDIN) or unnamed pipe - explicitly
+ * avoid on-disk named pipes which could block
+ */
+static int ispipe(const char *name)
+{
+	struct stat st;
+
+	if (name == file_from_standard_input)
+		return 1;  /* STDIN */
+
+	if (!lstat(name, &st)) {
+		if (S_ISLNK(st.st_mode)) {
+			/* symlink - read it and check it doesn't exists
+			 * as a file yet link to a pipe */
+			struct strbuf sb = STRBUF_INIT;
+			strbuf_realpath(&sb, name, 0);
+			/* We're abusing strbuf_realpath here, it may append
+			 * pipe:[NNNNNNNNN] to an abs path */
+			if (!stat(sb.buf, &st))
+				return 0; /* link target exists , not pipe */
+			if (!stat(name, &st))
+				return S_ISFIFO(st.st_mode);
+		}
+	}
+	return 0;
+}
+
 static int get_mode(const char *path, int *mode)
 {
 	struct stat st;
@@ -51,7 +78,7 @@ static int get_mode(const char *path, int *mode)
 	else if (!strcasecmp(path, "nul"))
 		*mode = 0;
 #endif
-	else if (path == file_from_standard_input)
+	else if (ispipe(path))
 		*mode = create_ce_mode(0666);
 	else if (lstat(path, &st))
 		return error("Could not access '%s'", path);
@@ -60,13 +87,13 @@ static int get_mode(const char *path, int *mode)
 	return 0;
 }
 
-static int populate_from_stdin(struct diff_filespec *s)
+static int populate_from_fd(struct diff_filespec *s, int fd)
 {
 	struct strbuf buf = STRBUF_INIT;
 	size_t size = 0;
 
-	if (strbuf_read(&buf, 0, 0) < 0)
-		return error_errno("error while reading from stdin");
+	if (strbuf_read(&buf, fd, 0) < 0)
+		return error_errno(_("error while reading from fd %i"), fd);
 
 	s->should_munmap = 0;
 	s->data = strbuf_detach(&buf, &size);
@@ -76,6 +103,20 @@ static int populate_from_stdin(struct diff_filespec *s)
 	return 0;
 }
 
+static int populate_from_pipe(struct diff_filespec *s, const char *name)
+{
+	int ret;
+	FILE *f;
+
+	f = fopen(name, "r");
+	if (!f)
+		return error_errno(_("cannot open %s"), name);
+
+	ret = populate_from_fd(s, fileno(f));
+	fclose(f);
+	return ret;
+}
+
 static struct diff_filespec *noindex_filespec(const char *name, int mode)
 {
 	struct diff_filespec *s;
@@ -85,7 +126,9 @@ static struct diff_filespec *noindex_filespec(const char *name, int mode)
 	s = alloc_filespec(name);
 	fill_filespec(s, &null_oid, 0, mode);
 	if (name == file_from_standard_input)
-		populate_from_stdin(s);
+		populate_from_fd(s, 0);
+	else if (ispipe(name))
+		populate_from_pipe(s, name);
 	return s;
 }
 
@@ -218,8 +261,7 @@ static void fixup_paths(const char **path, struct strbuf *replacement)
 {
 	unsigned int isdir0, isdir1;
 
-	if (path[0] == file_from_standard_input ||
-	    path[1] == file_from_standard_input)
+	if (ispipe(path[0]) || ispipe(path[1]))
 		return;
 	isdir0 = is_directory(path[0]);
 	isdir1 = is_directory(path[1]);
diff --git a/t/t4053-diff-no-index.sh b/t/t4053-diff-no-index.sh
index 0168946b63..e49f773515 100755
--- a/t/t4053-diff-no-index.sh
+++ b/t/t4053-diff-no-index.sh
@@ -144,4 +144,193 @@ test_expect_success 'diff --no-index allows external diff' '
 	test_cmp expect actual
 '
 
+test_expect_success 'diff --no-index can diff piped subshells' '
+	echo 1 >non/git/c &&
+	test_expect_code 0 git diff --no-index non/git/b <(cat non/git/c) &&
+	test_expect_code 0 git diff --no-index <(cat non/git/b) non/git/c &&
+	test_expect_code 0 git diff --no-index <(cat non/git/b) <(cat non/git/c) &&
+	test_expect_code 0 cat non/git/b | git diff --no-index - non/git/c &&
+	test_expect_code 0 cat non/git/c | git diff --no-index non/git/b - &&
+	test_expect_code 0 cat non/git/b | git diff --no-index - <(cat non/git/c) &&
+	test_expect_code 0 cat non/git/c | git diff --no-index <(cat non/git/b) -
+'
+
+test_expect_success 'diff --no-index finds diff in piped subshells' '
+	(
+		set -- <(cat /dev/null) <(cat /dev/null)
+		cat <<-EOF >expect
+			diff --git a$1 b$2
+			--- a$1
+			+++ b$2
+			@@ -1 +1 @@
+			-1
+			+2
+		EOF
+	) &&
+	test_expect_code 1 \
+		git diff --no-index <(cat non/git/b) <(sed s/1/2/ non/git/c) >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'diff --no-index with stat and numstat' '
+	(
+		set -- <(cat /dev/null) <(cat /dev/null)
+		min=$((${#1} < ${#2} ? ${#1} : ${#2}))
+		for ((i=0; i<min; i++)); do [ "${1:i:1}" = "${2:i:1}" ] || break; done
+		base=${1:0:i-1}
+		cat <<-EOF >expect1
+			 $base{${1#$base} => ${2#$base}} | 2 +-
+			 1 file changed, 1 insertion(+), 1 deletion(-)
+		EOF
+		cat <<-EOF >expect2
+			1	1	$base{${1#$base} => ${2#$base}}
+		EOF
+	) &&
+	test_expect_code 1 \
+		git diff --no-index --stat <(cat non/git/a) <(sed s/1/2/ non/git/b) >actual &&
+	test_cmp expect1 actual &&
+	test_expect_code 1 \
+		git diff --no-index --numstat <(cat non/git/a) <(sed s/1/2/ non/git/b) >actual &&
+	test_cmp expect2 actual
+'
+
+test_expect_success PIPE 'diff --no-index on filesystem pipes' '
+	(
+		cd non/git &&
+		mkdir f g &&
+		mkfifo f/1 g/1 &&
+		test_expect_code 128 git diff --no-index f g &&
+		test_expect_code 128 git diff --no-index ../../a f &&
+		test_expect_code 128 git diff --no-index g ../../a &&
+		test_expect_code 128 git diff --no-index f/1 g/1 &&
+		test_expect_code 128 git diff --no-index f/1 ../../a/1 &&
+		test_expect_code 128 git diff --no-index ../../a/1 g/1
+	)
+'
+
+test_expect_success PIPE 'diff --no-index reads symlinks to named pipes as symlinks' '
+	(
+		cd non/git &&
+		mkdir h i &&
+		ln -s ../f/1 h/1 &&
+		ln -s ../g/1 i/1 &&
+		test_expect_code 1 git diff --no-index h i >actual &&
+		cat <<-EOF >expect &&
+			diff --git a/h/1 b/i/1
+			index d0b5850..d8b9c34 120000
+			--- a/h/1
+			+++ b/i/1
+			@@ -1 +1 @@
+			-../f/1
+			\ No newline at end of file
+			+../g/1
+			\ No newline at end of file
+		EOF
+		test_cmp expect actual &&
+		test_expect_code 1 git diff --no-index ../../a h >actual &&
+		cat <<-EOF >expect &&
+			diff --git a/../../a/1 b/../../a/1
+			deleted file mode 100644
+			index d00491f..0000000
+			--- a/../../a/1
+			+++ /dev/null
+			@@ -1 +0,0 @@
+			-1
+			diff --git a/h/1 b/h/1
+			new file mode 120000
+			index 0000000..d0b5850
+			--- /dev/null
+			+++ b/h/1
+			@@ -0,0 +1 @@
+			+../f/1
+			\ No newline at end of file
+			diff --git a/../../a/2 b/../../a/2
+			deleted file mode 100644
+			index 0cfbf08..0000000
+			--- a/../../a/2
+			+++ /dev/null
+			@@ -1 +0,0 @@
+			-2
+		EOF
+		test_cmp expect actual &&
+		test_expect_code 1 git diff --no-index i ../../a >actual &&
+		cat <<-EOF >expect &&
+			diff --git a/i/1 b/i/1
+			deleted file mode 120000
+			index d8b9c34..0000000
+			--- a/i/1
+			+++ /dev/null
+			@@ -1 +0,0 @@
+			-../g/1
+			\ No newline at end of file
+			diff --git a/../../a/1 b/../../a/1
+			new file mode 100644
+			index 0000000..d00491f
+			--- /dev/null
+			+++ b/../../a/1
+			@@ -0,0 +1 @@
+			+1
+			diff --git a/../../a/2 b/../../a/2
+			new file mode 100644
+			index 0000000..0cfbf08
+			--- /dev/null
+			+++ b/../../a/2
+			@@ -0,0 +1 @@
+			+2
+		EOF
+		test_cmp expect actual &&
+		test_expect_code 1 git diff --no-index h/1 i/1 >actual &&
+		cat <<-EOF >expect &&
+			diff --git a/h/1 b/i/1
+			index d0b5850..d8b9c34 120000
+			--- a/h/1
+			+++ b/i/1
+			@@ -1 +1 @@
+			-../f/1
+			\ No newline at end of file
+			+../g/1
+			\ No newline at end of file
+		EOF
+		test_cmp expect actual &&
+		test_expect_code 1 git diff --no-index h/1 ../../a/1 >actual &&
+		cat <<-EOF >expect &&
+			diff --git a/h/1 b/h/1
+			deleted file mode 120000
+			index d0b5850..0000000
+			--- a/h/1
+			+++ /dev/null
+			@@ -1 +0,0 @@
+			-../f/1
+			\ No newline at end of file
+			diff --git a/../../a/1 b/../../a/1
+			new file mode 100644
+			index 0000000..d00491f
+			--- /dev/null
+			+++ b/../../a/1
+			@@ -0,0 +1 @@
+			+1
+		EOF
+		test_cmp expect actual &&
+		test_expect_code 1 git diff --no-index ../../a/1 i/1 >actual &&
+		cat <<-EOF >expect &&
+			diff --git a/../../a/1 b/../../a/1
+			deleted file mode 100644
+			index d00491f..0000000
+			--- a/../../a/1
+			+++ /dev/null
+			@@ -1 +0,0 @@
+			-1
+			diff --git a/i/1 b/i/1
+			new file mode 120000
+			index 0000000..d8b9c34
+			--- /dev/null
+			+++ b/i/1
+			@@ -0,0 +1 @@
+			+../g/1
+			\ No newline at end of file
+		EOF
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index
  2020-09-18 11:32 ` [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index Thomas Guyot-Sionnest
@ 2020-09-18 14:36   ` Taylor Blau
  2020-09-18 16:34     ` Thomas Guyot-Sionnest
  2020-09-18 17:20     ` Jeff King
  2020-09-18 21:56   ` brian m. carlson
  1 sibling, 2 replies; 50+ messages in thread
From: Taylor Blau @ 2020-09-18 14:36 UTC (permalink / raw)
  To: Thomas Guyot-Sionnest; +Cc: git, dermoth

Hi Thomas,

On Fri, Sep 18, 2020 at 07:32:56AM -0400, Thomas Guyot-Sionnest wrote:
> A very handy way to pass data to applications is to use the <() process
> substitution syntax in bash variants. It allow comparing files streamed
> from a remote server or doing on-the-fly stream processing to alter the
> diff. These are usually implemented as a symlink that points to a bogus
> name (ex "pipe:[209326419]") but opens as a pipe.

This is true in bash, but sh does not support process substitution with
<().

> Git normally tracks symlinks targets. This patch makes it detect such
> pipes in --no-index mode and read the file normally like it would do for
> stdin ("-"), so they can be compared directly.
>
> Signed-off-by: Thomas Guyot-Sionnest <tguyot@gmail.com>
> ---
>  diff-no-index.c          |  56 ++++++++++--
>  t/t4053-diff-no-index.sh | 189 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 238 insertions(+), 7 deletions(-)
>
> diff --git a/diff-no-index.c b/diff-no-index.c
> index 7814eabfe0..779c686d23 100644
> --- a/diff-no-index.c
> +++ b/diff-no-index.c
> @@ -41,6 +41,33 @@ static int read_directory_contents(const char *path, struct string_list *list)
>   */
>  static const char file_from_standard_input[] = "-";
>
> +/* Check that file is - (STDIN) or unnamed pipe - explicitly
> + * avoid on-disk named pipes which could block
> + */
> +static int ispipe(const char *name)
> +{
> +	struct stat st;
> +
> +	if (name == file_from_standard_input)
> +		return 1;  /* STDIN */
> +
> +	if (!lstat(name, &st)) {
> +		if (S_ISLNK(st.st_mode)) {

I had to read this a few times to make sure that I got it; you want to
stat the link itself, and then check that it links to a pipe.

I'm not sure why, though. Do you want to avoid handling named FIFOs in
the code below? Your comment that they "could block" makes me think you
do, but I don't know why that would be a problem.

> +			/* symlink - read it and check it doesn't exists
> +			 * as a file yet link to a pipe */
> +			struct strbuf sb = STRBUF_INIT;
> +			strbuf_realpath(&sb, name, 0);
> +			/* We're abusing strbuf_realpath here, it may append
> +			 * pipe:[NNNNNNNNN] to an abs path */
> +			if (!stat(sb.buf, &st))

Statting sb.buf is confusing to me (especially when followed up by
another stat right below. Could you explain?

> +test_expect_success 'diff --no-index can diff piped subshells' '
> +	echo 1 >non/git/c &&
> +	test_expect_code 0 git diff --no-index non/git/b <(cat non/git/c) &&
> +	test_expect_code 0 git diff --no-index <(cat non/git/b) non/git/c &&
> +	test_expect_code 0 git diff --no-index <(cat non/git/b) <(cat non/git/c) &&
> +	test_expect_code 0 cat non/git/b | git diff --no-index - non/git/c &&
> +	test_expect_code 0 cat non/git/c | git diff --no-index non/git/b - &&
> +	test_expect_code 0 cat non/git/b | git diff --no-index - <(cat non/git/c) &&
> +	test_expect_code 0 cat non/git/c | git diff --no-index <(cat non/git/b) -
> +'

Indeed this test fails (Git thinks that the HERE-DOC is broken, but I
suspect it's just getting confused by the '<()'). This test (like almost
all other tests in Git) use /bin/sh as its shebang. Does your /bin/sh
actually point to bash?

If you did want to test something like this, you'd need to source
t/lib-bash.sh instead of t/test-lib.sh.

Unrelated to the above comment, but there are a few small style nits
that I notice:

  - There is no need to run with 'test_expect_code 0' since the test is
    marked as 'test_expect_success' and the commands are all in an '&&'
    chain. (This does appear to be common style for others in t4053, so
    you may just be matching it--which is fine--but an additional
    clean-up on top to modernize would be appreciated, too).

  - The cat pipe is unnecessary, and is also violating a rule that we
    don't place 'git' on the right-hand side of a pipe (can you redirect
    the file at the end instead?).

Documentation/CodingGuidelines is a great place to look if you are ever
curious about whether something is in good style.

> +test_expect_success 'diff --no-index finds diff in piped subshells' '
> +	(
> +		set -- <(cat /dev/null) <(cat /dev/null)

Why is this necessary?

> +		cat <<-EOF >expect
> +			diff --git a$1 b$2
> +			--- a$1
> +			+++ b$2
> +			@@ -1 +1 @@
> +			-1
> +			+2
> +		EOF
> +	) &&
> +	test_expect_code 1 \
> +		git diff --no-index <(cat non/git/b) <(sed s/1/2/ non/git/c) >actual &&
> +	test_cmp expect actual
> +'

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-18 11:32 ` [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat Thomas Guyot-Sionnest
@ 2020-09-18 14:46   ` Taylor Blau
  2020-09-18 15:10     ` Thomas Guyot-Sionnest
  2020-09-18 17:27   ` Jeff King
  2020-09-20 13:09   ` [PATCH v2] " Thomas Guyot-Sionnest
  2 siblings, 1 reply; 50+ messages in thread
From: Taylor Blau @ 2020-09-18 14:46 UTC (permalink / raw)
  To: Thomas Guyot-Sionnest; +Cc: git, dermoth

On Fri, Sep 18, 2020 at 07:32:55AM -0400, Thomas Guyot-Sionnest wrote:
> In builtin_diffstat(), when both files are coming from "stdin" (which
> could be better described as the file's content being written directly
> into the file object), oideq() compares two null hashes and ignores the
> actual differences for the statistics.
>
> This patch checks if is_stdin flag is set on both sides and compare
> contents directly.
>
> Signed-off-by: Thomas Guyot-Sionnest <tguyot@gmail.com>
> ---
>  diff.c                | 5 ++++-
>  t/t3206-range-diff.sh | 8 ++++----
>  2 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index a5114fa864..2995527896 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -3681,7 +3681,10 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
>  		return;
>  	}
>
> -	same_contents = oideq(&one->oid, &two->oid);
> +	if (one->is_stdin && two->is_stdin)
> +		same_contents = !strcmp(one->data, two->data);

Hmm. A couple of thoughts here:

  - strcmp seems like a slow-down here, since we'll have to go through
    at worst the smaller of one->data and two->data to compare each of
    them.

  - strcmp is likely not the right way to do that, since we could be
    diffing binary content, in which case we'd want to continue over
    NULs and instead stop at a fixed length (the minimum of the length
    of one->data and two->data, specifically). I'd have expected memcmp
    here instead.

  - Why do we have to do this at all all the way up in
    'builtin_diffstat'? I would expect these to contain the right
    OIDs by the time they are given back to us from the call to
    'diff_fill_oid_info' in 'run_diffstat'.

So, my last point is the most important of the three. I'd expect
something more along the lines of:

  1. diff_fill_oid_info resolve the link to the pipe, and
  2. index_path handles the resolved fd.

...but it looks like that is already what it's doing? I'm confused why
this doesn't work as-is.

> +	else
> +		same_contents = oideq(&one->oid, &two->oid);
>
>  	if (diff_filespec_is_binary(o->repo, one) ||
>  	    diff_filespec_is_binary(o->repo, two)) {
> diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> index e024cff65c..4715e75b68 100755
> --- a/t/t3206-range-diff.sh
> +++ b/t/t3206-range-diff.sh
> @@ -258,11 +258,11 @@ test_expect_success 'changed commit with --stat diff option' '
>  	     a => b | 0
>  	     1 file changed, 0 insertions(+), 0 deletions(-)
>  	3:  $(test_oid t3) ! 3:  $(test_oid c3) s/11/B/
> -	     a => b | 0
> -	     1 file changed, 0 insertions(+), 0 deletions(-)
> +	     a => b | 2 +-
> +	     1 file changed, 1 insertion(+), 1 deletion(-)
>  	4:  $(test_oid t4) ! 4:  $(test_oid c4) s/12/B/
> -	     a => b | 0
> -	     1 file changed, 0 insertions(+), 0 deletions(-)
> +	     a => b | 2 +-
> +	     1 file changed, 1 insertion(+), 1 deletion(-)

The tests definitely demonstrate that the old behavior was wrong,
though...

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-18 14:46   ` Taylor Blau
@ 2020-09-18 15:10     ` Thomas Guyot-Sionnest
  2020-09-18 17:37       ` Jeff King
  2020-09-20  4:53       ` Thomas Guyot
  0 siblings, 2 replies; 50+ messages in thread
From: Thomas Guyot-Sionnest @ 2020-09-18 15:10 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Thomas Guyot-Sionnest

On Fri, 18 Sep 2020 at 10:46, Taylor Blau <me@ttaylorr.com> wrote:
>
> On Fri, Sep 18, 2020 at 07:32:55AM -0400, Thomas Guyot-Sionnest wrote:
> > -     same_contents = oideq(&one->oid, &two->oid);
> > +     if (one->is_stdin && two->is_stdin)
> > +             same_contents = !strcmp(one->data, two->data);
>
> Hmm. A couple of thoughts here:
>
>   - strcmp seems like a slow-down here, since we'll have to go through
>     at worst the smaller of one->data and two->data to compare each of
>     them.
>
>   - strcmp is likely not the right way to do that, since we could be
>     diffing binary content, in which case we'd want to continue over
>     NULs and instead stop at a fixed length (the minimum of the length
>     of one->data and two->data, specifically). I'd have expected memcmp
>     here instead.
>

You're absolutely right - this is a bug I managed to figure out last
night - first real incursion into git C code - and I definitely didn't
think this through. TBH so far I coded mostly with tools dealing in
plaintext and C strings.

>   - Why do we have to do this at all all the way up in
>     'builtin_diffstat'? I would expect these to contain the right
>     OIDs by the time they are given back to us from the call to
>     'diff_fill_oid_info' in 'run_diffstat'.
>
> So, my last point is the most important of the three. I'd expect
> something more along the lines of:
>
>   1. diff_fill_oid_info resolve the link to the pipe, and
>   2. index_path handles the resolved fd.
>
> ...but it looks like that is already what it's doing? I'm confused why
> this doesn't work as-is.

So the idea is to checksum the data and write a valid oid. I'll see if
I can figure that out. Thanks for the hint though else I would likely
have gone with a buffer and memcmp. Your solution seems cleaner, and
there is a few other uses of oideq's that look dubious at best with
the case of null oids / buffered data so it's definitely a better
approach.

> > +     else
> > +             same_contents = oideq(&one->oid, &two->oid);
> >
> >       if (diff_filespec_is_binary(o->repo, one) ||
> >           diff_filespec_is_binary(o->repo, two)) {
> > diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> > index e024cff65c..4715e75b68 100755
> > --- a/t/t3206-range-diff.sh
> > +++ b/t/t3206-range-diff.sh
> > @@ -258,11 +258,11 @@ test_expect_success 'changed commit with --stat diff option' '
> >            a => b | 0
> >            1 file changed, 0 insertions(+), 0 deletions(-)
> >       3:  $(test_oid t3) ! 3:  $(test_oid c3) s/11/B/
> > -          a => b | 0
> > -          1 file changed, 0 insertions(+), 0 deletions(-)
> > +          a => b | 2 +-
> > +          1 file changed, 1 insertion(+), 1 deletion(-)
> >       4:  $(test_oid t4) ! 4:  $(test_oid c4) s/12/B/
> > -          a => b | 0
> > -          1 file changed, 0 insertions(+), 0 deletions(-)
> > +          a => b | 2 +-
> > +          1 file changed, 1 insertion(+), 1 deletion(-)
>
> The tests definitely demonstrate that the old behavior was wrong,
> though...
>

For the records I verified the actual diffs (I think they're even
hardcoded in the earlier tests) and the remaining 0-add/del's are also
valid.

Regards,

Thomas

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index
  2020-09-18 14:36   ` Taylor Blau
@ 2020-09-18 16:34     ` Thomas Guyot-Sionnest
  2020-09-18 17:19       ` Jeff King
  2020-09-18 17:58       ` Taylor Blau
  2020-09-18 17:20     ` Jeff King
  1 sibling, 2 replies; 50+ messages in thread
From: Thomas Guyot-Sionnest @ 2020-09-18 16:34 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Thomas Guyot-Sionnest

Hi Taylor,

On Fri, 18 Sep 2020 at 10:36, Taylor Blau <me@ttaylorr.com> wrote:
> On Fri, Sep 18, 2020 at 07:32:56AM -0400, Thomas Guyot-Sionnest wrote:
> > A very handy way to pass data to applications is to use the <() process
> > substitution syntax in bash variants. It allow comparing files streamed
> > from a remote server or doing on-the-fly stream processing to alter the
> > diff. These are usually implemented as a symlink that points to a bogus
> > name (ex "pipe:[209326419]") but opens as a pipe.
>
> This is true in bash, but sh does not support process substitution with
> <().

Bash, ksh, zsh and likely any more moden shell. Other programming
languages also setup such pipes. It's much cleaner than creating temp
files and cleaning them up and in some cases faster too (I've ran
diff's like this over GB's of test data, it's very handy to remove
known patterns that would cause needless diffs).

> > +/* Check that file is - (STDIN) or unnamed pipe - explicitly
> > + * avoid on-disk named pipes which could block
> > + */
> > +static int ispipe(const char *name)
> > +{
> > +     struct stat st;
> > +
> > +     if (name == file_from_standard_input)
> > +             return 1;  /* STDIN */
> > +
> > +     if (!lstat(name, &st)) {
> > +             if (S_ISLNK(st.st_mode)) {
>
> I had to read this a few times to make sure that I got it; you want to
> stat the link itself, and then check that it links to a pipe.
>
> I'm not sure why, though. Do you want to avoid handling named FIFOs in
> the code below? Your comment that they "could block" makes me think you
> do, but I don't know why that would be a problem.

I'll admit the comment was written first and is a bit naive  - i'll
rephrase that. Yes you don't want to block on pipes like if you run a
"grep -R" on a subtree that has fifos - but as I coded this I realized
the obvious: git tracks symlinks name so the real bugger would be to
detect one as a pipe and try reading it instead or calling readlink().

> > +                     /* symlink - read it and check it doesn't exists
> > +                      * as a file yet link to a pipe */
> > +                     struct strbuf sb = STRBUF_INIT;
> > +                     strbuf_realpath(&sb, name, 0);
> > +                     /* We're abusing strbuf_realpath here, it may append
> > +                      * pipe:[NNNNNNNNN] to an abs path */
> > +                     if (!stat(sb.buf, &st))
>
> Statting sb.buf is confusing to me (especially when followed up by
> another stat right below. Could you explain?

The whole block is under lstat/S_ISLNK (see previous chunk), so the
path provided to us was a symlink.

Initially I looked at what differentiate these - mainly, stat() st_dev
- but that struct is os-specific, you'd want to check major(st_dev) ==
0 (at least on linux) and even if we knew how each os behaves, the
code isn't portable and would be a pain to support. Gnu's difftools
have very incomplete historical source code in git but there's
indications they have gotten rid of it too.

So what I'm doing instead is trying to resolve the link and see if the
destination exists (a clear no). Luckily strbuf_realpath does the
heavy lifting and leaves me with a real path to the file the symlink
points to (especially useful for relative links), which is bogus for
the special pipes we're interested in.

Then the block right after (not shown) do a stat() on the initial name
and return whenever it's a fifo or not (if it is, but the link is
broken, we know it's a special device).

Now you mention it, maybe I could do that stat first, rule this out
from the beginning... less work for the general case.

*untested*:

    if (!lstat(name, &st)) {
        if (!S_ISLNK(st.st_mode))
            return(0);
        if (!stat(name, &st)) {
            if (!S_ISFIFO(st.st_mode))
                return(0);

            /* We have a symlink that points to a pipe. If it's resolved
             * target doesn't really exist we can safely assume it's a
             * special file and use it */
            struct strbuf sb = STRBUF_INIT;
            strbuf_realpath(&sb, name, 0);
            /* We're abusing strbuf_realpath here, it may append
             * pipe:[NNNNNNNNN] to an abs path */
            if (stat(sb.buf, &st))
                return(1); /* stat failed, special one */
        }
    }
    return(0);

TL;DR - the conditions we need:

- lstat(name) == 0  // name exists
- islink(lstat(name))  // name is a symlink
- stat(name) == 0  // target of name is reachable
- isfifo(stat(name))  // Target of name is a fifo
- stat(realpath(readlink(name))) != 0  // Although we can reach it,
name's destination doesn't actually exist.

BTW is st/sb too confusing ? I took examples elsewhere in the code, I
can rename them if it's easier to read.

> > +test_expect_success 'diff --no-index can diff piped subshells' '
> > +     echo 1 >non/git/c &&
> > +     test_expect_code 0 git diff --no-index non/git/b <(cat non/git/c) &&
> > +     test_expect_code 0 git diff --no-index <(cat non/git/b) non/git/c &&
> > +     test_expect_code 0 git diff --no-index <(cat non/git/b) <(cat non/git/c) &&
> > +     test_expect_code 0 cat non/git/b | git diff --no-index - non/git/c &&
> > +     test_expect_code 0 cat non/git/c | git diff --no-index non/git/b - &&
> > +     test_expect_code 0 cat non/git/b | git diff --no-index - <(cat non/git/c) &&
> > +     test_expect_code 0 cat non/git/c | git diff --no-index <(cat non/git/b) -
> > +'
>
> Indeed this test fails (Git thinks that the HERE-DOC is broken, but I
> suspect it's just getting confused by the '<()'). This test (like almost
> all other tests in Git) use /bin/sh as its shebang. Does your /bin/sh
> actually point to bash?
>
> If you did want to test something like this, you'd need to source
> t/lib-bash.sh instead of t/test-lib.sh.

Thanks for the tip - indeed I think I ran the testsuite directly with
back, but the make test failed.

> Unrelated to the above comment, but there are a few small style nits
> that I notice:
>
>   - There is no need to run with 'test_expect_code 0' since the test is
>     marked as 'test_expect_success' and the commands are all in an '&&'
>     chain. (This does appear to be common style for others in t4053, so
>     you may just be matching it--which is fine--but an additional
>     clean-up on top to modernize would be appreciated, too).
>
>   - The cat pipe is unnecessary, and is also violating a rule that we
>     don't place 'git' on the right-hand side of a pipe (can you redirect
>     the file at the end instead?).

Cleanup, no pipelines (I read too fast / assumed last command was ok) - will do!

> Documentation/CodingGuidelines is a great place to look if you are ever
> curious about whether something is in good style.
>
> > +test_expect_success 'diff --no-index finds diff in piped subshells' '
> > +     (
> > +             set -- <(cat /dev/null) <(cat /dev/null)

Precautions/portability. The file names are somewhat dynamic (at least
the fd part...) this is to be sure I capture the names of the pipes
that will be used (assuming the fd's will be reallocated in the same
order which I think is fairly safe). An alternative is to sed "actual"
to remove known variables, but then I hope it would be reliable (and
can I use sed -r?). IIRC earlier versions of bash or on some systems a
temp file could be used for these - although it defeats the purpose
it's not a reason to fail....

I cannot develop this on other systems but I tested the pipe names on
Windows and Sunos, and also using ksh and zsh on Linux (zsh uses /proc
directly, kss uses lower fd's which means it can easily clash with
scripts if you don't use named fd's, but not our problem....)

Thanks,

Thomas

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index
  2020-09-18 16:34     ` Thomas Guyot-Sionnest
@ 2020-09-18 17:19       ` Jeff King
  2020-09-18 17:21         ` Jeff King
                           ` (2 more replies)
  2020-09-18 17:58       ` Taylor Blau
  1 sibling, 3 replies; 50+ messages in thread
From: Jeff King @ 2020-09-18 17:19 UTC (permalink / raw)
  To: Thomas Guyot-Sionnest; +Cc: Taylor Blau, git, Thomas Guyot-Sionnest

On Fri, Sep 18, 2020 at 12:34:48PM -0400, Thomas Guyot-Sionnest wrote:

> Hi Taylor,
> 
> On Fri, 18 Sep 2020 at 10:36, Taylor Blau <me@ttaylorr.com> wrote:
> > On Fri, Sep 18, 2020 at 07:32:56AM -0400, Thomas Guyot-Sionnest wrote:
> > > A very handy way to pass data to applications is to use the <() process
> > > substitution syntax in bash variants. It allow comparing files streamed
> > > from a remote server or doing on-the-fly stream processing to alter the
> > > diff. These are usually implemented as a symlink that points to a bogus
> > > name (ex "pipe:[209326419]") but opens as a pipe.
> >
> > This is true in bash, but sh does not support process substitution with
> > <().
> 
> Bash, ksh, zsh and likely any more moden shell. Other programming
> languages also setup such pipes. It's much cleaner than creating temp
> files and cleaning them up and in some cases faster too (I've ran
> diff's like this over GB's of test data, it's very handy to remove
> known patterns that would cause needless diffs).

Yeah, it's definitely a reasonable thing to want (see below). And from a
portability perspective, it is outside of Git's scope; users with those
shells can use the feature, and people on other shells don't have to
care.

But we do have to account for this in the test suite, which must be able
to run under a vanilla POSIX shell. So you'd probably want to set up a
prerequisite that lets us skip these tests on other shells, like:

  test_lazy_prereq PROCESS_SUBSTITUTION '
	echo foo >expect &&
	cat >actual <(echo foo) &&
	test_cmp expect actual
  '

  test_expect_success PROCESS_SUBSTITUTION 'some test...' '
	# safe because we skip this test on shells that do not support it
	git diff --no-index <(cat whatever)
  '

Though it is a little sad that people running the suite with a vanilla
/bin/sh like dash wouldn't ever run the tests. I wonder if there's a
more portable way to formulate it.

Getting back to the overall feature, this is definitely something that
has come up before. The last I know of is:

  https://lore.kernel.org/git/20181220002610.43832-1-sandals@crustytoothpaste.net/

which everybody seemed to like the direction of; I suspect the original
author (cc'd) just never got around to it again. Compared to this
approach, it uses a command-line option to avoid dereferencing symlinks.
That puts an extra burden on the caller to pass the option, but it's way
less magical; you could drop all of the "does this look like a symlink
to a pipe" heuristics. It would also be much easier to test. ;)

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index
  2020-09-18 14:36   ` Taylor Blau
  2020-09-18 16:34     ` Thomas Guyot-Sionnest
@ 2020-09-18 17:20     ` Jeff King
  2020-09-18 18:00       ` Taylor Blau
  1 sibling, 1 reply; 50+ messages in thread
From: Jeff King @ 2020-09-18 17:20 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Thomas Guyot-Sionnest, git, dermoth

On Fri, Sep 18, 2020 at 10:36:47AM -0400, Taylor Blau wrote:

>   - The cat pipe is unnecessary, and is also violating a rule that we
>     don't place 'git' on the right-hand side of a pipe (can you redirect
>     the file at the end instead?).

What's wrong with git on the right-hand side of a pipe?

On the left-hand side we lose its exit code, which is bad. But on the
right hand side, we are only losing the exit code of "cat", which we
don't care about.

(Though I agree that "cat" is pointless here; we could just be
redirecting from a file).

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index
  2020-09-18 17:19       ` Jeff King
@ 2020-09-18 17:21         ` Jeff King
  2020-09-18 17:39         ` Thomas Guyot-Sionnest
  2020-09-18 17:48         ` Junio C Hamano
  2 siblings, 0 replies; 50+ messages in thread
From: Jeff King @ 2020-09-18 17:21 UTC (permalink / raw)
  To: Thomas Guyot-Sionnest
  Cc: brian m. carlson, Taylor Blau, git, Thomas Guyot-Sionnest

On Fri, Sep 18, 2020 at 01:19:50PM -0400, Jeff King wrote:

> Getting back to the overall feature, this is definitely something that
> has come up before. The last I know of is:
> 
>   https://lore.kernel.org/git/20181220002610.43832-1-sandals@crustytoothpaste.net/
> 
> which everybody seemed to like the direction of; I suspect the original
> author (cc'd) just never got around to it again. Compared to this
> approach, it uses a command-line option to avoid dereferencing symlinks.
> That puts an extra burden on the caller to pass the option, but it's way
> less magical; you could drop all of the "does this look like a symlink
> to a pipe" heuristics. It would also be much easier to test. ;)

Of course I forgot to add the cc. +cc brian.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-18 11:32 ` [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat Thomas Guyot-Sionnest
  2020-09-18 14:46   ` Taylor Blau
@ 2020-09-18 17:27   ` Jeff King
  2020-09-18 17:52     ` Thomas Guyot-Sionnest
  2020-09-23 15:05     ` Johannes Schindelin
  2020-09-20 13:09   ` [PATCH v2] " Thomas Guyot-Sionnest
  2 siblings, 2 replies; 50+ messages in thread
From: Jeff King @ 2020-09-18 17:27 UTC (permalink / raw)
  To: Thomas Guyot-Sionnest; +Cc: Johannes Schindelin, git, dermoth

On Fri, Sep 18, 2020 at 07:32:55AM -0400, Thomas Guyot-Sionnest wrote:

> In builtin_diffstat(), when both files are coming from "stdin" (which
> could be better described as the file's content being written directly
> into the file object), oideq() compares two null hashes and ignores the
> actual differences for the statistics.
> 
> This patch checks if is_stdin flag is set on both sides and compare
> contents directly.

I'm somewhat puzzled how we could have two filespecs that came from
stdin, since we'd generally read to EOF. But looking at the test, it
seems this is a weird range-diff hack to set is_stdin.

Looking at your patch:

> diff --git a/diff.c b/diff.c
> index a5114fa864..2995527896 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -3681,7 +3681,10 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
>  		return;
>  	}
>  
> -	same_contents = oideq(&one->oid, &two->oid);
> +	if (one->is_stdin && two->is_stdin)
> +		same_contents = !strcmp(one->data, two->data);
> +	else
> +		same_contents = oideq(&one->oid, &two->oid);

...should this actually be checking the oid_valid flag in each filespec?
That would presumably cover the is_stdin case, too. I also wonder
whether range-diff ought to be using that flag instead of is_stdin.

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-18 15:10     ` Thomas Guyot-Sionnest
@ 2020-09-18 17:37       ` Jeff King
  2020-09-18 18:00         ` Thomas Guyot-Sionnest
  2020-09-20  4:53       ` Thomas Guyot
  1 sibling, 1 reply; 50+ messages in thread
From: Jeff King @ 2020-09-18 17:37 UTC (permalink / raw)
  To: Thomas Guyot-Sionnest; +Cc: Taylor Blau, git, Thomas Guyot-Sionnest

On Fri, Sep 18, 2020 at 11:10:45AM -0400, Thomas Guyot-Sionnest wrote:

> > So, my last point is the most important of the three. I'd expect
> > something more along the lines of:
> >
> >   1. diff_fill_oid_info resolve the link to the pipe, and
> >   2. index_path handles the resolved fd.
> >
> > ...but it looks like that is already what it's doing? I'm confused why
> > this doesn't work as-is.
> 
> So the idea is to checksum the data and write a valid oid. I'll see if
> I can figure that out. Thanks for the hint though else I would likely
> have gone with a buffer and memcmp. Your solution seems cleaner, and
> there is a few other uses of oideq's that look dubious at best with
> the case of null oids / buffered data so it's definitely a better
> approach.

You're generally better off not to compute the oid just to compare two
buffers:

  - a byte-by-byte comparison can quit early as soon as it sees a
    difference, whereas computing two hashes has to cover each byte

  - even in the worst case that the byte comparison has to go all the
    way to the end, it's way faster than computing a sha1

So generally in the diff code we compare oids if we got them for free
(from the index, or from a tree), but otherwise it's OK to have
filespecs without the oid_valid flag set, and compare them byte-wise
when necessary. And there something like:

  if (one->size == two->size &&
      !memcmp(one->data, two->data, one->size))

is what you'd want.

Note that filespecs may not have their data or size loaded yet, though.
Looking at that part of builtin_diffstat(), I'm pretty sure that is
possible here (see how later code calls diff_populate_filespec() to make
sure it has data). OTOH, I guess if they're from stdin we'd always have
the data (since we'd have no oid to load from), so it might be OK under
that conditional.

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index
  2020-09-18 17:19       ` Jeff King
  2020-09-18 17:21         ` Jeff King
@ 2020-09-18 17:39         ` Thomas Guyot-Sionnest
  2020-09-18 17:48         ` Junio C Hamano
  2 siblings, 0 replies; 50+ messages in thread
From: Thomas Guyot-Sionnest @ 2020-09-18 17:39 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, Thomas Guyot-Sionnest

Hi Jeff,

On Fri, 18 Sep 2020 at 13:19, Jeff King <peff@peff.net> wrote:
> On Fri, Sep 18, 2020 at 12:34:48PM -0400, Thomas Guyot-Sionnest wrote:
> But we do have to account for this in the test suite, which must be able
> to run under a vanilla POSIX shell. So you'd probably want to set up a
> prerequisite that lets us skip these tests on other shells, like:

Indeed, the bash test library that was suggested earlier may be better
as it exec() bash rather than skipping tests. Testing as a prereq
works, another approach which may be harder to swallow but allow
testing when default shell is /bin/sh is to run each git command
through bash - could be coupled with a dep if bash isn't installed at
all.

> Though it is a little sad that people running the suite with a vanilla
> /bin/sh like dash wouldn't ever run the tests. I wonder if there's a
> more portable way to formulate it.

Debian defaults to dash, a minimalistic and afaik POSIX-compliant shell.

> Getting back to the overall feature, this is definitely something that
> has come up before. The last I know of is:
>
>   https://lore.kernel.org/git/20181220002610.43832-1-sandals@crustytoothpaste.net/
>
> which everybody seemed to like the direction of; I suspect the original
> author (cc'd) just never got around to it again. Compared to this
> approach, it uses a command-line option to avoid dereferencing symlinks.
> That puts an extra burden on the caller to pass the option, but it's way
> less magical; you could drop all of the "does this look like a symlink
> to a pipe" heuristics. It would also be much easier to test. ;)

Thanks for the info. Another consideration is how other commands -
diff, vim, sed, curl, etc can take input the same way, so being able
to swap-in git is a plus imho,and one less switch to learn (or even
learn about, I never looked for one for this issue).

Regards,

Thomas

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index
  2020-09-18 17:19       ` Jeff King
  2020-09-18 17:21         ` Jeff King
  2020-09-18 17:39         ` Thomas Guyot-Sionnest
@ 2020-09-18 17:48         ` Junio C Hamano
  2020-09-18 18:02           ` Jeff King
  2 siblings, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2020-09-18 17:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Thomas Guyot-Sionnest, Taylor Blau, git, Thomas Guyot-Sionnest

Jeff King <peff@peff.net> writes:

> Getting back to the overall feature, this is definitely something that
> has come up before. The last I know of is:
>
>   https://lore.kernel.org/git/20181220002610.43832-1-sandals@crustytoothpaste.net/
>
> which everybody seemed to like the direction of; I suspect the original
> author (cc'd) just never got around to it again. Compared to this
> approach, it uses a command-line option to avoid dereferencing symlinks.
> That puts an extra burden on the caller to pass the option, but it's way
> less magical; you could drop all of the "does this look like a symlink
> to a pipe" heuristics. It would also be much easier to test. ;)

Yes, I do remember liking the approach very much and wanted to take
it once the "do not dereference symlinks everywhere---just limit it
to what was given from the command line" was done.

To be quite honest, I think "git diff --no-index A B" should
unconditionally dereference A and/or B if they are symlinks, whether
they are symlinks to pipes, regular files or directories, and
otherwise treat symlinks in A and B the same way as "git diff" if A
and B are directories.  But that is a design guideline that becomes
needed only after we start resurrecting Brian's effort, not with
these patches that started this thread.

Thanks.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Allow passing pipes to diff --no-index + bugfix
  2020-09-18 11:32 Allow passing pipes to diff --no-index + bugfix Thomas Guyot-Sionnest
  2020-09-18 11:32 ` [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat Thomas Guyot-Sionnest
  2020-09-18 11:32 ` [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index Thomas Guyot-Sionnest
@ 2020-09-18 17:51 ` Junio C Hamano
  2020-09-18 18:24   ` Thomas Guyot-Sionnest
  2 siblings, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2020-09-18 17:51 UTC (permalink / raw)
  To: Thomas Guyot-Sionnest; +Cc: git, dermoth

Thomas Guyot-Sionnest <tguyot@gmail.com> writes:

> Along with it a small fix in --stat and --numstat that affected one one
> git range-diff test, where added/removed lines stts were missing (needed
> for difffing the pipes too)

Next time, please send each of such unrelated patches independently,
not as a two-patch series that gives a (false) impression that the
second one needs the first one to work correctly.

Thanks.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-18 17:27   ` Jeff King
@ 2020-09-18 17:52     ` Thomas Guyot-Sionnest
  2020-09-18 18:06       ` Junio C Hamano
  2020-09-23 15:05     ` Johannes Schindelin
  1 sibling, 1 reply; 50+ messages in thread
From: Thomas Guyot-Sionnest @ 2020-09-18 17:52 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, git, Thomas Guyot-Sionnest

On Fri, 18 Sep 2020 at 13:27, Jeff King <peff@peff.net> wrote:
> On Fri, Sep 18, 2020 at 07:32:55AM -0400, Thomas Guyot-Sionnest wrote:
> > This patch checks if is_stdin flag is set on both sides and compare
> > contents directly.
>
> I'm somewhat puzzled how we could have two filespecs that came from
> stdin, since we'd generally read to EOF. But looking at the test, it
> seems this is a weird range-diff hack to set is_stdin.

"is_stdin" is actually set manually by a function that copies stdin to
diff_filespec->data. We can get an arbitrary number of pipes from
command line arguments - only difference with stdin is that we have to
open them before read.

The flag seems to have been leveraged by diff-range - the first patch
fixes that tool alone, and 2nd adds support for multiple pipes in
--no-index. Both are independent but you would not be able to --stat
two pipes without the first patch.

> > diff --git a/diff.c b/diff.c
> > index a5114fa864..2995527896 100644
> > --- a/diff.c
> > +++ b/diff.c
> > @@ -3681,7 +3681,10 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
> >               return;
> >       }
> >
> > -     same_contents = oideq(&one->oid, &two->oid);
> > +     if (one->is_stdin && two->is_stdin)
> > +             same_contents = !strcmp(one->data, two->data);
> > +     else
> > +             same_contents = oideq(&one->oid, &two->oid);
>
> ...should this actually be checking the oid_valid flag in each filespec?
> That would presumably cover the is_stdin case, too. I also wonder
> whether range-diff ought to be using that flag instead of is_stdin.

I considered that, but IIRC when run under a debugger oid_valid was
set to 0 - it seemed to be used for something different that i'm not
familiar with, maybe it's an indication the object is in git datastore
(whereas with --no-index outside files will only be hashed for
comparison).

I think is_stdin is a misnomer, but if we want to refactor that i'd
rather do it after.

Regards,

Thomas

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index
  2020-09-18 16:34     ` Thomas Guyot-Sionnest
  2020-09-18 17:19       ` Jeff King
@ 2020-09-18 17:58       ` Taylor Blau
  2020-09-18 18:05         ` Jeff King
  1 sibling, 1 reply; 50+ messages in thread
From: Taylor Blau @ 2020-09-18 17:58 UTC (permalink / raw)
  To: Thomas Guyot-Sionnest; +Cc: Taylor Blau, git, Thomas Guyot-Sionnest

Hi Thomas,

On Fri, Sep 18, 2020 at 12:34:48PM -0400, Thomas Guyot-Sionnest wrote:
> Hi Taylor,
>
> On Fri, 18 Sep 2020 at 10:36, Taylor Blau <me@ttaylorr.com> wrote:
> > On Fri, Sep 18, 2020 at 07:32:56AM -0400, Thomas Guyot-Sionnest wrote:
> > > A very handy way to pass data to applications is to use the <() process
> > > substitution syntax in bash variants. It allow comparing files streamed
> > > from a remote server or doing on-the-fly stream processing to alter the
> > > diff. These are usually implemented as a symlink that points to a bogus
> > > name (ex "pipe:[209326419]") but opens as a pipe.
> >
> > This is true in bash, but sh does not support process substitution with
> > <().
>
> Bash, ksh, zsh and likely any more moden shell. Other programming
> languages also setup such pipes. It's much cleaner than creating temp
> files and cleaning them up and in some cases faster too (I've ran
> diff's like this over GB's of test data, it's very handy to remove
> known patterns that would cause needless diffs).

Oh, yes, I definitely agree that it will be useful for callers who use
more modern shells. I was making sure that we would still be able to run
this in the test suite (for us, that means making a new file that
sources lib-bash and tests only in environments where bash is
supported).

> Now you mention it, maybe I could do that stat first, rule this out
> from the beginning... less work for the general case.
>
> *untested*:
>
>     if (!lstat(name, &st)) {
>         if (!S_ISLNK(st.st_mode))
>             return(0);
>         if (!stat(name, &st)) {
>             if (!S_ISFIFO(st.st_mode))
>                 return(0);

I still don't think I quite understand why FIFOs aren't allowed...
>
>             /* We have a symlink that points to a pipe. If it's resolved
>              * target doesn't really exist we can safely assume it's a
>              * special file and use it */
>             struct strbuf sb = STRBUF_INIT;
>             strbuf_realpath(&sb, name, 0);
>             /* We're abusing strbuf_realpath here, it may append
>              * pipe:[NNNNNNNNN] to an abs path */
>             if (stat(sb.buf, &st))
>                 return(1); /* stat failed, special one */

By the time that I got to this point I think that what you wrote is much
easier to follow. Thanks.

>         }
>     }
>     return(0);
>
> TL;DR - the conditions we need:
>
> - lstat(name) == 0  // name exists
> - islink(lstat(name))  // name is a symlink
> - stat(name) == 0  // target of name is reachable
> - isfifo(stat(name))  // Target of name is a fifo
> - stat(realpath(readlink(name))) != 0  // Although we can reach it,
> name's destination doesn't actually exist.
>
> BTW is st/sb too confusing ? I took examples elsewhere in the code, I
> can rename them if it's easier to read.

No, it's fine. I think anecdotally I read 'struct strbuf buf' more than
I read 'struct strbuf sb', but I guess that's just the areas of code
that I happen to frequent, since some quick grepping around shows 462
uses of 'sb' followed by 425 uses of 'buf' (the next most common names
are 'err' and 'out', with 191 and 121 uses, respectively).

> > > +test_expect_success 'diff --no-index finds diff in piped subshells' '
> > > +     (
> > > +             set -- <(cat /dev/null) <(cat /dev/null)
>
> Precautions/portability. The file names are somewhat dynamic (at least
> the fd part...) this is to be sure I capture the names of the pipes
> that will be used (assuming the fd's will be reallocated in the same
> order which I think is fairly safe). An alternative is to sed "actual"
> to remove known variables, but then I hope it would be reliable (and
> can I use sed -r?). IIRC earlier versions of bash or on some systems a
> temp file could be used for these - although it defeats the purpose
> it's not a reason to fail....

OK.

> I cannot develop this on other systems but I tested the pipe names on
> Windows and Sunos, and also using ksh and zsh on Linux (zsh uses /proc
> directly, kss uses lower fd's which means it can easily clash with
> scripts if you don't use named fd's, but not our problem....)
>
> Thanks,
>
> Thomas

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-18 17:37       ` Jeff King
@ 2020-09-18 18:00         ` Thomas Guyot-Sionnest
  0 siblings, 0 replies; 50+ messages in thread
From: Thomas Guyot-Sionnest @ 2020-09-18 18:00 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, Thomas Guyot-Sionnest

On Fri, 18 Sep 2020 at 13:37, Jeff King <peff@peff.net> wrote:
> On Fri, Sep 18, 2020 at 11:10:45AM -0400, Thomas Guyot-Sionnest wrote:
>
>   if (one->size == two->size &&
>       !memcmp(one->data, two->data, one->size))
>
> is what you'd want.
>

I think the other approach has its merits too - AFAIK if you run this
from a git repo and one of the files is tracked by it (or even both if
you compare two files within the repo) the oid will be readily
available and usable if the file hasn't been modified. If there is no
big objection I could stick with the hybrid approach, using memcmp of
course. - it's also the easiest fix.

--
Thomas

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index
  2020-09-18 17:20     ` Jeff King
@ 2020-09-18 18:00       ` Taylor Blau
  0 siblings, 0 replies; 50+ messages in thread
From: Taylor Blau @ 2020-09-18 18:00 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, Thomas Guyot-Sionnest, git, dermoth

On Fri, Sep 18, 2020 at 01:20:44PM -0400, Jeff King wrote:
> On Fri, Sep 18, 2020 at 10:36:47AM -0400, Taylor Blau wrote:
>
> >   - The cat pipe is unnecessary, and is also violating a rule that we
> >     don't place 'git' on the right-hand side of a pipe (can you redirect
> >     the file at the end instead?).
>
> What's wrong with git on the right-hand side of a pipe?

Ack, ignore me. The problem would be on the left-hand side only, without
set -o pipefail, which we don't do.

> On the left-hand side we lose its exit code, which is bad. But on the
> right hand side, we are only losing the exit code of "cat", which we
> don't care about.
>
> (Though I agree that "cat" is pointless here; we could just be
> redirecting from a file).

Yep, but an easy mistake to make nonetheless.

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index
  2020-09-18 17:48         ` Junio C Hamano
@ 2020-09-18 18:02           ` Jeff King
  2020-09-20 12:54             ` Thomas Guyot
  0 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2020-09-18 18:02 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: brian m. carlson, Thomas Guyot-Sionnest, Taylor Blau, git,
	Thomas Guyot-Sionnest

On Fri, Sep 18, 2020 at 10:48:41AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Getting back to the overall feature, this is definitely something that
> > has come up before. The last I know of is:
> >
> >   https://lore.kernel.org/git/20181220002610.43832-1-sandals@crustytoothpaste.net/
> >
> > which everybody seemed to like the direction of; I suspect the original
> > author (cc'd) just never got around to it again. Compared to this
> > approach, it uses a command-line option to avoid dereferencing symlinks.
> > That puts an extra burden on the caller to pass the option, but it's way
> > less magical; you could drop all of the "does this look like a symlink
> > to a pipe" heuristics. It would also be much easier to test. ;)
> 
> Yes, I do remember liking the approach very much and wanted to take
> it once the "do not dereference symlinks everywhere---just limit it
> to what was given from the command line" was done.
> 
> To be quite honest, I think "git diff --no-index A B" should
> unconditionally dereference A and/or B if they are symlinks, whether
> they are symlinks to pipes, regular files or directories, and
> otherwise treat symlinks in A and B the same way as "git diff" if A
> and B are directories.  But that is a design guideline that becomes
> needed only after we start resurrecting Brian's effort, not with
> these patches that started this thread.

Yeah, I think I'd be fine with that approach, too. It makes "git diff
--no-index" more like other tools out of the box. And if we took brian's
patch first, then we'd just be flipping its default, and the option it
adds would give an easy escape hatch for somebody who really wants to
diff two maybe-symlinks.

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index
  2020-09-18 17:58       ` Taylor Blau
@ 2020-09-18 18:05         ` Jeff King
  0 siblings, 0 replies; 50+ messages in thread
From: Jeff King @ 2020-09-18 18:05 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Thomas Guyot-Sionnest, git, Thomas Guyot-Sionnest

On Fri, Sep 18, 2020 at 01:58:36PM -0400, Taylor Blau wrote:

> > *untested*:
> >
> >     if (!lstat(name, &st)) {
> >         if (!S_ISLNK(st.st_mode))
> >             return(0);
> >         if (!stat(name, &st)) {
> >             if (!S_ISFIFO(st.st_mode))
> >                 return(0);
> 
> I still don't think I quite understand why FIFOs aren't allowed...

It's the other way around. Non-fifos return "0" from this "is it a pipe"
function.

I think it is to prevent a false positive with:

  rm bar
  ln -s foo bar
  git diff --no-index foo something-else

We'd still want to treat "foo" as a symlink there. I.e., the heuristic
for guessing it's a process substitution pipe is:

  - it's a symlink that doesn't point anywhere
  - it's also a fifo

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-18 17:52     ` Thomas Guyot-Sionnest
@ 2020-09-18 18:06       ` Junio C Hamano
  2020-09-23 19:16         ` Johannes Schindelin
  0 siblings, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2020-09-18 18:06 UTC (permalink / raw)
  To: Thomas Guyot-Sionnest
  Cc: Jeff King, Johannes Schindelin, git, Thomas Guyot-Sionnest

Thomas Guyot-Sionnest <tguyot@gmail.com> writes:

>> > -     same_contents = oideq(&one->oid, &two->oid);
>> > +     if (one->is_stdin && two->is_stdin)
>> > +             same_contents = !strcmp(one->data, two->data);
>> > +     else
>> > +             same_contents = oideq(&one->oid, &two->oid);
>>
>> ...should this actually be checking the oid_valid flag in each filespec?
>> That would presumably cover the is_stdin case, too. I also wonder
>> whether range-diff ought to be using that flag instead of is_stdin.
>
> I considered that, but IIRC when run under a debugger oid_valid was
> set to 0 - it seemed to be used for something different that i'm not
> familiar with, maybe it's an indication the object is in git datastore
> (whereas with --no-index outside files will only be hashed for
> comparison).

If it says !oid_valid, I think you are getting what you do want.
The contents from the outside world, be it what was read from the
standard input or a pipe, a regular file that is not up-to-date with
the index, may not have a usable oid computed for it, and oid_valid
being false signals you that you need byte-for-byte comparison.  As
suggested by Peff in another message, you can take that signal and
compare the size and then the contents with memcmp() to see if they
are the same.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: Allow passing pipes to diff --no-index + bugfix
  2020-09-18 17:51 ` Allow passing pipes to diff --no-index + bugfix Junio C Hamano
@ 2020-09-18 18:24   ` Thomas Guyot-Sionnest
  0 siblings, 0 replies; 50+ messages in thread
From: Thomas Guyot-Sionnest @ 2020-09-18 18:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Thomas Guyot-Sionnest

On Fri, 18 Sep 2020 at 13:51, Junio C Hamano <gitster@pobox.com> wrote:
>
> Next time, please send each of such unrelated patches independently,
> not as a two-patch series that gives a (false) impression that the
> second one needs the first one to work correctly.

Hi Junio,

My apologies for not making it clear enough - the fix in the first
patch is for an edgy case of git diff-range, but the same fix applies
to the 2nd patch. Any diff --stat comparison of two pipes would return
0-line diffs although the files would be marked as changed.

It was originally just one commit; I splitted it up when I realized a
test was actually triggered by this fix (false negative without the
fix, false positive with it) as it's still independent enough to be
reviewed/merged alone.

Regards,

Thomas

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index
  2020-09-18 11:32 ` [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index Thomas Guyot-Sionnest
  2020-09-18 14:36   ` Taylor Blau
@ 2020-09-18 21:56   ` brian m. carlson
  1 sibling, 0 replies; 50+ messages in thread
From: brian m. carlson @ 2020-09-18 21:56 UTC (permalink / raw)
  To: Thomas Guyot-Sionnest; +Cc: git, dermoth

[-- Attachment #1: Type: text/plain, Size: 3168 bytes --]

On 2020-09-18 at 11:32:56, Thomas Guyot-Sionnest wrote:
> diff --git a/diff-no-index.c b/diff-no-index.c
> index 7814eabfe0..779c686d23 100644
> --- a/diff-no-index.c
> +++ b/diff-no-index.c
> @@ -41,6 +41,33 @@ static int read_directory_contents(const char *path, struct string_list *list)
>   */
>  static const char file_from_standard_input[] = "-";
>  
> +/* Check that file is - (STDIN) or unnamed pipe - explicitly
> + * avoid on-disk named pipes which could block
> + */
> +static int ispipe(const char *name)
> +{
> +	struct stat st;
> +
> +	if (name == file_from_standard_input)
> +		return 1;  /* STDIN */
> +
> +	if (!lstat(name, &st)) {
> +		if (S_ISLNK(st.st_mode)) {
> +			/* symlink - read it and check it doesn't exists
> +			 * as a file yet link to a pipe */
> +			struct strbuf sb = STRBUF_INIT;
> +			strbuf_realpath(&sb, name, 0);
> +			/* We're abusing strbuf_realpath here, it may append
> +			 * pipe:[NNNNNNNNN] to an abs path */
> +			if (!stat(sb.buf, &st))
> +				return 0; /* link target exists , not pipe */
> +			if (!stat(name, &st))
> +				return S_ISFIFO(st.st_mode);

This makes a lot of assumptions about the implementation which are
specific to Linux, namely that an anonymous pipe will be a symlink to a
FIFO.  That's not the case on macOS, where the /dev/fd entries are
actually named pipes themselves.

Granted, in that case, Git just chokes and fails instead of diffing the
symlink values, but I suspect you'll want this to work on macOS as well.
I don't use macOS that often, but I would appreciate it if it worked
when I did, and I'm sure others will as well.

I think we can probably get away with just doing a regular stat and
seeing if S_ISFIFO is true, which is both simpler and cheaper.

> diff --git a/t/t4053-diff-no-index.sh b/t/t4053-diff-no-index.sh
> index 0168946b63..e49f773515 100755
> --- a/t/t4053-diff-no-index.sh
> +++ b/t/t4053-diff-no-index.sh
> @@ -144,4 +144,193 @@ test_expect_success 'diff --no-index allows external diff' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'diff --no-index can diff piped subshells' '
> +	echo 1 >non/git/c &&
> +	test_expect_code 0 git diff --no-index non/git/b <(cat non/git/c) &&
> +	test_expect_code 0 git diff --no-index <(cat non/git/b) non/git/c &&
> +	test_expect_code 0 git diff --no-index <(cat non/git/b) <(cat non/git/c) &&
> +	test_expect_code 0 cat non/git/b | git diff --no-index - non/git/c &&
> +	test_expect_code 0 cat non/git/c | git diff --no-index non/git/b - &&
> +	test_expect_code 0 cat non/git/b | git diff --no-index - <(cat non/git/c) &&
> +	test_expect_code 0 cat non/git/c | git diff --no-index <(cat non/git/b) -
> +'

As mentioned by others, this requires non-POSIX syntax.  /bin/sh on my
Debian system is dash, which doesn't support this.  You can either use a
prerequisite, or just test by piping from standard input and assume that
Git can handle the rest.  I would recommend at least adding some POSIX
testcases that use only a pipe from standard input to avoid regressing
this behavior on Debian and Ubuntu.
-- 
brian m. carlson: Houston, Texas, US

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-18 15:10     ` Thomas Guyot-Sionnest
  2020-09-18 17:37       ` Jeff King
@ 2020-09-20  4:53       ` Thomas Guyot
  1 sibling, 0 replies; 50+ messages in thread
From: Thomas Guyot @ 2020-09-20  4:53 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Thomas Guyot-Sionnest, Jeff King

Hi... Added Jeff as he got involved later and comments below are
relevant to his questions.

On 2020-09-18 11:10, Thomas Guyot-Sionnest wrote:
> On Fri, 18 Sep 2020 at 10:46, Taylor Blau <me@ttaylorr.com> wrote:
>>
>>   - Why do we have to do this at all all the way up in
>>     'builtin_diffstat'? I would expect these to contain the right
>>     OIDs by the time they are given back to us from the call to
>>     'diff_fill_oid_info' in 'run_diffstat'.
>>
>> So, my last point is the most important of the three. I'd expect
>> something more along the lines of:
>>
>>   1. diff_fill_oid_info resolve the link to the pipe, and
>>   2. index_path handles the resolved fd.
>>
>> ...but it looks like that is already what it's doing? I'm confused why
>> this doesn't work as-is.
> 
> So the idea is to checksum the data and write a valid oid. I'll see if
> I can figure that out. Thanks for the hint though else I would likely
> have gone with a buffer and memcmp. Your solution seems cleaner, and
> there is a few other uses of oideq's that look dubious at best with
> the case of null oids / buffered data so it's definitely a better
> approach.
> 

After looking further at the code I understand your point, although
pipes can only ever be read once, so even if we do that we would have to
buffer on first read. It appears the files are first read by
diff_populate_filespec() - builtin_diffstat isn't even called if the
files match (even for two pipes).

Jeff, on your suggestion to compare size, the size is set even if data
is null. Files in-tree appears to be mmapped on demand for reads.

diff_fill_oid_info explicitly resets oids for is_stdin and return, and
if the file's been read already and it's a pipe, we would *have* to have
buffered the data already so I don't really see what else we can do
besides memcmp() (technically we should be able to tell if the files
have been modified at this point but apparently that information isn't
transmitted to builtin_diffstat - it's assumed and I won't make complex
change for that odd case of diffing two pipes. That's what I have now:

    /* What is_stdin really means is that the file's content is only
     * in the filespec's buffer and its oid is zero. We can't compare
     * oid's if both are null and we can just diff the buffers */
    if (one->is_stdin && two->is_stdin)
        same_contents = (one->size == two->size ?
            !memcmp(one->data, two->data, one->size) : 0);
    else
        same_contents = oideq(&one->oid, &two->oid);


Even when we implement the --literally switch, considering we can't
guarantee a single read per file, for now I'd keep using the is_stdin
flag as an indication of in-memory data, and we'll have to read in all
pipes we diff (like earlier patch). It could be a concern if we
--literally diff a whole subtree of large pipes. In that case the only
fix I can see is to reorder the operations to generate the stats on each
file diff (or at least keep the diffs around for the stats pass).


Regards,

Thomas



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index
  2020-09-18 18:02           ` Jeff King
@ 2020-09-20 12:54             ` Thomas Guyot
  2020-09-21 19:31               ` Jeff King
  0 siblings, 1 reply; 50+ messages in thread
From: Thomas Guyot @ 2020-09-20 12:54 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano
  Cc: brian m. carlson, Taylor Blau, git, Thomas Guyot-Sionnest

On 2020-09-18 14:02, Jeff King wrote:
> On Fri, Sep 18, 2020 at 10:48:41AM -0700, Junio C Hamano wrote:
> 
>> Jeff King <peff@peff.net> writes:
>>
>>> Getting back to the overall feature, this is definitely something that
>>> has come up before. The last I know of is:
>>>
>>>   https://lore.kernel.org/git/20181220002610.43832-1-sandals@crustytoothpaste.net/
>>>
>>> which everybody seemed to like the direction of; I suspect the original
>>> author (cc'd) just never got around to it again. Compared to this
>>> approach, it uses a command-line option to avoid dereferencing symlinks.
>>> That puts an extra burden on the caller to pass the option, but it's way
>>> less magical; you could drop all of the "does this look like a symlink
>>> to a pipe" heuristics. It would also be much easier to test. ;)
>>
>> Yes, I do remember liking the approach very much and wanted to take
>> it once the "do not dereference symlinks everywhere---just limit it
>> to what was given from the command line" was done.
>>
>> To be quite honest, I think "git diff --no-index A B" should
>> unconditionally dereference A and/or B if they are symlinks, whether
>> they are symlinks to pipes, regular files or directories, and
>> otherwise treat symlinks in A and B the same way as "git diff" if A
>> and B are directories.  But that is a design guideline that becomes
>> needed only after we start resurrecting Brian's effort, not with
>> these patches that started this thread.
> 
> Yeah, I think I'd be fine with that approach, too. It makes "git diff
> --no-index" more like other tools out of the box. And if we took brian's
> patch first, then we'd just be flipping its default, and the option it
> adds would give an easy escape hatch for somebody who really wants to
> diff two maybe-symlinks.

Considering the issue with MacOS I'm starting to think the best solution
is to not use any heuristic and read passed-in files directly. That
said, I don't think it makes much change either way (if I resurrect
Brian's patch is will probably end up being a hybrid between the two as
both read the pipe at the same place and my approach was simpler further
down).

I'm not sure which way I prefer to start first - will you accept a patch
that reads passed in files as-is if I I start with this one?

In the mean time I will submit the first patch fixed) as a standalone one.

Regards,

Thomas


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-18 11:32 ` [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat Thomas Guyot-Sionnest
  2020-09-18 14:46   ` Taylor Blau
  2020-09-18 17:27   ` Jeff King
@ 2020-09-20 13:09   ` Thomas Guyot-Sionnest
  2020-09-20 15:39     ` Taylor Blau
  2 siblings, 1 reply; 50+ messages in thread
From: Thomas Guyot-Sionnest @ 2020-09-20 13:09 UTC (permalink / raw)
  To: git; +Cc: dermoth, me, peff, Thomas Guyot-Sionnest

In builtin_diffstat(), when both files are coming from "stdin" (which
could be better described as the file's content being written directly
into the file object), oideq() compares two null hashes and ignores the
actual differences for the statistics.

This patch checks if is_stdin flag is set on both sides and compare
contents directly.

Signed-off-by: Thomas Guyot-Sionnest <tguyot@gmail.com>
---
Range-diff:
1:  479c2835fc ! 1:  1f25713d44 diff: Fix modified lines stats with --stat and --numstat
    @@ -20,8 +20,12 @@
      	}
      
     -	same_contents = oideq(&one->oid, &two->oid);
    ++	/* What is_stdin really means is that the file's content is only
    ++	 * in the filespec's buffer and its oid is zero. We can't compare
    ++	 * oid's if both are null and we can just diff the buffers */
     +	if (one->is_stdin && two->is_stdin)
    -+		same_contents = !strcmp(one->data, two->data);
    ++		same_contents = (one->size == two->size ?
    ++			!memcmp(one->data, two->data, one->size) : 0);
     +	else
     +		same_contents = oideq(&one->oid, &two->oid);
      

 diff.c                | 9 ++++++++-
 t/t3206-range-diff.sh | 8 ++++----
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
index ee8e8189e9..2e47bf824e 100644
--- a/diff.c
+++ b/diff.c
@@ -3681,7 +3681,14 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
 		return;
 	}
 
-	same_contents = oideq(&one->oid, &two->oid);
+	/* What is_stdin really means is that the file's content is only
+	 * in the filespec's buffer and its oid is zero. We can't compare
+	 * oid's if both are null and we can just diff the buffers */
+	if (one->is_stdin && two->is_stdin)
+		same_contents = (one->size == two->size ?
+			!memcmp(one->data, two->data, one->size) : 0);
+	else
+		same_contents = oideq(&one->oid, &two->oid);
 
 	if (diff_filespec_is_binary(o->repo, one) ||
 	    diff_filespec_is_binary(o->repo, two)) {
diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
index e024cff65c..4715e75b68 100755
--- a/t/t3206-range-diff.sh
+++ b/t/t3206-range-diff.sh
@@ -258,11 +258,11 @@ test_expect_success 'changed commit with --stat diff option' '
 	     a => b | 0
 	     1 file changed, 0 insertions(+), 0 deletions(-)
 	3:  $(test_oid t3) ! 3:  $(test_oid c3) s/11/B/
-	     a => b | 0
-	     1 file changed, 0 insertions(+), 0 deletions(-)
+	     a => b | 2 +-
+	     1 file changed, 1 insertion(+), 1 deletion(-)
 	4:  $(test_oid t4) ! 4:  $(test_oid c4) s/12/B/
-	     a => b | 0
-	     1 file changed, 0 insertions(+), 0 deletions(-)
+	     a => b | 2 +-
+	     1 file changed, 1 insertion(+), 1 deletion(-)
 	EOF
 	test_cmp expect actual
 '
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-20 13:09   ` [PATCH v2] " Thomas Guyot-Sionnest
@ 2020-09-20 15:39     ` Taylor Blau
  2020-09-20 16:38       ` Thomas Guyot
  2020-09-20 19:11       ` Junio C Hamano
  0 siblings, 2 replies; 50+ messages in thread
From: Taylor Blau @ 2020-09-20 15:39 UTC (permalink / raw)
  To: Thomas Guyot-Sionnest; +Cc: git, dermoth, me, peff

On Sun, Sep 20, 2020 at 09:09:46AM -0400, Thomas Guyot-Sionnest wrote:
> In builtin_diffstat(), when both files are coming from "stdin" (which
> could be better described as the file's content being written directly
> into the file object), oideq() compares two null hashes and ignores the
> actual differences for the statistics.
>
> This patch checks if is_stdin flag is set on both sides and compare
> contents directly.
>
> Signed-off-by: Thomas Guyot-Sionnest <tguyot@gmail.com>
> ---
> Range-diff:
> 1:  479c2835fc ! 1:  1f25713d44 diff: Fix modified lines stats with --stat and --numstat
>     @@ -20,8 +20,12 @@
>       	}
>
>      -	same_contents = oideq(&one->oid, &two->oid);
>     ++	/* What is_stdin really means is that the file's content is only
>     ++	 * in the filespec's buffer and its oid is zero. We can't compare
>     ++	 * oid's if both are null and we can just diff the buffers */
>      +	if (one->is_stdin && two->is_stdin)
>     -+		same_contents = !strcmp(one->data, two->data);
>     ++		same_contents = (one->size == two->size ?
>     ++			!memcmp(one->data, two->data, one->size) : 0);
>      +	else
>      +		same_contents = oideq(&one->oid, &two->oid);

After reading your explanation in [1], this version makes more sense to
me.

Thanks.

[1]: https://lore.kernel.org/git/f4c4cb48-f4b5-3d4d-066d-b94e961dcbb5@gmail.com/

Taylor

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-20 15:39     ` Taylor Blau
@ 2020-09-20 16:38       ` Thomas Guyot
  2020-09-20 19:11       ` Junio C Hamano
  1 sibling, 0 replies; 50+ messages in thread
From: Thomas Guyot @ 2020-09-20 16:38 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dermoth, peff

On 2020-09-20 11:39, Taylor Blau wrote:
> On Sun, Sep 20, 2020 at 09:09:46AM -0400, Thomas Guyot-Sionnest wrote:
>> In builtin_diffstat(), when both files are coming from "stdin" (which
>> could be better described as the file's content being written directly
>> into the file object), oideq() compares two null hashes and ignores the
>> actual differences for the statistics.
>>
>> This patch checks if is_stdin flag is set on both sides and compare
>> contents directly.
>>
>> Signed-off-by: Thomas Guyot-Sionnest <tguyot@gmail.com>
>> ---
>> Range-diff:
>> 1:  479c2835fc ! 1:  1f25713d44 diff: Fix modified lines stats with --stat and --numstat
>>     @@ -20,8 +20,12 @@
>>       	}
>>
>>      -	same_contents = oideq(&one->oid, &two->oid);
>>     ++	/* What is_stdin really means is that the file's content is only
>>     ++	 * in the filespec's buffer and its oid is zero. We can't compare
>>     ++	 * oid's if both are null and we can just diff the buffers */
>>      +	if (one->is_stdin && two->is_stdin)
>>     -+		same_contents = !strcmp(one->data, two->data);
>>     ++		same_contents = (one->size == two->size ?
>>     ++			!memcmp(one->data, two->data, one->size) : 0);
>>      +	else
>>      +		same_contents = oideq(&one->oid, &two->oid);
> 
> After reading your explanation in [1], this version makes more sense to
> me.
> 
> Thanks.
> 
> [1]: https://lore.kernel.org/git/f4c4cb48-f4b5-3d4d-066d-b94e961dcbb5@gmail.com/

There's a little bit missing... Just before the new code example:,
previous to last paragraph:

> it's assumed [we can just call oidcmp()] and I won't make complex

--
Thomas

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-20 15:39     ` Taylor Blau
  2020-09-20 16:38       ` Thomas Guyot
@ 2020-09-20 19:11       ` Junio C Hamano
  2020-09-20 20:08         ` Junio C Hamano
                           ` (2 more replies)
  1 sibling, 3 replies; 50+ messages in thread
From: Junio C Hamano @ 2020-09-20 19:11 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Thomas Guyot-Sionnest, git, dermoth, peff

Taylor Blau <me@ttaylorr.com> writes:

> On Sun, Sep 20, 2020 at 09:09:46AM -0400, Thomas Guyot-Sionnest wrote:
>> In builtin_diffstat(), when both files are coming from "stdin" (which
>> could be better described as the file's content being written directly
>> into the file object), oideq() compares two null hashes and ignores the
>> actual differences for the statistics.
>>
>> This patch checks if is_stdin flag is set on both sides and compare
>> contents directly.
>>
>> Signed-off-by: Thomas Guyot-Sionnest <tguyot@gmail.com>
>> ---
>> Range-diff:
>> 1:  479c2835fc ! 1:  1f25713d44 diff: Fix modified lines stats with --stat and --numstat
>>     @@ -20,8 +20,12 @@
>>       	}
>>
>>      -	same_contents = oideq(&one->oid, &two->oid);
>>     ++	/* What is_stdin really means is that the file's content is only
>>     ++	 * in the filespec's buffer and its oid is zero. We can't compare
>>     ++	 * oid's if both are null and we can just diff the buffers */
>>      +	if (one->is_stdin && two->is_stdin)
>>     -+		same_contents = !strcmp(one->data, two->data);
>>     ++		same_contents = (one->size == two->size ?
>>     ++			!memcmp(one->data, two->data, one->size) : 0);
>>      +	else
>>      +		same_contents = oideq(&one->oid, &two->oid);
>
> After reading your explanation in [1], this version makes more sense to
> me.

These oid fields are prepared by calling diff_fill_oid_info(), and
even for paths that are dirty (hence no "free" oid available from
index or tree entry), an appropriate oid is computed.

But there are paths for which oid cannot be computed without
destroying their contents.  Such paths are marked by the function
with null_oid.

It happens to be that stdin is the only class of paths that are
treated that way _right_ _now_, but future code may support
different kind of paths that share the same trait.

When we want to know "is comparing the oid sufficient?", we
shouldn't inspect the is_stdin flag ourselves in a caller of
diff_fill_oid_info(), because the helper _is_ responsible for
knowing what kind of paths are special, and signals that "assume
this would not be equal to anything else" by giving null_oid back.

The caller should use the info left by diff_fill_oid_info(), namely,
"even if the oid on both sides are the same, if it is null_oid, then
we know diff_fill_oid_info() didn't actually compute the oid, and we
need to compare the blob ourselves".

And there is no point in doing memcmp() here, I think.  

The same_contents() check is done as an optimization to avoid xdl.
Even if the two sides were thought to be different at the oid level,
xdl comparison may find that there is no difference after all
(e.g. think of whitespace ignoring comparison), so we should assume
and rely on that the downstream code MUST BE prepared to handle
false negatives (i.e. same_contents says they are different, but
they actually produce no diffstat).  Running memcmp() over contents
in potentially a large buffer to find that they are different, and
then have xdl process that large buffer again, would be a waste.

Summarizing the above, I think the second best fix is this (which
means that the posted patch is the third best):

	/*
	 * diff_fill_oid_info() marked one/two->oid with null_oid
	 * for a path whose oid is not available.  Disable early
	 * return optimization for them.
	 */
	if (oideq(&one->oid, &null_oid) || oideq(&two->oid, &null_oid))
		same_contents = 0; /* could be different */
	else if (oideq(&one->oid, &two->oid))
		same_contents = 1; /* definitely the same */
	else
		same_contents = 0; /* definitely different */

But I suspect that the best fix is to teach diff_fill_oid_info() to
hash the in-memory data to compute the oid, instead of punting and
filling the oid field with null_oid.  If function builtin_diffstat()
is allowed to look at the contents and run memcmp() here, the 'data'
field should have been filled and valid when diff_fill_oid_info()
looked at it already.

The "best" fix will have wider consequences, so we may not want to
jump to it right away without careful consideration.

For example, the "best" fix will fix another bug.  The 'index'
header shows a NULL object name in normal "diff --patch" output for
these paths in the current code, which means they cannot be used
with "apply --3way".  That way, this codepath does not have to know
anything about the "null means we don't know" convention.  

Try:

    $ (cat COPYING; echo) >RENAMING
    $ git diff --no-index COPYING - <RENAMING | grep '^index '
    index 536e55524d..0000000000 100644

and notice that the stdin side has a null object name in the current
code.  I think we will show the right object name if we fix the
diff_fill_oid_info().

Thanks.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-20 19:11       ` Junio C Hamano
@ 2020-09-20 20:08         ` Junio C Hamano
  2020-09-20 20:36         ` Junio C Hamano
  2020-09-21 19:26         ` Jeff King
  2 siblings, 0 replies; 50+ messages in thread
From: Junio C Hamano @ 2020-09-20 20:08 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Thomas Guyot-Sionnest, git, dermoth, peff

Junio C Hamano <gitster@pobox.com> writes:

> But I suspect that the best fix is to teach diff_fill_oid_info() to
> hash the in-memory data to compute the oid, instead of punting and
> filling the oid field with null_oid.  If function builtin_diffstat()
> is allowed to look at the contents and run memcmp() here, the 'data'
> field should have been filled and valid when diff_fill_oid_info()
> looked at it already.
>
> The "best" fix will have wider consequences, so we may not want to
> jump to it right away without careful consideration.

And then after giving a bit more thought, I don't recommend to go
with this approach, because it breaks an established convention that
objects with unknown name is perfectly OK and shown with the null
oid.

In other words, I'd suggest to use the "second best" one I gave in
the message I am responding to.

Thanks.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-20 19:11       ` Junio C Hamano
  2020-09-20 20:08         ` Junio C Hamano
@ 2020-09-20 20:36         ` Junio C Hamano
  2020-09-20 22:15           ` Junio C Hamano
  2020-09-21 19:26         ` Jeff King
  2 siblings, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2020-09-20 20:36 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Thomas Guyot-Sionnest, git, dermoth, peff

Junio C Hamano <gitster@pobox.com> writes:

> Summarizing the above, I think the second best fix is this (which
> means that the posted patch is the third best):
>
> 	/*
> 	 * diff_fill_oid_info() marked one/two->oid with null_oid
> 	 * for a path whose oid is not available.  Disable early
> 	 * return optimization for them.
> 	 */
> 	if (oideq(&one->oid, &null_oid) || oideq(&two->oid, &null_oid))
> 		same_contents = 0; /* could be different */
> 	else if (oideq(&one->oid, &two->oid))
> 		same_contents = 1; /* definitely the same */
> 	else
> 		same_contents = 0; /* definitely different */

A tangent.

There is this code in diff.c::fill_metainfo() that is used to
populate the "index" header element of "diff --patch" output:

	if (one && two && !oideq(&one->oid, &two->oid)) {
		const unsigned hexsz = the_hash_algo->hexsz;
		int abbrev = o->abbrev ? o->abbrev : DEFAULT_ABBREV;

		if (o->flags.full_index)
			abbrev = hexsz;

		if (o->flags.binary) {
			mmfile_t mf;
			if ((!fill_mmfile(o->repo, &mf, one) &&
			     diff_filespec_is_binary(o->repo, one)) ||
			    (!fill_mmfile(o->repo, &mf, two) &&
			     diff_filespec_is_binary(o->repo, two)))
				abbrev = hexsz;
		}
		strbuf_addf(msg, "%s%sindex %s..%s", line_prefix, set,
			    diff_abbrev_oid(&one->oid, abbrev),
			    diff_abbrev_oid(&two->oid, abbrev));
		if (one->mode == two->mode)
			strbuf_addf(msg, " %06o", one->mode);
		strbuf_addf(msg, "%s\n", reset);
	}

Currently it is OK because there can only be one side that
diff_fill_oid_info() would mark as "oid unavailable" (e.g. reading
standard input stream).  If a new feature is introducing a situation
where both ends have null_oid, which was so far been impossible,
we'd probably need to factor out the condition used in the above
into a helper function, e.g.

    static int cannot_be_the_same(struct diff_filespec *one, struct diff_filespec *two)
    {
	if ((oideq(&one->oid, &null_oid) || oideq(&two->oid, &null_oid))
		return 1;
	else if (oideq(&one->oid, &two->oid))
		return 0;
	else
		return 1;
    }

and rewrite the conditional in fill_metainfo() to

	if (one && two && cannot_be_the_same(one, two)) {
		...

The "second best fix" could then become a single liner:

	same_contents = !cannot_be_the_same(one, two);

using the helper.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-20 20:36         ` Junio C Hamano
@ 2020-09-20 22:15           ` Junio C Hamano
  0 siblings, 0 replies; 50+ messages in thread
From: Junio C Hamano @ 2020-09-20 22:15 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Thomas Guyot-Sionnest, git, dermoth, peff

Junio C Hamano <gitster@pobox.com> writes:

Sorry, this will be the last message from me on this topic for now.

> we'd probably need to factor out the condition used in the above
> into a helper function, e.g.
>
>     static int cannot_be_the_same(struct diff_filespec *one, struct diff_filespec *two)

The naming of this helper is tricky.  In both potential callers,
what we want to see is "one and two may be different, we cannot say
they are the same with certainty", so "cannot be the same" is a
misnomer.  Worse, the negated form is hard to grok.

Perhaps "may_differ()" is a more correct name.  If either side is
NULL oid, we cannot say they are the same, so it is true.  If two
oid that are not NULL oid are the same, there is no possibility that
they are different, so we return false.  And two oid that are not
NULL oid are different, we know they are different, so we return
true.

>     {
> 	if ((oideq(&one->oid, &null_oid) || oideq(&two->oid, &null_oid))
> 		return 1;
> 	else if (oideq(&one->oid, &two->oid))
> 		return 0;
> 	else
> 		return 1;
>     }
>
> and rewrite the conditional in fill_metainfo() to
>
> 	if (one && two && cannot_be_the_same(one, two)) {
> 		...

And this becomes much easier to read and understand, i.e.

	if (one && two && may_differ(one, two)) {
		... create the index one->oid..two->oid header ...

> The "second best fix" could then become a single liner:
>
> 	same_contents = !cannot_be_the_same(one, two);
>
> using the helper.

And this becomes

	same_contents = !may_differ(one, two);

meaning that "there is no possibility that one and two are
different".  That allows us to optimize out the invocation of the
xdl machinery.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-20 19:11       ` Junio C Hamano
  2020-09-20 20:08         ` Junio C Hamano
  2020-09-20 20:36         ` Junio C Hamano
@ 2020-09-21 19:26         ` Jeff King
  2020-09-21 21:51           ` Junio C Hamano
  2 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2020-09-21 19:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Thomas Guyot-Sionnest, git, dermoth

On Sun, Sep 20, 2020 at 12:11:20PM -0700, Junio C Hamano wrote:

> > After reading your explanation in [1], this version makes more sense to
> > me.
> 
> These oid fields are prepared by calling diff_fill_oid_info(), and
> even for paths that are dirty (hence no "free" oid available from
> index or tree entry), an appropriate oid is computed.

This is the part that confused me earlier. I expected these "stdin"
entries, just like other some other entries (e.g., stat dirty ones for
diff-files, or anything for "diff --no-index") to have bogus oids.

But that diff_fill_oid_info() is what actually computes the sha1 from
scratch for them. I get why that is needed for generating a git diff, as
we have an "index from...to" line there that we'd want to fill.

For diffstat, though, it seems like a waste of time; we don't care what
the object hash is. I.e., if we were to do this:

diff --git a/diff.c b/diff.c
index 16eeaf6645..1934af29a5 100644
--- a/diff.c
+++ b/diff.c
@@ -4564,9 +4564,6 @@ static void run_diffstat(struct diff_filepair *p, struct diff_options *o,
 	if (o->prefix_length)
 		strip_prefix(o->prefix_length, &name, &other);

-	diff_fill_oid_info(p->one, o->repo->index);
-	diff_fill_oid_info(p->two, o->repo->index);
-
 	builtin_diffstat(name, other, p->one, p->two,
 			 diffstat, o, p);
 }

then everything seems to work fine _except_ a "git diff --stat
--no-index", exactly because it hits this "same_contents" check we've
been discussing. And once that is fixed properly (to handle any case
where we have no oid, not just when the stdin flag is set), then perhaps
it is worth doing.

Or perhaps not. Even if we have to memcmp sometimes in
builtin_diffstat(), it would be faster than computing the individual
hashes. But it may not be measurably so, and it would be no difference
for the common case of filespecs for which we do know the oid for free.
I also suspect we'd need to be a little smarter about combined formats
(e.g., "--stat --patch" might as well compute the oid as early as
possible, since we'll need it eventually for the patch; but we'd hit the
call in builtin_diffstat() before the one in run_diff()).

> But there are paths for which oid cannot be computed without
> destroying their contents.  Such paths are marked by the function
> with null_oid.

I'm not clear how computing the oid destroys the contents. We have them
in an in-memory buffer at this point, don't we? So we _could_ generate
an oid even for stdin, like this:

diff --git a/cache.h b/cache.h
index 55d7f61087..1ace143eac 100644
--- a/cache.h
+++ b/cache.h
@@ -858,6 +858,7 @@ int ie_modified(struct index_state *, const struct cache_entry *, struct stat *,
 #define HASH_RENORMALIZE  4
 int index_fd(struct index_state *istate, struct object_id *oid, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
 int index_path(struct index_state *istate, struct object_id *oid, const char *path, struct stat *st, unsigned flags);
+int index_mem(struct index_state *istate, struct object_id *oid, void *buf, size_t size, enum object_type type, const char *path, unsigned flags);

 /*
  * Record to sd the data from st that we use to check whether a file
diff --git a/diff.c b/diff.c
index 16eeaf6645..181b632114 100644
--- a/diff.c
+++ b/diff.c
@@ -4463,7 +4463,10 @@ static void diff_fill_oid_info(struct diff_filespec *one, struct index_state *is
 		if (!one->oid_valid) {
 			struct stat st;
 			if (one->is_stdin) {
-				oidclr(&one->oid);
+				if (index_mem(istate, &one->oid,
+					      one->data, one->size,
+					      OBJ_BLOB, one->path, 0))
+					die("cannot hash diff file from stdin");
 				return;
 			}
 			if (lstat(one->path, &st) < 0)
diff --git a/sha1-file.c b/sha1-file.c
index 770501d6d1..c7d017b3e0 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -2046,10 +2046,10 @@ static void check_tag(const void *buf, size_t size)
 		die(_("corrupt tag"));
 }

-static int index_mem(struct index_state *istate,
-		     struct object_id *oid, void *buf, size_t size,
-		     enum object_type type,
-		     const char *path, unsigned flags)
+int index_mem(struct index_state *istate,
+	      struct object_id *oid, void *buf, size_t size,
+	      enum object_type type,
+	      const char *path, unsigned flags)
 {
 	int ret, re_allocated = 0;
 	int write_object = flags & HASH_WRITE_OBJECT;

which is basically your "best fix" from below. It fixes the bug here,
and it gives you a non-null index line. I'd consider coupling it with
calling fill_oid less often, though (something like range-diff computes
a bunch of fake-stdin diffs, and doesn't need to waste time computing
the oids at all).

> Summarizing the above, I think the second best fix is this (which
> means that the posted patch is the third best):
> 
> 	/*
> 	 * diff_fill_oid_info() marked one/two->oid with null_oid
> 	 * for a path whose oid is not available.  Disable early
> 	 * return optimization for them.
> 	 */
> 	if (oideq(&one->oid, &null_oid) || oideq(&two->oid, &null_oid))
> 		same_contents = 0; /* could be different */
> 	else if (oideq(&one->oid, &two->oid))
> 		same_contents = 1; /* definitely the same */
> 	else
> 		same_contents = 0; /* definitely different */

This is the direction I was getting at in my earlier emails, except that
I imagined that first conditional could be checking:

  if (!one->oid_valid || !two->oid_valid)

but I was surprised to see that diff_fill_oid_info() does not set
oid_valid. Is that a bug?

I also imagined that we'd have to determine right then whether the
contents are actually different or not with a memcmp(), to avoid
emitting a "0 changes" line, but we do handle that case within the
"!same_contents" conditional. See the comment starting with "Omit
diffstats..." added recently by 1cf3d5db9b (diff: teach --stat to ignore
uninteresting modifications, 2020-08-20).

-Peff

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index
  2020-09-20 12:54             ` Thomas Guyot
@ 2020-09-21 19:31               ` Jeff King
  2020-09-21 20:14                 ` Junio C Hamano
  0 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2020-09-21 19:31 UTC (permalink / raw)
  To: Thomas Guyot
  Cc: Junio C Hamano, brian m. carlson, Taylor Blau, git,
	Thomas Guyot-Sionnest

On Sun, Sep 20, 2020 at 08:54:53AM -0400, Thomas Guyot wrote:

> Considering the issue with MacOS I'm starting to think the best solution
> is to not use any heuristic and read passed-in files directly. That
> said, I don't think it makes much change either way (if I resurrect
> Brian's patch is will probably end up being a hybrid between the two as
> both read the pipe at the same place and my approach was simpler further
> down).
> 
> I'm not sure which way I prefer to start first - will you accept a patch
> that reads passed in files as-is if I I start with this one?

I think the ideal is:

  - implement a command-line option to read the content of paths on the
    command-line literally (i.e., reading from pipes, dereferencing
    symlinks, etc)

  - make sure we have the inverse option (which you should get for free
    in step 1 if you use parse_options)

  - flip the default to do literal reads

We'd sometimes wait several versions before that last step to give
people time to adjust scripts, etc. But in this case, I suspect it would
be OK to just flip it immediately. We don't consider "git diff" itself
part of the stable plumbing, and the --no-index part of it I would
consider even less stable. And AFAICT most people consider the current
behavior a bug because it doesn't behave like other diff tools.

-Peff

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index
  2020-09-21 19:31               ` Jeff King
@ 2020-09-21 20:14                 ` Junio C Hamano
  0 siblings, 0 replies; 50+ messages in thread
From: Junio C Hamano @ 2020-09-21 20:14 UTC (permalink / raw)
  To: Jeff King
  Cc: Thomas Guyot, brian m. carlson, Taylor Blau, git,
	Thomas Guyot-Sionnest

Jeff King <peff@peff.net> writes:

> We'd sometimes wait several versions before that last step to give
> people time to adjust scripts, etc. But in this case, I suspect it would
> be OK to just flip it immediately. We don't consider "git diff" itself
> part of the stable plumbing, and the --no-index part of it I would
> consider even less stable. And AFAICT most people consider the current
> behavior a bug because it doesn't behave like other diff tools.

The "git diff" proper gets no filenames from the command line and
the above strictly applies only to the no-index mode, with or
without the explicit "--no-index" option.

It was a way to give Git niceties like colored diffs, renames,
etc. to non-Git managed two sets of paths, and the primary reason
why we have it as a mode of "git diff" is because we chose not to
bother with interacting with upstream maintainers of "diff".  In an
ideal world, "GNU diff" and others would have learned things like
renames, word diffs, etc., instead of "git diff" adding "--no-index"
mode.

And that makes me agree that users expect "git diff --no-index" to
behave like other people's diff and that is more important than
behaving like "git diff" for that mode.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-21 19:26         ` Jeff King
@ 2020-09-21 21:51           ` Junio C Hamano
  2020-09-21 22:20             ` Jeff King
  0 siblings, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2020-09-21 21:51 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, Thomas Guyot-Sionnest, git, dermoth

Jeff King <peff@peff.net> writes:

> For diffstat, though, it seems like a waste of time; we don't care what
> the object hash is. I.e., if we were to do this:
>
> diff --git a/diff.c b/diff.c
> index 16eeaf6645..1934af29a5 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -4564,9 +4564,6 @@ static void run_diffstat(struct diff_filepair *p, struct diff_options *o,
>  	if (o->prefix_length)
>  		strip_prefix(o->prefix_length, &name, &other);
>  
> -	diff_fill_oid_info(p->one, o->repo->index);
> -	diff_fill_oid_info(p->two, o->repo->index);
> -
>  	builtin_diffstat(name, other, p->one, p->two,
>  			 diffstat, o, p);
>  }
>
> then everything seems to work fine _except_ a "git diff --stat
> --no-index", exactly because it hits this "same_contents" check we've
> been discussing. And once that is fixed properly (to handle any case
> where we have no oid, not just when the stdin flag is set), then perhaps
> it is worth doing.

> Or perhaps not. Even if we have to memcmp sometimes in
> builtin_diffstat(), it would be faster than computing the individual
> hashes. But it may not be measurably so, and it would be no difference
> for the common case of filespecs for which we do know the oid for free.
> I also suspect we'd need to be a little smarter about combined formats
> (e.g., "--stat --patch" might as well compute the oid as early as
> possible, since we'll need it eventually for the patch; but we'd hit the
> call in builtin_diffstat() before the one in run_diff()).
>
>> But there are paths for which oid cannot be computed without
>> destroying their contents.  Such paths are marked by the function
>> with null_oid.
>
> I'm not clear how computing the oid destroys the contents. We have them
> in an in-memory buffer at this point, don't we? So we _could_ generate
> an oid even for stdin, like this:

Yes, yes yes.  That is the "best" (which later retracted) approach I
suggested in the same message, but it would end up filling a
real-looking object name for working tree side of diff-files, which
has a far larger consequence we need to think about and consumes
more brain cycles than warranted here, I would think.

>> Summarizing the above, I think the second best fix is this (which
>> means that the posted patch is the third best):
>> 
>> 	/*
>> 	 * diff_fill_oid_info() marked one/two->oid with null_oid
>> 	 * for a path whose oid is not available.  Disable early
>> 	 * return optimization for them.
>> 	 */
>> 	if (oideq(&one->oid, &null_oid) || oideq(&two->oid, &null_oid))
>> 		same_contents = 0; /* could be different */
>> 	else if (oideq(&one->oid, &two->oid))
>> 		same_contents = 1; /* definitely the same */
>> 	else
>> 		same_contents = 0; /* definitely different */
>
> This is the direction I was getting at in my earlier emails, except that
> I imagined that first conditional could be checking:
>
>   if (!one->oid_valid || !two->oid_valid)
>
> but I was surprised to see that diff_fill_oid_info() does not set
> oid_valid. Is that a bug?

I do not think so.  oid_valid refers to the state during the
collection phase (those who called diff_addremove() etc.) and
updating it in diff_fill_oid_info() would lose information.  Maybe
nobody looks at the bit at this late in the processing chain these
days, in which case we can start flipping the bit there, but I
offhand do not know what consequences such a change would trigger.

> I also imagined that we'd have to determine right then whether the
> contents are actually different or not with a memcmp(), to avoid
> emitting a "0 changes" line, but we do handle that case within the
> "!same_contents" conditional. See the comment starting with "Omit
> diffstats..." added recently by 1cf3d5db9b (diff: teach --stat to ignore
> uninteresting modifications, 2020-08-20).

Yes, we are essentially on the same page---same_contents bit is
merely an optimization to decide cheaply when we do not have to do
xdl, but the codepath that does the xdl must be prepared to deal
with the "we thought they are different, but after all they turn out
to be equivalent" case.  Therefore false positive to declare two
different things as same cannot be tolerated, but false negative to
declare two things that are the same as !same_contents is fine.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-21 21:51           ` Junio C Hamano
@ 2020-09-21 22:20             ` Jeff King
  2020-09-21 22:37               ` Junio C Hamano
  0 siblings, 1 reply; 50+ messages in thread
From: Jeff King @ 2020-09-21 22:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Thomas Guyot-Sionnest, git, dermoth

On Mon, Sep 21, 2020 at 02:51:21PM -0700, Junio C Hamano wrote:

> > This is the direction I was getting at in my earlier emails, except that
> > I imagined that first conditional could be checking:
> >
> >   if (!one->oid_valid || !two->oid_valid)
> >
> > but I was surprised to see that diff_fill_oid_info() does not set
> > oid_valid. Is that a bug?
> 
> I do not think so.  oid_valid refers to the state during the
> collection phase (those who called diff_addremove() etc.) and
> updating it in diff_fill_oid_info() would lose information.  Maybe
> nobody looks at the bit at this late in the processing chain these
> days, in which case we can start flipping the bit there, but I
> offhand do not know what consequences such a change would trigger.

We use the flag to determine whether we need to compute the oid from
scratch. So I would think the current code causes us to compute the oid
multiple times in many cases. For example, with this patch:

diff --git a/diff.c b/diff.c
index ee8e8189e9..8363abab5b 100644
--- a/diff.c
+++ b/diff.c
@@ -4424,6 +4424,8 @@ static void diff_fill_oid_info(struct diff_filespec *one, struct index_state *is
 				die_errno("stat '%s'", one->path);
 			if (index_path(istate, &one->oid, one->path, &st, 0))
 				die("cannot hash %s", one->path);
+			warning("computed oid of %s as %s",
+				one->path, oid_to_hex(&one->oid));
 		}
 	}
 	else

I get (because diff.c is dirty in my working tree due to the patch):

  $ ./git diff --stat -p
  warning: computed oid of diff.c as 8363abab5b51479ac8cc9fb1c96b39fb90041f88
   diff.c | 2 ++
   1 file changed, 2 insertions(+)
  
  warning: computed oid of diff.c as 8363abab5b51479ac8cc9fb1c96b39fb90041f88
  diff --git a/diff.c b/diff.c
  index ee8e8189e9..8363abab5b 100644
  --- a/diff.c
  +++ b/diff.c
  @@ -4424,6 +4424,8 @@ static void diff_fill_oid_info(struct diff_filespec *one, struct index_state *is
   				die_errno("stat '%s'", one->path);
   			if (index_path(istate, &one->oid, one->path, &st, 0))
   				die("cannot hash %s", one->path);
  +			warning("computed oid of %s as %s",
  +				one->path, oid_to_hex(&one->oid));
   		}
   	}
   	else

even though we already know the oid in the second call, so it's wasted
work. I agree that other code could be depending on oid_valid in a weird
way, but IMHO that code is probably wrong to do so. But it may not be
worth digging into, if nobody has complained about the waste.

> > I also imagined that we'd have to determine right then whether the
> > contents are actually different or not with a memcmp(), to avoid
> > emitting a "0 changes" line, but we do handle that case within the
> > "!same_contents" conditional. See the comment starting with "Omit
> > diffstats..." added recently by 1cf3d5db9b (diff: teach --stat to ignore
> > uninteresting modifications, 2020-08-20).
> 
> Yes, we are essentially on the same page---same_contents bit is
> merely an optimization to decide cheaply when we do not have to do
> xdl, but the codepath that does the xdl must be prepared to deal
> with the "we thought they are different, but after all they turn out
> to be equivalent" case.  Therefore false positive to declare two
> different things as same cannot be tolerated, but false negative to
> declare two things that are the same as !same_contents is fine.

I thought it may matter on "maint", where we do not have 1cf3d5db9b.
I.e., I expected:

  echo foo >a
  echo foo >b
  git diff --no-index --stat a b

might switch from no output to having a line like:

  a => b | 0

But we don't even get to builtin_diffstat() there. We throw out the pair
in diffcore_skip_stat_unmatch(). Likewise, if you get past that with
something like a mode change:

  chmod +x b
  git diff --no-index --stat a b

then that does generate the "0" stat line. But it does so both before
and after the proposed change. The same thing happens in no-index mode:

  git init
  echo foo >file
  git add .
  git commit -am no-bit
  chmod +x file
  git commit -am exec-bit
  git show --stat

will give you:

   file | 0

I'm not sure if that's the desired behavior or not, but at any rate
fixing this builtin_diffstat() conditional won't change it either way. :)

-Peff

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-21 22:20             ` Jeff King
@ 2020-09-21 22:37               ` Junio C Hamano
  0 siblings, 0 replies; 50+ messages in thread
From: Junio C Hamano @ 2020-09-21 22:37 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, Thomas Guyot-Sionnest, git, dermoth

Jeff King <peff@peff.net> writes:

> .... I agree that other code could be depending on oid_valid in a weird
> way, but IMHO that code is probably wrong to do so. But it may not be
> worth digging into, if nobody has complained about the waste.

Yup, that was my feeling when I wrote the message you are responding
to.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-18 17:27   ` Jeff King
  2020-09-18 17:52     ` Thomas Guyot-Sionnest
@ 2020-09-23 15:05     ` Johannes Schindelin
  1 sibling, 0 replies; 50+ messages in thread
From: Johannes Schindelin @ 2020-09-23 15:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Thomas Guyot-Sionnest, git, dermoth

Hi Peff,

On Fri, 18 Sep 2020, Jeff King wrote:

> I also wonder whether range-diff ought to be using that flag
> [`oid_valid`] instead of is_stdin.

From `diffcore.h`:

        unsigned oid_valid : 1;  /* if true, use oid and trust mode;
                                  * if false, use the name and read from
                                  * the filesystem.
                                  */

That description leads me to believe that `oid_valid` cannot be used here:
we do _not_ want to read any data from the file system in `range-diff.c`'s
`get_filespec()` function; Instead, we want to use the data provided via
the function parameter `p`.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-18 18:06       ` Junio C Hamano
@ 2020-09-23 19:16         ` Johannes Schindelin
  2020-09-23 19:23           ` Junio C Hamano
  0 siblings, 1 reply; 50+ messages in thread
From: Johannes Schindelin @ 2020-09-23 19:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Thomas Guyot-Sionnest, Jeff King, git, Thomas Guyot-Sionnest

Hi,

On Fri, 18 Sep 2020, Junio C Hamano wrote:

> Thomas Guyot-Sionnest <tguyot@gmail.com> writes:
>
> >> > -     same_contents = oideq(&one->oid, &two->oid);
> >> > +     if (one->is_stdin && two->is_stdin)
> >> > +             same_contents = !strcmp(one->data, two->data);
> >> > +     else
> >> > +             same_contents = oideq(&one->oid, &two->oid);
> >>
> >> ...should this actually be checking the oid_valid flag in each filespec?
> >> That would presumably cover the is_stdin case, too. I also wonder
> >> whether range-diff ought to be using that flag instead of is_stdin.
> >
> > I considered that, but IIRC when run under a debugger oid_valid was
> > set to 0 - it seemed to be used for something different that i'm not
> > familiar with, maybe it's an indication the object is in git datastore
> > (whereas with --no-index outside files will only be hashed for
> > comparison).
>
> If it says !oid_valid, I think you are getting what you do want.

I suspect the same.

> The contents from the outside world, be it what was read from the
> standard input or a pipe, a regular file that is not up-to-date with
> the index, may not have a usable oid computed for it, and oid_valid
> being false signals you that you need byte-for-byte comparison.  As
> suggested by Peff in another message, you can take that signal and
> compare the size and then the contents with memcmp() to see if they
> are the same.

To complete the information: `struct diff_filespec`'s first attribute is
`oid`, the object ID of the data. If it is left uninitialized (as is the
case in `range-diff`'s case), `oid_valid` has to be 0 to prevent it from
being used.

I believe that that is exactly the reason why we want this:

-	same_contents = oideq(&one->oid, &two->oid);
+	same_contents = one->oid_valid && two->oid_valid ?
		oideq(&one->oid, &two->oid) : !strcmp(one->data, two->data);

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-23 19:16         ` Johannes Schindelin
@ 2020-09-23 19:23           ` Junio C Hamano
  2020-09-23 20:44             ` Johannes Schindelin
  0 siblings, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2020-09-23 19:23 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Thomas Guyot-Sionnest, Jeff King, git, Thomas Guyot-Sionnest

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> I believe that that is exactly the reason why we want this:
>
> -	same_contents = oideq(&one->oid, &two->oid);
> +	same_contents = one->oid_valid && two->oid_valid ?
> 		oideq(&one->oid, &two->oid) : !strcmp(one->data, two->data);

Not quite.  The other side should either be

	one->size == two->size && !memcmp(...)

or just left to false, as the downstream code must be prepared for
same_contents being false even when one and two turns out to be
not-byte-for-byte-same but equivalent anyway.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-23 19:23           ` Junio C Hamano
@ 2020-09-23 20:44             ` Johannes Schindelin
  2020-09-24  4:49               ` Thomas Guyot
  0 siblings, 1 reply; 50+ messages in thread
From: Johannes Schindelin @ 2020-09-23 20:44 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Thomas Guyot-Sionnest, Jeff King, git, Thomas Guyot-Sionnest

Hi Junio,

On Wed, 23 Sep 2020, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > I believe that that is exactly the reason why we want this:
> >
> > -	same_contents = oideq(&one->oid, &two->oid);
> > +	same_contents = one->oid_valid && two->oid_valid ?
> > 		oideq(&one->oid, &two->oid) : !strcmp(one->data, two->data);
>
> Not quite.  The other side should either be
>
> 	one->size == two->size && !memcmp(...)

Right!

Thank you for correcting my mistake,
Dscho

>
> or just left to false, as the downstream code must be prepared for
> same_contents being false even when one and two turns out to be
> not-byte-for-byte-same but equivalent anyway.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-23 20:44             ` Johannes Schindelin
@ 2020-09-24  4:49               ` Thomas Guyot
  2020-09-24  5:24                 ` [PATCH v3] " Thomas Guyot-Sionnest
  2020-09-24  6:40                 ` [PATCH 1/2] " Junio C Hamano
  0 siblings, 2 replies; 50+ messages in thread
From: Thomas Guyot @ 2020-09-24  4:49 UTC (permalink / raw)
  To: Johannes Schindelin, Junio C Hamano; +Cc: Jeff King, git, Thomas Guyot-Sionnest

Hi,

On 2020-09-23 16:44, Johannes Schindelin wrote:
> Hi Junio,
> 
> On Wed, 23 Sep 2020, Junio C Hamano wrote:
> 
>> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>>
>>> I believe that that is exactly the reason why we want this:
>>>
>>> -	same_contents = oideq(&one->oid, &two->oid);
>>> +	same_contents = one->oid_valid && two->oid_valid ?
>>> 		oideq(&one->oid, &two->oid) : !strcmp(one->data, two->data);
>>
>> Not quite.  The other side should either be
>>
>> 	one->size == two->size && !memcmp(...)

Thanks for all the feedback, that was enlightening. Although I have been
silent the past few days I watched this thread with interest.

So as Junio pointed out this is merely an optimization - the range-diff
test that I corrected also showed two 0-line diffs and I realized
there's a block further down that should explicitly removes them, under

    else if (!same_contents) {

We can even remove same_contents entirely and everything work just fine
after adjusting the range-diff test - the logic is correct and
underlying functions already DTRT.

My next patch simplifies the test down to:

    same_contents = one->oid_valid && two->oid_valid &&
        oideq(&one->oid, &two->oid);

My understanding is that oid_valid will never be true for a modified
(even mode change) or out of tree file so it's a valid assumption.

I'll also rename that variable to "same_oid" - the current name is
misleading both ways (true doesn't means there will be diffs, false
doesn't mean contents differs).

Regards,

Thomas

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3] diff: Fix modified lines stats with --stat and --numstat
  2020-09-24  4:49               ` Thomas Guyot
@ 2020-09-24  5:24                 ` Thomas Guyot-Sionnest
  2020-09-24  7:41                   ` [PATCH v4] " Thomas Guyot-Sionnest
  2020-09-24  6:40                 ` [PATCH 1/2] " Junio C Hamano
  1 sibling, 1 reply; 50+ messages in thread
From: Thomas Guyot-Sionnest @ 2020-09-24  5:24 UTC (permalink / raw)
  To: git; +Cc: dermoth, me, gitster, Johannes.Schindelin, peff,
	Thomas Guyot-Sionnest

Only skip diffstats when both oids are valid and identical. This check
was causing both false-positives (files included in diffstats with no
actual changes (0 lines modified) and false-negatives (showing 0 lines
modified in stats when files had actually changed).

Also renamed same_contents to same_file to avoid confusion.

Signed-off-by: Thomas Guyot-Sionnest <tguyot@gmail.com>
---
Interdiff:
  diff --git a/diff.c b/diff.c
  index 2e47bf824e..77e0bd772e 100644
  --- a/diff.c
  +++ b/diff.c
  @@ -3663,7 +3663,7 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
   {
   	mmfile_t mf1, mf2;
   	struct diffstat_file *data;
  -	int same_contents;
  +	int same_file;
   	int complete_rewrite = 0;
   
   	if (!DIFF_PAIR_UNMERGED(p)) {
  @@ -3681,19 +3681,14 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
   		return;
   	}
   
  -	/* What is_stdin really means is that the file's content is only
  -	 * in the filespec's buffer and its oid is zero. We can't compare
  -	 * oid's if both are null and we can just diff the buffers */
  -	if (one->is_stdin && two->is_stdin)
  -		same_contents = (one->size == two->size ?
  -			!memcmp(one->data, two->data, one->size) : 0);
  -	else
  -		same_contents = oideq(&one->oid, &two->oid);
  +	/* saves some reads if true, not a guarantee of diff outcome */
  +	same_file = one->oid_valid && two->oid_valid &&
  +		oideq(&one->oid, &two->oid);
   
   	if (diff_filespec_is_binary(o->repo, one) ||
   	    diff_filespec_is_binary(o->repo, two)) {
   		data->is_binary = 1;
  -		if (same_contents) {
  +		if (same_file) {
   			data->added = 0;
   			data->deleted = 0;
   		} else {
  @@ -3709,7 +3704,7 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
   		data->added = count_lines(two->data, two->size);
   	}
   
  -	else if (!same_contents) {
  +	else if (!same_file) {
   		/* Crazy xdl interfaces.. */
   		xpparam_t xpp;
   		xdemitconf_t xecfg;
  @@ -3734,7 +3729,7 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
   				diffstat->files[diffstat->nr - 1];
   			/*
   			 * Omit diffstats of modified files where nothing changed.
  -			 * Even if !same_contents, this might be the case due to
  +			 * Even if !same_file, this might be the case due to
   			 * ignoring whitespace changes, etc.
   			 *
   			 * But note that we special-case additions, deletions,
  diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
  index 4715e75b68..6eb344be03 100755
  --- a/t/t3206-range-diff.sh
  +++ b/t/t3206-range-diff.sh
  @@ -252,11 +252,7 @@ test_expect_success 'changed commit with --stat diff option' '
   	git range-diff --no-color --stat topic...changed >actual &&
   	cat >expect <<-EOF &&
   	1:  $(test_oid t1) = 1:  $(test_oid c1) s/5/A/
  -	     a => b | 0
  -	     1 file changed, 0 insertions(+), 0 deletions(-)
   	2:  $(test_oid t2) = 2:  $(test_oid c2) s/4/A/
  -	     a => b | 0
  -	     1 file changed, 0 insertions(+), 0 deletions(-)
   	3:  $(test_oid t3) ! 3:  $(test_oid c3) s/11/B/
   	     a => b | 2 +-
   	     1 file changed, 1 insertion(+), 1 deletion(-)

 diff.c                | 12 +++++++-----
 t/t3206-range-diff.sh | 12 ++++--------
 2 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/diff.c b/diff.c
index ee8e8189e9..77e0bd772e 100644
--- a/diff.c
+++ b/diff.c
@@ -3663,7 +3663,7 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
 {
 	mmfile_t mf1, mf2;
 	struct diffstat_file *data;
-	int same_contents;
+	int same_file;
 	int complete_rewrite = 0;
 
 	if (!DIFF_PAIR_UNMERGED(p)) {
@@ -3681,12 +3681,14 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
 		return;
 	}
 
-	same_contents = oideq(&one->oid, &two->oid);
+	/* saves some reads if true, not a guarantee of diff outcome */
+	same_file = one->oid_valid && two->oid_valid &&
+		oideq(&one->oid, &two->oid);
 
 	if (diff_filespec_is_binary(o->repo, one) ||
 	    diff_filespec_is_binary(o->repo, two)) {
 		data->is_binary = 1;
-		if (same_contents) {
+		if (same_file) {
 			data->added = 0;
 			data->deleted = 0;
 		} else {
@@ -3702,7 +3704,7 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
 		data->added = count_lines(two->data, two->size);
 	}
 
-	else if (!same_contents) {
+	else if (!same_file) {
 		/* Crazy xdl interfaces.. */
 		xpparam_t xpp;
 		xdemitconf_t xecfg;
@@ -3727,7 +3729,7 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
 				diffstat->files[diffstat->nr - 1];
 			/*
 			 * Omit diffstats of modified files where nothing changed.
-			 * Even if !same_contents, this might be the case due to
+			 * Even if !same_file, this might be the case due to
 			 * ignoring whitespace changes, etc.
 			 *
 			 * But note that we special-case additions, deletions,
diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
index e024cff65c..6eb344be03 100755
--- a/t/t3206-range-diff.sh
+++ b/t/t3206-range-diff.sh
@@ -252,17 +252,13 @@ test_expect_success 'changed commit with --stat diff option' '
 	git range-diff --no-color --stat topic...changed >actual &&
 	cat >expect <<-EOF &&
 	1:  $(test_oid t1) = 1:  $(test_oid c1) s/5/A/
-	     a => b | 0
-	     1 file changed, 0 insertions(+), 0 deletions(-)
 	2:  $(test_oid t2) = 2:  $(test_oid c2) s/4/A/
-	     a => b | 0
-	     1 file changed, 0 insertions(+), 0 deletions(-)
 	3:  $(test_oid t3) ! 3:  $(test_oid c3) s/11/B/
-	     a => b | 0
-	     1 file changed, 0 insertions(+), 0 deletions(-)
+	     a => b | 2 +-
+	     1 file changed, 1 insertion(+), 1 deletion(-)
 	4:  $(test_oid t4) ! 4:  $(test_oid c4) s/12/B/
-	     a => b | 0
-	     1 file changed, 0 insertions(+), 0 deletions(-)
+	     a => b | 2 +-
+	     1 file changed, 1 insertion(+), 1 deletion(-)
 	EOF
 	test_cmp expect actual
 '
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-24  4:49               ` Thomas Guyot
  2020-09-24  5:24                 ` [PATCH v3] " Thomas Guyot-Sionnest
@ 2020-09-24  6:40                 ` Junio C Hamano
  2020-09-24  7:13                   ` Thomas Guyot
  1 sibling, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2020-09-24  6:40 UTC (permalink / raw)
  To: Thomas Guyot; +Cc: Johannes Schindelin, Jeff King, git, Thomas Guyot-Sionnest

Thomas Guyot <tguyot@gmail.com> writes:

> My next patch simplifies the test down to:
>
>     same_contents = one->oid_valid && two->oid_valid &&
>         oideq(&one->oid, &two->oid);
>
> My understanding is that oid_valid will never be true for a modified
> (even mode change) or out of tree file so it's a valid assumption.
>
> I'll also rename that variable to "same_oid" - the current name is
> misleading both ways (true doesn't means there will be diffs, false
> doesn't mean contents differs).

It is not "both ways", I think.  The idea is that when this variable
is true, we know with certainty that these two are the same, but
even when the variable is false, they still can be the same.  So
true does mean there will not be diff.  False indeed is fuzzy.

And as long as one side gives a 100% correct answer cheaply, we can
use it as an optimization (and 'true' being that side in this case).

I have a mild suspicion that the name same_anything conveys a wrong
impression, no matter what word you use for <anything>.  It does not
capture that we are saying the "true" side has no false positive.

And that is why I alluded to "may_differ" earlier (with opposite
polarity).  The flow would become:

    may_differ = !one->oid_valid || !two->oid_valid || !oideq(...);

    if (binary) {
        if (!may_differ) {
            added = deleted = 0;
            ...
        } else {
            ... count added and deleted ...
        }
    } else if (rewrite) {
	...
    } else if (may_differ) {
	... use xdl ...
    }

and it would become quite straight-forward to follow.  When there is
no chance that they may be different, we short-cut and otherwise we
compute without cheating.  Only when they can be different, we do
the expensive xdl thing.

Thanks.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-24  6:40                 ` [PATCH 1/2] " Junio C Hamano
@ 2020-09-24  7:13                   ` Thomas Guyot
  2020-09-24 17:19                     ` Junio C Hamano
  0 siblings, 1 reply; 50+ messages in thread
From: Thomas Guyot @ 2020-09-24  7:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, Jeff King, git, Thomas Guyot-Sionnest

Hi Junio,

On 2020-09-24 02:40, Junio C Hamano wrote:
> Thomas Guyot <tguyot@gmail.com> writes:
> 
> It is not "both ways", I think.  The idea is that when this variable
> is true, we know with certainty that these two are the same, but
> even when the variable is false, they still can be the same.  So
> true does mean there will not be diff.  False indeed is fuzzy.

I meant to say the old behavior "lied" in both directions.

> And as long as one side gives a 100% correct answer cheaply, we can
> use it as an optimization (and 'true' being that side in this case).
> 
> I have a mild suspicion that the name same_anything conveys a wrong
> impression, no matter what word you use for <anything>.  It does not
> capture that we are saying the "true" side has no false positive.
> 
> And that is why I alluded to "may_differ" earlier (with opposite
> polarity).  The flow would become:
> 
>     may_differ = !one->oid_valid || !two->oid_valid || !oideq(...);
> 
>     if (binary) {
>         if (!may_differ) {
>             added = deleted = 0;
>             ...
>         } else {
>             ... count added and deleted ...
>         }
>     } else if (rewrite) {
> 	...
>     } else if (may_differ) {
> 	... use xdl ...
>     }
> 
> and it would become quite straight-forward to follow.  When there is
> no chance that they may be different, we short-cut and otherwise we
> compute without cheating.  Only when they can be different, we do
> the expensive xdl thing.

I toyed a bit on the binary side... I never sent my 2nd reply as I still
needed to dig up; testing with diff_filespec_is_binary() { return 1; } I
would get (as expected) the same false-positive modified binary files I
used to get in range-diff test.

What I didn't get is applying the same logic
(free_diffstat_file(file);diffstat->nr--;) didn't have any effect. I'll
have to find what differs here to make binary files how up regardless.

Regards,

Thomas

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v4] diff: Fix modified lines stats with --stat and --numstat
  2020-09-24  5:24                 ` [PATCH v3] " Thomas Guyot-Sionnest
@ 2020-09-24  7:41                   ` Thomas Guyot-Sionnest
  0 siblings, 0 replies; 50+ messages in thread
From: Thomas Guyot-Sionnest @ 2020-09-24  7:41 UTC (permalink / raw)
  To: git; +Cc: dermoth, me, gitster, Johannes.Schindelin, peff,
	Thomas Guyot-Sionnest

Only skip diffstats when both oids are valid and identical. This check
was causing both false-positives (files included in diffstats with no
actual changes (0 lines modified) and false-negatives (showing 0 lines
modified in stats when files had actually changed).

Also replaced same_contents with may_differ to avoid confusion.

Signed-off-by: Thomas Guyot-Sionnest <tguyot@gmail.com>
---
Interdiff:
  diff --git a/diff.c b/diff.c
  index 77e0bd772e..2bb2f8f57e 100644
  --- a/diff.c
  +++ b/diff.c
  @@ -3663,7 +3663,7 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
   {
   	mmfile_t mf1, mf2;
   	struct diffstat_file *data;
  -	int same_file;
  +	int may_differ;
   	int complete_rewrite = 0;
   
   	if (!DIFF_PAIR_UNMERGED(p)) {
  @@ -3682,13 +3682,13 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
   	}
   
   	/* saves some reads if true, not a guarantee of diff outcome */
  -	same_file = one->oid_valid && two->oid_valid &&
  -		oideq(&one->oid, &two->oid);
  +	may_differ = !(one->oid_valid && two->oid_valid &&
  +			oideq(&one->oid, &two->oid));
   
   	if (diff_filespec_is_binary(o->repo, one) ||
   	    diff_filespec_is_binary(o->repo, two)) {
   		data->is_binary = 1;
  -		if (same_file) {
  +		if (!may_differ) {
   			data->added = 0;
   			data->deleted = 0;
   		} else {
  @@ -3704,7 +3704,7 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
   		data->added = count_lines(two->data, two->size);
   	}
   
  -	else if (!same_file) {
  +	else if (may_differ) {
   		/* Crazy xdl interfaces.. */
   		xpparam_t xpp;
   		xdemitconf_t xecfg;
  @@ -3729,7 +3729,7 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
   				diffstat->files[diffstat->nr - 1];
   			/*
   			 * Omit diffstats of modified files where nothing changed.
  -			 * Even if !same_file, this might be the case due to
  +			 * Even if may_differ, this might be the case due to
   			 * ignoring whitespace changes, etc.
   			 *
   			 * But note that we special-case additions, deletions,

 diff.c                | 12 +++++++-----
 t/t3206-range-diff.sh | 12 ++++--------
 2 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/diff.c b/diff.c
index ee8e8189e9..2bb2f8f57e 100644
--- a/diff.c
+++ b/diff.c
@@ -3663,7 +3663,7 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
 {
 	mmfile_t mf1, mf2;
 	struct diffstat_file *data;
-	int same_contents;
+	int may_differ;
 	int complete_rewrite = 0;
 
 	if (!DIFF_PAIR_UNMERGED(p)) {
@@ -3681,12 +3681,14 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
 		return;
 	}
 
-	same_contents = oideq(&one->oid, &two->oid);
+	/* saves some reads if true, not a guarantee of diff outcome */
+	may_differ = !(one->oid_valid && two->oid_valid &&
+			oideq(&one->oid, &two->oid));
 
 	if (diff_filespec_is_binary(o->repo, one) ||
 	    diff_filespec_is_binary(o->repo, two)) {
 		data->is_binary = 1;
-		if (same_contents) {
+		if (!may_differ) {
 			data->added = 0;
 			data->deleted = 0;
 		} else {
@@ -3702,7 +3704,7 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
 		data->added = count_lines(two->data, two->size);
 	}
 
-	else if (!same_contents) {
+	else if (may_differ) {
 		/* Crazy xdl interfaces.. */
 		xpparam_t xpp;
 		xdemitconf_t xecfg;
@@ -3727,7 +3729,7 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
 				diffstat->files[diffstat->nr - 1];
 			/*
 			 * Omit diffstats of modified files where nothing changed.
-			 * Even if !same_contents, this might be the case due to
+			 * Even if may_differ, this might be the case due to
 			 * ignoring whitespace changes, etc.
 			 *
 			 * But note that we special-case additions, deletions,
diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
index e024cff65c..6eb344be03 100755
--- a/t/t3206-range-diff.sh
+++ b/t/t3206-range-diff.sh
@@ -252,17 +252,13 @@ test_expect_success 'changed commit with --stat diff option' '
 	git range-diff --no-color --stat topic...changed >actual &&
 	cat >expect <<-EOF &&
 	1:  $(test_oid t1) = 1:  $(test_oid c1) s/5/A/
-	     a => b | 0
-	     1 file changed, 0 insertions(+), 0 deletions(-)
 	2:  $(test_oid t2) = 2:  $(test_oid c2) s/4/A/
-	     a => b | 0
-	     1 file changed, 0 insertions(+), 0 deletions(-)
 	3:  $(test_oid t3) ! 3:  $(test_oid c3) s/11/B/
-	     a => b | 0
-	     1 file changed, 0 insertions(+), 0 deletions(-)
+	     a => b | 2 +-
+	     1 file changed, 1 insertion(+), 1 deletion(-)
 	4:  $(test_oid t4) ! 4:  $(test_oid c4) s/12/B/
-	     a => b | 0
-	     1 file changed, 0 insertions(+), 0 deletions(-)
+	     a => b | 2 +-
+	     1 file changed, 1 insertion(+), 1 deletion(-)
 	EOF
 	test_cmp expect actual
 '
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-24  7:13                   ` Thomas Guyot
@ 2020-09-24 17:19                     ` Junio C Hamano
  2020-09-24 17:38                       ` Junio C Hamano
  0 siblings, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2020-09-24 17:19 UTC (permalink / raw)
  To: Thomas Guyot; +Cc: Johannes Schindelin, Jeff King, git, Thomas Guyot-Sionnest

Thomas Guyot <tguyot@gmail.com> writes:

> Hi Junio,
>
> On 2020-09-24 02:40, Junio C Hamano wrote:
>> Thomas Guyot <tguyot@gmail.com> writes:
>> 
>> It is not "both ways", I think.  The idea is that when this variable
>> is true, we know with certainty that these two are the same, but
>> even when the variable is false, they still can be the same.  So
>> true does mean there will not be diff.  False indeed is fuzzy.
>
> I meant to say the old behavior "lied" in both directions.

It depends on the perspective ;-)

The old one didn't expect/realize fill_oid_info() can leave the oid
field to "unknown".  It was OK because is_stdin happens to be the
only such case [*1*] and we never saw both one->oid and two->oid
being the null_oid at the same time, so it wasn't an issue that
their validity weren't checked there.  As long as one side was
valid, when the comparison said they were equal, they indeed were
equal.  So in that sense, the true side did not lie.

If you add new case where the oid of both sides can legitimately be
null_oid, that will of course break the code.  I think that is the
reason why we are having this discussion to prepare for such a
future (that happens in 2/2???).

[Footnote]

*1* The missing side of addition and deletion will also get null_oid,
but we don't compare that with stdin in such a case.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat
  2020-09-24 17:19                     ` Junio C Hamano
@ 2020-09-24 17:38                       ` Junio C Hamano
  0 siblings, 0 replies; 50+ messages in thread
From: Junio C Hamano @ 2020-09-24 17:38 UTC (permalink / raw)
  To: Thomas Guyot; +Cc: Johannes Schindelin, Jeff King, git, Thomas Guyot-Sionnest

Junio C Hamano <gitster@pobox.com> writes:

> If you add new case where the oid of both sides can legitimately be
> null_oid, that will of course break the code.

Ahh, I forgot that we already had such an iffy caller that broke the
code.  I should have re-read the test part of the patch.

Thanks.

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2020-09-24 17:49 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-18 11:32 Allow passing pipes to diff --no-index + bugfix Thomas Guyot-Sionnest
2020-09-18 11:32 ` [PATCH 1/2] diff: Fix modified lines stats with --stat and --numstat Thomas Guyot-Sionnest
2020-09-18 14:46   ` Taylor Blau
2020-09-18 15:10     ` Thomas Guyot-Sionnest
2020-09-18 17:37       ` Jeff King
2020-09-18 18:00         ` Thomas Guyot-Sionnest
2020-09-20  4:53       ` Thomas Guyot
2020-09-18 17:27   ` Jeff King
2020-09-18 17:52     ` Thomas Guyot-Sionnest
2020-09-18 18:06       ` Junio C Hamano
2020-09-23 19:16         ` Johannes Schindelin
2020-09-23 19:23           ` Junio C Hamano
2020-09-23 20:44             ` Johannes Schindelin
2020-09-24  4:49               ` Thomas Guyot
2020-09-24  5:24                 ` [PATCH v3] " Thomas Guyot-Sionnest
2020-09-24  7:41                   ` [PATCH v4] " Thomas Guyot-Sionnest
2020-09-24  6:40                 ` [PATCH 1/2] " Junio C Hamano
2020-09-24  7:13                   ` Thomas Guyot
2020-09-24 17:19                     ` Junio C Hamano
2020-09-24 17:38                       ` Junio C Hamano
2020-09-23 15:05     ` Johannes Schindelin
2020-09-20 13:09   ` [PATCH v2] " Thomas Guyot-Sionnest
2020-09-20 15:39     ` Taylor Blau
2020-09-20 16:38       ` Thomas Guyot
2020-09-20 19:11       ` Junio C Hamano
2020-09-20 20:08         ` Junio C Hamano
2020-09-20 20:36         ` Junio C Hamano
2020-09-20 22:15           ` Junio C Hamano
2020-09-21 19:26         ` Jeff King
2020-09-21 21:51           ` Junio C Hamano
2020-09-21 22:20             ` Jeff King
2020-09-21 22:37               ` Junio C Hamano
2020-09-18 11:32 ` [PATCH 2/2] Allow passing pipes for input pipes to diff --no-index Thomas Guyot-Sionnest
2020-09-18 14:36   ` Taylor Blau
2020-09-18 16:34     ` Thomas Guyot-Sionnest
2020-09-18 17:19       ` Jeff King
2020-09-18 17:21         ` Jeff King
2020-09-18 17:39         ` Thomas Guyot-Sionnest
2020-09-18 17:48         ` Junio C Hamano
2020-09-18 18:02           ` Jeff King
2020-09-20 12:54             ` Thomas Guyot
2020-09-21 19:31               ` Jeff King
2020-09-21 20:14                 ` Junio C Hamano
2020-09-18 17:58       ` Taylor Blau
2020-09-18 18:05         ` Jeff King
2020-09-18 17:20     ` Jeff King
2020-09-18 18:00       ` Taylor Blau
2020-09-18 21:56   ` brian m. carlson
2020-09-18 17:51 ` Allow passing pipes to diff --no-index + bugfix Junio C Hamano
2020-09-18 18:24   ` Thomas Guyot-Sionnest

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).