git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/6] some diff --cc --stat fixes
@ 2019-01-24 12:26 Jeff King
  2019-01-24 12:27 ` [PATCH 1/6] t4006: resurrect commented-out tests Jeff King
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Jeff King @ 2019-01-24 12:26 UTC (permalink / raw)
  To: git; +Cc: David Turner, Stefan Beller

This started as a follow-up to my "something like this" bug-fix from
last July:

 https://public-inbox.org/git/20180710203438.GB6886@sigill.intra.peff.net/

(better late than never). But in the course of that, I discovered
another really weird and interesting bug with --color-moved. This fixes
both, along with some related cases.

-Peff

  [1/6]: t4006: resurrect commented-out tests
  [2/6]: diff: clear emitted_symbols flag after use
  [3/6]: combine-diff: factor out stat-format mask
  [4/6]: combine-diff: treat --shortstat like --stat
  [5/6]: combine-diff: treat --summary like --stat
  [6/6]: combine-diff: treat --dirstat like --stat

 combine-diff.c                                | 17 ++--
 diff.c                                        |  4 +-
 t/t4006-diff-mode.sh                          | 55 +++++++------
 t/t4013-diff-various.sh                       |  9 +++
 .../diff.diff-tree_--cc_--shortstat_master    |  4 +
 t/t4013/diff.diff-tree_--cc_--summary_REVERSE |  6 ++
 .../diff.diff_--dirstat_--cc_master~1_master  |  3 +
 t/t4066-diff-emit-delay.sh                    | 79 +++++++++++++++++++
 8 files changed, 147 insertions(+), 30 deletions(-)
 create mode 100644 t/t4013/diff.diff-tree_--cc_--shortstat_master
 create mode 100644 t/t4013/diff.diff-tree_--cc_--summary_REVERSE
 create mode 100644 t/t4013/diff.diff_--dirstat_--cc_master~1_master
 create mode 100755 t/t4066-diff-emit-delay.sh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/6] t4006: resurrect commented-out tests
  2019-01-24 12:26 [PATCH 0/6] some diff --cc --stat fixes Jeff King
@ 2019-01-24 12:27 ` Jeff King
  2019-01-24 18:18   ` Stefan Beller
  2019-01-24 12:32 ` [PATCH 2/6] diff: clear emitted_symbols flag after use Jeff King
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2019-01-24 12:27 UTC (permalink / raw)
  To: git; +Cc: David Turner, Stefan Beller

This set of tests was added by 4434e6ba6c (tests: check --[short]stat
output after chmod, 2012-05-01), and is primarily about the handling of
binary versus text files.

Later, 74faaa16f0 (Fix "git diff --stat" for interesting - but empty -
file changes, 2012-10-17) changed the stat output so that the empty text
file is mentioned rather than omitted. That commit just comments out
these tests. There's no discussion in the commit message, but the
original email[1] says:

  NOTE! This does break two of our tests, so we clearly did this on
  purpose, or at least tested for it. I just uncommented the subtests
  that this makes irrelevant, and changed the output of another one.

I don't think they're irrelevant, though. We should be testing this
"mode change only" case and making sure that it has the post-74faaa16f0
behavior. So this commit brings back those tests, with the current
expected output.

[1] https://public-inbox.org/git/CA+55aFz88GPJcfMSqiyY+u0Cdm48bEyrsTGxHVJbGsYsDg=Q5w@mail.gmail.com/

Signed-off-by: Jeff King <peff@peff.net>
---
Not actually relevant to the rest of the series; I just found this while
looking for existing --shortstat tests.

 t/t4006-diff-mode.sh | 55 ++++++++++++++++++++++++++------------------
 1 file changed, 32 insertions(+), 23 deletions(-)

diff --git a/t/t4006-diff-mode.sh b/t/t4006-diff-mode.sh
index a8e01eccd1..03489aff14 100755
--- a/t/t4006-diff-mode.sh
+++ b/t/t4006-diff-mode.sh
@@ -32,28 +32,37 @@ test_expect_success 'prepare binary file' '
 	git commit -m binbin
 '
 
-# test_expect_success '--stat output after text chmod' '
-# 	test_chmod -x rezrov &&
-# 	echo " 0 files changed" >expect &&
-# 	git diff HEAD --stat >actual &&
-#	test_i18ncmp expect actual
-# '
-#
-# test_expect_success '--shortstat output after text chmod' '
-# 	git diff HEAD --shortstat >actual &&
-# 	test_i18ncmp expect actual
-# '
-#
-# test_expect_success '--stat output after binary chmod' '
-# 	test_chmod +x binbin &&
-# 	echo " 0 files changed" >expect &&
-# 	git diff HEAD --stat >actual &&
-# 	test_i18ncmp expect actual
-# '
-#
-# test_expect_success '--shortstat output after binary chmod' '
-# 	git diff HEAD --shortstat >actual &&
-# 	test_i18ncmp expect actual
-# '
+test_expect_success '--stat output after text chmod' '
+	test_chmod -x rezrov &&
+	cat >expect <<-\EOF &&
+	 rezrov | 0
+	 1 file changed, 0 insertions(+), 0 deletions(-)
+	EOF
+	git diff HEAD --stat >actual &&
+	test_i18ncmp expect actual
+'
+
+test_expect_success '--shortstat output after text chmod' '
+	tail -n 1 <expect >expect.short &&
+	git diff HEAD --shortstat >actual &&
+	test_i18ncmp expect.short actual
+'
+
+test_expect_success '--stat output after binary chmod' '
+	test_chmod +x binbin &&
+	cat >expect <<-EOF &&
+	 binbin | Bin
+	 rezrov |   0
+	 2 files changed, 0 insertions(+), 0 deletions(-)
+	EOF
+	git diff HEAD --stat >actual &&
+	test_i18ncmp expect actual
+'
+
+test_expect_success '--shortstat output after binary chmod' '
+	tail -n 1 <expect >expect.short &&
+	git diff HEAD --shortstat >actual &&
+	test_i18ncmp expect.short actual
+'
 
 test_done
-- 
2.20.1.842.g8986705066


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/6] diff: clear emitted_symbols flag after use
  2019-01-24 12:26 [PATCH 0/6] some diff --cc --stat fixes Jeff King
  2019-01-24 12:27 ` [PATCH 1/6] t4006: resurrect commented-out tests Jeff King
@ 2019-01-24 12:32 ` Jeff King
  2019-01-24 18:55   ` Stefan Beller
  2019-01-24 20:18   ` Junio C Hamano
  2019-01-24 12:33 ` [PATCH 3/6] combine-diff: factor out stat-format mask Jeff King
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 19+ messages in thread
From: Jeff King @ 2019-01-24 12:32 UTC (permalink / raw)
  To: git; +Cc: David Turner, Stefan Beller

There's an odd bug when "log --color-moved" is used with the combination
of "--cc --stat -p": the stat for merge commits is erroneously shown
with the diff of the _next_ commit.

The included test demonstrates the issue. Our history looks something
like this:

  A-B-M--D
   \ /
    C

When we run "git log --cc --stat -p --color-moved" starting at D, we get
this sequence of events:

  1. The diff for D is using -p, so diff_flush() calls into
     diff_flush_patch_all_file_pairs(). There we see that o->color_moved
     is in effect, so we point o->emitted_symbols to a static local
     struct, causing diff_flush_patch() to queue the symbols instead of
     actually writing them out.

     We then do our move detection, emit the symbols, and clear the
     struct. But we leave o->emitted_symbols pointing to our struct.

  2. Next we compute the diff for M. This is a merge, so we use the
     combined diff code. In find_paths_generic(), we compute the
     pairwise diff between each commit and its parent. Normally this is
     done with DIFF_FORMAT_NO_OUTPUT, since we're just looking for
     intersecting paths. But since "--stat --cc" shows the first-parent
     stat, and since we're computing that diff anyway, we enable
     DIFF_FORMAT_DIFFSTAT for the first parent. This outputs the stat
     information immediately, saving us from running a separate
     first-parent diff later.

     But where does that output go? Normally it goes directly to stdout,
     but because o->emitted_symbols is set, we queue it. As a result, we
     don't actually print the diffstat for the merge commit (yet), which
     is wrong.

  3. Next we compute the diff for C. We're actually showing a patch
     again, so we end up in diff_flush_patch_all_file_pairs(), but this
     time we have the queued stat from step 2 waiting in our struct.

     We add new elements to it for C's diff, and then flush the whole
     thing. And we see the diffstat from M as part of C's diff, which is
     wrong.

So triggering the bug really does require the combination of all of
those options.

To fix it, we can simply restore o->emitted_symbols to NULL after
flushing it, so that it does not affect anything outside of
diff_flush_patch_all_file_pairs(). This intuitively makes sense, since
nobody outside of that function is going to bother flushing it, so we
would not want them to write to it either.

In fact, we could take this a step further and turn the local "esm"
struct into a non-static variable that goes away after the function
ends. However, since it contains a dynamically sized array, we benefit
from amortizing the cost of allocations over many calls. So we'll leave
it as static to retain that benefit.

But let's push the zero-ing of esm.nr into the conditional for "if
(o->emitted_symbols)" to make it clear that we do not expect esm to hold
any values if we did not just try to use it. With the code as it is
written now, if we did encounter such a case (which I think would be a
bug), we'd silently leak those values without even bothering to display
them. With this change, we'd at least eventually show them, and somebody
would notice.

Signed-off-by: Jeff King <peff@peff.net>
---
 diff.c                     |  4 +-
 t/t4066-diff-emit-delay.sh | 79 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 82 insertions(+), 1 deletion(-)
 create mode 100755 t/t4066-diff-emit-delay.sh

diff --git a/diff.c b/diff.c
index 1b5f276360..7b97739799 100644
--- a/diff.c
+++ b/diff.c
@@ -5894,8 +5894,10 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 
 		for (i = 0; i < esm.nr; i++)
 			free((void *)esm.buf[i].line);
+		esm.nr = 0;
+
+		o->emitted_symbols = NULL;
 	}
-	esm.nr = 0;
 }
 
 void diff_flush(struct diff_options *options)
diff --git a/t/t4066-diff-emit-delay.sh b/t/t4066-diff-emit-delay.sh
new file mode 100755
index 0000000000..5df6b5e64e
--- /dev/null
+++ b/t/t4066-diff-emit-delay.sh
@@ -0,0 +1,79 @@
+#!/bin/sh
+
+test_description='test combined/stat/moved interaction'
+. ./test-lib.sh
+
+# This test covers a weird 3-way interaction between "--cc -p", which will run
+# the combined diff code, along with "--stat", which will be computed as a
+# first-parent stat during the combined diff, and "--color-moved", which
+# enables the emitted_symbols list to store the diff in memory.
+
+test_expect_success 'set up history with a merge' '
+	test_commit A &&
+	test_commit B &&
+	git checkout -b side HEAD^ &&
+	test_commit C &&
+	git merge -m M master &&
+	test_commit D
+'
+
+test_expect_success 'log --cc -p --stat --color-moved' '
+	cat >expect <<-\EOF &&
+	commit D
+	---
+	 D.t | 1 +
+	 1 file changed, 1 insertion(+)
+
+	diff --git a/D.t b/D.t
+	new file mode 100644
+	index 0000000..1784810
+	--- /dev/null
+	+++ b/D.t
+	@@ -0,0 +1 @@
+	+D
+	commit M
+
+	 B.t | 1 +
+	 1 file changed, 1 insertion(+)
+	commit C
+	---
+	 C.t | 1 +
+	 1 file changed, 1 insertion(+)
+
+	diff --git a/C.t b/C.t
+	new file mode 100644
+	index 0000000..3cc58df
+	--- /dev/null
+	+++ b/C.t
+	@@ -0,0 +1 @@
+	+C
+	commit B
+	---
+	 B.t | 1 +
+	 1 file changed, 1 insertion(+)
+
+	diff --git a/B.t b/B.t
+	new file mode 100644
+	index 0000000..223b783
+	--- /dev/null
+	+++ b/B.t
+	@@ -0,0 +1 @@
+	+B
+	commit A
+	---
+	 A.t | 1 +
+	 1 file changed, 1 insertion(+)
+
+	diff --git a/A.t b/A.t
+	new file mode 100644
+	index 0000000..f70f10e
+	--- /dev/null
+	+++ b/A.t
+	@@ -0,0 +1 @@
+	+A
+	EOF
+	git log --format="commit %s" --cc -p --stat --color-moved >actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
2.20.1.842.g8986705066


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/6] combine-diff: factor out stat-format mask
  2019-01-24 12:26 [PATCH 0/6] some diff --cc --stat fixes Jeff King
  2019-01-24 12:27 ` [PATCH 1/6] t4006: resurrect commented-out tests Jeff King
  2019-01-24 12:32 ` [PATCH 2/6] diff: clear emitted_symbols flag after use Jeff King
@ 2019-01-24 12:33 ` Jeff King
  2019-01-24 12:34 ` [PATCH 4/6] combine-diff: treat --shortstat like --stat Jeff King
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2019-01-24 12:33 UTC (permalink / raw)
  To: git; +Cc: David Turner, Stefan Beller

There are several conditionals in the combine diff code that check if
we're doing --stat or --numstat output. Since these must all remain in
sync, let's pull them out into a separate bit-mask.

Arguably this could go into diff.h along with the other DIFF_FORMAT
macros, but it's not clear that the definition of "which formats are
stat" is universal (e.g., does --dirstat count? --summary?). So let's
keep this local to combine-diff.c, where the meaning is more clearly
"stat formats that combine-diff supports".

Signed-off-by: Jeff King <peff@peff.net>
---
 combine-diff.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/combine-diff.c b/combine-diff.c
index a143c00634..b1d259d5a0 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1321,6 +1321,11 @@ static const char *path_path(void *obj)
 	return path->path;
 }
 
+/*
+ * Diff stat formats which we always compute solely against the first parent.
+ */
+#define STAT_FORMAT_MASK (DIFF_FORMAT_NUMSTAT \
+			  | DIFF_FORMAT_DIFFSTAT)
 
 /* find set of paths that every parent touches */
 static struct combine_diff_path *find_paths_generic(const struct object_id *oid,
@@ -1342,8 +1347,7 @@ static struct combine_diff_path *find_paths_generic(const struct object_id *oid,
 		 * show stat against the first parent even when doing
 		 * combined diff.
 		 */
-		int stat_opt = (output_format &
-				(DIFF_FORMAT_NUMSTAT|DIFF_FORMAT_DIFFSTAT));
+		int stat_opt = output_format & STAT_FORMAT_MASK;
 		if (i == 0 && stat_opt)
 			opt->output_format = stat_opt;
 		else
@@ -1470,8 +1474,7 @@ void diff_tree_combined(const struct object_id *oid,
 		 * show stat against the first parent even
 		 * when doing combined diff.
 		 */
-		stat_opt = (opt->output_format &
-				(DIFF_FORMAT_NUMSTAT|DIFF_FORMAT_DIFFSTAT));
+		stat_opt = opt->output_format & STAT_FORMAT_MASK;
 		if (stat_opt) {
 			diffopts.output_format = stat_opt;
 
@@ -1515,8 +1518,7 @@ void diff_tree_combined(const struct object_id *oid,
 				show_raw_diff(p, num_parent, rev);
 			needsep = 1;
 		}
-		else if (opt->output_format &
-			 (DIFF_FORMAT_NUMSTAT|DIFF_FORMAT_DIFFSTAT))
+		else if (opt->output_format & STAT_FORMAT_MASK)
 			needsep = 1;
 		else if (opt->output_format & DIFF_FORMAT_CALLBACK)
 			handle_combined_callback(opt, paths, num_parent, num_paths);
-- 
2.20.1.842.g8986705066


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/6] combine-diff: treat --shortstat like --stat
  2019-01-24 12:26 [PATCH 0/6] some diff --cc --stat fixes Jeff King
                   ` (2 preceding siblings ...)
  2019-01-24 12:33 ` [PATCH 3/6] combine-diff: factor out stat-format mask Jeff King
@ 2019-01-24 12:34 ` Jeff King
  2019-01-24 18:58   ` David Turner
  2019-01-24 19:02   ` Stefan Beller
  2019-01-24 12:35 ` [PATCH 5/6] combine-diff: treat --summary " Jeff King
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 19+ messages in thread
From: Jeff King @ 2019-01-24 12:34 UTC (permalink / raw)
  To: git; +Cc: David Turner, Stefan Beller

The --stat of a combined diff is defined as the first-parent stat,
going all the way back to 965f803c32 (combine-diff: show diffstat with
the first parent., 2006-04-17).

Naturally, we gave --numstat the same treatment in 74e2abe5b7 (diff
--numstat, 2006-10-12).

But --shortstat, which is really just the final line of --stat, does
nothing, which produces confusing results:

  $ git show --oneline --stat eab7584e37
  eab7584e37 Merge branch 'en/show-ref-doc-fix'

   Documentation/git-show-ref.txt | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

  $ git show --oneline --shortstat eab7584e37
  eab7584e37 Merge branch 'en/show-ref-doc-fix'

  [nothing! We'd expect to see the "1 file changed..." line]

This patch teaches combine-diff to treats the two formats identically.

Reported-by: David Turner <novalis@novalis.org>
Signed-off-by: Jeff King <peff@peff.net>
---
 combine-diff.c                                 | 1 +
 t/t4013-diff-various.sh                        | 1 +
 t/t4013/diff.diff-tree_--cc_--shortstat_master | 4 ++++
 3 files changed, 6 insertions(+)
 create mode 100644 t/t4013/diff.diff-tree_--cc_--shortstat_master

diff --git a/combine-diff.c b/combine-diff.c
index b1d259d5a0..83ec3dfffa 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1325,6 +1325,7 @@ static const char *path_path(void *obj)
  * Diff stat formats which we always compute solely against the first parent.
  */
 #define STAT_FORMAT_MASK (DIFF_FORMAT_NUMSTAT \
+			  | DIFF_FORMAT_SHORTSTAT \
 			  | DIFF_FORMAT_DIFFSTAT)
 
 /* find set of paths that every parent touches */
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 7d985ff6b1..9ccdf08730 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -239,6 +239,7 @@ diff-tree --cc --stat --summary master
 # stat summary should show the diffstat and summary with the first parent
 diff-tree -c --stat --summary side
 diff-tree --cc --stat --summary side
+diff-tree --cc --shortstat master
 # improved by Timo's patch
 diff-tree --cc --patch-with-stat master
 # improved by Timo's patch
diff --git a/t/t4013/diff.diff-tree_--cc_--shortstat_master b/t/t4013/diff.diff-tree_--cc_--shortstat_master
new file mode 100644
index 0000000000..a4ca42df2a
--- /dev/null
+++ b/t/t4013/diff.diff-tree_--cc_--shortstat_master
@@ -0,0 +1,4 @@
+$ git diff-tree --cc --shortstat master
+59d314ad6f356dd08601a4cd5e530381da3e3c64
+ 2 files changed, 5 insertions(+)
+$
-- 
2.20.1.842.g8986705066


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/6] combine-diff: treat --summary like --stat
  2019-01-24 12:26 [PATCH 0/6] some diff --cc --stat fixes Jeff King
                   ` (3 preceding siblings ...)
  2019-01-24 12:34 ` [PATCH 4/6] combine-diff: treat --shortstat like --stat Jeff King
@ 2019-01-24 12:35 ` Jeff King
  2019-01-24 19:14   ` Stefan Beller
  2019-01-24 12:36 ` [PATCH 6/6] combine-diff: treat --dirstat " Jeff King
  2019-01-24 19:21 ` [PATCH 0/6] some diff --cc --stat fixes Stefan Beller
  6 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2019-01-24 12:35 UTC (permalink / raw)
  To: git; +Cc: David Turner, Stefan Beller

Currently "--cc --summary" on a merge shows nothing. Since we show "--cc
--stat" as a stat against the first parent, and because --summary is
typically used in combination with --stat, it makes sense to treat them
both the same way.

Note that we have to tweak t4013's setup a bit to test this case, as the
existing merges do not have any --summary results against their first
parent. But since the merge at the tip of 'master' does add and remove
files with respect to the second parent, we can just make a reversed
doppelganger merge where the parents are swapped.

Signed-off-by: Jeff King <peff@peff.net>
---
 combine-diff.c                                | 1 +
 t/t4013-diff-various.sh                       | 7 +++++++
 t/t4013/diff.diff-tree_--cc_--summary_REVERSE | 6 ++++++
 3 files changed, 14 insertions(+)
 create mode 100644 t/t4013/diff.diff-tree_--cc_--summary_REVERSE

diff --git a/combine-diff.c b/combine-diff.c
index 83ec3dfffa..f42b41d884 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1326,6 +1326,7 @@ static const char *path_path(void *obj)
  */
 #define STAT_FORMAT_MASK (DIFF_FORMAT_NUMSTAT \
 			  | DIFF_FORMAT_SHORTSTAT \
+			  | DIFF_FORMAT_SUMMARY \
 			  | DIFF_FORMAT_DIFFSTAT)
 
 /* find set of paths that every parent touches */
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 9ccdf08730..e28953975b 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -98,6 +98,12 @@ test_expect_success setup '
 	git commit -m "update mode" &&
 	git checkout -f master &&
 
+	# Same merge as master, but with parents reversed. Hide it in a
+	# pseudo-ref to avoid impacting tests with --all.
+	commit=$(echo reverse |
+		 git commit-tree -p master^2 -p master^1 master^{tree}) &&
+	git update-ref REVERSE $commit &&
+
 	git config diff.renames false &&
 
 	git show-branch
@@ -240,6 +246,7 @@ diff-tree --cc --stat --summary master
 diff-tree -c --stat --summary side
 diff-tree --cc --stat --summary side
 diff-tree --cc --shortstat master
+diff-tree --cc --summary REVERSE
 # improved by Timo's patch
 diff-tree --cc --patch-with-stat master
 # improved by Timo's patch
diff --git a/t/t4013/diff.diff-tree_--cc_--summary_REVERSE b/t/t4013/diff.diff-tree_--cc_--summary_REVERSE
new file mode 100644
index 0000000000..e208dd5682
--- /dev/null
+++ b/t/t4013/diff.diff-tree_--cc_--summary_REVERSE
@@ -0,0 +1,6 @@
+$ git diff-tree --cc --summary REVERSE
+2562325a7ee916efb2481da93073b82cec801cbc
+ create mode 100644 file1
+ delete mode 100644 file2
+ delete mode 100644 file3
+$
-- 
2.20.1.842.g8986705066


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/6] combine-diff: treat --dirstat like --stat
  2019-01-24 12:26 [PATCH 0/6] some diff --cc --stat fixes Jeff King
                   ` (4 preceding siblings ...)
  2019-01-24 12:35 ` [PATCH 5/6] combine-diff: treat --summary " Jeff King
@ 2019-01-24 12:36 ` Jeff King
  2019-01-24 19:21 ` [PATCH 0/6] some diff --cc --stat fixes Stefan Beller
  6 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2019-01-24 12:36 UTC (permalink / raw)
  To: git; +Cc: David Turner, Stefan Beller

Currently "--cc --dirstat" will show nothing for a merge.  Like
--shortstat and --summary in the previous two patches, it probably makes
sense to treat it like we do --stat, and show a stat against the
first-parent.

This case is less obviously correct than for --shortstat and --summary,
as those are basically variants of --stat themselves. It's possible we
could develop a multi-parent combined dirstat format, in which case we
might regret defining this first-parent behavior. But the same could be
said for --stat, and in the 12+ years of it showing first-parent stats,
nobody has complained.

So showing the first-parent dirstat is at least _useful_, and if we
later develop a clever multi-parent stat format, we'd probably have to
deal with --stat anyway.

Signed-off-by: Jeff King <peff@peff.net>
---
 combine-diff.c                                   | 1 +
 t/t4013-diff-various.sh                          | 1 +
 t/t4013/diff.diff_--dirstat_--cc_master~1_master | 3 +++
 3 files changed, 5 insertions(+)
 create mode 100644 t/t4013/diff.diff_--dirstat_--cc_master~1_master

diff --git a/combine-diff.c b/combine-diff.c
index f42b41d884..23d8fabe75 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1327,6 +1327,7 @@ static const char *path_path(void *obj)
 #define STAT_FORMAT_MASK (DIFF_FORMAT_NUMSTAT \
 			  | DIFF_FORMAT_SHORTSTAT \
 			  | DIFF_FORMAT_SUMMARY \
+			  | DIFF_FORMAT_DIRSTAT \
 			  | DIFF_FORMAT_DIFFSTAT)
 
 /* find set of paths that every parent touches */
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index e28953975b..9f8f0e84ad 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -358,6 +358,7 @@ diff --line-prefix=abc master master^ side
 diff --dirstat master~1 master~2
 diff --dirstat initial rearrange
 diff --dirstat-by-file initial rearrange
+diff --dirstat --cc master~1 master
 # No-index --abbrev and --no-abbrev
 diff --raw initial
 :noellipses diff --raw initial
diff --git a/t/t4013/diff.diff_--dirstat_--cc_master~1_master b/t/t4013/diff.diff_--dirstat_--cc_master~1_master
new file mode 100644
index 0000000000..fba4e34175
--- /dev/null
+++ b/t/t4013/diff.diff_--dirstat_--cc_master~1_master
@@ -0,0 +1,3 @@
+$ git diff --dirstat --cc master~1 master
+  40.0% dir/
+$
-- 
2.20.1.842.g8986705066

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] t4006: resurrect commented-out tests
  2019-01-24 12:27 ` [PATCH 1/6] t4006: resurrect commented-out tests Jeff King
@ 2019-01-24 18:18   ` Stefan Beller
  0 siblings, 0 replies; 19+ messages in thread
From: Stefan Beller @ 2019-01-24 18:18 UTC (permalink / raw)
  To: Jeff King; +Cc: git, David Turner

On Thu, Jan 24, 2019 at 4:27 AM Jeff King <peff@peff.net> wrote:

> ---
> Not actually relevant to the rest of the series; I just found this while
> looking for existing --shortstat tests.

Not actually relevant to the review, but my long term vision for
--color-moved would work with this patch, too. ;)

But now that I look at the patch, there are actual changes, as you
indicated in the commit message. Just more than I expected.

This patch is
    Reviewed-by: Stefan Beller <sbeller@google.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] diff: clear emitted_symbols flag after use
  2019-01-24 12:32 ` [PATCH 2/6] diff: clear emitted_symbols flag after use Jeff King
@ 2019-01-24 18:55   ` Stefan Beller
  2019-01-24 19:11     ` Jeff King
  2019-01-24 20:18   ` Junio C Hamano
  1 sibling, 1 reply; 19+ messages in thread
From: Stefan Beller @ 2019-01-24 18:55 UTC (permalink / raw)
  To: Jeff King; +Cc: git, David Turner

On Thu, Jan 24, 2019 at 4:32 AM Jeff King <peff@peff.net> wrote:
>
> There's an odd bug when "log --color-moved" is used with the combination
> of "--cc --stat -p": the stat for merge commits is erroneously shown
> with the diff of the _next_ commit.
>
> The included test demonstrates the issue. Our history looks something
> like this:
>
>   A-B-M--D
>    \ /
>     C
>
> When we run "git log --cc --stat -p --color-moved" starting at D, we get
> this sequence of events:
>
>   1. The diff for D is using -p, so diff_flush() calls into
>      diff_flush_patch_all_file_pairs(). There we see that o->color_moved
>      is in effect, so we point o->emitted_symbols to a static local
>      struct, causing diff_flush_patch() to queue the symbols instead of
>      actually writing them out.
>
>      We then do our move detection, emit the symbols, and clear the
>      struct. But we leave o->emitted_symbols pointing to our struct.
>
>   2. Next we compute the diff for M. This is a merge, so we use the
>      combined diff code. In find_paths_generic(), we compute the
>      pairwise diff between each commit and its parent. Normally this is
>      done with DIFF_FORMAT_NO_OUTPUT, since we're just looking for
>      intersecting paths. But since "--stat --cc" shows the first-parent
>      stat, and since we're computing that diff anyway, we enable
>      DIFF_FORMAT_DIFFSTAT for the first parent. This outputs the stat
>      information immediately, saving us from running a separate
>      first-parent diff later.
>
>      But where does that output go? Normally it goes directly to stdout,
>      but because o->emitted_symbols is set, we queue it. As a result, we
>      don't actually print the diffstat for the merge commit (yet),

Thanks for your analysis. As always a pleasant read.
I understand and agree with what is written up to here remembering
the code vaguely.

> which
>      is wrong.

I disagree with this sentiment. If we remember to flush the queued output
this is merely an inefficiency due to implementation details, but not wrong.

We could argue that it is wrong to have o->emitted_symbols set, as
we know we don't need it for producing a diffstat only.

>
>   3. Next we compute the diff for C. We're actually showing a patch
>      again, so we end up in diff_flush_patch_all_file_pairs(), but this
>      time we have the queued stat from step 2 waiting in our struct.

Right, that is how the queueing can produce errors. I wonder if the
test that is included in this patch would work on top of
e6e045f803 ("diff.c: buffer all output if asked to", 2017-06-29)
as that commit specifically wanted to make sure these errors
would be caught.

>
>      We add new elements to it for C's diff, and then flush the whole
>      thing. And we see the diffstat from M as part of C's diff, which is
>      wrong.
>
> So triggering the bug really does require the combination of all of
> those options.

Similarly we can trigger bugs by using options that enable buffering
(so far only --color-moved) and options that do not fully buffer and flush.

>
> To fix it, we can simply restore o->emitted_symbols to NULL after
> flushing it, so that it does not affect anything outside of
> diff_flush_patch_all_file_pairs(). This intuitively makes sense, since
> nobody outside of that function is going to bother flushing it, so we
> would not want them to write to it either.

This would also cause the inefficiency I mentioned after (2) to disappear,
as the merge commits diffstat would be just printed to stdout?

>
> In fact, we could take this a step further and turn the local "esm"
> struct into a non-static variable that goes away after the function
> ends. However, since it contains a dynamically sized array, we benefit
> from amortizing the cost of allocations over many calls. So we'll leave
> it as static to retain that benefit.

okay.

>
> But let's push the zero-ing of esm.nr into the conditional for "if
> (o->emitted_symbols)" to make it clear that we do not expect esm to hold
> any values if we did not just try to use it. With the code as it is
> written now, if we did encounter such a case (which I think would be a
> bug), we'd silently leak those values without even bothering to display
> them. With this change, we'd at least eventually show them, and somebody
> would notice.

Wow. Good call.

>
> Signed-off-by: Jeff King <peff@peff.net>

Reviewed-by: Stefan Beller <sbeller@google.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] combine-diff: treat --shortstat like --stat
  2019-01-24 12:34 ` [PATCH 4/6] combine-diff: treat --shortstat like --stat Jeff King
@ 2019-01-24 18:58   ` David Turner
  2019-01-24 19:02   ` Stefan Beller
  1 sibling, 0 replies; 19+ messages in thread
From: David Turner @ 2019-01-24 18:58 UTC (permalink / raw)
  To: Jeff King, git; +Cc: Stefan Beller

Thanks for the fix here.

On Thu, 2019-01-24 at 07:34 -0500, Jeff King wrote:
> The --stat of a combined diff is defined as the first-parent stat,
> going all the way back to 965f803c32 (combine-diff: show diffstat
> with
> the first parent., 2006-04-17).
> 
> Naturally, we gave --numstat the same treatment in 74e2abe5b7 (diff
> --numstat, 2006-10-12).
> 
> But --shortstat, which is really just the final line of --stat, does
> nothing, which produces confusing results:
> 
>   $ git show --oneline --stat eab7584e37
>   eab7584e37 Merge branch 'en/show-ref-doc-fix'
> 
>    Documentation/git-show-ref.txt | 2 +-
>    1 file changed, 1 insertion(+), 1 deletion(-)
> 
>   $ git show --oneline --shortstat eab7584e37
>   eab7584e37 Merge branch 'en/show-ref-doc-fix'
> 
>   [nothing! We'd expect to see the "1 file changed..." line]
> 
> This patch teaches combine-diff to treats the two formats
> identically.
> 
> Reported-by: David Turner <novalis@novalis.org>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  combine-diff.c                                 | 1 +
>  t/t4013-diff-various.sh                        | 1 +
>  t/t4013/diff.diff-tree_--cc_--shortstat_master | 4 ++++
>  3 files changed, 6 insertions(+)
>  create mode 100644 t/t4013/diff.diff-tree_--cc_--shortstat_master
> 
> diff --git a/combine-diff.c b/combine-diff.c
> index b1d259d5a0..83ec3dfffa 100644
> --- a/combine-diff.c
> +++ b/combine-diff.c
> @@ -1325,6 +1325,7 @@ static const char *path_path(void *obj)
>   * Diff stat formats which we always compute solely against the
> first parent.
>   */
>  #define STAT_FORMAT_MASK (DIFF_FORMAT_NUMSTAT \
> +			  | DIFF_FORMAT_SHORTSTAT \
>  			  | DIFF_FORMAT_DIFFSTAT)
>  
>  /* find set of paths that every parent touches */
> diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
> index 7d985ff6b1..9ccdf08730 100755
> --- a/t/t4013-diff-various.sh
> +++ b/t/t4013-diff-various.sh
> @@ -239,6 +239,7 @@ diff-tree --cc --stat --summary master
>  # stat summary should show the diffstat and summary with the first
> parent
>  diff-tree -c --stat --summary side
>  diff-tree --cc --stat --summary side
> +diff-tree --cc --shortstat master
>  # improved by Timo's patch
>  diff-tree --cc --patch-with-stat master
>  # improved by Timo's patch
> diff --git a/t/t4013/diff.diff-tree_--cc_--shortstat_master
> b/t/t4013/diff.diff-tree_--cc_--shortstat_master
> new file mode 100644
> index 0000000000..a4ca42df2a
> --- /dev/null
> +++ b/t/t4013/diff.diff-tree_--cc_--shortstat_master
> @@ -0,0 +1,4 @@
> +$ git diff-tree --cc --shortstat master
> +59d314ad6f356dd08601a4cd5e530381da3e3c64
> + 2 files changed, 5 insertions(+)
> +$


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] combine-diff: treat --shortstat like --stat
  2019-01-24 12:34 ` [PATCH 4/6] combine-diff: treat --shortstat like --stat Jeff King
  2019-01-24 18:58   ` David Turner
@ 2019-01-24 19:02   ` Stefan Beller
  1 sibling, 0 replies; 19+ messages in thread
From: Stefan Beller @ 2019-01-24 19:02 UTC (permalink / raw)
  To: Jeff King; +Cc: git, David Turner

On Thu, Jan 24, 2019 at 4:34 AM Jeff King <peff@peff.net> wrote:
>
> The --stat of a combined diff is defined as the first-parent stat,
> going all the way back to 965f803c32 (combine-diff: show diffstat with
> the first parent., 2006-04-17).
>
> Naturally, we gave --numstat the same treatment in 74e2abe5b7 (diff
> --numstat, 2006-10-12).
>
> But --shortstat, which is really just the final line of --stat, does
> nothing, which produces confusing results:
>
>   $ git show --oneline --stat eab7584e37
>   eab7584e37 Merge branch 'en/show-ref-doc-fix'
>
>    Documentation/git-show-ref.txt | 2 +-
>    1 file changed, 1 insertion(+), 1 deletion(-)
>
>   $ git show --oneline --shortstat eab7584e37
>   eab7584e37 Merge branch 'en/show-ref-doc-fix'
>
>   [nothing! We'd expect to see the "1 file changed..." line]
>
> This patch teaches combine-diff to treats the two formats identically.
>
> Reported-by: David Turner <novalis@novalis.org>
> Signed-off-by: Jeff King <peff@peff.net>

Both the previous patch and this one is reviewed by me.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] diff: clear emitted_symbols flag after use
  2019-01-24 18:55   ` Stefan Beller
@ 2019-01-24 19:11     ` Jeff King
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2019-01-24 19:11 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, David Turner

On Thu, Jan 24, 2019 at 10:55:10AM -0800, Stefan Beller wrote:

> >      But where does that output go? Normally it goes directly to stdout,
> >      but because o->emitted_symbols is set, we queue it. As a result, we
> >      don't actually print the diffstat for the merge commit (yet),
> 
> Thanks for your analysis. As always a pleasant read.
> I understand and agree with what is written up to here remembering
> the code vaguely.
> 
> > which
> >      is wrong.
> 
> I disagree with this sentiment. If we remember to flush the queued output
> this is merely an inefficiency due to implementation details, but not wrong.
> 
> We could argue that it is wrong to have o->emitted_symbols set, as
> we know we don't need it for producing a diffstat only.

It's wrong in the sense that we finish printing that merge commit
without having shown its diff. If it were the final commit, we would not
ever print it at all!

So if you are arguing that it would be OK to queue it as long as we
flushed it before deciding we were done with the diff, then I agree. But
doing that correctly would actually be non-trivial, because the
combined-diff code does not use the emitted_symbols queue for its diff
(so the stat and the patch would appear out of order).

I also wondered why diffstats go to o->emitted_symbols at all. We do not
do any analysis of them with --color-moved, I don't think. But I can
also see that having emitted_symbols hold everything makes sense from a
maintainability standpoint; future features may want to see more of what
we're emitting.

> >   3. Next we compute the diff for C. We're actually showing a patch
> >      again, so we end up in diff_flush_patch_all_file_pairs(), but this
> >      time we have the queued stat from step 2 waiting in our struct.
> 
> Right, that is how the queueing can produce errors. I wonder if the
> test that is included in this patch would work on top of
> e6e045f803 ("diff.c: buffer all output if asked to", 2017-06-29)
> as that commit specifically wanted to make sure these errors
> would be caught.

I suspect that would not work with "--cc", because combine-diff outputs
directly stdout. That's something that we might want to improve in the
long run (since obviously it cannot use --color-moved at this point).

> > To fix it, we can simply restore o->emitted_symbols to NULL after
> > flushing it, so that it does not affect anything outside of
> > diff_flush_patch_all_file_pairs(). This intuitively makes sense, since
> > nobody outside of that function is going to bother flushing it, so we
> > would not want them to write to it either.
> 
> This would also cause the inefficiency I mentioned after (2) to disappear,
> as the merge commits diffstat would be just printed to stdout?

Yes, it avoids the overhead of even storing them in the emitted struct
at all.

> Reviewed-by: Stefan Beller <sbeller@google.com>

Thanks!

I did quite a bit of head-scratching figuring out this bug, but at the
end of it I now understand the flow of the color-moved code quite a bit
better. :)

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] combine-diff: treat --summary like --stat
  2019-01-24 12:35 ` [PATCH 5/6] combine-diff: treat --summary " Jeff King
@ 2019-01-24 19:14   ` Stefan Beller
  2019-01-24 19:23     ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Beller @ 2019-01-24 19:14 UTC (permalink / raw)
  To: Jeff King; +Cc: git, David Turner

On Thu, Jan 24, 2019 at 4:35 AM Jeff King <peff@peff.net> wrote:

> Note that we have to tweak t4013's setup a bit to test this case, as the
> existing merges do not have any --summary results against their first
> parent. But since the merge at the tip of 'master' does add and remove
> files with respect to the second parent, we can just make a reversed
> doppelganger merge where the parents are swapped.

...

> +       # Same merge as master, but with parents reversed. Hide it in a
> +       # pseudo-ref to avoid impacting tests with --all.

There are 2 calls with --all, which may be worth testing for as well
assuming we still have similar bugs as shown in the second patch,
but I guess this would also allow for other tests (how do we list all
pseudo refs for example?) to cover more corner cases.

I am not sure I like this.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/6] some diff --cc --stat fixes
  2019-01-24 12:26 [PATCH 0/6] some diff --cc --stat fixes Jeff King
                   ` (5 preceding siblings ...)
  2019-01-24 12:36 ` [PATCH 6/6] combine-diff: treat --dirstat " Jeff King
@ 2019-01-24 19:21 ` Stefan Beller
  6 siblings, 0 replies; 19+ messages in thread
From: Stefan Beller @ 2019-01-24 19:21 UTC (permalink / raw)
  To: Jeff King; +Cc: git, David Turner

On Thu, Jan 24, 2019 at 4:26 AM Jeff King <peff@peff.net> wrote:
>
> This started as a follow-up to my "something like this" bug-fix from
> last July:
>
>  https://public-inbox.org/git/20180710203438.GB6886@sigill.intra.peff.net/
>
> (better late than never). But in the course of that, I discovered
> another really weird and interesting bug with --color-moved. This fixes
> both, along with some related cases.

I reviewed the whole series, and I like it except for the small nit
in the test setup. Arguably we can fix the --all tests once the time
comes, so I am in favor of taking this series as-is.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] combine-diff: treat --summary like --stat
  2019-01-24 19:14   ` Stefan Beller
@ 2019-01-24 19:23     ` Jeff King
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2019-01-24 19:23 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, David Turner

On Thu, Jan 24, 2019 at 11:14:15AM -0800, Stefan Beller wrote:

> On Thu, Jan 24, 2019 at 4:35 AM Jeff King <peff@peff.net> wrote:
> 
> > Note that we have to tweak t4013's setup a bit to test this case, as the
> > existing merges do not have any --summary results against their first
> > parent. But since the merge at the tip of 'master' does add and remove
> > files with respect to the second parent, we can just make a reversed
> > doppelganger merge where the parents are swapped.
> 
> ...
> 
> > +       # Same merge as master, but with parents reversed. Hide it in a
> > +       # pseudo-ref to avoid impacting tests with --all.
> 
> There are 2 calls with --all, which may be worth testing for as well
> assuming we still have similar bugs as shown in the second patch,
> but I guess this would also allow for other tests (how do we list all
> pseudo refs for example?) to cover more corner cases.
> 
> I am not sure I like this.

The --all tests aren't actually very thorough. In fact, they don't
generate diffs at all, making it especially silly that they are in
t4013-diff-various. They are only looking at --decorate.

It also would not be the end of the world to modify the expected output
for those tests. You can see the extend of the damage by applying the
patch below on top and running t4013 with "-v".

---
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 9f8f0e84ad..742c3cdbcb 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -98,11 +98,10 @@ test_expect_success setup '
 	git commit -m "update mode" &&
 	git checkout -f master &&
 
-	# Same merge as master, but with parents reversed. Hide it in a
-	# pseudo-ref to avoid impacting tests with --all.
+	# Same merge as master, but with parents reversed.
 	commit=$(echo reverse |
 		 git commit-tree -p master^2 -p master^1 master^{tree}) &&
-	git update-ref REVERSE $commit &&
+	git update-ref refs/heads/reverse $commit &&
 
 	git config diff.renames false &&
 
@@ -246,7 +245,7 @@ diff-tree --cc --stat --summary master
 diff-tree -c --stat --summary side
 diff-tree --cc --stat --summary side
 diff-tree --cc --shortstat master
-diff-tree --cc --summary REVERSE
+diff-tree --cc --summary reverse
 # improved by Timo's patch
 diff-tree --cc --patch-with-stat master
 # improved by Timo's patch
diff --git a/t/t4013/diff.diff-tree_--cc_--summary_REVERSE b/t/t4013/diff.diff-tree_--cc_--summary_reverse
similarity index 75%
rename from t/t4013/diff.diff-tree_--cc_--summary_REVERSE
rename to t/t4013/diff.diff-tree_--cc_--summary_reverse
index e208dd5682..35da01cf46 100644
--- a/t/t4013/diff.diff-tree_--cc_--summary_REVERSE
+++ b/t/t4013/diff.diff-tree_--cc_--summary_reverse
@@ -1,4 +1,4 @@
-$ git diff-tree --cc --summary REVERSE
+$ git diff-tree --cc --summary reverse
 2562325a7ee916efb2481da93073b82cec801cbc
  create mode 100644 file1
  delete mode 100644 file2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] diff: clear emitted_symbols flag after use
  2019-01-24 12:32 ` [PATCH 2/6] diff: clear emitted_symbols flag after use Jeff King
  2019-01-24 18:55   ` Stefan Beller
@ 2019-01-24 20:18   ` Junio C Hamano
  2019-01-24 20:36     ` Stefan Beller
  2019-01-24 21:15     ` Jeff King
  1 sibling, 2 replies; 19+ messages in thread
From: Junio C Hamano @ 2019-01-24 20:18 UTC (permalink / raw)
  To: Jeff King; +Cc: git, David Turner, Stefan Beller

Jeff King <peff@peff.net> writes:

> When we run "git log --cc --stat -p --color-moved" starting at D, we get
> this sequence of events:
>
>   1. The diff for D is using -p, so diff_flush() calls into
>      diff_flush_patch_all_file_pairs(). There we see that o->color_moved
>      is in effect, so we point o->emitted_symbols to a static local
>      struct, causing diff_flush_patch() to queue the symbols instead of
>      actually writing them out.
>
>      We then do our move detection, emit the symbols, and clear the
>      struct. But we leave o->emitted_symbols pointing to our struct.

Wow, that was nasty.  

I did not like the complexity of that "emitted symbols" conversion
we had to do recently and never trusted the code.  There still is
something funny in diff_flush_patch_all_file_pairs() even after this
patch, though.

 - We first check o->color_moved and unconditionally point
   o->emitted_symbols to &esm.

 - In an if() block we enter when o->emitted_symbols is set, there
   is a check to see if o->color_moved is set.  This makes sense
   only if we are trying to be prepared to handle a case where we
   are not the one that assigned a non-NULL to o->emitted_symbols
   due to o->color_moved.  So it certainly is possible that
   o->emitted_symbols is set before we enter this function.

 - But then, it means that o->emitted_symbols we may have had
   non-NULL when the function is called may be overwritten if
   o->color_moved is set.

The above observation does not necessarily indicate any bug; it just
shows that the code structure is messier than necessary.

> To fix it, we can simply restore o->emitted_symbols to NULL after
> flushing it, so that it does not affect anything outside of
> diff_flush_patch_all_file_pairs(). This intuitively makes sense, since
> nobody outside of that function is going to bother flushing it, so we
> would not want them to write to it either.

Perhaps.  I see word-diff codepath gives an allocated buffer to
o->emitted_symbols, so assigning NULL without freeing would mean a
leak, but I guess this helper function is not designed to be called

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] diff: clear emitted_symbols flag after use
  2019-01-24 20:18   ` Junio C Hamano
@ 2019-01-24 20:36     ` Stefan Beller
  2019-01-24 21:17       ` Jeff King
  2019-01-24 21:15     ` Jeff King
  1 sibling, 1 reply; 19+ messages in thread
From: Stefan Beller @ 2019-01-24 20:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git, David Turner

On Thu, Jan 24, 2019 at 12:18 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jeff King <peff@peff.net> writes:
>
> > When we run "git log --cc --stat -p --color-moved" starting at D, we get
> > this sequence of events:
> >
> >   1. The diff for D is using -p, so diff_flush() calls into
> >      diff_flush_patch_all_file_pairs(). There we see that o->color_moved
> >      is in effect, so we point o->emitted_symbols to a static local
> >      struct, causing diff_flush_patch() to queue the symbols instead of
> >      actually writing them out.
> >
> >      We then do our move detection, emit the symbols, and clear the
> >      struct. But we leave o->emitted_symbols pointing to our struct.
>
> Wow, that was nasty.
>
> I did not like the complexity of that "emitted symbols" conversion
> we had to do recently and never trusted the code.  There still is
> something funny in diff_flush_patch_all_file_pairs() even after this
> patch, though.
>
>  - We first check o->color_moved and unconditionally point
>    o->emitted_symbols to &esm.

I do not recall if we discussed this or if I just had talking voices in my
head, but this was done deliberately to separate the implementation
of the buffering feature and its user. I thought once we had the buffering
in place, we might add options that also require buffering;
hence the explicit "move coloring implies using a buffer".

Having the buffer as a local static might be an iffy implementation
that should be fixed for clarify.

>
>  - In an if() block we enter when o->emitted_symbols is set, there
>    is a check to see if o->color_moved is set.  This makes sense
>    only if we are trying to be prepared to handle a case where we
>    are not the one that assigned a non-NULL to o->emitted_symbols
>    due to o->color_moved.  So it certainly is possible that
>    o->emitted_symbols is set before we enter this function.

Ah I see where you are coming from, as the code was written
I imagined:

    if (o->color_moved)
        o->emitted_symbols = &esm;
    if (o->distim_gostaks)
        o->emitted_symbols = &esm;

    if (o->emitted_symbols) {
         if (o->color_moved)
            handle_color_moved(o);
        if (o->distim_gostaks)
            handle_distimming(o);

        ... flush symbols...
        ... free &cleanup ...
    }


>  - But then, it means that o->emitted_symbols we may have had
>    non-NULL when the function is called may be overwritten if
>    o->color_moved is set.

I see. So either we'd want to have

    if (o->emitted_symbols)
        BUG("should not be already set");

or as Peff points out, make it non-static.

> The above observation does not necessarily indicate any bug; it just
> shows that the code structure is messier than necessary.

Makes sense. I should not have anticipated any new feature to be
added, yet.

>
> > To fix it, we can simply restore o->emitted_symbols to NULL after
> > flushing it, so that it does not affect anything outside of
> > diff_flush_patch_all_file_pairs(). This intuitively makes sense, since
> > nobody outside of that function is going to bother flushing it, so we
> > would not want them to write to it either.
>
> Perhaps.  I see word-diff codepath gives an allocated buffer to
> o->emitted_symbols, so assigning NULL without freeing would mean a
> leak, but I guess this helper function is not designed to be called

Yes, this was done as to make the buffering complete.
We would not have to buffer any word diff as the color move doesn't yet
work on that.

See the NEEDSWORK in diff_words_flush, where we could implement
a word based move coloring.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] diff: clear emitted_symbols flag after use
  2019-01-24 20:18   ` Junio C Hamano
  2019-01-24 20:36     ` Stefan Beller
@ 2019-01-24 21:15     ` Jeff King
  1 sibling, 0 replies; 19+ messages in thread
From: Jeff King @ 2019-01-24 21:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, David Turner, Stefan Beller

On Thu, Jan 24, 2019 at 12:18:41PM -0800, Junio C Hamano wrote:

> I did not like the complexity of that "emitted symbols" conversion
> we had to do recently and never trusted the code.  There still is
> something funny in diff_flush_patch_all_file_pairs() even after this
> patch, though.
> 
>  - We first check o->color_moved and unconditionally point
>    o->emitted_symbols to &esm.
> 
>  - In an if() block we enter when o->emitted_symbols is set, there
>    is a check to see if o->color_moved is set.  This makes sense
>    only if we are trying to be prepared to handle a case where we
>    are not the one that assigned a non-NULL to o->emitted_symbols
>    due to o->color_moved.  So it certainly is possible that
>    o->emitted_symbols is set before we enter this function.

Yeah, I noticed that, too. I assumed it was preparing for a day when the
logic for "are we collecting symbols" becomes more complex than just
being equivalent to "o->color_moved".

Under that rationale, I was OK leaving it.

>  - But then, it means that o->emitted_symbols we may have had
>    non-NULL when the function is called may be overwritten if
>    o->color_moved is set.

Yeah, that is true. I think in the new world order proposed by this
patch, we'd always assume that it's NULL coming in, possibly assign it,
and re-NULL it going
out.

> The above observation does not necessarily indicate any bug; it just
> shows that the code structure is messier than necessary.

Yeah, I don't think it's a bug currently, although...

> > To fix it, we can simply restore o->emitted_symbols to NULL after
> > flushing it, so that it does not affect anything outside of
> > diff_flush_patch_all_file_pairs(). This intuitively makes sense, since
> > nobody outside of that function is going to bother flushing it, so we
> > would not want them to write to it either.
> 
> Perhaps.  I see word-diff codepath gives an allocated buffer to
> o->emitted_symbols, so assigning NULL without freeing would mean a
> leak, but I guess this helper function is not designed to be called

Hrm, I'm embarrassed to say I did not notice that it also uses the
emitted_symbols system.

I think we only do it there, though, in the sub-diff_options that
word-diff uses, in which case we make a separate emitted_diff_symbols
struct instead of re-using the one from the parent diff_options.

So I think the general idea still holds, which is that whoever assigns
the emitted_symbols flag is responsible for flushing it. For
--color-moved, that happens in a single function, but for word-diff,
it's split across the init/flush functions.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] diff: clear emitted_symbols flag after use
  2019-01-24 20:36     ` Stefan Beller
@ 2019-01-24 21:17       ` Jeff King
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff King @ 2019-01-24 21:17 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, git, David Turner

On Thu, Jan 24, 2019 at 12:36:38PM -0800, Stefan Beller wrote:

> >  - In an if() block we enter when o->emitted_symbols is set, there
> >    is a check to see if o->color_moved is set.  This makes sense
> >    only if we are trying to be prepared to handle a case where we
> >    are not the one that assigned a non-NULL to o->emitted_symbols
> >    due to o->color_moved.  So it certainly is possible that
> >    o->emitted_symbols is set before we enter this function.
> 
> Ah I see where you are coming from, as the code was written
> I imagined:
> 
>     if (o->color_moved)
>         o->emitted_symbols = &esm;
>     if (o->distim_gostaks)
>         o->emitted_symbols = &esm;
> 
>     if (o->emitted_symbols) {
>          if (o->color_moved)
>             handle_color_moved(o);
>         if (o->distim_gostaks)
>             handle_distimming(o);
> 
>         ... flush symbols...
>         ... free &cleanup ...
>     }

Yeah, FWIW this is what I took to be the reason for the code being laid
out as it is.

> >  - But then, it means that o->emitted_symbols we may have had
> >    non-NULL when the function is called may be overwritten if
> >    o->color_moved is set.
> 
> I see. So either we'd want to have
> 
>     if (o->emitted_symbols)
>         BUG("should not be already set");
> 
> or as Peff points out, make it non-static.

Even if it's non-static, you'd still want to ensure that it's not set
coming into the function (because somebody like diff-words might have
left it set, confusing us). So I think that BUG() may be worth having
either way.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-01-24 21:17 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-24 12:26 [PATCH 0/6] some diff --cc --stat fixes Jeff King
2019-01-24 12:27 ` [PATCH 1/6] t4006: resurrect commented-out tests Jeff King
2019-01-24 18:18   ` Stefan Beller
2019-01-24 12:32 ` [PATCH 2/6] diff: clear emitted_symbols flag after use Jeff King
2019-01-24 18:55   ` Stefan Beller
2019-01-24 19:11     ` Jeff King
2019-01-24 20:18   ` Junio C Hamano
2019-01-24 20:36     ` Stefan Beller
2019-01-24 21:17       ` Jeff King
2019-01-24 21:15     ` Jeff King
2019-01-24 12:33 ` [PATCH 3/6] combine-diff: factor out stat-format mask Jeff King
2019-01-24 12:34 ` [PATCH 4/6] combine-diff: treat --shortstat like --stat Jeff King
2019-01-24 18:58   ` David Turner
2019-01-24 19:02   ` Stefan Beller
2019-01-24 12:35 ` [PATCH 5/6] combine-diff: treat --summary " Jeff King
2019-01-24 19:14   ` Stefan Beller
2019-01-24 19:23     ` Jeff King
2019-01-24 12:36 ` [PATCH 6/6] combine-diff: treat --dirstat " Jeff King
2019-01-24 19:21 ` [PATCH 0/6] some diff --cc --stat fixes Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).