git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [BUG] log --graph corrupts patch
@ 2018-03-19 10:21 Phillip Wood
  2018-03-20  6:09 ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Phillip Wood @ 2018-03-19 10:21 UTC (permalink / raw)
  To: Git Mailing List

I've just been reviewing some patches with 'git log --graph --patch' and
came across what looked like a bug:

| @@ -272,6 +272,9 @@ do
|       --keep-empty)
|               keep_empty=yes
|               ;;
|       --allow-empty-message)
| +     --no-keep-empty)
| +             keep_empty=
| +             ;;
|               allow_empty_message=--allow-empty-message
|               ;;

However when I looked at the file it was fine, "--allow-empty-message)"
was actually below the insertions. 'git log --patch' gives the correct
patch:

@@ -272,6 +272,9 @@ do
        --keep-empty)
                keep_empty=yes
                ;;
+       --no-keep-empty)
+               keep_empty=
+               ;;
        --allow-empty-message)
                allow_empty_message=--allow-empty-message
                ;;

for some reason adding --graph causes the patch to get corrupted. I've
tried all combinations of --[no-]-indent-heuristic and
--diff-algorithm={patience|minimal|histogram|myers} and they all give
the same result. I've no idea what is going on, it happens with 2.16.2
and recent next and master. I've pushed the commit to github so anyone
who is interested can get it with

git fetch https://github.com/phillipwood/git.git log-graph-breaks-patch

Best Wishes

Phillip

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] log --graph corrupts patch
  2018-03-19 10:21 [BUG] log --graph corrupts patch Phillip Wood
@ 2018-03-20  6:09 ` Jeff King
  2018-03-20  9:58   ` Phillip Wood
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2018-03-20  6:09 UTC (permalink / raw)
  To: phillip.wood; +Cc: Git Mailing List

On Mon, Mar 19, 2018 at 10:21:56AM +0000, Phillip Wood wrote:

> I've just been reviewing some patches with 'git log --graph --patch' and
> came across what looked like a bug:
> 
> | @@ -272,6 +272,9 @@ do
> |       --keep-empty)
> |               keep_empty=yes
> |               ;;
> |       --allow-empty-message)
> | +     --no-keep-empty)
> | +             keep_empty=
> | +             ;;
> |               allow_empty_message=--allow-empty-message
> |               ;;
> 
> However when I looked at the file it was fine, "--allow-empty-message)"
> was actually below the insertions. 'git log --patch' gives the correct
> patch:
> [...]
> git fetch https://github.com/phillipwood/git.git log-graph-breaks-patch

That's really strange. I can't seem to replicate it here, though; it
looks correct with our without --graph. Knowing how the graph code is
implemented, it seems like an unlikely bug for us to output lines out of
order (but of course anything's possible).

Are you using any exotic filters for your pager? If you use "git
--no-pager" does the problem persist?

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] log --graph corrupts patch
  2018-03-20  6:09 ` Jeff King
@ 2018-03-20  9:58   ` Phillip Wood
  2018-03-20 15:58     ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Phillip Wood @ 2018-03-20  9:58 UTC (permalink / raw)
  To: Jeff King, phillip.wood; +Cc: Git Mailing List

On 20/03/18 06:09, Jeff King wrote:
> On Mon, Mar 19, 2018 at 10:21:56AM +0000, Phillip Wood wrote:
> 
>> I've just been reviewing some patches with 'git log --graph --patch' and
>> came across what looked like a bug:
>>
>> | @@ -272,6 +272,9 @@ do
>> |       --keep-empty)
>> |               keep_empty=yes
>> |               ;;
>> |       --allow-empty-message)
>> | +     --no-keep-empty)
>> | +             keep_empty=
>> | +             ;;
>> |               allow_empty_message=--allow-empty-message
>> |               ;;
>>
>> However when I looked at the file it was fine, "--allow-empty-message)"
>> was actually below the insertions. 'git log --patch' gives the correct
>> patch:
>> [...]
>> git fetch https://github.com/phillipwood/git.git log-graph-breaks-patch
> 
> That's really strange. I can't seem to replicate it here, though; it
> looks correct with our without --graph. Knowing how the graph code is
> implemented, it seems like an unlikely bug for us to output lines out of
> order (but of course anything's possible).
> 
> Are you using any exotic filters for your pager? If you use "git
> --no-pager" does the problem persist?

Hi Peff, thanks for taking the time to check this, I had forgotten about
the pager. I'm using diff-highlight and it seems that is causing the
problems.

Thanks again

Phillip

> -Peff
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] log --graph corrupts patch
  2018-03-20  9:58   ` Phillip Wood
@ 2018-03-20 15:58     ` Jeff King
  2018-03-21  5:47       ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2018-03-20 15:58 UTC (permalink / raw)
  To: phillip.wood; +Cc: Brian Henderson, Git Mailing List

On Tue, Mar 20, 2018 at 09:58:14AM +0000, Phillip Wood wrote:

> > Are you using any exotic filters for your pager? If you use "git
> > --no-pager" does the problem persist?
> 
> Hi Peff, thanks for taking the time to check this, I had forgotten about
> the pager. I'm using diff-highlight and it seems that is causing the
> problems.

Heh. Lucky guess. Unsurprisingly, I use diff-highlight, too. But I did
not see it because I never bothered to upgrade my personal copy of the
script, which has been working for me for ages, to the one in contrib/.

But indeed, I can easily reproduce the problem with that version of the
script. Here's a pretty minimal reproduction:

-- >8 --

cat >bad <<\EOF
* commit whatever
| other stuff irrelevant
|
| diff --git a/foo b/foo
| --- a/foo
| --- b/foo
| @@ -100,6 +100,9
|  some
|  context
|  lines
| +some
| +new
| +lines
|  -context line with a leading minus
|  and other
|  context
EOF

contrib/diff-highlight/diff-highlight <bad

-- 8< --

which produce:

...
|  -context line with a leading minus
| +some
| +new
| +lines
...

The issue bisects to 7e4ffb4c17 (diff-highlight: add support for --graph
output, 2016-08-29). I think the problem is the "\s+" at the end of the
$GRAPH regex, which soaks up the space for the context, and accidentally
treats the "-" line as a preimage removal.

But just switching that to "\s" doesn't quite work. We may have an
arbitrary number of spaces between the graph ascii-art and the diff.
E.g., if you have a commit at the base of a branch (the test in
contrib/diff-highlight shows this case).

So I think you'd have to record the indent of the previous hunk header,
and then make sure that the indent matched that. But even there, I think
we're subject to false positives if a commit message contains a hunk
header (it's indented an extra 4 characters, but we'd accidentally soak
that up thinking it was graph indentation).

To make it bullet-proof, I think we'd have to actually parse the graph
structure, finding a "*" line and then accepting only an indent that
matched it.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [BUG] log --graph corrupts patch
  2018-03-20 15:58     ` Jeff King
@ 2018-03-21  5:47       ` Jeff King
  2018-03-21  5:47         ` [PATCH 1/7] diff-highlight: correct test graph diagram Jeff King
                           ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Jeff King @ 2018-03-21  5:47 UTC (permalink / raw)
  To: phillip.wood; +Cc: Brian Henderson, Git Mailing List

On Tue, Mar 20, 2018 at 11:58:14AM -0400, Jeff King wrote:

> The issue bisects to 7e4ffb4c17 (diff-highlight: add support for --graph
> output, 2016-08-29). I think the problem is the "\s+" at the end of the
> $GRAPH regex, which soaks up the space for the context, and accidentally
> treats the "-" line as a preimage removal.
> 
> But just switching that to "\s" doesn't quite work. We may have an
> arbitrary number of spaces between the graph ascii-art and the diff.
> E.g., if you have a commit at the base of a branch (the test in
> contrib/diff-highlight shows this case).
> 
> So I think you'd have to record the indent of the previous hunk header,
> and then make sure that the indent matched that. But even there, I think
> we're subject to false positives if a commit message contains a hunk
> header (it's indented an extra 4 characters, but we'd accidentally soak
> that up thinking it was graph indentation).
> 
> To make it bullet-proof, I think we'd have to actually parse the graph
> structure, finding a "*" line and then accepting only an indent that
> matched it.

Wow. Nerd snipe successful. This turned out to be quite tricky, but also
kind of interesting.

Here's a series which fixes it. The meaty bits are in the final commit;
the rest is just preparatory cleanup, and adding some tests (all are
cases which I managed to break while fixing this).

  [1/7]: diff-highlight: correct test graph diagram
  [2/7]: diff-highlight: use test_tick in graph test
  [3/7]: diff-highlight: prefer "echo" to "cat" in tests
  [4/7]: diff-highlight: test interleaved parallel lines of history
  [5/7]: diff-highlight: test graphs with --color
  [6/7]: diff-highlight: factor out flush_hunk() helper
  [7/7]: diff-highlight: detect --graph by indent

 contrib/diff-highlight/DiffHighlight.pm       | 89 +++++++++++++++----
 .../diff-highlight/t/t9400-diff-highlight.sh  | 81 +++++++++++++----
 2 files changed, 133 insertions(+), 37 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/7] diff-highlight: correct test graph diagram
  2018-03-21  5:47       ` Jeff King
@ 2018-03-21  5:47         ` Jeff King
  2018-03-21  5:48         ` [PATCH 2/7] diff-highlight: use test_tick in graph test Jeff King
                           ` (6 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2018-03-21  5:47 UTC (permalink / raw)
  To: phillip.wood; +Cc: Brian Henderson, Git Mailing List

We actually branch "A" off of "D". The sample "--graph"
output is right, but the left-to-right diagram is
misleading. Let's fix it.

Signed-off-by: Jeff King <peff@peff.net>
---
 contrib/diff-highlight/t/t9400-diff-highlight.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/contrib/diff-highlight/t/t9400-diff-highlight.sh b/contrib/diff-highlight/t/t9400-diff-highlight.sh
index 3b43dbed74..bf62f036c7 100755
--- a/contrib/diff-highlight/t/t9400-diff-highlight.sh
+++ b/contrib/diff-highlight/t/t9400-diff-highlight.sh
@@ -52,8 +52,8 @@ test_strip_patch_header () {
 # dh_test_setup_history generates a contrived graph such that we have at least
 # 1 nesting (E) and 2 nestings (F).
 #
-#	      A branch
-#	     /
+#	  A branch
+#	 /
 #	D---E---F master
 #
 #	git log --all --graph
-- 
2.17.0.rc0.402.ged0b3fd1ee


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/7] diff-highlight: use test_tick in graph test
  2018-03-21  5:47       ` Jeff King
  2018-03-21  5:47         ` [PATCH 1/7] diff-highlight: correct test graph diagram Jeff King
@ 2018-03-21  5:48         ` Jeff King
  2018-03-21  5:48         ` [PATCH 3/7] diff-highlight: prefer "echo" to "cat" in tests Jeff King
                           ` (5 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2018-03-21  5:48 UTC (permalink / raw)
  To: phillip.wood; +Cc: Brian Henderson, Git Mailing List

The exact ordering output by Git may depend on the commit
timestamps, so let's make sure they're actually
monotonically increasing, and not all the same (or worse,
subject to how long the test script takes to run).

Let's use test_tick to make sure this is stable. Note that
we actually have to rearrange the order of the branches to
match the expected graph structure (which means that
previously we might racily have been testing a slightly
different output, though the test is written in such a way
that we'd still pass).

Signed-off-by: Jeff King <peff@peff.net>
---
 .../diff-highlight/t/t9400-diff-highlight.sh   | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/contrib/diff-highlight/t/t9400-diff-highlight.sh b/contrib/diff-highlight/t/t9400-diff-highlight.sh
index bf62f036c7..deab90ed91 100755
--- a/contrib/diff-highlight/t/t9400-diff-highlight.sh
+++ b/contrib/diff-highlight/t/t9400-diff-highlight.sh
@@ -52,9 +52,9 @@ test_strip_patch_header () {
 # dh_test_setup_history generates a contrived graph such that we have at least
 # 1 nesting (E) and 2 nestings (F).
 #
-#	  A branch
+#	  A master
 #	 /
-#	D---E---F master
+#	D---E---F branch
 #
 #	git log --all --graph
 #	* commit
@@ -74,18 +74,22 @@ dh_test_setup_history () {
 
 	cat file1 >file &&
 	git add file &&
+	test_tick &&
 	git commit -m "D" &&
 
 	git checkout -b branch &&
 	cat file2 >file &&
-	git commit -a -m "A" &&
-
-	git checkout master &&
-	cat file2 >file &&
+	test_tick &&
 	git commit -a -m "E" &&
 
 	cat file3 >file &&
-	git commit -a -m "F"
+	test_tick &&
+	git commit -a -m "F" &&
+
+	git checkout master &&
+	cat file2 >file &&
+	test_tick &&
+	git commit -a -m "A"
 }
 
 left_trim () {
-- 
2.17.0.rc0.402.ged0b3fd1ee


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/7] diff-highlight: prefer "echo" to "cat" in tests
  2018-03-21  5:47       ` Jeff King
  2018-03-21  5:47         ` [PATCH 1/7] diff-highlight: correct test graph diagram Jeff King
  2018-03-21  5:48         ` [PATCH 2/7] diff-highlight: use test_tick in graph test Jeff King
@ 2018-03-21  5:48         ` Jeff King
  2018-03-21  5:49         ` [PATCH 4/7] diff-highlight: test interleaved parallel lines of history Jeff King
                           ` (4 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2018-03-21  5:48 UTC (permalink / raw)
  To: phillip.wood; +Cc: Brian Henderson, Git Mailing List

We generate a bunch of one-line files whose contents match
their names, and then generate our commits by cat-ing those
files. Let's just echo the contents directly, which saves
some processes.

Signed-off-by: Jeff King <peff@peff.net>
---
 contrib/diff-highlight/t/t9400-diff-highlight.sh | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/contrib/diff-highlight/t/t9400-diff-highlight.sh b/contrib/diff-highlight/t/t9400-diff-highlight.sh
index deab90ed91..3f02d31467 100755
--- a/contrib/diff-highlight/t/t9400-diff-highlight.sh
+++ b/contrib/diff-highlight/t/t9400-diff-highlight.sh
@@ -68,26 +68,22 @@ test_strip_patch_header () {
 #	     D
 #
 dh_test_setup_history () {
-	echo "file1" >file1 &&
-	echo "file2" >file2 &&
-	echo "file3" >file3 &&
-
-	cat file1 >file &&
+	echo file1 >file &&
 	git add file &&
 	test_tick &&
 	git commit -m "D" &&
 
 	git checkout -b branch &&
-	cat file2 >file &&
+	echo file2 >file &&
 	test_tick &&
 	git commit -a -m "E" &&
 
-	cat file3 >file &&
+	echo file3 >file &&
 	test_tick &&
 	git commit -a -m "F" &&
 
 	git checkout master &&
-	cat file2 >file &&
+	echo file2 >file &&
 	test_tick &&
 	git commit -a -m "A"
 }
-- 
2.17.0.rc0.402.ged0b3fd1ee


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/7] diff-highlight: test interleaved parallel lines of history
  2018-03-21  5:47       ` Jeff King
                           ` (2 preceding siblings ...)
  2018-03-21  5:48         ` [PATCH 3/7] diff-highlight: prefer "echo" to "cat" in tests Jeff King
@ 2018-03-21  5:49         ` Jeff King
  2018-03-21  5:49         ` [PATCH 5/7] diff-highlight: test graphs with --color Jeff King
                           ` (3 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2018-03-21  5:49 UTC (permalink / raw)
  To: phillip.wood; +Cc: Brian Henderson, Git Mailing List

The graph test in t9400 covers the case of two simultaneous
branches, but all of the commits during this time are on the
right-hand branch. So we test a graph structure like:

  | |
  | * commit ...
  | |

but we never see the reverse, a commit on the left-hand
branch:

  | |
  * | commit ...
  | |

Since this is an easy thing to get wrong when touching the
graph-matching code, let's cover it by adding one more
commit with its timestamp interleaved with the other branch.

Note that we need to pass --date-order to convince Git to
show it this way (since --topo-order tries to keep lines of
history separate).

Signed-off-by: Jeff King <peff@peff.net>
---
 .../diff-highlight/t/t9400-diff-highlight.sh  | 22 +++++++++++++------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/contrib/diff-highlight/t/t9400-diff-highlight.sh b/contrib/diff-highlight/t/t9400-diff-highlight.sh
index 3f02d31467..33bcdbc90f 100755
--- a/contrib/diff-highlight/t/t9400-diff-highlight.sh
+++ b/contrib/diff-highlight/t/t9400-diff-highlight.sh
@@ -52,15 +52,17 @@ test_strip_patch_header () {
 # dh_test_setup_history generates a contrived graph such that we have at least
 # 1 nesting (E) and 2 nestings (F).
 #
-#	  A master
+#	  A---B master
 #	 /
 #	D---E---F branch
 #
 #	git log --all --graph
 #	* commit
-#	|    A
+#	|    B
 #	| * commit
 #	| |    F
+#	* | commit
+#	| |    A
 #	| * commit
 #	|/
 #	|    E
@@ -78,14 +80,20 @@ dh_test_setup_history () {
 	test_tick &&
 	git commit -a -m "E" &&
 
+	git checkout master &&
+	echo file2 >file &&
+	test_tick &&
+	git commit -a -m "A" &&
+
+	git checkout branch &&
 	echo file3 >file &&
 	test_tick &&
 	git commit -a -m "F" &&
 
 	git checkout master &&
-	echo file2 >file &&
+	echo file3 >file &&
 	test_tick &&
-	git commit -a -m "A"
+	git commit -a -m "B"
 }
 
 left_trim () {
@@ -246,12 +254,12 @@ test_expect_failure 'diff-highlight treats combining code points as a unit' '
 test_expect_success 'diff-highlight works with the --graph option' '
 	dh_test_setup_history &&
 
-	# topo-order so that the order of the commits is the same as with --graph
+	# date-order so that the commits are interleaved for both
 	# trim graph elements so we can do a diff
 	# trim leading space because our trim_graph is not perfect
-	git log --branches -p --topo-order |
+	git log --branches -p --date-order |
 		"$DIFF_HIGHLIGHT" | left_trim >graph.exp &&
-	git log --branches -p --graph |
+	git log --branches -p --date-order --graph |
 		"$DIFF_HIGHLIGHT" | trim_graph | left_trim >graph.act &&
 	test_cmp graph.exp graph.act
 '
-- 
2.17.0.rc0.402.ged0b3fd1ee


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/7] diff-highlight: test graphs with --color
  2018-03-21  5:47       ` Jeff King
                           ` (3 preceding siblings ...)
  2018-03-21  5:49         ` [PATCH 4/7] diff-highlight: test interleaved parallel lines of history Jeff King
@ 2018-03-21  5:49         ` Jeff King
  2018-03-21  5:56         ` [PATCH 6/7] diff-highlight: use flush() helper consistently Jeff King
                           ` (2 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2018-03-21  5:49 UTC (permalink / raw)
  To: phillip.wood; +Cc: Brian Henderson, Git Mailing List

Our tests send git's output directly to files or pipes, so
there will never be any color. Let's do at least one --color
test to make sure that we can handle this case (which we
currently can, but will be an easy thing to mess up when we
touch the graph code in a future patch).

We'll just cover the --graph case, since this is much more
complex than the earlier cases (i.e., if it manages to
highlight, then the non-graph case definitely would).

Signed-off-by: Jeff King <peff@peff.net>
---
 contrib/diff-highlight/t/t9400-diff-highlight.sh | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/contrib/diff-highlight/t/t9400-diff-highlight.sh b/contrib/diff-highlight/t/t9400-diff-highlight.sh
index 33bcdbc90f..bf0c270d60 100755
--- a/contrib/diff-highlight/t/t9400-diff-highlight.sh
+++ b/contrib/diff-highlight/t/t9400-diff-highlight.sh
@@ -264,6 +264,15 @@ test_expect_success 'diff-highlight works with the --graph option' '
 	test_cmp graph.exp graph.act
 '
 
+# Just reuse the previous graph test, but with --color.  Our trimming
+# doesn't know about color, so just sanity check that something got
+# highlighted.
+test_expect_success 'diff-highlight works with color graph' '
+	git log --branches -p --date-order --graph --color |
+		"$DIFF_HIGHLIGHT" | trim_graph | left_trim >graph &&
+	grep "\[7m" graph
+'
+
 # Most combined diffs won't meet diff-highlight's line-number filter. So we
 # create one here where one side drops a line and the other modifies it. That
 # should result in a diff like:
-- 
2.17.0.rc0.402.ged0b3fd1ee


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/7] diff-highlight: use flush() helper consistently
  2018-03-21  5:47       ` Jeff King
                           ` (4 preceding siblings ...)
  2018-03-21  5:49         ` [PATCH 5/7] diff-highlight: test graphs with --color Jeff King
@ 2018-03-21  5:56         ` Jeff King
  2018-03-21  5:59         ` [PATCH 7/7] diff-highlight: detect --graph by indent Jeff King
  2018-03-21 17:23         ` [BUG] log --graph corrupts patch Junio C Hamano
  7 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2018-03-21  5:56 UTC (permalink / raw)
  To: phillip.wood; +Cc: Brian Henderson, Git Mailing List

The current flush() helper only shows the queued diff but
does not clear the queue. This is conceptually a bug, but it
works because we only call it once at the end of the
program.

Let's teach it to clear the queue, which will let us use it
in more places (one for now, but more in future patches).

Signed-off-by: Jeff King <peff@peff.net>
---
 contrib/diff-highlight/DiffHighlight.pm | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/contrib/diff-highlight/DiffHighlight.pm b/contrib/diff-highlight/DiffHighlight.pm
index 663992e530..e07cd5931d 100644
--- a/contrib/diff-highlight/DiffHighlight.pm
+++ b/contrib/diff-highlight/DiffHighlight.pm
@@ -46,10 +46,7 @@ sub handle_line {
 		push @added, $_;
 	}
 	else {
-		show_hunk(\@removed, \@added);
-		@removed = ();
-		@added = ();
-
+		flush();
 		$line_cb->($_);
 		$in_hunk = /^$GRAPH*$COLOR*[\@ ]/;
 	}
@@ -71,6 +68,8 @@ sub flush {
 	# Flush any queued hunk (this can happen when there is no trailing
 	# context in the final diff of the input).
 	show_hunk(\@removed, \@added);
+	@removed = ();
+	@added = ();
 }
 
 sub highlight_stdin {
-- 
2.17.0.rc0.402.ged0b3fd1ee


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 7/7] diff-highlight: detect --graph by indent
  2018-03-21  5:47       ` Jeff King
                           ` (5 preceding siblings ...)
  2018-03-21  5:56         ` [PATCH 6/7] diff-highlight: use flush() helper consistently Jeff King
@ 2018-03-21  5:59         ` Jeff King
  2018-03-22 10:59           ` Phillip Wood
  2018-03-21 17:23         ` [BUG] log --graph corrupts patch Junio C Hamano
  7 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2018-03-21  5:59 UTC (permalink / raw)
  To: phillip.wood; +Cc: Brian Henderson, Git Mailing List

This patch fixes a corner case where diff-highlight may
scramble some diffs when combined with --graph.

Commit 7e4ffb4c17 (diff-highlight: add support for --graph
output, 2016-08-29) taught diff-highlight to skip past the
graph characters at the start of each line with this regex:

  ($COLOR?\|$COLOR?\s+)*

I.e., any series of pipes separated by and followed by
arbitrary whitespace.  We need to match more than just a
single space because the commit in question may be indented
to accommodate other parts of the graph drawing. E.g.:

 * commit 1234abcd
 | ...
 | diff --git ...

has only a single space, but for the last commit before a
fork:

 | | |
 | * | commit 1234abcd
 | |/  ...
 | |   diff --git

the diff lines have more spaces between the pipes and the
start of the diff.

However, when we soak up all of those spaces with the
$GRAPH regex, we may accidentally include the leading space
for a context line. That means we may consider the actual
contents of a context line as part of the diff syntax. In
other words, something like this:

   normal context line
  -old line
  +new line
   -this is a context line with a leading dash

would cause us to see that final context line as a removal
line, and we'd end up showing the hunk in the wrong order:

  normal context line
  -old line
   -this is a context line with a leading dash
  +new line

Instead, let's a be a little more clever about parsing the
graph. We'll look for the actual "*" line that marks the
start of a commit, and record the indentation we see there.
Then we can skip past that indentation when checking whether
the line is a hunk header, removal, addition, etc.

There is one tricky thing: the indentation in bytes may be
different for various lines of the graph due to coloring.
E.g., the "*" on a commit line is generally shown without
color, but on the actual diff lines, it will be replaced
with a colorized "|" character, adding several bytes. We
work around this here by counting "visible" bytes. This is
unfortunately a bit more expensive, making us about twice as
slow to handle --graph output. But since this is meant to be
used interactively anyway, it's tolerably fast (and the
non-graph case is unaffected).

One alternative would be to search for hunk header lines and
use their indentation (since they'd have the same colors as
the diff lines which follow). But that just opens up
different corner cases. If we see:

  | |    @@ 1,2 1,3 @@

we cannot know if this is a real diff that has been
indented due to the graph, or if it's a context line that
happens to look like a diff header. We can only be sure of
the indent on the "*" lines, since we know those don't
contain arbitrary data (technically the user could include a
bunch of extra indentation via --format, but that's rare
enough to disregard).

Reported-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Jeff King <peff@peff.net>
---
 contrib/diff-highlight/DiffHighlight.pm       | 78 +++++++++++++++----
 .../diff-highlight/t/t9400-diff-highlight.sh  | 28 +++++++
 2 files changed, 91 insertions(+), 15 deletions(-)

diff --git a/contrib/diff-highlight/DiffHighlight.pm b/contrib/diff-highlight/DiffHighlight.pm
index e07cd5931d..536754583b 100644
--- a/contrib/diff-highlight/DiffHighlight.pm
+++ b/contrib/diff-highlight/DiffHighlight.pm
@@ -21,34 +21,82 @@ my $RESET = "\x1b[m";
 my $COLOR = qr/\x1b\[[0-9;]*m/;
 my $BORING = qr/$COLOR|\s/;
 
-# The patch portion of git log -p --graph should only ever have preceding | and
-# not / or \ as merge history only shows up on the commit line.
-my $GRAPH = qr/$COLOR?\|$COLOR?\s+/;
-
 my @removed;
 my @added;
 my $in_hunk;
+my $graph_indent = 0;
 
 our $line_cb = sub { print @_ };
 our $flush_cb = sub { local $| = 1 };
 
-sub handle_line {
+# Count the visible width of a string, excluding any terminal color sequences.
+sub visible_width {
 	local $_ = shift;
+	my $ret = 0;
+	while (length) {
+		if (s/^$COLOR//) {
+			# skip colors
+		} elsif (s/^.//) {
+			$ret++;
+		}
+	}
+	return $ret;
+}
+
+# Return a substring of $str, omitting $len visible characters from the
+# beginning, where terminal color sequences do not count as visible.
+sub visible_substr {
+	my ($str, $len) = @_;
+	while ($len > 0) {
+		if ($str =~ s/^$COLOR//) {
+			next
+		}
+		$str =~ s/^.//;
+		$len--;
+	}
+	return $str;
+}
+
+sub handle_line {
+	my $orig = shift;
+	local $_ = $orig;
+
+	# match a graph line that begins a commit
+	if (/^(?:$COLOR?\|$COLOR?[ ])* # zero or more leading "|" with space
+	         $COLOR?\*$COLOR?[ ]   # a "*" with its trailing space
+	      (?:$COLOR?\|$COLOR?[ ])* # zero or more trailing "|"
+	                         [ ]*  # trailing whitespace for merges
+	    /x) {
+	        my $graph_prefix = $&;
+
+		# We must flush before setting graph indent, since the
+		# new commit may be indented differently from what we
+		# queued.
+		flush();
+		$graph_indent = visible_width($graph_prefix);
+
+	} elsif ($graph_indent) {
+		if (length($_) < $graph_indent) {
+			$graph_indent = 0;
+		} else {
+			$_ = visible_substr($_, $graph_indent);
+		}
+	}
 
 	if (!$in_hunk) {
-		$line_cb->($_);
-		$in_hunk = /^$GRAPH*$COLOR*\@\@ /;
+		$line_cb->($orig);
+		$in_hunk = /^$COLOR*\@\@ /;
 	}
-	elsif (/^$GRAPH*$COLOR*-/) {
-		push @removed, $_;
+	elsif (/^$COLOR*-/) {
+		push @removed, $orig;
 	}
-	elsif (/^$GRAPH*$COLOR*\+/) {
-		push @added, $_;
+	elsif (/^$COLOR*\+/) {
+		push @added, $orig;
 	}
 	else {
 		flush();
-		$line_cb->($_);
-		$in_hunk = /^$GRAPH*$COLOR*[\@ ]/;
+		$line_cb->($orig);
+		$in_hunk = /^$COLOR*[\@ ]/;
 	}
 
 	# Most of the time there is enough output to keep things streaming,
@@ -225,8 +273,8 @@ sub is_pair_interesting {
 	my $suffix_a = join('', @$a[($sa+1)..$#$a]);
 	my $suffix_b = join('', @$b[($sb+1)..$#$b]);
 
-	return $prefix_a !~ /^$GRAPH*$COLOR*-$BORING*$/ ||
-	       $prefix_b !~ /^$GRAPH*$COLOR*\+$BORING*$/ ||
+	return visible_substr($prefix_a, $graph_indent) !~ /^$COLOR*-$BORING*$/ ||
+	       visible_substr($prefix_b, $graph_indent) !~ /^$COLOR*\+$BORING*$/ ||
 	       $suffix_a !~ /^$BORING*$/ ||
 	       $suffix_b !~ /^$BORING*$/;
 }
diff --git a/contrib/diff-highlight/t/t9400-diff-highlight.sh b/contrib/diff-highlight/t/t9400-diff-highlight.sh
index bf0c270d60..f6f5195d00 100755
--- a/contrib/diff-highlight/t/t9400-diff-highlight.sh
+++ b/contrib/diff-highlight/t/t9400-diff-highlight.sh
@@ -310,4 +310,32 @@ test_expect_success 'diff-highlight ignores combined diffs' '
 	test_cmp expect actual
 '
 
+test_expect_success 'diff-highlight handles --graph with leading dash' '
+	cat >file <<-\EOF &&
+	before
+	the old line
+	-leading dash
+	EOF
+	git add file &&
+	git commit -m before &&
+
+	sed s/old/new/ <file >file.tmp &&
+	mv file.tmp file &&
+	git add file &&
+	git commit -m after &&
+
+	cat >expect <<-EOF &&
+	--- a/file
+	+++ b/file
+	@@ -1,3 +1,3 @@
+	 before
+	-the ${CW}old${CR} line
+	+the ${CW}new${CR} line
+	 -leading dash
+	EOF
+	git log --graph -p -1 | "$DIFF_HIGHLIGHT" >actual.raw &&
+	trim_graph <actual.raw | sed -n "/^---/,\$p" >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.17.0.rc0.402.ged0b3fd1ee

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [BUG] log --graph corrupts patch
  2018-03-21  5:47       ` Jeff King
                           ` (6 preceding siblings ...)
  2018-03-21  5:59         ` [PATCH 7/7] diff-highlight: detect --graph by indent Jeff King
@ 2018-03-21 17:23         ` Junio C Hamano
  7 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2018-03-21 17:23 UTC (permalink / raw)
  To: Jeff King; +Cc: phillip.wood, Brian Henderson, Git Mailing List

Jeff King <peff@peff.net> writes:

>> To make it bullet-proof, I think we'd have to actually parse the graph
>> structure, finding a "*" line and then accepting only an indent that
>> matched it.
>
> Wow. Nerd snipe successful. This turned out to be quite tricky, but also
> kind of interesting.

The last patch looks quite "interesting" ;-).  The idea is sound and
probably the execution, too, but it indeed is tricky.

>
> Here's a series which fixes it. The meaty bits are in the final commit;
> the rest is just preparatory cleanup, and adding some tests (all are
> cases which I managed to break while fixing this).
>
>   [1/7]: diff-highlight: correct test graph diagram
>   [2/7]: diff-highlight: use test_tick in graph test
>   [3/7]: diff-highlight: prefer "echo" to "cat" in tests
>   [4/7]: diff-highlight: test interleaved parallel lines of history
>   [5/7]: diff-highlight: test graphs with --color
>   [6/7]: diff-highlight: factor out flush_hunk() helper
>   [7/7]: diff-highlight: detect --graph by indent
>
>  contrib/diff-highlight/DiffHighlight.pm       | 89 +++++++++++++++----
>  .../diff-highlight/t/t9400-diff-highlight.sh  | 81 +++++++++++++----
>  2 files changed, 133 insertions(+), 37 deletions(-)
>
> -Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 7/7] diff-highlight: detect --graph by indent
  2018-03-21  5:59         ` [PATCH 7/7] diff-highlight: detect --graph by indent Jeff King
@ 2018-03-22 10:59           ` Phillip Wood
  2018-03-22 11:18             ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Phillip Wood @ 2018-03-22 10:59 UTC (permalink / raw)
  To: Jeff King, phillip.wood; +Cc: Brian Henderson, Git Mailing List, Junio C Hamano

On 21/03/18 05:59, Jeff King wrote:
> This patch fixes a corner case where diff-highlight may
> scramble some diffs when combined with --graph.
> 
> Commit 7e4ffb4c17 (diff-highlight: add support for --graph
> output, 2016-08-29) taught diff-highlight to skip past the
> graph characters at the start of each line with this regex:
> 
>   ($COLOR?\|$COLOR?\s+)*
> 
> I.e., any series of pipes separated by and followed by
> arbitrary whitespace.  We need to match more than just a
> single space because the commit in question may be indented
> to accommodate other parts of the graph drawing. E.g.:
> 
>  * commit 1234abcd
>  | ...
>  | diff --git ...
> 
> has only a single space, but for the last commit before a
> fork:
> 
>  | | |
>  | * | commit 1234abcd
>  | |/  ...
>  | |   diff --git
> 
> the diff lines have more spaces between the pipes and the
> start of the diff.
> 
> However, when we soak up all of those spaces with the
> $GRAPH regex, we may accidentally include the leading space
> for a context line. That means we may consider the actual
> contents of a context line as part of the diff syntax. In
> other words, something like this:
> 
>    normal context line
>   -old line
>   +new line
>    -this is a context line with a leading dash
> 
> would cause us to see that final context line as a removal
> line, and we'd end up showing the hunk in the wrong order:
> 
>   normal context line
>   -old line
>    -this is a context line with a leading dash
>   +new line
> 
> Instead, let's a be a little more clever about parsing the
> graph. We'll look for the actual "*" line that marks the
> start of a commit, and record the indentation we see there.
> Then we can skip past that indentation when checking whether
> the line is a hunk header, removal, addition, etc.
> 
> There is one tricky thing: the indentation in bytes may be
> different for various lines of the graph due to coloring.
> E.g., the "*" on a commit line is generally shown without
> color, but on the actual diff lines, it will be replaced
> with a colorized "|" character, adding several bytes. We
> work around this here by counting "visible" bytes. This is
> unfortunately a bit more expensive, making us about twice as
> slow to handle --graph output. But since this is meant to be
> used interactively anyway, it's tolerably fast (and the
> non-graph case is unaffected).
> 
> One alternative would be to search for hunk header lines and
> use their indentation (since they'd have the same colors as
> the diff lines which follow). But that just opens up
> different corner cases. If we see:
> 
>   | |    @@ 1,2 1,3 @@
> 
> we cannot know if this is a real diff that has been
> indented due to the graph, or if it's a context line that
> happens to look like a diff header. We can only be sure of
> the indent on the "*" lines, since we know those don't
> contain arbitrary data (technically the user could include a
> bunch of extra indentation via --format, but that's rare
> enough to disregard).
> 
> Reported-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  contrib/diff-highlight/DiffHighlight.pm       | 78 +++++++++++++++----
>  .../diff-highlight/t/t9400-diff-highlight.sh  | 28 +++++++
>  2 files changed, 91 insertions(+), 15 deletions(-)
> 
> diff --git a/contrib/diff-highlight/DiffHighlight.pm b/contrib/diff-highlight/DiffHighlight.pm
> index e07cd5931d..536754583b 100644
> --- a/contrib/diff-highlight/DiffHighlight.pm
> +++ b/contrib/diff-highlight/DiffHighlight.pm
> @@ -21,34 +21,82 @@ my $RESET = "\x1b[m";
>  my $COLOR = qr/\x1b\[[0-9;]*m/;
>  my $BORING = qr/$COLOR|\s/;
>  
> -# The patch portion of git log -p --graph should only ever have preceding | and
> -# not / or \ as merge history only shows up on the commit line.
> -my $GRAPH = qr/$COLOR?\|$COLOR?\s+/;
> -
>  my @removed;
>  my @added;
>  my $in_hunk;
> +my $graph_indent = 0;
>  
>  our $line_cb = sub { print @_ };
>  our $flush_cb = sub { local $| = 1 };
>  
> -sub handle_line {
> +# Count the visible width of a string, excluding any terminal color sequences.
> +sub visible_width {
>  	local $_ = shift;
> +	my $ret = 0;
> +	while (length) {
> +		if (s/^$COLOR//) {
> +			# skip colors
> +		} elsif (s/^.//) {
> +			$ret++;
> +		}
> +	}
> +	return $ret;
> +}
> +
> +# Return a substring of $str, omitting $len visible characters from the
> +# beginning, where terminal color sequences do not count as visible.
> +sub visible_substr {
> +	my ($str, $len) = @_;
> +	while ($len > 0) {
> +		if ($str =~ s/^$COLOR//) {
> +			next
> +		}
> +		$str =~ s/^.//;
> +		$len--;
> +	}
> +	return $str;
> +}
> +
> +sub handle_line {
> +	my $orig = shift;
> +	local $_ = $orig;
> +
> +	# match a graph line that begins a commit
> +	if (/^(?:$COLOR?\|$COLOR?[ ])* # zero or more leading "|" with space
> +	         $COLOR?\*$COLOR?[ ]   # a "*" with its trailing space
> +	      (?:$COLOR?\|$COLOR?[ ])* # zero or more trailing "|"
> +	                         [ ]*  # trailing whitespace for merges

Hi Peff, thanks for looking at this. I've only had a quick look through
but I wonder if this will be confused by commit messages that contain
  * bullet points
  * like this

Best Wishes

Phillip

> +	    /x) {
> +	        my $graph_prefix = $&;
> +
> +		# We must flush before setting graph indent, since the
> +		# new commit may be indented differently from what we
> +		# queued.
> +		flush();
> +		$graph_indent = visible_width($graph_prefix);
> +
> +	} elsif ($graph_indent) {
> +		if (length($_) < $graph_indent) {
> +			$graph_indent = 0;
> +		} else {
> +			$_ = visible_substr($_, $graph_indent);
> +		}
> +	}
>  
>  	if (!$in_hunk) {
> -		$line_cb->($_);
> -		$in_hunk = /^$GRAPH*$COLOR*\@\@ /;
> +		$line_cb->($orig);
> +		$in_hunk = /^$COLOR*\@\@ /;
>  	}
> -	elsif (/^$GRAPH*$COLOR*-/) {
> -		push @removed, $_;
> +	elsif (/^$COLOR*-/) {
> +		push @removed, $orig;
>  	}
> -	elsif (/^$GRAPH*$COLOR*\+/) {
> -		push @added, $_;
> +	elsif (/^$COLOR*\+/) {
> +		push @added, $orig;
>  	}
>  	else {
>  		flush();
> -		$line_cb->($_);
> -		$in_hunk = /^$GRAPH*$COLOR*[\@ ]/;
> +		$line_cb->($orig);
> +		$in_hunk = /^$COLOR*[\@ ]/;
>  	}
>  
>  	# Most of the time there is enough output to keep things streaming,
> @@ -225,8 +273,8 @@ sub is_pair_interesting {
>  	my $suffix_a = join('', @$a[($sa+1)..$#$a]);
>  	my $suffix_b = join('', @$b[($sb+1)..$#$b]);
>  
> -	return $prefix_a !~ /^$GRAPH*$COLOR*-$BORING*$/ ||
> -	       $prefix_b !~ /^$GRAPH*$COLOR*\+$BORING*$/ ||
> +	return visible_substr($prefix_a, $graph_indent) !~ /^$COLOR*-$BORING*$/ ||
> +	       visible_substr($prefix_b, $graph_indent) !~ /^$COLOR*\+$BORING*$/ ||
>  	       $suffix_a !~ /^$BORING*$/ ||
>  	       $suffix_b !~ /^$BORING*$/;
>  }
> diff --git a/contrib/diff-highlight/t/t9400-diff-highlight.sh b/contrib/diff-highlight/t/t9400-diff-highlight.sh
> index bf0c270d60..f6f5195d00 100755
> --- a/contrib/diff-highlight/t/t9400-diff-highlight.sh
> +++ b/contrib/diff-highlight/t/t9400-diff-highlight.sh
> @@ -310,4 +310,32 @@ test_expect_success 'diff-highlight ignores combined diffs' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'diff-highlight handles --graph with leading dash' '
> +	cat >file <<-\EOF &&
> +	before
> +	the old line
> +	-leading dash
> +	EOF
> +	git add file &&
> +	git commit -m before &&
> +
> +	sed s/old/new/ <file >file.tmp &&
> +	mv file.tmp file &&
> +	git add file &&
> +	git commit -m after &&
> +
> +	cat >expect <<-EOF &&
> +	--- a/file
> +	+++ b/file
> +	@@ -1,3 +1,3 @@
> +	 before
> +	-the ${CW}old${CR} line
> +	+the ${CW}new${CR} line
> +	 -leading dash
> +	EOF
> +	git log --graph -p -1 | "$DIFF_HIGHLIGHT" >actual.raw &&
> +	trim_graph <actual.raw | sed -n "/^---/,\$p" >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_done
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 7/7] diff-highlight: detect --graph by indent
  2018-03-22 10:59           ` Phillip Wood
@ 2018-03-22 11:18             ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2018-03-22 11:18 UTC (permalink / raw)
  To: phillip.wood; +Cc: Brian Henderson, Git Mailing List, Junio C Hamano

On Thu, Mar 22, 2018 at 10:59:31AM +0000, Phillip Wood wrote:

> > +sub handle_line {
> > +	my $orig = shift;
> > +	local $_ = $orig;
> > +
> > +	# match a graph line that begins a commit
> > +	if (/^(?:$COLOR?\|$COLOR?[ ])* # zero or more leading "|" with space
> > +	         $COLOR?\*$COLOR?[ ]   # a "*" with its trailing space
> > +	      (?:$COLOR?\|$COLOR?[ ])* # zero or more trailing "|"
> > +	                         [ ]*  # trailing whitespace for merges
> 
> Hi Peff, thanks for looking at this. I've only had a quick look through
> but I wonder if this will be confused by commit messages that contain
>   * bullet points
>   * like this

I think we should be OK because the commit messages are indented by 4
spaces, and we allow only single spaces amidst the graph-drawing bits
(and we respect the "*" only in those graph-drawing bits).

So I think you could fool it with something like:

  git log --graph --format='* oops!'

or maybe even:

  git log --graph --format='%B'

but not with a regular git-log format. Those final corner cases I don't
think we can overcome; it's just syntactically ambiguous. Unless perhaps
we traced the graph lines from the start of the output and knew how many
to expect, but that seems rather complicated.

Ultimately the best solution would be to avoid this parsing nonsense
entirely by teaching Git's internal diff to produce the highlighting
internally.

-Peff

PS I also eyeballed the results of "git log --graph -p --no-merges
  -1000" piped through the old and new versions. There are several
  changes, but they all look like improvements.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-03-22 11:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-19 10:21 [BUG] log --graph corrupts patch Phillip Wood
2018-03-20  6:09 ` Jeff King
2018-03-20  9:58   ` Phillip Wood
2018-03-20 15:58     ` Jeff King
2018-03-21  5:47       ` Jeff King
2018-03-21  5:47         ` [PATCH 1/7] diff-highlight: correct test graph diagram Jeff King
2018-03-21  5:48         ` [PATCH 2/7] diff-highlight: use test_tick in graph test Jeff King
2018-03-21  5:48         ` [PATCH 3/7] diff-highlight: prefer "echo" to "cat" in tests Jeff King
2018-03-21  5:49         ` [PATCH 4/7] diff-highlight: test interleaved parallel lines of history Jeff King
2018-03-21  5:49         ` [PATCH 5/7] diff-highlight: test graphs with --color Jeff King
2018-03-21  5:56         ` [PATCH 6/7] diff-highlight: use flush() helper consistently Jeff King
2018-03-21  5:59         ` [PATCH 7/7] diff-highlight: detect --graph by indent Jeff King
2018-03-22 10:59           ` Phillip Wood
2018-03-22 11:18             ` Jeff King
2018-03-21 17:23         ` [BUG] log --graph corrupts patch Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).