git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: phillip.wood@dunelm.org.uk
Cc: Brian Henderson <henderson.bj@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: [PATCH 7/7] diff-highlight: detect --graph by indent
Date: Wed, 21 Mar 2018 01:59:01 -0400	[thread overview]
Message-ID: <20180321055900.GB15674@sigill.intra.peff.net> (raw)
In-Reply-To: <20180321054718.GA13936@sigill.intra.peff.net>

This patch fixes a corner case where diff-highlight may
scramble some diffs when combined with --graph.

Commit 7e4ffb4c17 (diff-highlight: add support for --graph
output, 2016-08-29) taught diff-highlight to skip past the
graph characters at the start of each line with this regex:

  ($COLOR?\|$COLOR?\s+)*

I.e., any series of pipes separated by and followed by
arbitrary whitespace.  We need to match more than just a
single space because the commit in question may be indented
to accommodate other parts of the graph drawing. E.g.:

 * commit 1234abcd
 | ...
 | diff --git ...

has only a single space, but for the last commit before a
fork:

 | | |
 | * | commit 1234abcd
 | |/  ...
 | |   diff --git

the diff lines have more spaces between the pipes and the
start of the diff.

However, when we soak up all of those spaces with the
$GRAPH regex, we may accidentally include the leading space
for a context line. That means we may consider the actual
contents of a context line as part of the diff syntax. In
other words, something like this:

   normal context line
  -old line
  +new line
   -this is a context line with a leading dash

would cause us to see that final context line as a removal
line, and we'd end up showing the hunk in the wrong order:

  normal context line
  -old line
   -this is a context line with a leading dash
  +new line

Instead, let's a be a little more clever about parsing the
graph. We'll look for the actual "*" line that marks the
start of a commit, and record the indentation we see there.
Then we can skip past that indentation when checking whether
the line is a hunk header, removal, addition, etc.

There is one tricky thing: the indentation in bytes may be
different for various lines of the graph due to coloring.
E.g., the "*" on a commit line is generally shown without
color, but on the actual diff lines, it will be replaced
with a colorized "|" character, adding several bytes. We
work around this here by counting "visible" bytes. This is
unfortunately a bit more expensive, making us about twice as
slow to handle --graph output. But since this is meant to be
used interactively anyway, it's tolerably fast (and the
non-graph case is unaffected).

One alternative would be to search for hunk header lines and
use their indentation (since they'd have the same colors as
the diff lines which follow). But that just opens up
different corner cases. If we see:

  | |    @@ 1,2 1,3 @@

we cannot know if this is a real diff that has been
indented due to the graph, or if it's a context line that
happens to look like a diff header. We can only be sure of
the indent on the "*" lines, since we know those don't
contain arbitrary data (technically the user could include a
bunch of extra indentation via --format, but that's rare
enough to disregard).

Reported-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Jeff King <peff@peff.net>
---
 contrib/diff-highlight/DiffHighlight.pm       | 78 +++++++++++++++----
 .../diff-highlight/t/t9400-diff-highlight.sh  | 28 +++++++
 2 files changed, 91 insertions(+), 15 deletions(-)

diff --git a/contrib/diff-highlight/DiffHighlight.pm b/contrib/diff-highlight/DiffHighlight.pm
index e07cd5931d..536754583b 100644
--- a/contrib/diff-highlight/DiffHighlight.pm
+++ b/contrib/diff-highlight/DiffHighlight.pm
@@ -21,34 +21,82 @@ my $RESET = "\x1b[m";
 my $COLOR = qr/\x1b\[[0-9;]*m/;
 my $BORING = qr/$COLOR|\s/;
 
-# The patch portion of git log -p --graph should only ever have preceding | and
-# not / or \ as merge history only shows up on the commit line.
-my $GRAPH = qr/$COLOR?\|$COLOR?\s+/;
-
 my @removed;
 my @added;
 my $in_hunk;
+my $graph_indent = 0;
 
 our $line_cb = sub { print @_ };
 our $flush_cb = sub { local $| = 1 };
 
-sub handle_line {
+# Count the visible width of a string, excluding any terminal color sequences.
+sub visible_width {
 	local $_ = shift;
+	my $ret = 0;
+	while (length) {
+		if (s/^$COLOR//) {
+			# skip colors
+		} elsif (s/^.//) {
+			$ret++;
+		}
+	}
+	return $ret;
+}
+
+# Return a substring of $str, omitting $len visible characters from the
+# beginning, where terminal color sequences do not count as visible.
+sub visible_substr {
+	my ($str, $len) = @_;
+	while ($len > 0) {
+		if ($str =~ s/^$COLOR//) {
+			next
+		}
+		$str =~ s/^.//;
+		$len--;
+	}
+	return $str;
+}
+
+sub handle_line {
+	my $orig = shift;
+	local $_ = $orig;
+
+	# match a graph line that begins a commit
+	if (/^(?:$COLOR?\|$COLOR?[ ])* # zero or more leading "|" with space
+	         $COLOR?\*$COLOR?[ ]   # a "*" with its trailing space
+	      (?:$COLOR?\|$COLOR?[ ])* # zero or more trailing "|"
+	                         [ ]*  # trailing whitespace for merges
+	    /x) {
+	        my $graph_prefix = $&;
+
+		# We must flush before setting graph indent, since the
+		# new commit may be indented differently from what we
+		# queued.
+		flush();
+		$graph_indent = visible_width($graph_prefix);
+
+	} elsif ($graph_indent) {
+		if (length($_) < $graph_indent) {
+			$graph_indent = 0;
+		} else {
+			$_ = visible_substr($_, $graph_indent);
+		}
+	}
 
 	if (!$in_hunk) {
-		$line_cb->($_);
-		$in_hunk = /^$GRAPH*$COLOR*\@\@ /;
+		$line_cb->($orig);
+		$in_hunk = /^$COLOR*\@\@ /;
 	}
-	elsif (/^$GRAPH*$COLOR*-/) {
-		push @removed, $_;
+	elsif (/^$COLOR*-/) {
+		push @removed, $orig;
 	}
-	elsif (/^$GRAPH*$COLOR*\+/) {
-		push @added, $_;
+	elsif (/^$COLOR*\+/) {
+		push @added, $orig;
 	}
 	else {
 		flush();
-		$line_cb->($_);
-		$in_hunk = /^$GRAPH*$COLOR*[\@ ]/;
+		$line_cb->($orig);
+		$in_hunk = /^$COLOR*[\@ ]/;
 	}
 
 	# Most of the time there is enough output to keep things streaming,
@@ -225,8 +273,8 @@ sub is_pair_interesting {
 	my $suffix_a = join('', @$a[($sa+1)..$#$a]);
 	my $suffix_b = join('', @$b[($sb+1)..$#$b]);
 
-	return $prefix_a !~ /^$GRAPH*$COLOR*-$BORING*$/ ||
-	       $prefix_b !~ /^$GRAPH*$COLOR*\+$BORING*$/ ||
+	return visible_substr($prefix_a, $graph_indent) !~ /^$COLOR*-$BORING*$/ ||
+	       visible_substr($prefix_b, $graph_indent) !~ /^$COLOR*\+$BORING*$/ ||
 	       $suffix_a !~ /^$BORING*$/ ||
 	       $suffix_b !~ /^$BORING*$/;
 }
diff --git a/contrib/diff-highlight/t/t9400-diff-highlight.sh b/contrib/diff-highlight/t/t9400-diff-highlight.sh
index bf0c270d60..f6f5195d00 100755
--- a/contrib/diff-highlight/t/t9400-diff-highlight.sh
+++ b/contrib/diff-highlight/t/t9400-diff-highlight.sh
@@ -310,4 +310,32 @@ test_expect_success 'diff-highlight ignores combined diffs' '
 	test_cmp expect actual
 '
 
+test_expect_success 'diff-highlight handles --graph with leading dash' '
+	cat >file <<-\EOF &&
+	before
+	the old line
+	-leading dash
+	EOF
+	git add file &&
+	git commit -m before &&
+
+	sed s/old/new/ <file >file.tmp &&
+	mv file.tmp file &&
+	git add file &&
+	git commit -m after &&
+
+	cat >expect <<-EOF &&
+	--- a/file
+	+++ b/file
+	@@ -1,3 +1,3 @@
+	 before
+	-the ${CW}old${CR} line
+	+the ${CW}new${CR} line
+	 -leading dash
+	EOF
+	git log --graph -p -1 | "$DIFF_HIGHLIGHT" >actual.raw &&
+	trim_graph <actual.raw | sed -n "/^---/,\$p" >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.17.0.rc0.402.ged0b3fd1ee

  parent reply	other threads:[~2018-03-21  5:59 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-19 10:21 [BUG] log --graph corrupts patch Phillip Wood
2018-03-20  6:09 ` Jeff King
2018-03-20  9:58   ` Phillip Wood
2018-03-20 15:58     ` Jeff King
2018-03-21  5:47       ` Jeff King
2018-03-21  5:47         ` [PATCH 1/7] diff-highlight: correct test graph diagram Jeff King
2018-03-21  5:48         ` [PATCH 2/7] diff-highlight: use test_tick in graph test Jeff King
2018-03-21  5:48         ` [PATCH 3/7] diff-highlight: prefer "echo" to "cat" in tests Jeff King
2018-03-21  5:49         ` [PATCH 4/7] diff-highlight: test interleaved parallel lines of history Jeff King
2018-03-21  5:49         ` [PATCH 5/7] diff-highlight: test graphs with --color Jeff King
2018-03-21  5:56         ` [PATCH 6/7] diff-highlight: use flush() helper consistently Jeff King
2018-03-21  5:59         ` Jeff King [this message]
2018-03-22 10:59           ` [PATCH 7/7] diff-highlight: detect --graph by indent Phillip Wood
2018-03-22 11:18             ` Jeff King
2018-03-21 17:23         ` [BUG] log --graph corrupts patch Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180321055900.GB15674@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=henderson.bj@gmail.com \
    --cc=phillip.wood@dunelm.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).