git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Phillip Wood <phillip.wood@talktalk.net>
To: Jeff King <peff@peff.net>, phillip.wood@dunelm.org.uk
Cc: Brian Henderson <henderson.bj@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 7/7] diff-highlight: detect --graph by indent
Date: Thu, 22 Mar 2018 10:59:31 +0000	[thread overview]
Message-ID: <0e2317fc-cfec-ba7f-8ded-59cc6b7f9a63@talktalk.net> (raw)
In-Reply-To: <20180321055900.GB15674@sigill.intra.peff.net>

On 21/03/18 05:59, Jeff King wrote:
> This patch fixes a corner case where diff-highlight may
> scramble some diffs when combined with --graph.
> 
> Commit 7e4ffb4c17 (diff-highlight: add support for --graph
> output, 2016-08-29) taught diff-highlight to skip past the
> graph characters at the start of each line with this regex:
> 
>   ($COLOR?\|$COLOR?\s+)*
> 
> I.e., any series of pipes separated by and followed by
> arbitrary whitespace.  We need to match more than just a
> single space because the commit in question may be indented
> to accommodate other parts of the graph drawing. E.g.:
> 
>  * commit 1234abcd
>  | ...
>  | diff --git ...
> 
> has only a single space, but for the last commit before a
> fork:
> 
>  | | |
>  | * | commit 1234abcd
>  | |/  ...
>  | |   diff --git
> 
> the diff lines have more spaces between the pipes and the
> start of the diff.
> 
> However, when we soak up all of those spaces with the
> $GRAPH regex, we may accidentally include the leading space
> for a context line. That means we may consider the actual
> contents of a context line as part of the diff syntax. In
> other words, something like this:
> 
>    normal context line
>   -old line
>   +new line
>    -this is a context line with a leading dash
> 
> would cause us to see that final context line as a removal
> line, and we'd end up showing the hunk in the wrong order:
> 
>   normal context line
>   -old line
>    -this is a context line with a leading dash
>   +new line
> 
> Instead, let's a be a little more clever about parsing the
> graph. We'll look for the actual "*" line that marks the
> start of a commit, and record the indentation we see there.
> Then we can skip past that indentation when checking whether
> the line is a hunk header, removal, addition, etc.
> 
> There is one tricky thing: the indentation in bytes may be
> different for various lines of the graph due to coloring.
> E.g., the "*" on a commit line is generally shown without
> color, but on the actual diff lines, it will be replaced
> with a colorized "|" character, adding several bytes. We
> work around this here by counting "visible" bytes. This is
> unfortunately a bit more expensive, making us about twice as
> slow to handle --graph output. But since this is meant to be
> used interactively anyway, it's tolerably fast (and the
> non-graph case is unaffected).
> 
> One alternative would be to search for hunk header lines and
> use their indentation (since they'd have the same colors as
> the diff lines which follow). But that just opens up
> different corner cases. If we see:
> 
>   | |    @@ 1,2 1,3 @@
> 
> we cannot know if this is a real diff that has been
> indented due to the graph, or if it's a context line that
> happens to look like a diff header. We can only be sure of
> the indent on the "*" lines, since we know those don't
> contain arbitrary data (technically the user could include a
> bunch of extra indentation via --format, but that's rare
> enough to disregard).
> 
> Reported-by: Phillip Wood <phillip.wood@dunelm.org.uk>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  contrib/diff-highlight/DiffHighlight.pm       | 78 +++++++++++++++----
>  .../diff-highlight/t/t9400-diff-highlight.sh  | 28 +++++++
>  2 files changed, 91 insertions(+), 15 deletions(-)
> 
> diff --git a/contrib/diff-highlight/DiffHighlight.pm b/contrib/diff-highlight/DiffHighlight.pm
> index e07cd5931d..536754583b 100644
> --- a/contrib/diff-highlight/DiffHighlight.pm
> +++ b/contrib/diff-highlight/DiffHighlight.pm
> @@ -21,34 +21,82 @@ my $RESET = "\x1b[m";
>  my $COLOR = qr/\x1b\[[0-9;]*m/;
>  my $BORING = qr/$COLOR|\s/;
>  
> -# The patch portion of git log -p --graph should only ever have preceding | and
> -# not / or \ as merge history only shows up on the commit line.
> -my $GRAPH = qr/$COLOR?\|$COLOR?\s+/;
> -
>  my @removed;
>  my @added;
>  my $in_hunk;
> +my $graph_indent = 0;
>  
>  our $line_cb = sub { print @_ };
>  our $flush_cb = sub { local $| = 1 };
>  
> -sub handle_line {
> +# Count the visible width of a string, excluding any terminal color sequences.
> +sub visible_width {
>  	local $_ = shift;
> +	my $ret = 0;
> +	while (length) {
> +		if (s/^$COLOR//) {
> +			# skip colors
> +		} elsif (s/^.//) {
> +			$ret++;
> +		}
> +	}
> +	return $ret;
> +}
> +
> +# Return a substring of $str, omitting $len visible characters from the
> +# beginning, where terminal color sequences do not count as visible.
> +sub visible_substr {
> +	my ($str, $len) = @_;
> +	while ($len > 0) {
> +		if ($str =~ s/^$COLOR//) {
> +			next
> +		}
> +		$str =~ s/^.//;
> +		$len--;
> +	}
> +	return $str;
> +}
> +
> +sub handle_line {
> +	my $orig = shift;
> +	local $_ = $orig;
> +
> +	# match a graph line that begins a commit
> +	if (/^(?:$COLOR?\|$COLOR?[ ])* # zero or more leading "|" with space
> +	         $COLOR?\*$COLOR?[ ]   # a "*" with its trailing space
> +	      (?:$COLOR?\|$COLOR?[ ])* # zero or more trailing "|"
> +	                         [ ]*  # trailing whitespace for merges

Hi Peff, thanks for looking at this. I've only had a quick look through
but I wonder if this will be confused by commit messages that contain
  * bullet points
  * like this

Best Wishes

Phillip

> +	    /x) {
> +	        my $graph_prefix = $&;
> +
> +		# We must flush before setting graph indent, since the
> +		# new commit may be indented differently from what we
> +		# queued.
> +		flush();
> +		$graph_indent = visible_width($graph_prefix);
> +
> +	} elsif ($graph_indent) {
> +		if (length($_) < $graph_indent) {
> +			$graph_indent = 0;
> +		} else {
> +			$_ = visible_substr($_, $graph_indent);
> +		}
> +	}
>  
>  	if (!$in_hunk) {
> -		$line_cb->($_);
> -		$in_hunk = /^$GRAPH*$COLOR*\@\@ /;
> +		$line_cb->($orig);
> +		$in_hunk = /^$COLOR*\@\@ /;
>  	}
> -	elsif (/^$GRAPH*$COLOR*-/) {
> -		push @removed, $_;
> +	elsif (/^$COLOR*-/) {
> +		push @removed, $orig;
>  	}
> -	elsif (/^$GRAPH*$COLOR*\+/) {
> -		push @added, $_;
> +	elsif (/^$COLOR*\+/) {
> +		push @added, $orig;
>  	}
>  	else {
>  		flush();
> -		$line_cb->($_);
> -		$in_hunk = /^$GRAPH*$COLOR*[\@ ]/;
> +		$line_cb->($orig);
> +		$in_hunk = /^$COLOR*[\@ ]/;
>  	}
>  
>  	# Most of the time there is enough output to keep things streaming,
> @@ -225,8 +273,8 @@ sub is_pair_interesting {
>  	my $suffix_a = join('', @$a[($sa+1)..$#$a]);
>  	my $suffix_b = join('', @$b[($sb+1)..$#$b]);
>  
> -	return $prefix_a !~ /^$GRAPH*$COLOR*-$BORING*$/ ||
> -	       $prefix_b !~ /^$GRAPH*$COLOR*\+$BORING*$/ ||
> +	return visible_substr($prefix_a, $graph_indent) !~ /^$COLOR*-$BORING*$/ ||
> +	       visible_substr($prefix_b, $graph_indent) !~ /^$COLOR*\+$BORING*$/ ||
>  	       $suffix_a !~ /^$BORING*$/ ||
>  	       $suffix_b !~ /^$BORING*$/;
>  }
> diff --git a/contrib/diff-highlight/t/t9400-diff-highlight.sh b/contrib/diff-highlight/t/t9400-diff-highlight.sh
> index bf0c270d60..f6f5195d00 100755
> --- a/contrib/diff-highlight/t/t9400-diff-highlight.sh
> +++ b/contrib/diff-highlight/t/t9400-diff-highlight.sh
> @@ -310,4 +310,32 @@ test_expect_success 'diff-highlight ignores combined diffs' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'diff-highlight handles --graph with leading dash' '
> +	cat >file <<-\EOF &&
> +	before
> +	the old line
> +	-leading dash
> +	EOF
> +	git add file &&
> +	git commit -m before &&
> +
> +	sed s/old/new/ <file >file.tmp &&
> +	mv file.tmp file &&
> +	git add file &&
> +	git commit -m after &&
> +
> +	cat >expect <<-EOF &&
> +	--- a/file
> +	+++ b/file
> +	@@ -1,3 +1,3 @@
> +	 before
> +	-the ${CW}old${CR} line
> +	+the ${CW}new${CR} line
> +	 -leading dash
> +	EOF
> +	git log --graph -p -1 | "$DIFF_HIGHLIGHT" >actual.raw &&
> +	trim_graph <actual.raw | sed -n "/^---/,\$p" >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_done
> 


  reply	other threads:[~2018-03-22 10:59 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-19 10:21 [BUG] log --graph corrupts patch Phillip Wood
2018-03-20  6:09 ` Jeff King
2018-03-20  9:58   ` Phillip Wood
2018-03-20 15:58     ` Jeff King
2018-03-21  5:47       ` Jeff King
2018-03-21  5:47         ` [PATCH 1/7] diff-highlight: correct test graph diagram Jeff King
2018-03-21  5:48         ` [PATCH 2/7] diff-highlight: use test_tick in graph test Jeff King
2018-03-21  5:48         ` [PATCH 3/7] diff-highlight: prefer "echo" to "cat" in tests Jeff King
2018-03-21  5:49         ` [PATCH 4/7] diff-highlight: test interleaved parallel lines of history Jeff King
2018-03-21  5:49         ` [PATCH 5/7] diff-highlight: test graphs with --color Jeff King
2018-03-21  5:56         ` [PATCH 6/7] diff-highlight: use flush() helper consistently Jeff King
2018-03-21  5:59         ` [PATCH 7/7] diff-highlight: detect --graph by indent Jeff King
2018-03-22 10:59           ` Phillip Wood [this message]
2018-03-22 11:18             ` Jeff King
2018-03-21 17:23         ` [BUG] log --graph corrupts patch Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0e2317fc-cfec-ba7f-8ded-59cc6b7f9a63@talktalk.net \
    --to=phillip.wood@talktalk.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=henderson.bj@gmail.com \
    --cc=peff@peff.net \
    --cc=phillip.wood@dunelm.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).