git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: George King <george.w.king@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: Difficulty with parsing colorized diff output
Date: Sat, 8 Dec 2018 02:16:34 -0500	[thread overview]
Message-ID: <20181208071634.GA18272@sigill.intra.peff.net> (raw)
In-Reply-To: <799879BD-A2F0-487C-AA05-8054AC62C5BD@gmail.com>

On Fri, Dec 07, 2018 at 07:09:58PM -0500, George King wrote:

> My goal is to detect SGR color sequences, e.g. '\x1b[32m', that exist
> in the source text, and have my highlighter print escaped
> representations of those. For example, I have checked in files that
> are expected test outputs for tools that emit color codes, and diffs
> of those get very confusing.
> 
> Figuring out which color codes are from the source text and which were
> added by git is proving very difficult. The obvious solution is to
> turn git diff coloring off, but as far as I can see this also turns
> off all coloring for logs, which is undesirable.

Another gotcha that I don't think you've hit yet: whitespace
highlighting can color arbitrary parts of the line.

Try this, for example:

  printf 'foo\n' >old
  printf '\t \tfoo\n' >new
  git diff --no-index old new

So I'm not sure what you want to do can ever be completely robust. That
said, I'm not opposed to making Git's color output more regular. It
seems like a reasonable cleanup on its own.

(For the Git project itself, we long ago realized that putting raw color
codes into the source is a big pain when working with diffs, and we
instead use tools like t/test-lib-functions.sh's test_decode_color).

> * Context lines do not begin with reset code, but do end with a reset
> code. It would be preferable in my opinion if they had both (like
> every other line), or none at all.

Context lines do have both. It's just that the default color for context
lines is empty. ;)

But yes, I think it might be reasonable to recognize when an opening
color is empty, and turn the closing reset into a noop in that case (or
I guess you are also advocating the opposite: turn an empty color into a
noop \x1b[m or similar).

I think most of the coloring, including context lines, is coming from
diff.c:emit_diff_symbol_from_struct(). Instead of unconditionally
calling:

  context = diff_get_color_opt(o, DIFF_CONTEXT);
  reset = diff_get_color_opt(o, DIFF_RESET);

I suspect we could have a helper like this:

  static const char *diff_get_reset(const char *color)
  {
	if (!color || !*color)
		return "";
	return diff_colors[DIFF_RESET];
  }
  ...
  context = diff_get_color_opt(o, DIFF_CONTEXT);
  reset = diff_get_reset(context);

> * Added lines have excess codes after the plus sign. The entire prefix
> is, `\x1b[32m+\x1b[m\x1b[32m` translating to GREEN PLUS RESET GREEN.
> Emitting codes after the plus sign makes the parsing more complex and
> idiosyncratic.

Yeah, I've run into this one, too (when working on diff-highlight, which
similarly tries to pass-through Git's existing colors).

It's unfortunately not quite trivial, due to some interactions with
whitespace coloring. See this old patch:

  https://public-inbox.org/git/20101109220136.GA5617@sigill.intra.peff.net/

and the followup:

  https://public-inbox.org/git/20101118064050.GA12825@sigill.intra.peff.net/

> * Add a config feature to turn on log coloring while leaving diff coloring off.

Yes, the fact that --pretty log coloring is tied to diffs is mostly a
historical artifact. I think it would be OK to introduce a color.log
config option that defaults to the value of color.diff if unset. That
would be backwards compatible, but allow you to set them independently
if you wanted to.

-Peff

  reply	other threads:[~2018-12-08  7:16 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-08  0:09 Difficulty with parsing colorized diff output George King
2018-12-08  7:16 ` Jeff King [this message]
2018-12-11  3:26   ` Stefan Beller
2018-12-11 10:17     ` Jeff King
2018-12-11 14:47       ` George King
2018-12-11 16:28       ` Ævar Arnfjörð Bjarmason
2018-12-11 16:41         ` George King
2018-12-11 18:55           ` George King
2018-12-12 13:52             ` Jeff King
2018-12-12 12:49           ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181208071634.GA18272@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=george.w.king@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).