git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Jeff King <peff@peff.net>
Cc: "Σπύρος Βαζαίος" <sbazaios@gmail.com>, git <git@vger.kernel.org>,
	"Michael Haggerty" <mhagger@alum.mit.edu>
Subject: Re: make git diff output easier to read - use better diff heuristics
Date: Tue, 13 Feb 2018 10:11:49 -0800	[thread overview]
Message-ID: <CAGZ79kYXStMQCxnVjpV7n7miZEDAw4moR+0JksgTaRqHJwgSqw@mail.gmail.com> (raw)
In-Reply-To: <20180213160824.GA5203@sigill.intra.peff.net>

On Tue, Feb 13, 2018 at 8:08 AM, Jeff King <peff@peff.net> wrote:
> On Tue, Feb 13, 2018 at 05:06:15PM +0200, Σπύρος Βαζαίος wrote:
>
>> Hi, I've came across an issue when using the git diff command. In
>> particular the diff is different to what the svn diff produces. While
>> both being correct the output of the svn diff is easier to understand
>> than the git diff one. See the following issue on github where I
>> initially reported the issue:
>>
>> https://github.com/git-for-windows/git/issues/1494
>>
>> I have Included a picture to better illustrate the problem. What do
>> you think? Is it possible to make git diff output similar to svn diff
>> regarding this issue?
>
> Try "git diff --no-indent-heuristic", which makes your example look
> better. Here's a quick reproduction:
>
> -- >8 --
> cat >foo.c <<\EOF
> static struct foo bar[] = {
> #ifdef SOMETHING
>         { "stats.info", MNU_GBX_FSTAINF, etc },
>         { "expired.info", MNU_GBX_FSTAINF, etc },
>         { "info.log", MNU_GBX_INFOLOG, etc },
> #endif
>         { NULL, 0, 0 },
> };
> EOF
>
> sed '6a\
> #ifdef WITH_EMU\
>         { "SoftCam.Key", MNU_CFG_FSOFTCAMKEY, etc },\
> #endif
> ' <foo.c >bar.c
> -- 8< --
>
> Now this looks ugly:
>
>   git diff --no-index foo.c bar.c
>
> but this does not:
>
>   git diff --no-index --no-indent-heuristic foo.c bar.c
>
> That heuristic is described in 433860f3d0 (diff: improve positioning of
> add/delete blocks in diffs, 2016-09-05). I'm not sure exactly why it
> does the wrong thing here, or if it would be possible to tweak the
> weighting factors to make it work.

https://github.com/git/git/commit/433860f3d0

I would think that the long lines at the shift boundaries[1] have
bad impact on the algorithm as they are of different length
and influence the backstepping value. (vague theory)

I wonder if we want to add language specifics to the heuristic,
such that '}' or 'end' in a given line have a strong signal that the
cut should be right after that line? i.e. that would decrease the
badness score.

While this is not a general solution (but quite language specific),
I would think this is still a good idea to look at.
(When coming up with the heuristic, most people thought the bad
examples would come from exotic languages, but not C, which we
are all so familiar with such that we thought we had a good grasp
how to make a heuristic that works with C. Note that the example
looks like C, though).

[1]
{ "info.log", MNU_GBX_INFOLOG, etc },
vs
{ "SoftCam.Key", MNU_CFG_FSOFTCAMKEY, etc },\


(slightly unrelated tangent on better diffs in general:)
A couple days ago I had another discussion on how
diffs are best presented to users. That discussion was
focused on a slightly higher layer, the diff algorithm itself,
instead of a post-algorithm boost-fixup.

It is usually easier to both write as well as review code that
adds completely new features or projects instead of squashing
bugs or integrating subtle new features in an existing code base.

This observation lead to the conclusion that reviewing large
locks of adjacent new lines is easier than reviewing blocks
where deletions as well as additions are found. So we theorized
that deletions should have more impact than additions when
computing the diff itself. The case presented here has no deletions
so this is not applicable, but most examples that came up once
2477ab2ea8 (diff: support anchoring line(s), 2017-11-27)
was discussed would be likely to have better diffs.

Stefan

  reply	other threads:[~2018-02-13 18:11 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-13 15:06 make git diff output easier to read - use better diff heuristics Σπύρος Βαζαίος
2018-02-13 16:08 ` Jeff King
2018-02-13 18:11   ` Stefan Beller [this message]
2018-02-13 18:25     ` Σπύρος Βαζαίος
2018-02-13 22:41       ` Michael Haggerty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79kYXStMQCxnVjpV7n7miZEDAw4moR+0JksgTaRqHJwgSqw@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=mhagger@alum.mit.edu \
    --cc=peff@peff.net \
    --cc=sbazaios@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).