git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Σπύρος Βαζαίος" <sbazaios@gmail.com>
To: Stefan Beller <sbeller@google.com>
Cc: Jeff King <peff@peff.net>, git <git@vger.kernel.org>,
	Michael Haggerty <mhagger@alum.mit.edu>
Subject: Re: make git diff output easier to read - use better diff heuristics
Date: Tue, 13 Feb 2018 20:25:55 +0200	[thread overview]
Message-ID: <CAOmC-AnXD=eSphS=TK4v72tSYDYEV2cgu_EsP4kgNGR=7yre8w@mail.gmail.com> (raw)
In-Reply-To: <CAGZ79kYXStMQCxnVjpV7n7miZEDAw4moR+0JksgTaRqHJwgSqw@mail.gmail.com>

Hi guys,

While I din't have the experience to express an opinion on this
matter, I have to say that the --no-indent-heuristic that Jeff
suggested worked great.
There were more than a handful of cases that this issue happened in my
diff file (all were the same: #endif followed by #ifdef).
Oh, and the language is C indeed.

Thanks for the great support!

On Tue, Feb 13, 2018 at 8:11 PM, Stefan Beller <sbeller@google.com> wrote:
> On Tue, Feb 13, 2018 at 8:08 AM, Jeff King <peff@peff.net> wrote:
>> On Tue, Feb 13, 2018 at 05:06:15PM +0200, Σπύρος Βαζαίος wrote:
>>
>>> Hi, I've came across an issue when using the git diff command. In
>>> particular the diff is different to what the svn diff produces. While
>>> both being correct the output of the svn diff is easier to understand
>>> than the git diff one. See the following issue on github where I
>>> initially reported the issue:
>>>
>>> https://github.com/git-for-windows/git/issues/1494
>>>
>>> I have Included a picture to better illustrate the problem. What do
>>> you think? Is it possible to make git diff output similar to svn diff
>>> regarding this issue?
>>
>> Try "git diff --no-indent-heuristic", which makes your example look
>> better. Here's a quick reproduction:
>>
>> -- >8 --
>> cat >foo.c <<\EOF
>> static struct foo bar[] = {
>> #ifdef SOMETHING
>>         { "stats.info", MNU_GBX_FSTAINF, etc },
>>         { "expired.info", MNU_GBX_FSTAINF, etc },
>>         { "info.log", MNU_GBX_INFOLOG, etc },
>> #endif
>>         { NULL, 0, 0 },
>> };
>> EOF
>>
>> sed '6a\
>> #ifdef WITH_EMU\
>>         { "SoftCam.Key", MNU_CFG_FSOFTCAMKEY, etc },\
>> #endif
>> ' <foo.c >bar.c
>> -- 8< --
>>
>> Now this looks ugly:
>>
>>   git diff --no-index foo.c bar.c
>>
>> but this does not:
>>
>>   git diff --no-index --no-indent-heuristic foo.c bar.c
>>
>> That heuristic is described in 433860f3d0 (diff: improve positioning of
>> add/delete blocks in diffs, 2016-09-05). I'm not sure exactly why it
>> does the wrong thing here, or if it would be possible to tweak the
>> weighting factors to make it work.
>
> https://github.com/git/git/commit/433860f3d0
>
> I would think that the long lines at the shift boundaries[1] have
> bad impact on the algorithm as they are of different length
> and influence the backstepping value. (vague theory)
>
> I wonder if we want to add language specifics to the heuristic,
> such that '}' or 'end' in a given line have a strong signal that the
> cut should be right after that line? i.e. that would decrease the
> badness score.
>
> While this is not a general solution (but quite language specific),
> I would think this is still a good idea to look at.
> (When coming up with the heuristic, most people thought the bad
> examples would come from exotic languages, but not C, which we
> are all so familiar with such that we thought we had a good grasp
> how to make a heuristic that works with C. Note that the example
> looks like C, though).
>
> [1]
> { "info.log", MNU_GBX_INFOLOG, etc },
> vs
> { "SoftCam.Key", MNU_CFG_FSOFTCAMKEY, etc },\
>
>
> (slightly unrelated tangent on better diffs in general:)
> A couple days ago I had another discussion on how
> diffs are best presented to users. That discussion was
> focused on a slightly higher layer, the diff algorithm itself,
> instead of a post-algorithm boost-fixup.
>
> It is usually easier to both write as well as review code that
> adds completely new features or projects instead of squashing
> bugs or integrating subtle new features in an existing code base.
>
> This observation lead to the conclusion that reviewing large
> locks of adjacent new lines is easier than reviewing blocks
> where deletions as well as additions are found. So we theorized
> that deletions should have more impact than additions when
> computing the diff itself. The case presented here has no deletions
> so this is not applicable, but most examples that came up once
> 2477ab2ea8 (diff: support anchoring line(s), 2017-11-27)
> was discussed would be likely to have better diffs.
>
> Stefan

  reply	other threads:[~2018-02-13 18:26 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-13 15:06 make git diff output easier to read - use better diff heuristics Σπύρος Βαζαίος
2018-02-13 16:08 ` Jeff King
2018-02-13 18:11   ` Stefan Beller
2018-02-13 18:25     ` Σπύρος Βαζαίος [this message]
2018-02-13 22:41       ` Michael Haggerty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOmC-AnXD=eSphS=TK4v72tSYDYEV2cgu_EsP4kgNGR=7yre8w@mail.gmail.com' \
    --to=sbazaios@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=mhagger@alum.mit.edu \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).