git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Jacob Keller <jacob.keller@gmail.com>
Cc: Stefan Beller <sbeller@google.com>,
	Junio C Hamano <gitster@pobox.com>,
	Git mailing list <git@vger.kernel.org>,
	Jacob Keller <jacob.e.keller@intel.com>
Subject: Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
Date: Wed, 20 Apr 2016 00:18:27 -0400	[thread overview]
Message-ID: <20160420041827.GA7627@sigill.intra.peff.net> (raw)
In-Reply-To: <CA+P7+xoqn3fxEZGn02ST1XV-2UpQGr3iwV-37R8pakFJy_9n0w@mail.gmail.com>

[your original probably didn't make it to the list because of its 5MB
 attachment; the list has a 100K limit; I'll try to quote liberally]

On Tue, Apr 19, 2016 at 04:17:50PM -0700, Jacob Keller wrote:

> I ran this version of the patch against the entire Linux kernel
> history, as I figured this has a large batch of C code to try and spot
> any issues.
> 
> I ran something like the following command in bash
> 
> $git rev-list HEAD | while read -r rev; do diff -F ^commit -u <(git
> show --format="commit %H" --no-compaction-heuristic $rev) <(git show
> --format="commit %H" --compaction-heuristic $rev); done >
> heuristic.patch

My earlier tests with the perl script were all done with "git log -p",
which will not show anything at all for merges (and my script wouldn't
know how to deal with combined diffs anyway). But I think this new patch
_will_ kick in for combined diffs (because it is built on individual
diffs). It will be interesting to see if this has any effect there, and
what it looks like.

We should be able to see it (on a small enough repository) with:

  git log --format='commit %H' --cc --merges

and comparing the before/after.

> I've attached the file that I generated for the Linux history, it's
> rather large so hopefully I can get some help to spot any differences.
> The above approach will work for pretty much any repository, and works
> better than trying to generate the entire thing first and then diff
> (since that runs out of memory pretty fast).

I don't think there is much point in generating a complete diff between
the patches for every commit, when nobody can look at the whole thing.
Unless we have automated tooling to find "interesting" bits (and
certainly a tool to remove the boring "a comment got shifted by one"
lines would help; those are all known improvements, but it's the _other_
stuff we want to look).

But if we are not using automated tooling to find the needle in the
haystack, we might as well using sampling to make the dataset more
manageable. Adding "--since=1.year.ago" is one way, though we may want
to sample more randomly across time.

> So far, I haven't spotted anything that would want me to disable it,
> while I've spotted several cases where I felt that readability was
> improved. It's somewhat difficult to spot though.

I did find one case that I think is worse. Look at 857942fd1a in the
kernel. It has a pattern like this:

  ... surrounding code ...

  function_one();
  ... more surrounding code ...

which becomes:

  ... surrounding code ...

  function_two();

  ... more surrounding code

Without the new heuristic, that looks like:

  -function_one();
  +function_two();
  +

but with it, it becomes:

  +
  +function_two();

  -function_one();

which is kind of weird. Having the two directly next to each other reads
better to me. This is a pretty unusual diff, though, in that it did
change the surrounding whitespace (and if you look further in the diff,
the identical change is made elsewhere _without_ touching the
whitespace). So this is kind of an anomaly. And IMHO the weirdness here
is outweighed by the vast number of improvements elsewhere.

-Peff

  parent reply	other threads:[~2016-04-20  4:18 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-19 15:21 [PATCHv5 0/2] xdiff: implement empty line chunk heuristic Stefan Beller
2016-04-19 15:21 ` [PATCH 1/2] xdiff: add recs_match helper function Stefan Beller
2016-04-19 15:21 ` [PATCH 2/2] xdiff: implement empty line chunk heuristic Stefan Beller
     [not found]   ` <CA+P7+xoqn3fxEZGn02ST1XV-2UpQGr3iwV-37R8pakFJy_9n0w@mail.gmail.com>
2016-04-20  4:18     ` Jeff King [this message]
2016-04-20  4:37       ` Jeff King
2016-04-20  4:37       ` Stefan Beller
2016-04-29 20:29       ` Junio C Hamano
2016-04-29 20:59         ` Jacob Keller
2016-04-29 22:18           ` Junio C Hamano
2016-04-29 22:35             ` Stefan Beller
2016-04-29 22:39               ` Keller, Jacob E
2016-04-29 22:44                 ` Stefan Beller
2016-04-29 22:48                   ` Keller, Jacob E
2016-05-02 17:40                     ` Junio C Hamano
2016-05-02 17:45                       ` Stefan Beller
2016-05-02 18:02                       ` Jeff King
2016-05-03 17:55                         ` Jacob Keller
2016-04-30  3:06               ` Jeff King
2016-04-19 16:23 ` [PATCHv5 0/2] " Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2016-04-18 21:12 [PATCH 0/2 v4] " Stefan Beller
2016-04-18 21:12 ` [PATCH 2/2] " Stefan Beller
2016-04-18 22:04   ` Jacob Keller
2016-04-18 22:24     ` Junio C Hamano
2016-04-19  5:03   ` Jeff King
2016-04-19  6:47     ` Stefan Beller
2016-04-19  7:00       ` Jeff King
2016-04-19  7:05         ` Stefan Beller
2016-04-19 15:17     ` Stefan Beller
2016-04-19 17:06       ` Jeff King
2016-04-19 23:02         ` Jacob Keller
2016-04-19 23:07           ` Junio C Hamano
2016-04-20 13:12             ` Michael S. Tsirkin
2016-04-20 16:09               ` Junio C Hamano
2016-04-20 16:17                 ` Jeff King
2016-04-20  6:00         ` Junio C Hamano
2016-04-19 16:51     ` Junio C Hamano
2016-04-15 23:01 [RFC PATCH, WAS: "weird diff output?" v3a 0/2] implement shortest line diff " Stefan Beller
2016-04-15 23:01 ` [PATCH 2/2] xdiff: implement empty line " Stefan Beller
2016-04-15 23:05   ` Jacob Keller
2016-04-15 23:32     ` Jacob Keller
2016-04-15 23:45       ` Stefan Beller
2016-04-16  0:49   ` Junio C Hamano
2016-04-16  0:59     ` Stefan Beller
2016-04-16  1:07     ` Jacob Keller
2016-04-18 19:22       ` Junio C Hamano
2016-04-18 19:33         ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160420041827.GA7627@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jacob.e.keller@intel.com \
    --cc=jacob.keller@gmail.com \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).