From: Jeff King <peff@peff.net>
To: Jacob Keller <jacob.keller@gmail.com>
Cc: Stefan Beller <sbeller@google.com>,
Junio C Hamano <gitster@pobox.com>,
Git mailing list <git@vger.kernel.org>,
Jacob Keller <jacob.e.keller@intel.com>
Subject: Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
Date: Wed, 20 Apr 2016 00:18:27 -0400 [thread overview]
Message-ID: <20160420041827.GA7627@sigill.intra.peff.net> (raw)
In-Reply-To: <CA+P7+xoqn3fxEZGn02ST1XV-2UpQGr3iwV-37R8pakFJy_9n0w@mail.gmail.com>
[your original probably didn't make it to the list because of its 5MB
attachment; the list has a 100K limit; I'll try to quote liberally]
On Tue, Apr 19, 2016 at 04:17:50PM -0700, Jacob Keller wrote:
> I ran this version of the patch against the entire Linux kernel
> history, as I figured this has a large batch of C code to try and spot
> any issues.
>
> I ran something like the following command in bash
>
> $git rev-list HEAD | while read -r rev; do diff -F ^commit -u <(git
> show --format="commit %H" --no-compaction-heuristic $rev) <(git show
> --format="commit %H" --compaction-heuristic $rev); done >
> heuristic.patch
My earlier tests with the perl script were all done with "git log -p",
which will not show anything at all for merges (and my script wouldn't
know how to deal with combined diffs anyway). But I think this new patch
_will_ kick in for combined diffs (because it is built on individual
diffs). It will be interesting to see if this has any effect there, and
what it looks like.
We should be able to see it (on a small enough repository) with:
git log --format='commit %H' --cc --merges
and comparing the before/after.
> I've attached the file that I generated for the Linux history, it's
> rather large so hopefully I can get some help to spot any differences.
> The above approach will work for pretty much any repository, and works
> better than trying to generate the entire thing first and then diff
> (since that runs out of memory pretty fast).
I don't think there is much point in generating a complete diff between
the patches for every commit, when nobody can look at the whole thing.
Unless we have automated tooling to find "interesting" bits (and
certainly a tool to remove the boring "a comment got shifted by one"
lines would help; those are all known improvements, but it's the _other_
stuff we want to look).
But if we are not using automated tooling to find the needle in the
haystack, we might as well using sampling to make the dataset more
manageable. Adding "--since=1.year.ago" is one way, though we may want
to sample more randomly across time.
> So far, I haven't spotted anything that would want me to disable it,
> while I've spotted several cases where I felt that readability was
> improved. It's somewhat difficult to spot though.
I did find one case that I think is worse. Look at 857942fd1a in the
kernel. It has a pattern like this:
... surrounding code ...
function_one();
... more surrounding code ...
which becomes:
... surrounding code ...
function_two();
... more surrounding code
Without the new heuristic, that looks like:
-function_one();
+function_two();
+
but with it, it becomes:
+
+function_two();
-function_one();
which is kind of weird. Having the two directly next to each other reads
better to me. This is a pretty unusual diff, though, in that it did
change the surrounding whitespace (and if you look further in the diff,
the identical change is made elsewhere _without_ touching the
whitespace). So this is kind of an anomaly. And IMHO the weirdness here
is outweighed by the vast number of improvements elsewhere.
-Peff
next prev parent reply other threads:[~2016-04-20 4:18 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-19 15:21 [PATCHv5 0/2] xdiff: implement empty line chunk heuristic Stefan Beller
2016-04-19 15:21 ` [PATCH 1/2] xdiff: add recs_match helper function Stefan Beller
2016-04-19 15:21 ` [PATCH 2/2] xdiff: implement empty line chunk heuristic Stefan Beller
[not found] ` <CA+P7+xoqn3fxEZGn02ST1XV-2UpQGr3iwV-37R8pakFJy_9n0w@mail.gmail.com>
2016-04-20 4:18 ` Jeff King [this message]
2016-04-20 4:37 ` Jeff King
2016-04-20 4:37 ` Stefan Beller
2016-04-29 20:29 ` Junio C Hamano
2016-04-29 20:59 ` Jacob Keller
2016-04-29 22:18 ` Junio C Hamano
2016-04-29 22:35 ` Stefan Beller
2016-04-29 22:39 ` Keller, Jacob E
2016-04-29 22:44 ` Stefan Beller
2016-04-29 22:48 ` Keller, Jacob E
2016-05-02 17:40 ` Junio C Hamano
2016-05-02 17:45 ` Stefan Beller
2016-05-02 18:02 ` Jeff King
2016-05-03 17:55 ` Jacob Keller
2016-04-30 3:06 ` Jeff King
2016-04-19 16:23 ` [PATCHv5 0/2] " Junio C Hamano
-- strict thread matches above, loose matches on Subject: below --
2016-04-18 21:12 [PATCH 0/2 v4] " Stefan Beller
2016-04-18 21:12 ` [PATCH 2/2] " Stefan Beller
2016-04-18 22:04 ` Jacob Keller
2016-04-18 22:24 ` Junio C Hamano
2016-04-19 5:03 ` Jeff King
2016-04-19 6:47 ` Stefan Beller
2016-04-19 7:00 ` Jeff King
2016-04-19 7:05 ` Stefan Beller
2016-04-19 15:17 ` Stefan Beller
2016-04-19 17:06 ` Jeff King
2016-04-19 23:02 ` Jacob Keller
2016-04-19 23:07 ` Junio C Hamano
2016-04-20 13:12 ` Michael S. Tsirkin
2016-04-20 16:09 ` Junio C Hamano
2016-04-20 16:17 ` Jeff King
2016-04-20 6:00 ` Junio C Hamano
2016-04-19 16:51 ` Junio C Hamano
2016-04-15 23:01 [RFC PATCH, WAS: "weird diff output?" v3a 0/2] implement shortest line diff " Stefan Beller
2016-04-15 23:01 ` [PATCH 2/2] xdiff: implement empty line " Stefan Beller
2016-04-15 23:05 ` Jacob Keller
2016-04-15 23:32 ` Jacob Keller
2016-04-15 23:45 ` Stefan Beller
2016-04-16 0:49 ` Junio C Hamano
2016-04-16 0:59 ` Stefan Beller
2016-04-16 1:07 ` Jacob Keller
2016-04-18 19:22 ` Junio C Hamano
2016-04-18 19:33 ` Stefan Beller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160420041827.GA7627@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jacob.e.keller@intel.com \
--cc=jacob.keller@gmail.com \
--cc=sbeller@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).