From: Michael Haggerty <mhagger@alum.mit.edu>
To: git@vger.kernel.org
Cc: "Stefan Beller" <sbeller@google.com>,
"Junio C Hamano" <gitster@pobox.com>, "Jeff King" <peff@peff.net>,
"Jakub Narębski" <jnareb@gmail.com>,
"Jacob Keller" <jacob.keller@gmail.com>,
"Michael Haggerty" <mhagger@alum.mit.edu>
Subject: [PATCH v2 0/7] Better heuristics make prettier diffs
Date: Mon, 22 Aug 2016 13:22:39 +0200 [thread overview]
Message-ID: <cover.1471864378.git.mhagger@alum.mit.edu> (raw)
This is v2 of a patch series to improve the heuristics that `git diff`
and related commands use to position ambiguous blocks of added/deleted
lines. Thanks to Stefan, Jacob, Peff, Junio, and Jakub for their
comments about v1 [1]. This patch series is also available from my
GitHub account [2] as branch `diff-indent-heuristic`.
This version started out as a light touch-up of v1, but as I was
working I realized that it needed more changes. Among other things,
the final heuristic is now improved relative to the one proposed in
v1. Summary of changes:
* I changed the `--compaction-heuristic` code such that if a group of
added/deleted lines can be aligned with a group of deleted/added
lines in the other file, then that should be done rather than try to
slide to a blank line. In the existing code, the compaction
heuristic was attempted, incorrectly, *after* trying to align
groups, which doesn't make sense.
* I changed `recs_match()` similarly to `is_blank_line()`: now it
takes two `xrecord_t *` as arguments rather than an array of
`xrecord_t` and two indexes.
* I refactored the old `xdl_change_compact()` more thoroughly. All of
the index manipulation was pretty confusing, and I was having
trouble convincing myself that even the old code was working
correctly. So I introduced a `struct group`, which is like a cursor
that keeps track of a (possibly empty) group of changed lines in the
old or new version of a file. I added functions to initialize a
group to the first change in a file, to move to the next or previous
groups, and to slide a group forward or backward if possible (i.e.,
if the group is a "slider").
* In the course of that refactoring, I found (and fixed) another bug
in the `--compaction-heuristic` code: it didn't notice if the top
possible position of a slider had a blank line as its last line. See
the commit message of patch [5/5] for more details.
* I realized that the behavior of the indent heuristic from v1,
because it multiplied weighting factors times indent widths, would
behave differently if a project changed its convention for how many
spaces to use per level of indentation. That doesn't make sense, so
I changed the handling of indentation:
Now, the sum of the indentation width at the top and bottom of the
slider are added, *but those sums are only compared* between slider
positions, and the weight is multipled by -1, 0, or +1 depending on
whether the first indent sum is less than, equal, or greater than
the other. This should make the behavior relatively independent of
the project's indentation convention, and thus make the heuristic
work more consistently across projects.
The resulting heuristic works significantly better than the one
proposed in v1:
| repository | count | Git 2.9.0 | v1 | v2 |
| --------------------- | ----- | -------------- | -------------- | -------------- |
| afnetworking | 109 | 89 (81.7%) | 3 (2.8%) | 2 (1.8%) |
| alamofire | 30 | 18 (60.0%) | 2 (6.7%) | 0 (0.0%) |
| angular | 184 | 127 (69.0%) | 15 (8.2%) | 5 (2.7%) |
| animate | 313 | 2 (0.6%) | 2 (0.6%) | 2 (0.6%) |
| ant | 380 | 356 (93.7%) | 22 (5.8%) | 15 (3.9%) |
| bugzilla | 306 | 263 (85.9%) | 24 (7.8%) | 15 (4.9%) |
| corefx | 126 | 91 (72.2%) | 15 (11.9%) | 6 (4.8%) |
| couchdb | 78 | 44 (56.4%) | 17 (21.8%) | 6 (7.7%) |
| cpython | 937 | 158 (16.9%) | 26 (2.8%) | 5 (0.5%) |
| discourse | 160 | 95 (59.4%) | 24 (15.0%) | 13 (8.1%) |
| docker | 307 | 194 (63.2%) | 16 (5.2%) | 8 (2.6%) |
| electron | 163 | 132 (81.0%) | 15 (9.2%) | 6 (3.7%) |
| git | 536 | 470 (87.7%) | 18 (3.4%) | 16 (3.0%) |
| gitflow | 127 | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) |
| ionic | 133 | 89 (66.9%) | 6 (4.5%) | 1 (0.8%) |
| ipython | 482 | 362 (75.1%) | 45 (9.3%) | 11 (2.3%) |
| junit | 161 | 147 (91.3%) | 4 (2.5%) | 1 (0.6%) |
| lighttable | 15 | 5 (33.3%) | 2 (13.3%) | 0 (0.0%) |
| magit | 88 | 75 (85.2%) | 5 (5.7%) | 0 (0.0%) |
| neural-style | 28 | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) |
| nodejs | 781 | 649 (83.1%) | 98 (12.5%) | 5 (0.6%) |
| phpmyadmin | 491 | 481 (98.0%) | 7 (1.4%) | 2 (0.4%) |
| react-native | 168 | 130 (77.4%) | 5 (3.0%) | 0 (0.0%) |
| rust | 171 | 128 (74.9%) | 17 (9.9%) | 14 (8.2%) |
| spark | 186 | 149 (80.1%) | 17 (9.1%) | 2 (1.1%) |
| tensorflow | 115 | 66 (57.4%) | 7 (6.1%) | 5 (4.3%) |
| test-more | 19 | 15 (78.9%) | 1 (5.3%) | 1 (5.3%) |
| test-unit | 51 | 34 (66.7%) | 8 (15.7%) | 2 (3.9%) |
| xmonad | 23 | 22 (95.7%) | 1 (4.3%) | 1 (4.3%) |
| --------------------- | ----- | -------------- | -------------- | -------------- |
| totals | 6668 | 4391 (65.9%) | 422 (6.3%) | 144 (2.2%) |
* I noticed that most of the "bonus" weights were actually negative,
so calling them "bonuses" was misleading. Therefore, I negated the
coefficients and now call them "penalties"
* I noticed that `git blame` was parsing diff options like
`--indent-heuristic` from the command line, but wasn't using the
values. That is fixed in a new patch [07/07].
* I redid all of the analysis and training with a bigger corpus. To
check whether I was overtraining the heuristic, I also did the
following experiment: I optimized the weights against a training set
consisting of only some of the repositories, then tested it against
the rest of the corpus. See the full results in the commit message
for patch [06/06].
* I added a couple of smoke tests.
The companion project [3] now provides an easy way to replicate and
extend these results, if anybody is interested.
The new heuristic still has to be enabled via the `--indent-heuristic`
command-line parameter or the `diff.indentHeuristic` configuration
setting. Before we release this, we should decide what the final UI
should look like and make it so. If we decide to replace the existing
compaction heuristic with this one and/or turn this heuristic on for
all users by default, those steps might look like branch
`indent-heuristic-default` in my GitHub repository [2].
Michael
[1] http://public-inbox.org/git/cover.1470259583.git.mhagger@alum.mit.edu/t/#u
[2] https://github.com/mhagger/git
[3] https://github.com/mhagger/diff-slider-tools
Michael Haggerty (7):
xdl_change_compact(): fix compaction heuristic to adjust ixo
xdl_change_compact(): only use heuristic if group can't be matched
is_blank_line(): take a single xrecord_t as argument
recs_match(): take two xrecord_t pointers as arguments
xdl_change_compact(): introduce the concept of a change group
diff: improve positioning of add/delete blocks in diffs
blame: actually use the diff opts parsed from the command line
Documentation/diff-config.txt | 7 +-
Documentation/diff-options.txt | 9 +-
builtin/blame.c | 11 +-
diff.c | 23 +-
git-add--interactive.perl | 5 +-
t/t4059-diff-indent.sh | 160 +++++++++++
xdiff/xdiff.h | 1 +
xdiff/xdiffi.c | 635 ++++++++++++++++++++++++++++++++++-------
8 files changed, 736 insertions(+), 115 deletions(-)
create mode 100755 t/t4059-diff-indent.sh
--
2.9.3
next reply other threads:[~2016-08-22 11:24 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-22 11:22 Michael Haggerty [this message]
2016-08-22 11:22 ` [PATCH v2 1/7] xdl_change_compact(): fix compaction heuristic to adjust ixo Michael Haggerty
2016-08-22 11:22 ` [PATCH v2 2/7] xdl_change_compact(): only use heuristic if group can't be matched Michael Haggerty
2016-08-22 11:22 ` [PATCH v2 3/7] is_blank_line(): take a single xrecord_t as argument Michael Haggerty
2016-08-22 11:22 ` [PATCH v2 4/7] recs_match(): take two xrecord_t pointers as arguments Michael Haggerty
2016-08-22 11:22 ` [PATCH v2 5/7] xdl_change_compact(): introduce the concept of a change group Michael Haggerty
2016-08-23 21:35 ` Junio C Hamano
2016-08-22 11:22 ` [PATCH v2 6/7] diff: improve positioning of add/delete blocks in diffs Michael Haggerty
2016-08-22 11:22 ` [PATCH v2 7/7] blame: actually use the diff opts parsed from the command line Michael Haggerty
2016-08-23 9:56 ` René Scharfe
2016-09-05 4:12 ` Michael Haggerty
2016-09-07 17:58 ` Junio C Hamano
2016-08-23 17:01 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1471864378.git.mhagger@alum.mit.edu \
--to=mhagger@alum.mit.edu \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jacob.keller@gmail.com \
--cc=jnareb@gmail.com \
--cc=peff@peff.net \
--cc=sbeller@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).