git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Michael Haggerty <mhagger@alum.mit.edu>
Cc: git@vger.kernel.org, "Stefan Beller" <sbeller@google.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"Jakub Narębski" <jnareb@gmail.com>,
	"Jacob Keller" <jacob.keller@gmail.com>
Subject: Re: [PATCH 8/8] diff: improve positioning of add/delete blocks in diffs
Date: Thu, 4 Aug 2016 03:56:32 -0400	[thread overview]
Message-ID: <20160804075631.jakbi5dbsbxsqcpr@sigill.intra.peff.net> (raw)
In-Reply-To: <7b0680ed7a10fc13acd8d7816a75ed05a5f9e28c.1470259583.git.mhagger@alum.mit.edu>

On Thu, Aug 04, 2016 at 12:00:36AM +0200, Michael Haggerty wrote:

> This table shows the number of diff slider groups that were positioned
> differently than the human-generated values, for various repositories.
> "default" is the default "git diff" algorithm. "compaction" is Git 2.9.0
> with the `--compaction-heuristic` option "indent" is an earlier,

s/option/&./

>  static int diff_detect_rename_default;
> +static int diff_indent_heuristic; /* experimental */
>  static int diff_compaction_heuristic; /* experimental */

These two flags are mutually exclusive in the xdiff code, so we should
probably handle that here.

TBH, I do not care that much what:

  [diff]
  compactionHeuristic = true
  indentHeuristic = true

does. But right now:

  git config diff.compactionHeuristic true
  git show --indent-heuristic

still prefers the compaction heuristic, which I think is objectively
wrong.

So perhaps we need a single variable:

  enum {
    DIFF_HEURISTIC_COMPACTION,
    DIFF_HEURISTIC_INDENT
  } diff_heuristic;

and set it in last-one-wins fashion (it would be nice if the config and
command line options were shaped the same way so it's clear to the user
that they are exclusive, but we may have to keep --compaction-heuristic
around for compatibility, as an alias for --diff-heuristic=compaction).

> diff --git a/git-add--interactive.perl b/git-add--interactive.perl
> index 642cce1..ee3d812 100755
> --- a/git-add--interactive.perl
> +++ b/git-add--interactive.perl
> @@ -45,6 +45,7 @@ my ($diff_new_color) =
>  my $normal_color = $repo->get_color("", "reset");
>  
>  my $diff_algorithm = $repo->config('diff.algorithm');
> +my $diff_indent_heuristic = $repo->config_bool('diff.indentheuristic');
>  my $diff_compaction_heuristic = $repo->config_bool('diff.compactionheuristic');

Nice touch.

Unfortunately the mutual-exclusivity handling will probably bleed over
to here, too.

> +/*
> + * If a line is indented more than this, get_indent() just returns this value.
> + * This avoids having to do absurd amounts of work for data that are not
> + * human-readable text, and also ensures that the output of get_indent fits within
> + * an int.
> + */
> +#define MAX_INDENT 200

Speaking of absurd amounts of work, I was curious if there was a
noticeable performance penalty for using this heuristic (just because
it's a lot more complicated than the others). I couldn't detect any
differences running "git log -p --no-merges -3000" on git.git with no
heuristic, compaction, and indent. There may be other repositories that
behave more pathologically (it looks like having 20 blank lines at the
end of each hunk?), but I'd guess in most cases this will always be
drowned out in the noise of doing the actual diff.

> +#define START_OF_FILE_BONUS 9
> +#define END_OF_FILE_BONUS 46
> +#define TOTAL_BLANK_WEIGHT 4
> +#define PRE_BLANK_WEIGHT 16
> +#define RELATIVE_INDENT_BONUS -1
> +#define RELATIVE_INDENT_HAS_BLANK_BONUS 15
> +#define RELATIVE_OUTDENT_BONUS -19
> +#define RELATIVE_OUTDENT_HAS_BLANK_BONUS 2
> +#define RELATIVE_DEDENT_BONUS -63
> +#define RELATIVE_DEDENT_HAS_BLANK_BONUS 50

I see there is a comment below here mentioning that these are empirical
voodoo, but it might be worth one at the top (or just moving these below
the comment) because the comment looks like it's just associated with
the function (and these are sufficiently bizarre that anybody reading is
going to double-take on them).

> +        return 10 * score - bonus;

I don't mind this not "10" not being a #define constant, but after
reading the exchange between you and Stefan, I think it would be nice to
describe what it is in a comment. The rest of the function is commented
so nicely that this one left me thinking "huh?" upon seeing the "10".

The rest looks sane to me, though I am not sure I have absorbed all the
implications. IMHO the most interesting thing is the actual results,
though.

-Peff

  parent reply	other threads:[~2016-08-04  7:56 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-03 22:00 [PATCH 0/8] Better heuristics make prettier diffs Michael Haggerty
2016-08-03 22:00 ` [PATCH 1/8] xdl_change_compact(): rename some local variables for clarity Michael Haggerty
2016-08-04  7:06   ` Jeff King
2016-08-04 18:24     ` Junio C Hamano
2016-08-13 19:38     ` Michael Haggerty
2016-08-14 12:26       ` Jeff King
2016-08-03 22:00 ` [PATCH 2/8] xdl_change_compact(): clarify code Michael Haggerty
2016-08-03 22:11   ` Stefan Beller
2016-08-03 23:14     ` Michael Haggerty
2016-08-03 23:50       ` Stefan Beller
2016-08-04  7:13         ` Jeff King
2016-08-10 16:39         ` Michael Haggerty
2016-08-10 16:58           ` Stefan Beller
2016-08-03 22:00 ` [PATCH 3/8] xdl_change_compact(): rename i to end Michael Haggerty
2016-08-04  7:16   ` Jeff King
2016-08-03 22:00 ` [PATCH 4/8] xdl_change_compact(): do one final shift or the other, not both Michael Haggerty
2016-08-03 22:00 ` [PATCH 5/8] xdl_change_compact(): fix compaction heuristic to adjust io Michael Haggerty
2016-08-04  7:27   ` Jeff King
2016-08-10 16:58     ` Michael Haggerty
2016-08-10 17:09       ` Michael Haggerty
2016-08-11  4:16       ` Jeff King
2016-08-04 18:43   ` Junio C Hamano
2016-08-10 17:13     ` Michael Haggerty
2016-08-03 22:00 ` [PATCH 6/8] xdl_change_compact(): keep track of the earliest end Michael Haggerty
2016-08-04 18:46   ` Junio C Hamano
2016-08-10 17:16     ` Michael Haggerty
2016-08-03 22:00 ` [PATCH 7/8] is_blank_line: take a single xrecord_t as argument Michael Haggerty
2016-08-04 18:48   ` Junio C Hamano
2016-08-03 22:00 ` [PATCH 8/8] diff: improve positioning of add/delete blocks in diffs Michael Haggerty
2016-08-03 22:29   ` Jacob Keller
2016-08-03 22:36     ` Michael Haggerty
2016-08-04  4:47       ` Jacob Keller
2016-08-04 19:39       ` Junio C Hamano
2016-08-10 19:01         ` Michael Haggerty
2016-08-10 21:28           ` Junio C Hamano
2016-08-03 22:30   ` Stefan Beller
2016-08-03 22:41     ` Michael Haggerty
2016-08-03 22:51       ` Stefan Beller
2016-08-03 23:30         ` Michael Haggerty
2016-08-04  0:04           ` Stefan Beller
2016-08-10 19:12             ` Michael Haggerty
2016-08-04  7:56   ` Jeff King [this message]
2016-08-04 16:55     ` Stefan Beller
2016-08-04 19:47       ` Junio C Hamano
2016-08-13  0:09       ` Michael Haggerty
2016-08-12 23:25     ` Michael Haggerty
2016-08-13  8:59       ` Jeff King
2016-08-13 15:59         ` Junio C Hamano
2016-08-14  7:21           ` Jacob Keller
2016-08-15  6:33         ` Stefan Beller
2016-08-15 20:24           ` Junio C Hamano
2016-08-04 19:52   ` Junio C Hamano
2016-08-13  0:11     ` Michael Haggerty
2016-08-03 22:08 ` [PATCH 0/8] Better heuristics make prettier diffs Michael Haggerty
2016-08-04  7:38 ` Jeff King
2016-08-04 19:54   ` Junio C Hamano
2016-08-04 20:01     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160804075631.jakbi5dbsbxsqcpr@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jacob.keller@gmail.com \
    --cc=jnareb@gmail.com \
    --cc=mhagger@alum.mit.edu \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).