git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Junio C Hamano <gitster@pobox.com>
Cc: Stefan Beller <sbeller@google.com>, Jeff King <peff@peff.net>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	Jacob Keller <jacob.keller@gmail.com>
Subject: Re: [PATCH] diff compaction heuristic: favor shortest neighboring blank lines
Date: Thu, 23 Jun 2016 22:13:35 +0200	[thread overview]
Message-ID: <576C42EF.80400@alum.mit.edu> (raw)
In-Reply-To: <xmqq1t3nkdrm.fsf@gitster.mtv.corp.google.com>

On 06/23/2016 07:37 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
> 
>> Scoring heuristic:
>>
>>>     # A place to accumulate bonus factors (positive makes this
>>>     # index more favored):
>>>     bonus = 0
>>>
>>>     # Bonuses based on the location of blank lines:
>>>     if pre_blank and not blank:
>>>         bonus += 3
>>>     elif blank and not pre_blank:
>>>         bonus += 2
>>>     elif blank and pre_blank:
>>>         bonus += 1
>>>
>>>     # Now fill in missing indent values:
>>>     if pre_indent is None:
>>>         pre_indent = 0
>>>
>>>     if post_indent is None:
>>>         post_indent = 0
> 
> These "missing" are to handle the lines at the boundary of shiftable
> block, or do you have pre_indent if a line is the earliest possible
> one that could be included in the hunk by shifting, as long as it is
> not at the beginning of the file?

The missing values are meant to handle the case where there are only
blank lines between the add/delete block and the beginning/end of the
file. Though currently (due to a limitation of the prototype) I only
pass context up to the next add/delete line, so that fallback also kicks
in if there are only blank lines between this add/delete block and the
next added/deleted line. When that is fixed, I think it might help to
give beginning/end of file a little bit more bonus.

This isn't all implemented yet, but I think the best score for an
add/delete block positioning would be the sum of three values. Let's
take the following "complete file" diff as an example, and consider an
add block with the following positioning:

> diff --git a/file.txt b/file.txt
> index 9405325..6a6b95c 100644
> --- a/file.txt
> +++ b/file.txt
> @@ -1,5 +1,7 @@
>  a
>  b
>  c
> +x
> +c
>  d
>  e

The score for splitting the old version of the file between
"bc" and "de", plus

the score for splitting the new version of the file between
"bc" and "xc", plus

the score for splitting the new version of the file between
"xc" and "de".

> I think "this is indented deeper than the previous, and it wants to
> stay with the previous if possible" is an excellent heuristic, which
> you have below.  I wonder if "however, the previous cannot be
> included in the hunk by shifting, hence we try to keep this together
> with it by favouring its exclusion from the hunk" could be part of
> the same heuristics, if you do have pre_indent value for a line at
> the boundary of shiftable block.

That's automatic, isn't it? If splitting between two lines is assigned a
high cost, then the algorithm will avoid splitting there *either* by
including both lines in the hunk *or* by omitting both lines.

> Also I am wondering if 0 is a sensible thing to use as the fallback
> value.  When pre_indent is not definable, turning it to 0 and
> declaring that our indented line is indented deeper than the
> previous line and penalizing with "bonus -= 4" (which you do) feels
> just as fishy as turning the undefined pre_indent to maxint and
> declaring that our indented line is indented shallower than the
> previous line (which you don't, but is just as justifiable, I would
> think).

Once the algorithm is fixed to consider the whole file, then the
fallback value will only be used at the beginning and end of the file,
where an indent of 0 seems like the natural thing to do. As I mentioned
above, it might even help to give beginning/end of file a little more
bonus than that.

But really, testing against real-world diffs will be the best basis for
adjusting the heuristic.

Michael


  reply	other threads:[~2016-06-23 20:13 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-16 17:46 [PATCH] diff compaction heuristic: favor shortest neighboring blank lines Stefan Beller
2016-06-16 20:27 ` Junio C Hamano
2016-06-16 21:06   ` Stefan Beller
2016-06-16 21:10 ` Michael Haggerty
2016-06-16 21:36   ` Stefan Beller
2016-06-17 15:36 ` Jeff King
2016-06-17 16:09   ` Stefan Beller
2016-06-23 17:10     ` Michael Haggerty
2016-06-23 17:25       ` Stefan Beller
2016-06-23 17:37       ` Junio C Hamano
2016-06-23 20:13         ` Michael Haggerty [this message]
2016-06-30 13:54       ` Michael Haggerty
2016-07-01 17:04         ` diff heuristics dramatically improved by considering line indentation and " Michael Haggerty
2016-07-01 18:01         ` [PATCH] diff compaction heuristic: favor shortest neighboring " Junio C Hamano
2016-07-04 14:33           ` Jakub Narębski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=576C42EF.80400@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jacob.keller@gmail.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).