git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Michael Haggerty <mhagger@alum.mit.edu>
Cc: Stefan Beller <sbeller@google.com>, Jeff King <peff@peff.net>,
	"git\@vger.kernel.org" <git@vger.kernel.org>,
	Jacob Keller <jacob.keller@gmail.com>
Subject: Re: [PATCH] diff compaction heuristic: favor shortest neighboring blank lines
Date: Thu, 23 Jun 2016 10:37:17 -0700	[thread overview]
Message-ID: <xmqq1t3nkdrm.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <576C1803.5050905@alum.mit.edu> (Michael Haggerty's message of "Thu, 23 Jun 2016 19:10:27 +0200")

Michael Haggerty <mhagger@alum.mit.edu> writes:

> Scoring heuristic:
>
>>     # A place to accumulate bonus factors (positive makes this
>>     # index more favored):
>>     bonus = 0
>> 
>>     # Bonuses based on the location of blank lines:
>>     if pre_blank and not blank:
>>         bonus += 3
>>     elif blank and not pre_blank:
>>         bonus += 2
>>     elif blank and pre_blank:
>>         bonus += 1
>> 
>>     # Now fill in missing indent values:
>>     if pre_indent is None:
>>         pre_indent = 0
>> 
>>     if post_indent is None:
>>         post_indent = 0

These "missing" are to handle the lines at the boundary of shiftable
block, or do you have pre_indent if a line is the earliest possible
one that could be included in the hunk by shifting, as long as it is
not at the beginning of the file?

I think "this is indented deeper than the previous, and it wants to
stay with the previous if possible" is an excellent heuristic, which
you have below.  I wonder if "however, the previous cannot be
included in the hunk by shifting, hence we try to keep this together
with it by favouring its exclusion from the hunk" could be part of
the same heuristics, if you do have pre_indent value for a line at
the boundary of shiftable block.

Also I am wondering if 0 is a sensible thing to use as the fallback
value.  When pre_indent is not definable, turning it to 0 and
declaring that our indented line is indented deeper than the
previous line and penalizing with "bonus -= 4" (which you do) feels
just as fishy as turning the undefined pre_indent to maxint and
declaring that our indented line is indented shallower than the
previous line (which you don't, but is just as justifiable, I would
think).

Thanks.

>>     if blank:
>>         indent = post_indent
>> 
>>     if indent > pre_indent:
>>         # The line is indented more than its predecessor. It is
>>         # preferable to keep these lines together, so we score it
>>         # based on the larger indent:
>>         score = indent
>>         bonus -= 4
>> 
>>     elif indent < pre_indent:
>>         # The line is indented less than its predecessor. It could
>>         # be that this line is the start of a new block (e.g., of
>>         # an "else" block, or of a block without a block
>>         # terminator) or it could be the end of the previous
>>         # block.
>>         if indent < post_indent:
>>             # The following line is indented more. So it is likely
>>             # that this line is the start of a block. It's a
>>             # pretty good place to split, so score it based on the
>>             # smaller indent:
>>             score = indent
>>             bonus += 2
>>         else:
>>             # This was probably the end of a block. We score based
>>             # on the line's own indent:
>>             score = indent
>> 
>>     else:
>>         # The line has the same indentation level as its
>>         # predecessor. We score it based on its own indent:
>>         score = indent
>> 
>>     return 10 * score - bonus

  parent reply	other threads:[~2016-06-23 17:37 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-16 17:46 [PATCH] diff compaction heuristic: favor shortest neighboring blank lines Stefan Beller
2016-06-16 20:27 ` Junio C Hamano
2016-06-16 21:06   ` Stefan Beller
2016-06-16 21:10 ` Michael Haggerty
2016-06-16 21:36   ` Stefan Beller
2016-06-17 15:36 ` Jeff King
2016-06-17 16:09   ` Stefan Beller
2016-06-23 17:10     ` Michael Haggerty
2016-06-23 17:25       ` Stefan Beller
2016-06-23 17:37       ` Junio C Hamano [this message]
2016-06-23 20:13         ` Michael Haggerty
2016-06-30 13:54       ` Michael Haggerty
2016-07-01 17:04         ` diff heuristics dramatically improved by considering line indentation and " Michael Haggerty
2016-07-01 18:01         ` [PATCH] diff compaction heuristic: favor shortest neighboring " Junio C Hamano
2016-07-04 14:33           ` Jakub Narębski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq1t3nkdrm.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=jacob.keller@gmail.com \
    --cc=mhagger@alum.mit.edu \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).