git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Michael Haggerty <mhagger@alum.mit.edu>
Cc: Jeff King <peff@peff.net>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	Jacob Keller <jacob.keller@gmail.com>
Subject: Re: [PATCH] diff compaction heuristic: favor shortest neighboring blank lines
Date: Thu, 16 Jun 2016 14:36:12 -0700	[thread overview]
Message-ID: <CAGZ79kbr03yVT3SZXZTpE=Pdh11Ge5TP3LF_7r9_wP=CZM3SXg@mail.gmail.com> (raw)
In-Reply-To: <576315E1.2050405@alum.mit.edu>

>
> The heuristic I proposed was
>
> * Prefer to start and end hunks *following* lines with the least
>   indentation.
>
> * Define the "indentation" of a blank line to be the indentation of
>   the previous non-blank line minus epsilon.
>
> * In the case of a tie, prefer to slide the hunk down as far as
>   possible.
>
> If that's what you are trying to implement, then notice that the first
> rule says that the hunk should start *following* the line with the least
> indentation. So the "score" for

I did not try to implement that heuristic.

> * Prefer to start and end hunks *following* lines with the least
>   indentation.

We had this alone as a heuristic and it turned out badly. (More
false positives than the current "empty line only" thing)

It may however be a good starting block when combined with the other
conditions, as blank lines are assigned a longer virtual length than
what they have.

> * In the case of a tie, prefer to slide the hunk down as far as
>   possible.
>

This is what the current implementation does as well. (i.e. if there is
more than one blank line, we have a tie, so choose the last one as the
end of the hunk)


So what I am proposing in this patch:

 * Prefer to start and end hunks *following* lines with an empty line.
   (as implemented currently)

 * In the case of a tie, make use of a second tier heuristic.

 * Define the "length" of a blank line to be the
    minimum of the amount of white space in the line before and after

 * Choose that empty line (first tier) that has the shortest length
(second tier)

 * In the case of a tie, prefer to slide the hunk down as far as
   possible.

I would be really interested how these two heuristics compare in practice.

>
>> +        end
>> +
>> +        def bal
>
> would be the indentation of the line preceding "end", which is larger
> than the indentation of "end". And the score for
>
>> +                do_bal_stuff()
>> +
>> +                common_ending()
>
> would come from the line preceding "do_bal_stuff()", namely "def bal()",
> which is indented the same as "end". So the latter would be preferred.
>
> But actually, I don't understand how your example is meaningful. I think
> this heuristic is only used to slide hunks around; i.e., when the line
> following the hunk is the same as the first lines of the hunk, or when
> the line preceding the hunk is the same as the last line of the hunk.

Right.

> Right? So your snippets would never compete against each other.

right, I was using that to show why some parts are more favorable to
put a break in (i.e. the transisiton from "+ lines" to surrounding lines.
I was not clear enough there. :(

> Let's
> take the full example. The five possible placements for the `+`
> characters are marked with the digits 1 through 5 here:
>
>>              def bar
>>                      do_bar_stuff()
>> 1
>> 12                   common_ending()
>> 123          end
>> 1234
>> 12345        def bal
>> 12345                do_bal_stuff()
>>  2345
>>   345                common_ending()
>>    45        end
>>     5
>>              def baz
>>                      do_baz_stuff()
>>
>>                      common_ending()
>>              end
>
> Of the lines preceding the candidate hunks, the blank line preceding the
> hunk marked "5" has the smallest indent, namely "epsilon" less than the
> indentation of the "end" line preceding it. So placement "5" would be
> chosen.

I agree 5 is the best here, but for other reasons.

Imagine /s/end/long final boilerplate code/, and it may be more clear.
What I am trying to say is, that you "indentation" of the blank line is not
reaching until the end of the "end" line, but rather to the front of "end".

(If you don't care of white spaces in the git history and have an editor, that
makes fancy white space stuff, i.e. your text editor cursor is in the "end" line
and you press "enter" to get to the next line, it is most likely indented to
strlen ("<end line>") - strlen("end"), as that is how much indentation we had
in the preceding line?

>
> I don't know enough about this area of the code to review your patch in
> detail, but I did note below a typo that jumped out at me.

I did not know the code either before writing the patch. :)


>> +                                     if (min_bl_neigh_indent > bl_neigh_indent)
>> +                                             min_bl_neigh_indent = min_bl_neigh_indent;
>
> The line above has no effect (same variable on both sides of the =).

Thanks!

I had the feeling something was off judging from behavior,
but could not tell what exactly. This is it.

It should be

    min_bl_neigh_indent = bl_neigh_indent;

i.e. the minimum of the existing and newly computed surrounding line
starting blanks.

Junio writes:
> Now, on what column does 'd' sit on the example below?
>
>        def foo
>
> I typed SP SP HT 'd' 'e' 'f'; it still is indented by 8 places.

I see, so for the TAB case we would need to do a

    ret += (8 - ret%8);

Thanks!
Stefan

  reply	other threads:[~2016-06-16 21:36 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-16 17:46 [PATCH] diff compaction heuristic: favor shortest neighboring blank lines Stefan Beller
2016-06-16 20:27 ` Junio C Hamano
2016-06-16 21:06   ` Stefan Beller
2016-06-16 21:10 ` Michael Haggerty
2016-06-16 21:36   ` Stefan Beller [this message]
2016-06-17 15:36 ` Jeff King
2016-06-17 16:09   ` Stefan Beller
2016-06-23 17:10     ` Michael Haggerty
2016-06-23 17:25       ` Stefan Beller
2016-06-23 17:37       ` Junio C Hamano
2016-06-23 20:13         ` Michael Haggerty
2016-06-30 13:54       ` Michael Haggerty
2016-07-01 17:04         ` diff heuristics dramatically improved by considering line indentation and " Michael Haggerty
2016-07-01 18:01         ` [PATCH] diff compaction heuristic: favor shortest neighboring " Junio C Hamano
2016-07-04 14:33           ` Jakub Narębski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGZ79kbr03yVT3SZXZTpE=Pdh11Ge5TP3LF_7r9_wP=CZM3SXg@mail.gmail.com' \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=jacob.keller@gmail.com \
    --cc=mhagger@alum.mit.edu \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).