git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Jeff King <peff@peff.net>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	Jacob Keller <jacob.keller@gmail.com>,
	Michael Haggerty <mhagger@alum.mit.edu>
Subject: Re: [PATCH] diff compaction heuristic: favor shortest neighboring blank lines
Date: Thu, 16 Jun 2016 14:06:40 -0700	[thread overview]
Message-ID: <CAGZ79kYHO8q_CmePBxFUYxmhY6V_dS4M3djxCOrz5iJx_vFC-Q@mail.gmail.com> (raw)
In-Reply-To: <xmqqlh24516i.fsf@gitster.mtv.corp.google.com>

On Thu, Jun 16, 2016 at 1:27 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> ...
>> because there is less space between line start and {end, def bal}
>> than for {do_bal_stuff, common_ending}.
>
> I haven't thought this carefully yet, but would this equally work
> well for Python, where it does not have the "end" or does the lack
> of "end" pose a problem?  You'll still find "def bal" is a good
> boundary (but you cannot tell if it is the beginning or the end of a
> block, unless you understand the language), though.

Good point.  I found a flaw in my implementation
(as it doesn't match my mental model, not necessarily a bad thing)

We take the minimum of the two neighbors, i.e.

+                do_bal_stuff()
+
+        common_ending()

is preferrable to

+                do_bal_stuff()
+
+                common_ending()

and in python the example would look like:

    def foo():
        do_foo()

        common_thing()

+    def baz():
+        do_baz()
+
+        common_thing()
+
    def bar():
        do_bar()

        common_thing()

and breaking between

        common_thing()

    def bar():

is more favorable than between

        do_baz()

        common_thing()

because in the first former the count of
white space in front of "def bar():" is smaller
than for any of "do_baz()" and "common_thing()"


>
>> +static unsigned int leading_blank(const char *line)
>> +{
>> +     unsigned int ret = 0;
>> +     while (*line) {
>> +             if (*line == '\t')
>> +                     ret += 8;
>
> This will be broken with a line with space-before-tab whitespace
> breakage, I suspect...

How so? We inspect each character on its own and then move on later
by line++. (I am not seeing how this could cause trouble, so please
help me?)

Going back to python, this may become a problem when you have a code like:

 def baz():

        do_baz()

        common_thing()

 def bar():

+       do_bal()
+
+       common_thing()
+
+def bar():
+
        do_bar()

        common_thing()


but this was fabricated with a typo (the first definition of bar
should have been bal),
(Also it doesn't worsen the diff, as it is same without the heuristic)

once that typo is fixed we get:
(both with and without the heuristic)

        do_foo()

        common_thing()

 def baz():
        do_baz()

        common_thing()

+def bal():
+
+       do_bal()
+
+       common_thing()
+
 def bar():

        do_bar()

        common_thing()

Clearly it can also be intentional to have 2 methods with the same
code for historical reasons, (even without the blank line after the
function definition this produces the same result)

When playing around with various diffs I could not find a thing that
this patch makes worse, it only fixes the actual issue.
(I realized Peff actually attached a script to produce a bad diff, which
is gone with this patch)

Thanks,
Stefan

  reply	other threads:[~2016-06-16 21:06 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-16 17:46 [PATCH] diff compaction heuristic: favor shortest neighboring blank lines Stefan Beller
2016-06-16 20:27 ` Junio C Hamano
2016-06-16 21:06   ` Stefan Beller [this message]
2016-06-16 21:10 ` Michael Haggerty
2016-06-16 21:36   ` Stefan Beller
2016-06-17 15:36 ` Jeff King
2016-06-17 16:09   ` Stefan Beller
2016-06-23 17:10     ` Michael Haggerty
2016-06-23 17:25       ` Stefan Beller
2016-06-23 17:37       ` Junio C Hamano
2016-06-23 20:13         ` Michael Haggerty
2016-06-30 13:54       ` Michael Haggerty
2016-07-01 17:04         ` diff heuristics dramatically improved by considering line indentation and " Michael Haggerty
2016-07-01 18:01         ` [PATCH] diff compaction heuristic: favor shortest neighboring " Junio C Hamano
2016-07-04 14:33           ` Jakub Narębski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79kYHO8q_CmePBxFUYxmhY6V_dS4M3djxCOrz5iJx_vFC-Q@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jacob.keller@gmail.com \
    --cc=mhagger@alum.mit.edu \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).