From: Stefan Beller <sbeller@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Jeff King <peff@peff.net>,
"git@vger.kernel.org" <git@vger.kernel.org>,
Jacob Keller <jacob.keller@gmail.com>,
Michael Haggerty <mhagger@alum.mit.edu>
Subject: Re: [PATCH] diff compaction heuristic: favor shortest neighboring blank lines
Date: Thu, 16 Jun 2016 14:06:40 -0700 [thread overview]
Message-ID: <CAGZ79kYHO8q_CmePBxFUYxmhY6V_dS4M3djxCOrz5iJx_vFC-Q@mail.gmail.com> (raw)
In-Reply-To: <xmqqlh24516i.fsf@gitster.mtv.corp.google.com>
On Thu, Jun 16, 2016 at 1:27 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> ...
>> because there is less space between line start and {end, def bal}
>> than for {do_bal_stuff, common_ending}.
>
> I haven't thought this carefully yet, but would this equally work
> well for Python, where it does not have the "end" or does the lack
> of "end" pose a problem? You'll still find "def bal" is a good
> boundary (but you cannot tell if it is the beginning or the end of a
> block, unless you understand the language), though.
Good point. I found a flaw in my implementation
(as it doesn't match my mental model, not necessarily a bad thing)
We take the minimum of the two neighbors, i.e.
+ do_bal_stuff()
+
+ common_ending()
is preferrable to
+ do_bal_stuff()
+
+ common_ending()
and in python the example would look like:
def foo():
do_foo()
common_thing()
+ def baz():
+ do_baz()
+
+ common_thing()
+
def bar():
do_bar()
common_thing()
and breaking between
common_thing()
def bar():
is more favorable than between
do_baz()
common_thing()
because in the first former the count of
white space in front of "def bar():" is smaller
than for any of "do_baz()" and "common_thing()"
>
>> +static unsigned int leading_blank(const char *line)
>> +{
>> + unsigned int ret = 0;
>> + while (*line) {
>> + if (*line == '\t')
>> + ret += 8;
>
> This will be broken with a line with space-before-tab whitespace
> breakage, I suspect...
How so? We inspect each character on its own and then move on later
by line++. (I am not seeing how this could cause trouble, so please
help me?)
Going back to python, this may become a problem when you have a code like:
def baz():
do_baz()
common_thing()
def bar():
+ do_bal()
+
+ common_thing()
+
+def bar():
+
do_bar()
common_thing()
but this was fabricated with a typo (the first definition of bar
should have been bal),
(Also it doesn't worsen the diff, as it is same without the heuristic)
once that typo is fixed we get:
(both with and without the heuristic)
do_foo()
common_thing()
def baz():
do_baz()
common_thing()
+def bal():
+
+ do_bal()
+
+ common_thing()
+
def bar():
do_bar()
common_thing()
Clearly it can also be intentional to have 2 methods with the same
code for historical reasons, (even without the blank line after the
function definition this produces the same result)
When playing around with various diffs I could not find a thing that
this patch makes worse, it only fixes the actual issue.
(I realized Peff actually attached a script to produce a bad diff, which
is gone with this patch)
Thanks,
Stefan
next prev parent reply other threads:[~2016-06-16 21:06 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-16 17:46 [PATCH] diff compaction heuristic: favor shortest neighboring blank lines Stefan Beller
2016-06-16 20:27 ` Junio C Hamano
2016-06-16 21:06 ` Stefan Beller [this message]
2016-06-16 21:10 ` Michael Haggerty
2016-06-16 21:36 ` Stefan Beller
2016-06-17 15:36 ` Jeff King
2016-06-17 16:09 ` Stefan Beller
2016-06-23 17:10 ` Michael Haggerty
2016-06-23 17:25 ` Stefan Beller
2016-06-23 17:37 ` Junio C Hamano
2016-06-23 20:13 ` Michael Haggerty
2016-06-30 13:54 ` Michael Haggerty
2016-07-01 17:04 ` diff heuristics dramatically improved by considering line indentation and " Michael Haggerty
2016-07-01 18:01 ` [PATCH] diff compaction heuristic: favor shortest neighboring " Junio C Hamano
2016-07-04 14:33 ` Jakub Narębski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAGZ79kYHO8q_CmePBxFUYxmhY6V_dS4M3djxCOrz5iJx_vFC-Q@mail.gmail.com \
--to=sbeller@google.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jacob.keller@gmail.com \
--cc=mhagger@alum.mit.edu \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).