git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jacob Keller <jacob.keller@gmail.com>
To: Stefan Beller <sbeller@google.com>
Cc: Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>,
	"git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [BUG-ish] diff compaction heuristic false positive
Date: Fri, 10 Jun 2016 09:29:43 -0700	[thread overview]
Message-ID: <CA+P7+xp=bTPiwRRTH=h7v5pV8+=he4+789_3PNz227mv1387MA@mail.gmail.com> (raw)
In-Reply-To: <CAGZ79kZLT8AfmWTrrW+a-v7aXw5sm68P2H=vT7QZr2hj4Z2gDA@mail.gmail.com>

On Fri, Jun 10, 2016 at 9:25 AM, Stefan Beller <sbeller@google.com> wrote:
> On Fri, Jun 10, 2016 at 8:56 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> Jeff King <peff@peff.net> writes:
>>
>>> On Fri, Jun 10, 2016 at 03:50:43AM -0400, Jeff King wrote:
>>>
>>>> I found a false positive with the new compaction heuristic in v2.9:
>>>> [...]
>>>
>>> And by the way, this is less "hey neat, I found a case" and more "wow,
>>> this is a lot worse than I thought".
>>>
>>> I diffed the old and new output for the top 10,000 commits in this
>>> particular ruby code base. There were 45 commits with changed diffs.
>>> Spot-checking them manually, a little over 1/3 of them featured this bad
>>> pattern. The others looked like strict improvements.
>>>
>>> That's a lot worse than the outcomes we saw on other code bases earlier.
>>> 1/3 bad is still a net improvement, so I dunno. Is this worth worrying
>>> about? Should we bring back the documentation for the knob to disable
>>> it? Should we consider making it tunable via gitattributes?
>>>
>>> I don't think that last one really helps; the good cases _and_ the bad
>>> ones are both in ruby code (though certainly the C code we looked at
>>> earlier was all good).
>>>
>>> It may also be possible to make it Just Work by using extra information
>>> like indentation. I haven't thought hard enough about that to say.
>>>
>>> -Peff
>>
>> I recall saying "we'd end up being better in some and worse in
>> others" at the very beginning.  How about toggling the default back
>> for the upcoming release, keeping the experimentation knob in the
>> code, and try different heuristics like the "indentation" during the
>> next cycle?
>
> Sure. I thought about for a while now and by now I agree with Junio.
> No matter what kind of heuristic we can come up with it is easy to construct
> a counter example.
>
> That said, let's try the indentation thing, though I suspect
> one of the early motivating examples (an excerpt from a  kernel config file)
> would not do well with it, as it had not an indentation scheme as programming
> languages do.
>
> Thanks,
> Stefan

I think we could use the indentation trick and it might help in this
case. I agree, let's disable this for this cycle and experiment in the
next one. Good catch, Peff.

As others have said you will always be able to produce counter
examples, that's the nature of heuristics. The idea is to see if we
can come up with something simple that mostly improves the output,
even if sometimes it might have a negative impact on the outputs. But
I think we should avoid changing behavior unless it's mostly an
improvement.

Regards,
Jake

  reply	other threads:[~2016-06-10 16:30 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-10  7:50 [BUG-ish] diff compaction heuristic false positive Jeff King
2016-06-10  8:31 ` Jeff King
2016-06-10 15:56   ` Junio C Hamano
2016-06-10 16:25     ` Stefan Beller
2016-06-10 16:29       ` Jacob Keller [this message]
2016-06-10 18:13         ` Re* " Junio C Hamano
2016-06-10 18:21           ` Stefan Beller
2016-06-10 20:30           ` Jeff King
2016-06-10 20:48             ` [PATCH v2] diff: disable compaction heuristic for now Junio C Hamano
2016-06-10 20:53               ` Jeff King
2016-06-10 20:55               ` Junio C Hamano
2016-06-10 21:05                 ` Jeff King
2016-06-10 21:46                   ` Junio C Hamano
2016-06-10  8:31 ` [BUG-ish] diff compaction heuristic false positive Michael Haggerty
2016-06-10  8:41   ` Jeff King
2016-06-10 11:00     ` Michael Haggerty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+P7+xp=bTPiwRRTH=h7v5pV8+=he4+789_3PNz227mv1387MA@mail.gmail.com' \
    --to=jacob.keller@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).