git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Stefan Beller <sbeller@google.com>
Cc: Jacob Keller <jacob.keller@gmail.com>,
	Michael Haggerty <mhagger@alum.mit.edu>,
	Junio C Hamano <gitster@pobox.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: What's cooking in git.git (Jun 2017, #03; Mon, 5)
Date: Wed, 7 Jun 2017 23:58:28 +0200	[thread overview]
Message-ID: <CACBZZX6FR-KD-TpRaGjLR0MfUt62w0KvYpikK7WcTS2EMQ2L8w@mail.gmail.com> (raw)
In-Reply-To: <CAGZ79kZVB9Ld8m+Zjps0ysEvXaptp2_FzimqRhiOHEBfXdX91Q@mail.gmail.com>

On Wed, Jun 7, 2017 at 8:28 PM, Stefan Beller <sbeller@google.com> wrote:
> On Tue, Jun 6, 2017 at 3:05 PM, Jacob Keller <jacob.keller@gmail.com> wrote:
>> On Tue, Jun 6, 2017 at 2:50 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>>> On Mon, Jun 5, 2017 at 8:23 PM, Stefan Beller <sbeller@google.com> wrote:
>>>>
>>>> > [...]
>>>> >  "git diff" has been taught to optionally paint new lines that are
>>>> >  the same as deleted lines elsewhere differently from genuinely new
>>>> >  lines.
>>>> >
>>>> >  Are we happy with these changes?
>>>
>>>
>>> I've been studiously ignoring this patch series due to lack of bandwidth.
>>>
>>>> [...]
>>>> Things to come, but not in this series as they are more advanced:
>>>>
>>>>     Discuss if a block/line needs a minimum requirement.
>>>>
>>>> When doing reviews with this series, a couple of lines such
>>>> as "\t\t}" were marked as a moved, which is not wrong as they
>>>> really occurred in the text with opposing sign.
>>>> But it was annoying as it drew my attention to just closing
>>>> braces, which IMO is not the point of code review.
>>>>
>>>> To solve this issue I had the idea of a "minimum requirement", e.g.
>>>> * at least 3 consecutive lines or
>>>> * at least one line with at least 3 non-ws characters or
>>>> * compute the entropy of a given moved block and if it is too low, do
>>>>   not mark it up.
>>>
>>> Shooting from the hip here...
>>>
>>> It seems obvious that for a line to be marked as moved, a minimum
>>> requirement is that
>>>
>>> 1. The line appears as both "+" and "-".
>>>
>>> That doesn't seem strong enough evidence though, and if that is the
>>> only criterion, I would expect a lot of boilerplate lines like "\t\t}"
>>> to be marked as moved. It seems like a lot of noise could be
>>> eliminated by *also* requiring that
>>>
>>> 2a. The line doesn't appear elsewhere in the file(s) concerned.
>
> 'elsewhere' in the opposing sign (+,-) or all the diff (including ' ' context)?
>
> This rule opens up the discussion on multi-copies, which I imagine
> happens a lot in configuration files. So say you have a prod and staging
> environment, then you might be tempted to make patches titled as:
>   "1. preparation: duplicate common code into prod and staging"
>   "2. Make an actual change to staging"
>
> For 1. you still want to see that there is faithful copy, but we'd have
> 2 postimages having these lines.
>
> Also what about de-duplication?
> I just stumbled upon edb0c72428 ([PATCH] diff: consolidate test
> helper script pieces., 2005-05-31) for unrelated reasons,
> but the move coloring of the same content multiple times
> helped me there to focus on the relevant part.
>
>>>
>>> Rule (2a) would probably get rid of most boilerplate lines without
>>> having to try to measure entropy.
>
> But it would also get rid of good use cases when not being very careful.
> I intentionally left out the (2a) as I am not yet sure how the move
> detection for multiple occurrences in post and preimage should
> work in the desired case. The suppression of little-entropy closing braces
> might be a side effect of just this. Or it can be treated separately.
>
>>>
>>> Maybe you are already using both criteria? I didn't see it in a quick
>>> perusal of the code.
>>>
>>> OTOH, it would be silly to refuse to mark lines like "\t\t}" as moved
>>> *only* because they appear elsewhere in the file(s). If you did so,
>>> you would have gaps of supposedly non-moved lines in the middle of
>>> moved blocks. This suggests marking as moved lines matching (1) and
>>> (2a) but also lines matching (1) and the following:
>>>
>>> 2b. The line is adjacent to to another line that is thought to have
>>> moved from the same old location to the same new location.
>
> This is what we do, a "block detection" by comparing "line runs" against
> the current lines. Based on these line runs we detect one block and
> color up adjacent blocks.
>
>>>
>>> Rule (2b) would be applied recursively, with the net effect being that
>>> any line satisfying (1) and (2a) is allowed to carry along any
>>> neighboring lines within the same "+"/"-" block even if they are not
>>> unique.
>
> So you are saying each block has to have at least one unique line?
> That doesn't go well with (de-)duplication IMHO.
>
> Thanks for your shot from the hip. I'll think about these rules more to see
> if I can make sense of them for duplication still.

I've just been skimming this topic so far, but a question, what variant of:

    git diff ... | grep ...

Can I use to see whether the diff that's being emitted has hunks
marked as moved? Presumably this needs -c ui.color=always & grepping
for the color codes.

The use-case being to say add that diff | grep -q to a for-loop to
find all diffs in a repo that have hunks marked as moved.

  reply	other threads:[~2017-06-07 21:58 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-05  3:59 What's cooking in git.git (Jun 2017, #03; Mon, 5) Junio C Hamano
2017-06-05 18:23 ` Stefan Beller
2017-06-06  1:10   ` Junio C Hamano
2017-06-06  6:52     ` Jacob Keller
2017-06-08  5:41       ` Jacob Keller
2017-06-13 22:19         ` Stefan Beller
2017-06-14  9:54           ` Junio C Hamano
2017-06-14 18:44             ` Stefan Beller
2017-06-06  6:44   ` Jacob Keller
2017-06-06  9:50   ` Michael Haggerty
2017-06-06 22:05     ` Jacob Keller
2017-06-07 18:28       ` Stefan Beller
2017-06-07 21:58         ` Ævar Arnfjörð Bjarmason [this message]
2017-06-07 22:05           ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACBZZX6FR-KD-TpRaGjLR0MfUt62w0KvYpikK7WcTS2EMQ2L8w@mail.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jacob.keller@gmail.com \
    --cc=mhagger@alum.mit.edu \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).