git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Barret Rhoden <brho@google.com>
Cc: git@vger.kernel.org, "Michael Platings" <michael@platin.gs>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"David Kastrup" <dak@gnu.org>, "Jeff King" <peff@peff.net>,
	"Jeff Smith" <whydoubt@gmail.com>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"René Scharfe" <l.s.r@web.de>,
	"Stefan Beller" <stefanbeller@gmail.com>
Subject: Re: [PATCH v6 6/6] blame: use a fingerprint heuristic to match ignored lines
Date: Sun, 14 Apr 2019 12:54:50 +0900	[thread overview]
Message-ID: <xmqqk1fxw8ad.fsf@gitster-ct.c.googlers.com> (raw)
In-Reply-To: <20190410162409.117264-7-brho@google.com> (Barret Rhoden's message of "Wed, 10 Apr 2019 12:24:09 -0400")

Barret Rhoden <brho@google.com> writes:

> This replaces the heuristic used to identify lines from ignored commits
> with one that finds likely candidate lines in the parent's version of
> the file.
>
> The old heuristic simply assigned lines in the target to the same line
> number (plus offset) in the parent.  The new function uses a
> fingerprinting algorithm to detect similarity between lines.
>
> The fingerprint code and the idea to use them for blame came from
> Michael Platings <michael@platin.gs>.
>
> For each line changed in the target, i.e. in a blame_entry touched by a
> target's diff, guess_line_blames() finds the best line in the parent,
> above a magic threshold.  Ties are broken by proximity of the parent
> line number to the target's line.
>
> We actually make two passes.  The first pass checks in the diff chunk
> associated with the blame entry - specifically from blame_chunk().
> Often times, those diff chunks are small; any 'context' in a normal diff
> chunk is broken up into multiple calls to blame_chunk().  We make a
> second pass over the entire parent, with a slightly higher threshold.

Two thoughts.

 - Unless the 'old heuristic' is still available as an option after
   this step, a series that first begins with the 'old heuristic'
   and then later replaces it with the 'new heuristic' feels
   somewhat wasteful of reviewer resources, as the 'old heuristic'
   does not contribute an iota to the end result.

   It is OK while the series is still in RFC/WIP stage, though.  But
   because I got an impression that this is close to completion, so...

 - I wonder if the hash used here can replace what is used in
   diffcore-delta.c as an improvement (or obviously vice versa), as
   using two (or more) ad-hoc fingerprinting function without having
   a clear reason why we need two instead of a unified one feels
   like a bad idea.


  reply	other threads:[~2019-04-14  3:54 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-10 16:24 [PATCH v6 0/6] blame: add the ability to ignore commits Barret Rhoden
2019-04-10 16:24 ` [PATCH v6 1/6] Move init_skiplist() outside of fsck Barret Rhoden
2019-04-10 19:04   ` Ævar Arnfjörð Bjarmason
2019-04-15 13:32     ` Barret Rhoden
2019-04-10 16:24 ` [PATCH v6 2/6] blame: use a helper function in blame_chunk() Barret Rhoden
2019-04-10 16:24 ` [PATCH v6 3/6] blame: add the ability to ignore commits and their changes Barret Rhoden
2019-04-10 19:00   ` Ævar Arnfjörð Bjarmason
2019-04-14 10:42     ` Michael Platings
2019-04-15 13:32       ` Barret Rhoden
2019-04-15 13:34     ` Barret Rhoden
2019-04-10 16:24 ` [PATCH v6 4/6] blame: add config options to handle output for ignored lines Barret Rhoden
2019-04-14  3:45   ` Junio C Hamano
2019-04-14 10:09     ` Michael Platings
2019-04-14 10:24       ` Junio C Hamano
2019-04-14 11:27         ` Michael Platings
2019-04-15 13:51           ` Barret Rhoden
2019-04-10 16:24 ` [PATCH v6 5/6] blame: optionally track line fingerprints during fill_blame_origin() Barret Rhoden
2019-04-10 16:24 ` [PATCH v6 6/6] blame: use a fingerprint heuristic to match ignored lines Barret Rhoden
2019-04-14  3:54   ` Junio C Hamano [this message]
2019-04-14  9:41     ` Michael Platings
2019-04-15 14:03     ` Barret Rhoden
2019-04-16  4:10       ` Junio C Hamano
2019-04-14 21:10 ` [PATCH v6 0/6] blame: add the ability to ignore commits Michael Platings
2019-04-15 13:23   ` Barret Rhoden
2019-04-15 21:54     ` Michael Platings

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqk1fxw8ad.fsf@gitster-ct.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=brho@google.com \
    --cc=dak@gnu.org \
    --cc=git@vger.kernel.org \
    --cc=l.s.r@web.de \
    --cc=michael@platin.gs \
    --cc=peff@peff.net \
    --cc=stefanbeller@gmail.com \
    --cc=whydoubt@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).