git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Peter Oberndorfer <kumbayo84@arcor.de>, git@vger.kernel.org
Subject: Re: [PATCH 2/2] pickaxe: use textconv for -S counting
Date: Wed, 21 Nov 2012 15:27:05 -0500	[thread overview]
Message-ID: <20121121202704.GH16280@sigill.intra.peff.net> (raw)
In-Reply-To: <7vr4npt7zd.fsf@alter.siamese.dyndns.org>

On Mon, Nov 19, 2012 at 04:48:22PM -0800, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> >> Exact renames are the obvious one, but they are not handled here.
> >
> > That is half true.  Before this change, we will find the same number
> > of needles and this function would have said "no differences" in a
> > very inefficient way.  After this change, we may apply different
> > textconv filters and this function will say "there is a difference",
> > even though we wouldn't see such a difference at the content level
> > if there wasn't any rename.
> 
> ... but I think that is a good thing anyway.
> 
> If you renamed foo.c to foo.cc with different conversions from C
> code to the text that explain what the code does, if we special case
> only the exact rename case but let pickaxe examine the converted
> result in a case where blobs are modified only by one byte, we would
> get drastically different results between the two cases.

Right, exactly. I think the only sane thing is to always textconv or
always not textconv (whether they are identical renames or not), and any
"these are the same" optimization for identical content needs to take
into account whether we _would have_ done a different textconv (which
most of the time is going to be "no", as textconv is either not in use,
or both paths use the same diff driver; but it is not too expensive to
look up).

The diff_unmodified_pair at the top off diff_flush_patch is correct,
because it treats renames as interesting (because we have to show the
diff header, anyway). I do not know offhand if we avoid feeding
identical content to xdiff at all, but if so, we should be doing so only
after checking that the textconv filters are identical.

> Corollary to this is what should happen when you update the attributes
> between two trees so that textconv for a path that did not change
> between preimage and postimage are different.  Ideally, we should
> notice that the two converted result are different, perhaps, but I
> do not like the performance implications very much.

The content to compare cannot be different unless either the input
content changed or the path changed, and we treat either as
"interesting" in most code paths. So I do not think there are any
performance implications, except that we may need to make sure to look
up textconvs a few lines sooner in some cases.

I'll re-roll the series next week and break out the rename-optimization
bits separately so it is more obvious that it is doing the right thing.

As an aside, I also need to revisit the regex half of that code, which
is still buggy (before and after my patch, due to the expecting-a-NUL
behavior we talked about a week or two ago).  That is a separate topic,
but the same area of code.

-Peff

  reply	other threads:[~2012-11-21 20:27 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-27 18:37 crash on git diff-tree -Ganything <tree> for new files with textconv filter Peter Oberndorfer
2012-10-28 12:01 ` Jeff King
2012-10-28 12:45   ` [PATCH 0/2] textconv support for "log -S" Jeff King
2012-10-28 12:46     ` [PATCH 1/2] pickaxe: hoist empty needle check Jeff King
2012-10-28 12:47     ` [PATCH 2/2] pickaxe: use textconv for -S counting Jeff King
2012-11-13 23:13       ` Junio C Hamano
2012-11-15  1:21         ` Jeff King
2012-11-20  0:31           ` Junio C Hamano
2012-11-20  0:48             ` Junio C Hamano
2012-11-21 20:27               ` Jeff King [this message]
2012-10-28 19:56   ` crash on git diff-tree -Ganything <tree> for new files with textconv filter Peter Oberndorfer
2012-10-29  6:05     ` Jeff King
2012-10-29  6:18       ` Jeff King
2012-10-29 20:19       ` Peter Oberndorfer
2012-10-29 22:35         ` Jeff King
2012-10-29 22:47           ` Jeff King
2012-10-30 12:17             ` Jeff King
2012-10-30 12:46               ` Junio C Hamano
2012-10-30 13:12                 ` Jeff King
2012-11-01 19:19               ` Ramsay Jones
2012-11-07 21:10           ` Peter Oberndorfer
2012-11-07 21:13             ` Jeff King
2013-06-03 17:25               ` Peter Oberndorfer
2013-06-03 22:17                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121121202704.GH16280@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=kumbayo84@arcor.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).