git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Peter Oberndorfer <kumbayo84@arcor.de>, git@vger.kernel.org
Subject: Re: [PATCH 2/2] pickaxe: use textconv for -S counting
Date: Wed, 14 Nov 2012 17:21:31 -0800	[thread overview]
Message-ID: <20121115012131.GA17894@sigill.intra.peff.net> (raw)
In-Reply-To: <7vk3tpcd0w.fsf@alter.siamese.dyndns.org>

On Tue, Nov 13, 2012 at 03:13:19PM -0800, Junio C Hamano wrote:

> >  static int has_changes(struct diff_filepair *p, struct diff_options *o,
> >  		       regex_t *regexp, kwset_t kws)
> >  {
> > +	struct userdiff_driver *textconv_one = get_textconv(p->one);
> > +	struct userdiff_driver *textconv_two = get_textconv(p->two);
> > +	mmfile_t mf1, mf2;
> > +	int ret;
> > +
> >  	if (!o->pickaxe[0])
> >  		return 0;
> >  
> > -	if (!DIFF_FILE_VALID(p->one)) {
> > -		if (!DIFF_FILE_VALID(p->two))
> > -			return 0; /* ignore unmerged */
> 
> What happened to this part that avoids showing nonsense for unmerged
> paths?

It's moved down. fill_one will return an empty mmfile if
!DIFF_FILE_VALID, so we end up here:

        fill_one(p->one, &mf1, &textconv_one);
        fill_one(p->two, &mf2, &textconv_two);

        if (!mf1.ptr) {
                if (!mf2.ptr)
                        ret = 0; /* ignore unmerged */

Prior to this change, we didn't use fill_one, so we had to check manually.

> > +	/*
> > +	 * If we have an unmodified pair, we know that the count will be the
> > +	 * same and don't even have to load the blobs. Unless textconv is in
> > +	 * play, _and_ we are using two different textconv filters (e.g.,
> > +	 * because a pair is an exact rename with different textconv attributes
> > +	 * for each side, which might generate different content).
> > +	 */
> > +	if (textconv_one == textconv_two && diff_unmodified_pair(p))
> > +		return 0;
> 
> I am not sure about this part that cares about the textconv.
> 
> Wouldn't the normal "git diff A B" skip the filepair that are
> unmodified in the first place at the object name level without even
> looking at the contents (see e.g. diff_flush_patch())?

Hmph. The point was to find the case when the paths are different (e.g.,
in a rename), and therefore the textconvs might be different. But I
think I missed the fact that diff_unmodified_pair will note the
difference in paths. So just calling diff_unmodified_pair would be
sufficient, as the code prior to my patch does.

I thought the point was an optimization to avoid comparing contains() on
the same data (which we can know will match without looking at it).
Exact renames are the obvious one, but they are not handled here. So I
am not sure of the point (to catch "git diff $blob1 $blob2" when the two
are identical? I am not sure at what layer we cull that from the diff
queue).

So there is room for optimization here on exact renames, but
diff_unmodified_pair is too forgiving of what is interesting (a rename
is interesting to diff_flush_patch, because it wants to mention the
rename, but it is not interesting to pickaxe, because we did not change
the content, and it could be culled here).

I don't know that it is that big a deal in general. Pure renames are
going to be the minority of blobs we look at, so it is probably not even
measurable. You could construct a pathological case (e.g., an otherwise
small repo with a 2G file, rename the 2G file without modification, then
running "git log -Sfoo" will unnecessarily load the giant blob while
examining the rename commit).

> Shouldn't this part of the code emulating that behaviour no matter
> what textconv filter(s) are configured for these paths?

Yeah, I just missed that it is checking the path already. It may still
make sense to tighten the optimization, but that is a separate issue. It
should just check diff_unmodified_pair as before; textconv only matters
if you are trying to optimize out exact renames.

-Peff

  reply	other threads:[~2012-11-15  1:21 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-27 18:37 crash on git diff-tree -Ganything <tree> for new files with textconv filter Peter Oberndorfer
2012-10-28 12:01 ` Jeff King
2012-10-28 12:45   ` [PATCH 0/2] textconv support for "log -S" Jeff King
2012-10-28 12:46     ` [PATCH 1/2] pickaxe: hoist empty needle check Jeff King
2012-10-28 12:47     ` [PATCH 2/2] pickaxe: use textconv for -S counting Jeff King
2012-11-13 23:13       ` Junio C Hamano
2012-11-15  1:21         ` Jeff King [this message]
2012-11-20  0:31           ` Junio C Hamano
2012-11-20  0:48             ` Junio C Hamano
2012-11-21 20:27               ` Jeff King
2012-10-28 19:56   ` crash on git diff-tree -Ganything <tree> for new files with textconv filter Peter Oberndorfer
2012-10-29  6:05     ` Jeff King
2012-10-29  6:18       ` Jeff King
2012-10-29 20:19       ` Peter Oberndorfer
2012-10-29 22:35         ` Jeff King
2012-10-29 22:47           ` Jeff King
2012-10-30 12:17             ` Jeff King
2012-10-30 12:46               ` Junio C Hamano
2012-10-30 13:12                 ` Jeff King
2012-11-01 19:19               ` Ramsay Jones
2012-11-07 21:10           ` Peter Oberndorfer
2012-11-07 21:13             ` Jeff King
2013-06-03 17:25               ` Peter Oberndorfer
2013-06-03 22:17                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121115012131.GA17894@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=kumbayo84@arcor.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).