git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Zach Riggle <zachriggle@gmail.com>, git@vger.kernel.org
Subject: Re: git grep --threads 12 --textconv is effectively single-threaded
Date: Thu, 9 Jul 2020 19:10:30 -0400	[thread overview]
Message-ID: <20200709231030.GD664420@coredump.intra.peff.net> (raw)
In-Reply-To: <xmqqfta11zu0.fsf@gitster.c.googlers.com>

On Wed, Jul 08, 2020 at 02:06:31PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > It's probably possible to teach the grep code to do the same
> > check-in-the-index trick, but I'm not sure how complicated it would be.
> 
> I am not sure if we should even depend on the "check the object
> database and use it instead of reading the working tree files" done
> in diff code---somehow I thought we did the opposite for performance
> (i.e. when we ought to be comparing two objects, taken from tree and
> the index, if we notice that the index side is stat clean, we can
> read/mmap the working tree file instead of going to the object layer
> and deflating a loose object, or, worse yet, construct the blob by
> repeatedly applying deltas on a base object in a packfile).
> 
> Is this one in the opposite direction done specifically for gaining
> performance when textconv cache is in use?  If so, kudos to whoever
> did it---that sounds like a clever thing to do.

No, it turns out that nobody was that clever (and I was simply
misremembering how it worked).

For a tree-to-tree or index-to-tree comparison, both sides will have an
oid and can use the textconv cache. Even for an index case where we
might choose to use a stat-fresh working tree file as an optimization,
we'll still consult the textconv cache before loading those contents.

But for diffing a file in the working tree, we'll never have an oid and
will always run the textconv command). So "git diff" against the index,
for example, would run _one_ textconv (using the cached value for the
index, and running one for the working tree version). And we know that
isn't that interesting for optimizing, since by definition the file is
stat-dirty in that case (or else we'd skip the content-level comparison
entirely). So you'd have to compute the sha1 of the working tree file
from scratch. Plus the lifetime of a working tree file's entry in the
textconv cache is probably smaller, since it hasn't even been committed
yet.

I don't think I ever noticed because the primary thing I was trying to
speed up with the textconv cache is "git log -p", etc, which always has
an oid to work with.

But "grep" is a totally different story. It is frequently looking at all
of the stat-fresh working tree files.

-Peff

  reply	other threads:[~2020-07-09 23:10 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-07 21:25 git grep --threads 12 --textconv is effectively single-threaded Zach Riggle
2020-07-07 21:59 ` Jeff King
2020-07-08 17:40   ` Zach Riggle
2020-07-08 20:13     ` Jeff King
2020-07-08 21:06       ` Junio C Hamano
2020-07-09 23:10         ` Jeff King [this message]
2020-07-10 16:43           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200709231030.GD664420@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=zachriggle@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).