git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Zach Riggle <zachriggle@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: git grep --threads 12 --textconv is effectively single-threaded
Date: Tue, 7 Jul 2020 17:59:51 -0400	[thread overview]
Message-ID: <20200707215951.GB2300296@coredump.intra.peff.net> (raw)
In-Reply-To: <CAMP9c5nUteg_HouuYJZtq7g4MrSE638mns=HeKhNpNTYgQB4=w@mail.gmail.com>

On Tue, Jul 07, 2020 at 04:25:01PM -0500, Zach Riggle wrote:

> It looks like the bit of code that is responsible for performing
> textconv conversions is single-threaded, even if git-grep is provided
> a number of threads to use.

Yes, the locking is much coarser than it could be. The issue is in
grep.c's fill_textconv_grep():

          /*
           * fill_textconv is not remotely thread-safe; it modifies the global
           * diff tempfile structure, writes to the_repo's odb and might
           * internally call thread-unsafe functions such as the
           * prepare_packed_git() lazy-initializator. Because of the last two, we
           * must ensure mutual exclusion between this call and the object reading
           * API, thus we use obj_read_lock() here.
           *
           * TODO: allowing text conversion to run in parallel with object
           * reading operations might increase performance in the multithreaded
           * non-worktreee git-grep with --textconv.
           */
          obj_read_lock();
          size = fill_textconv(r, driver, df, &buf);
          obj_read_unlock();
          free_filespec(df);

Note that this lock is used whether we're doing textconv's or not (i.e.,
it also excludes reading two objects from the object database at the
same time, because none of that code is thread-safe). But the latency
when we're doing a textconv is _much_ higher, because it's shelling out
to a separate process and reading/writing the contents. Note the
much-higher system CPU in your second timing:

> Note the difference in total CPU usage in the following expressions:
> 
> $ git grep --threads 12 -e foobar --and -e fizzbuzz &> /dev/null
> 0.24s user 0.28s system 710% cpu 0.073 total
> 
> $ git grep --threads 12 -e foobar --and -e fizzbuzz --textconv &> /dev/null
> 0.90s user 1.75s system 110% cpu 2.390 total

So I think implementing that TODO would help a lot (because each
textconv could in theory proceed in parallel).

As workaround in the meantime, I suspect that enabling
diff.<driver>.cachetextconv for your particular textconv config might
help. It would be slow on the first run, but then we'd be able to skip
the external process entirely for subsequent runs (the results are
cached in a git-notes tree, which are just raw object reads).

-Peff

  reply	other threads:[~2020-07-07 21:59 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-07 21:25 git grep --threads 12 --textconv is effectively single-threaded Zach Riggle
2020-07-07 21:59 ` Jeff King [this message]
2020-07-08 17:40   ` Zach Riggle
2020-07-08 20:13     ` Jeff King
2020-07-08 21:06       ` Junio C Hamano
2020-07-09 23:10         ` Jeff King
2020-07-10 16:43           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200707215951.GB2300296@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=zachriggle@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).