From: Jeff King <peff@peff.net>
To: Duy Nguyen <pclouds@gmail.com>
Cc: "Christian Couder" <christian.couder@gmail.com>,
"Thomas Gummerer" <t.gummerer@gmail.com>,
"Matheus Tavares Bernardino" <matheus.bernardino@usp.br>,
git <git@vger.kernel.org>,
"Оля Тележная" <olyatelezhnaya@gmail.com>,
"Elijah Newren" <newren@gmail.com>,
"Tanushree Tumane" <tanushreetumane@gmail.com>
Subject: Re: Questions on GSoC 2019 Ideas
Date: Mon, 4 Mar 2019 23:51:40 -0500 [thread overview]
Message-ID: <20190305045140.GH19800@sigill.intra.peff.net> (raw)
In-Reply-To: <CACsJy8ATKdcDdbTzCdZFhChKEAWhjuYQJBpGXZ9HAVXK1r2pFw@mail.gmail.com>
On Sun, Mar 03, 2019 at 05:12:59PM +0700, Duy Nguyen wrote:
> On Sun, Mar 3, 2019 at 2:18 PM Christian Couder
> <christian.couder@gmail.com> wrote:
> > One thing I am still worried about is if we are sure that adding
> > parallelism is likely to get us a significant performance improvement
> > or not. If the performance of this code is bounded by disk or memory
> > access, then adding parallelism might not bring any benefit. (It could
> > perhaps decrease performance if memory locality gets worse.) So I'd
> > like some confirmation either by running some tests or by experienced
> > Git developers that it is likely to be a win.
>
> This is a good point. My guess is the pack access consists of two
> parts: deflate zlib, resolve delta objects (which is just another form
> of compression) and actual I/O. The former is CPU bound and may take
> advantage of multiple cores. However, the cache we have kinda helps
> reduce CPU work load already, so perhaps the actual gain is not that
> much (or maybe we could just improve this cache to be more efficient).
> I'm adding Jeff, maybe he has done some experiments on parallel pack
> access, who knows.
Sorry, I don't have anything intelligent to add here. I do know that
`index-pack` doesn't scale well with more cores. I don't think I've ever
looked at adding parallel access to the packs themselves. I suspect it
would be tricky due to a few global variables (the pack windows, the
delta cache, etc).
> The second good thing from parallel pack access is not about utilizing
> processing power from multiple cores, but about _not_ blocking. I
> think one example use case here is parallel checkout. While one thread
> is blocked by pack access code for whatever reason, the others can
> still continue doing other stuff (e.g. write the checked out file to
> disk) or even access the pack again to check more things out.
I'm not sure if it would help much for packs, because they're organized
to have pretty good cold-cache read-ahead behavior. But who knows until
we measure it.
I do suspect that inflating (and delta reconstruction) done in parallel
could be a win for git-grep, especially if you have a really simple
regex that is quick to search.
-Peff
next prev parent reply other threads:[~2019-03-05 4:51 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-28 21:46 Questions on GSoC 2019 Ideas Matheus Tavares Bernardino
2019-02-28 22:07 ` Christian Couder
2019-03-01 9:30 ` Duy Nguyen
2019-03-02 15:09 ` Thomas Gummerer
2019-03-03 7:18 ` Christian Couder
2019-03-03 10:12 ` Duy Nguyen
2019-03-03 10:17 ` Duy Nguyen
2019-03-05 4:51 ` Jeff King [this message]
2019-03-05 12:57 ` Duy Nguyen
2019-03-05 23:46 ` Matheus Tavares Bernardino
2019-03-06 10:17 ` Duy Nguyen
2019-03-12 0:18 ` Matheus Tavares Bernardino
2019-03-12 10:02 ` Duy Nguyen
2019-03-12 10:11 ` Duy Nguyen
2019-04-04 1:15 ` Matheus Tavares Bernardino
2019-04-04 7:56 ` Christian Couder
2019-04-04 8:20 ` Mike Hommey
2019-04-05 16:28 ` Matheus Tavares Bernardino
2019-04-07 23:40 ` Christian Couder
2019-03-05 23:03 ` Matheus Tavares Bernardino
2019-03-06 23:17 ` Thomas Gummerer
2019-03-03 10:03 ` Duy Nguyen
2019-03-03 16:12 ` Thomas Gummerer
2019-03-01 15:20 ` Johannes Schindelin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190305045140.GH19800@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=matheus.bernardino@usp.br \
--cc=newren@gmail.com \
--cc=olyatelezhnaya@gmail.com \
--cc=pclouds@gmail.com \
--cc=t.gummerer@gmail.com \
--cc=tanushreetumane@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).