git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Duy Nguyen <pclouds@gmail.com>
Cc: "Christian Couder" <christian.couder@gmail.com>,
	"Thomas Gummerer" <t.gummerer@gmail.com>,
	"Matheus Tavares Bernardino" <matheus.bernardino@usp.br>,
	git <git@vger.kernel.org>,
	"Оля Тележная" <olyatelezhnaya@gmail.com>,
	"Elijah Newren" <newren@gmail.com>,
	"Tanushree Tumane" <tanushreetumane@gmail.com>
Subject: Re: Questions on GSoC 2019 Ideas
Date: Mon, 4 Mar 2019 23:51:40 -0500	[thread overview]
Message-ID: <20190305045140.GH19800@sigill.intra.peff.net> (raw)
In-Reply-To: <CACsJy8ATKdcDdbTzCdZFhChKEAWhjuYQJBpGXZ9HAVXK1r2pFw@mail.gmail.com>

On Sun, Mar 03, 2019 at 05:12:59PM +0700, Duy Nguyen wrote:

> On Sun, Mar 3, 2019 at 2:18 PM Christian Couder
> <christian.couder@gmail.com> wrote:
> > One thing I am still worried about is if we are sure that adding
> > parallelism is likely to get us a significant performance improvement
> > or not. If the performance of this code is bounded by disk or memory
> > access, then adding parallelism might not bring any benefit. (It could
> > perhaps decrease performance if memory locality gets worse.) So I'd
> > like some confirmation either by running some tests or by experienced
> > Git developers that it is likely to be a win.
> 
> This is a good point. My guess is the pack access consists of two
> parts: deflate zlib, resolve delta objects (which is just another form
> of compression) and actual I/O. The former is CPU bound and may take
> advantage of multiple cores. However, the cache we have kinda helps
> reduce CPU work load already, so perhaps the actual gain is not that
> much (or maybe we could just improve this cache to be more efficient).
> I'm adding Jeff, maybe he has done some experiments on parallel pack
> access, who knows.

Sorry, I don't have anything intelligent to add here. I do know that
`index-pack` doesn't scale well with more cores. I don't think I've ever
looked at adding parallel access to the packs themselves. I suspect it
would be tricky due to a few global variables (the pack windows, the
delta cache, etc).

> The second good thing from parallel pack access is not about utilizing
> processing power from multiple cores, but about _not_ blocking. I
> think one example use case here is parallel checkout. While one thread
> is blocked by pack access code for whatever reason, the others can
> still continue doing other stuff (e.g. write the checked out file to
> disk) or even access the pack again to check more things out.

I'm not sure if it would help much for packs, because they're organized
to have pretty good cold-cache read-ahead behavior. But who knows until
we measure it.

I do suspect that inflating (and delta reconstruction) done in parallel
could be a win for git-grep, especially if you have a really simple
regex that is quick to search.

-Peff

  parent reply	other threads:[~2019-03-05  4:51 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-28 21:46 Questions on GSoC 2019 Ideas Matheus Tavares Bernardino
2019-02-28 22:07 ` Christian Couder
2019-03-01  9:30   ` Duy Nguyen
2019-03-02 15:09     ` Thomas Gummerer
2019-03-03  7:18       ` Christian Couder
2019-03-03 10:12         ` Duy Nguyen
2019-03-03 10:17           ` Duy Nguyen
2019-03-05  4:51           ` Jeff King [this message]
2019-03-05 12:57             ` Duy Nguyen
2019-03-05 23:46               ` Matheus Tavares Bernardino
2019-03-06 10:17                 ` Duy Nguyen
2019-03-12  0:18                   ` Matheus Tavares Bernardino
2019-03-12 10:02                     ` Duy Nguyen
2019-03-12 10:11                       ` Duy Nguyen
2019-04-04  1:15                         ` Matheus Tavares Bernardino
2019-04-04  7:56                           ` Christian Couder
2019-04-04  8:20                             ` Mike Hommey
2019-04-05 16:28                             ` Matheus Tavares Bernardino
2019-04-07 23:40                               ` Christian Couder
2019-03-05 23:03         ` Matheus Tavares Bernardino
2019-03-06 23:17           ` Thomas Gummerer
2019-03-03 10:03       ` Duy Nguyen
2019-03-03 16:12         ` Thomas Gummerer
2019-03-01 15:20   ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190305045140.GH19800@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=matheus.bernardino@usp.br \
    --cc=newren@gmail.com \
    --cc=olyatelezhnaya@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=t.gummerer@gmail.com \
    --cc=tanushreetumane@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).