git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Mike Hommey <mh@glandium.org>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org
Subject: Re: Surprising use of memory and time when repacking mozilla's gecko repository
Date: Fri, 5 Jul 2019 20:51:44 +0900	[thread overview]
Message-ID: <20190705115144.7jqvc3qzanrpvpxq@glandium.org> (raw)
In-Reply-To: <20190705054516.mke7aqk2cdsffkpd@glandium.org>

On Fri, Jul 05, 2019 at 02:45:16PM +0900, Mike Hommey wrote:
> On Fri, Jul 05, 2019 at 01:09:55AM -0400, Jeff King wrote:
> > On Thu, Jul 04, 2019 at 07:05:30PM +0900, Mike Hommey wrote:
> > > Finally, with 1 thread, the picture changes greatly. The overall process
> > > takes 2.5h:
> > > - 50 seconds enumerating and counting objects.
> > > - ~2.5h compressing objects.
> > > - 3 minutes and 25 seconds writing objects!
> > 
> > That's weird. I'd expect us to find similar amounts of deltas, but we
> > don't have the writing slow-down. I wonder if there is some bad
> > interaction between the multi-threaded code and the delta cache.
> > 
> > Did you run the second, single-thread run against the exact same
> > original repository you had? Or did you re-run it on the result of the
> > multi-thread run? Another explanation is that the original repository
> > had some poor patterns that made objects expensive to access (say, a ton
> > of really deep delta chains). And so the difference between the two runs
> > was not the threads, but just the on-disk repository state.
> > 
> > Kind of a long shot, but if that is what happened, try running another
> > multi-threaded "repack -f" and see if its writing phase is faster.
> 
> I've run 36-threads, 16-threads and 1-thread in sequence on the same
> repo, so 16-threads was repacking what was repacked by the 36-threads,
> and 1-thread was repacking what was repacked by the 16-threads. I
> assumed it didn't matter, but come to think of it, I guess it can.

I tried:
- fresh clone -> 36-threads
- fresh clone -> 1-thread -> 36-threads

The 36-threads gc in the latter was only marginally faster than in the
former (between 19 and 20 minutes instead of 22 for both "Compressing"
and "Writing").

Mike

      reply	other threads:[~2019-07-05 11:51 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-04 10:05 Surprising use of memory and time when repacking mozilla's gecko repository Mike Hommey
2019-07-04 12:04 ` Eric Wong
2019-07-04 13:13   ` Mike Hommey
2019-07-05  5:14     ` Jeff King
2019-07-05  5:47       ` Mike Hommey
2019-07-05 11:29         ` Jakub Narebski
2019-07-05  0:22 ` Mike Hommey
2019-07-05  4:45 ` Mike Hommey
2019-07-05  5:09 ` Jeff King
2019-07-05  5:45   ` Mike Hommey
2019-07-05 11:51     ` Mike Hommey [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190705115144.7jqvc3qzanrpvpxq@glandium.org \
    --to=mh@glandium.org \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).