git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: mkoegler@auto.tuwien.ac.at (Martin Koegler)
To: Jon Smirl <jonsmirl@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: performance on repack
Date: Sun, 12 Aug 2007 12:33:38 +0200	[thread overview]
Message-ID: <20070812103338.GA7763@auto.tuwien.ac.at> (raw)
In-Reply-To: <9e4733910708111412t48c1beaahfbaa2c68a02f64f1@mail.gmail.com>

On Sat, Aug 11, 2007 at 05:12:24PM -0400, Jon Smirl wrote:
> If anyone is bored and looking for something to do, making the delta
> code in git repack multithreaded would help. Yesterday I did a big
> repack that took 20 minutes and it only used one of my four cores. It
> was compute bound the entire time.

First, how much time is used by the write and how much by the deltify
phase?

If the writing phase uses too much time and you have enough free
memory, you can try to raise the config variable pack.deltacachelimit
(default 1000). It will save an additional delta operation for all
object, whose delta is smaller than pack.deltacachelimit by caching
the delta.

Have you considered the impact on memory usage, if there are large
blobs in the repository?

While repacking, git keeps $window_size (default: 10) objects unpacked
in memory. For all (except one), it additionally stores the delta
index, which has about the same size as the object.

So the worst case memory usage is "sizeof(biggest object)*(2*$window_size - 1)".
If you have blobs >=100 MB, you need some GB of memory.

Partitioning the problem is not trivial:

* To get not worse packing resultes, we must first sort all objects by
  type, path, size. Then we can split split the list (for each task
  one part), which we can deltify individually.

  The problems are:

  - We need more memory, as each tasks keeps its own window of
    $window_size objects (+ delta indexes) in memory.

  - The list must be split in parts, which require the same amount of
    time. This is difficult, as it depends on the size of the objects as
    well as how they are stored (delta chain length).

* On the other hand, we could run all try_delta operations for one object
  parallel. This way, we would need not very much more memory, but
  require more synchronisation (and more complex code).

mfg Martin Kögler

  parent reply	other threads:[~2007-08-12 11:04 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-11 21:12 performance on repack Jon Smirl
2007-08-11 22:09 ` David Kastrup
2007-08-11 22:34   ` Linus Torvalds
2007-08-11 23:21     ` Jon Smirl
2007-08-12 10:33 ` Martin Koegler [this message]
2007-08-12 13:49   ` Jon Smirl
2007-08-14  3:12     ` Shawn O. Pearce
2007-08-14  4:10       ` Jon Smirl
2007-08-14  5:13         ` Shawn O. Pearce
2007-08-14  5:57           ` Jon Smirl
2007-08-14 14:52       ` Nicolas Pitre
2007-08-14 21:41       ` Nicolas Pitre
2007-08-15  1:20         ` Jon Smirl
2007-08-15  1:59           ` Nicolas Pitre
2007-08-15  5:32         ` Shawn O. Pearce
2007-08-15 15:08           ` Jon Smirl
2007-08-15 17:11             ` Martin Koegler
2007-08-15 18:38               ` Jon Smirl
2007-08-15 19:00                 ` Nicolas Pitre
2007-08-15 19:42                   ` Jon Smirl
2007-08-16  8:10                   ` David Kastrup
2007-08-16 15:34                     ` Nicolas Pitre
2007-08-16 16:13                       ` Jon Smirl
2007-08-16 16:21                         ` Nicolas Pitre
2007-08-15 21:05             ` Nicolas Pitre
2007-08-15 20:49           ` Nicolas Pitre
2007-08-30  4:27             ` Nicolas Pitre
2007-08-30  4:36               ` Nicolas Pitre
2007-08-30 16:17                 ` Jon Smirl
2007-09-01 21:54                 ` Jon Smirl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070812103338.GA7763@auto.tuwien.ac.at \
    --to=mkoegler@auto.tuwien.ac.at \
    --cc=git@vger.kernel.org \
    --cc=jonsmirl@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).