Re: performance on repack

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Nicolas Pitre <nico@cam.org>
To: Jon Smirl <jonsmirl@gmail.com>
Cc: "Shawn O. Pearce" <spearce@spearce.org>,
	Martin Koegler <mkoegler@auto.tuwien.ac.at>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: performance on repack
Date: Wed, 15 Aug 2007 17:05:47 -0400 (EDT)	[thread overview]
Message-ID: <alpine.LFD.0.999.0708151650270.5415@xanadu.home> (raw)
In-Reply-To: <9e4733910708150808x39241071j1a4012f16cd26ef8@mail.gmail.com>

On Wed, 15 Aug 2007, Jon Smirl wrote:

> You can avoid making all the low level calls thread safe by using the
> main thread to get everything into RAM before starting to search for
> the deltas. The worker threads would only deal with things completely
> in memory. You may need to ref count these in-memory objects if they
> are shared between worker threads. For simplicity the in-memory input
> objects should be read only by the threads. The worker threads create
> new structures to hand their results back to the main thread for
> writing to disk.
> 
> A typical solution is to use a queue protected by locks. Main thread
> faults in all the needed objects to cache. Main thread builds a queue
> entry and increments reference count on all referenced objects. Main
> thread uses locks to add entry to queue, while queue is locked it
> removes any finished jobs. Main thread writes finished results to
> disks, decrements ref counts. Cache logic can then take over about
> when objects are actually deleted.
> 
> Worker threads wait on the queue. When something is placed in the
> queue a waiting worker thread removes it, processes it, puts the
> results in RAM, and places the object back on the finished queue. Then
> waits for another object. It doesn't call into the main body of code.

Way too complex and rather unpractical with the current algorithms.

Currently, information on objects is gathered (the "Counting objects" 
phase) and that hardly can be paralleled.

Once objects are known then a sorted list is created so deltification of 
object x might be optimally attempted on objects x-1 through to x-10.  
Creating that list cannot be paralleled either, but it is quick 
anyway.

Then comes the actual deltification phase where the huge cost is. The 
problem simply has to be partitioned into a few threads, where thread 1 
deals with object 1 to 100000 from that sorted list, thread 2 has 
objects 100001 to 200000, etc. etc.  This is now a partitioning problem 
where the thread synchronization is dealt with from a higher and non 
performance critical level.

Nicolas

next prev parent reply	other threads:[~2007-08-15 21:05 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-11 21:12 performance on repack Jon Smirl
2007-08-11 22:09 ` David Kastrup
2007-08-11 22:34   ` Linus Torvalds
2007-08-11 23:21     ` Jon Smirl
2007-08-12 10:33 ` Martin Koegler
2007-08-12 13:49   ` Jon Smirl
2007-08-14  3:12     ` Shawn O. Pearce
2007-08-14  4:10       ` Jon Smirl
2007-08-14  5:13         ` Shawn O. Pearce
2007-08-14  5:57           ` Jon Smirl
2007-08-14 14:52       ` Nicolas Pitre
2007-08-14 21:41       ` Nicolas Pitre
2007-08-15  1:20         ` Jon Smirl
2007-08-15  1:59           ` Nicolas Pitre
2007-08-15  5:32         ` Shawn O. Pearce
2007-08-15 15:08           ` Jon Smirl
2007-08-15 17:11             ` Martin Koegler
2007-08-15 18:38               ` Jon Smirl
2007-08-15 19:00                 ` Nicolas Pitre
2007-08-15 19:42                   ` Jon Smirl
2007-08-16  8:10                   ` David Kastrup
2007-08-16 15:34                     ` Nicolas Pitre
2007-08-16 16:13                       ` Jon Smirl
2007-08-16 16:21                         ` Nicolas Pitre
2007-08-15 21:05             ` Nicolas Pitre [this message]
2007-08-15 20:49           ` Nicolas Pitre
2007-08-30  4:27             ` Nicolas Pitre
2007-08-30  4:36               ` Nicolas Pitre
2007-08-30 16:17                 ` Jon Smirl
2007-09-01 21:54                 ` Jon Smirl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.0.999.0708151650270.5415@xanadu.home \
    --to=nico@cam.org \
    --cc=git@vger.kernel.org \
    --cc=jonsmirl@gmail.com \
    --cc=mkoegler@auto.tuwien.ac.at \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).