From: mkoegler@auto.tuwien.ac.at (Martin Koegler)
To: Jon Smirl <jonsmirl@gmail.com>
Cc: "Shawn O. Pearce" <spearce@spearce.org>,
Nicolas Pitre <nico@cam.org>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: performance on repack
Date: Wed, 15 Aug 2007 19:11:52 +0200 [thread overview]
Message-ID: <20070815171152.GA15155@auto.tuwien.ac.at> (raw)
In-Reply-To: <9e4733910708150808x39241071j1a4012f16cd26ef8@mail.gmail.com>
On Wed, Aug 15, 2007 at 11:08:38AM -0400, Jon Smirl wrote:
> On 8/15/07, Shawn O. Pearce <spearce@spearce.org> wrote:
> > > Also... read_sha1_file() is currently not thread safe, and threaded
> > > delta searching would requires that its usage be serialized with a
> > > global mutex (not done in this patch which is a bug), or ideally be made
> > > thread aware.
>
> You can avoid making all the low level calls thread safe by using the
> main thread to get everything into RAM before starting to search for
> the deltas. The worker threads would only deal with things completely
> in memory. You may need to ref count these in-memory objects if they
> are shared between worker threads. For simplicity the in-memory input
> objects should be read only by the threads. The worker threads create
> new structures to hand their results back to the main thread for
> writing to disk.
git-pack-objects knows the order, in which it will use the objects. A
seperate thread could pre-read the next object and wait until the main
thread starts processing it. After the read is complete, another
thread could start computing the delta index.
git-pack-objects currently reads an object (and computes the delta
index), if it is really necessary. With the pre-read, unnecessary
operations would happen.
> Initially I would just ignore very large objects while getting the
> basic code to work. After the basic code is working if a very large
> object is encountered when the main thread is faulting objects in, the
> main thread should just process this object on the spot using the
> existing low memory code.
I expect that the biggest gain will be for big objects, as they
require more time to read+unpack the source objects and compute the
delta index as well as the delta.
> > @@ -1862,10 +1863,12 @@ int pretend_sha1_file(void *buf, unsigned long len, enum object_type type,
> > void *read_sha1_file(const unsigned char *sha1, enum object_type *type,
> > unsigned long *size)
> > {
> > + static pthread_mutex_t locky = PTHREAD_MUTEX_INITIALIZER;
> > unsigned long mapsize;
> > void *map, *buf;
> > struct cached_object *co;
> >
> > + pthread_mutex_lock(&locky);
> > co = find_cached_object(sha1);
> > if (co) {
> > buf = xmalloc(co->size + 1);
> > @@ -1873,20 +1876,26 @@ void *read_sha1_file(const unsigned char *sha1, enum object_type *type,
> > ((char*)buf)[co->size] = 0;
> > *type = co->type;
> > *size = co->size;
> > + pthread_mutex_unlock(&locky);
> > return buf;
> > }
> >
> > buf = read_packed_sha1(sha1, type, size);
Couldn't we release the mutex at this point?
Why do we need to protect from concurrent access, when we are reading
a loose object?
> > - if (buf)
> > + if (buf) {
> > + pthread_mutex_unlock(&locky);
> > return buf;
> > + }
> > map = map_sha1_file(sha1, &mapsize);
> > if (map) {
> > buf = unpack_sha1_file(map, mapsize, type, size, sha1);
> > munmap(map, mapsize);
> > + pthread_mutex_unlock(&locky);
> > return buf;
> > }
> > reprepare_packed_git();
> > - return read_packed_sha1(sha1, type, size);
> > + buf = read_packed_sha1(sha1, type, size);
> > + pthread_mutex_unlock(&locky);
> > + return buf;
> > }
mfg Martin Kögler
next prev parent reply other threads:[~2007-08-15 17:12 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-08-11 21:12 performance on repack Jon Smirl
2007-08-11 22:09 ` David Kastrup
2007-08-11 22:34 ` Linus Torvalds
2007-08-11 23:21 ` Jon Smirl
2007-08-12 10:33 ` Martin Koegler
2007-08-12 13:49 ` Jon Smirl
2007-08-14 3:12 ` Shawn O. Pearce
2007-08-14 4:10 ` Jon Smirl
2007-08-14 5:13 ` Shawn O. Pearce
2007-08-14 5:57 ` Jon Smirl
2007-08-14 14:52 ` Nicolas Pitre
2007-08-14 21:41 ` Nicolas Pitre
2007-08-15 1:20 ` Jon Smirl
2007-08-15 1:59 ` Nicolas Pitre
2007-08-15 5:32 ` Shawn O. Pearce
2007-08-15 15:08 ` Jon Smirl
2007-08-15 17:11 ` Martin Koegler [this message]
2007-08-15 18:38 ` Jon Smirl
2007-08-15 19:00 ` Nicolas Pitre
2007-08-15 19:42 ` Jon Smirl
2007-08-16 8:10 ` David Kastrup
2007-08-16 15:34 ` Nicolas Pitre
2007-08-16 16:13 ` Jon Smirl
2007-08-16 16:21 ` Nicolas Pitre
2007-08-15 21:05 ` Nicolas Pitre
2007-08-15 20:49 ` Nicolas Pitre
2007-08-30 4:27 ` Nicolas Pitre
2007-08-30 4:36 ` Nicolas Pitre
2007-08-30 16:17 ` Jon Smirl
2007-09-01 21:54 ` Jon Smirl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070815171152.GA15155@auto.tuwien.ac.at \
--to=mkoegler@auto.tuwien.ac.at \
--cc=git@vger.kernel.org \
--cc=jonsmirl@gmail.com \
--cc=nico@cam.org \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).