git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: mkoegler@auto.tuwien.ac.at (Martin Koegler)
To: Jon Smirl <jonsmirl@gmail.com>
Cc: "Shawn O. Pearce" <spearce@spearce.org>,
	Nicolas Pitre <nico@cam.org>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: performance on repack
Date: Wed, 15 Aug 2007 19:11:52 +0200	[thread overview]
Message-ID: <20070815171152.GA15155@auto.tuwien.ac.at> (raw)
In-Reply-To: <9e4733910708150808x39241071j1a4012f16cd26ef8@mail.gmail.com>

On Wed, Aug 15, 2007 at 11:08:38AM -0400, Jon Smirl wrote:
> On 8/15/07, Shawn O. Pearce <spearce@spearce.org> wrote:
> > > Also... read_sha1_file() is currently not thread safe, and threaded
> > > delta searching would requires that its usage be serialized with a
> > > global mutex (not done in this patch which is a bug), or ideally be made
> > > thread aware.
> 
> You can avoid making all the low level calls thread safe by using the
> main thread to get everything into RAM before starting to search for
> the deltas. The worker threads would only deal with things completely
> in memory. You may need to ref count these in-memory objects if they
> are shared between worker threads. For simplicity the in-memory input
> objects should be read only by the threads. The worker threads create
> new structures to hand their results back to the main thread for
> writing to disk.

git-pack-objects knows the order, in which it will use the objects.  A
seperate thread could pre-read the next object and wait until the main
thread starts processing it. After the read is complete, another
thread could start computing the delta index.

git-pack-objects currently reads an object (and computes the delta
index), if it is really necessary. With the pre-read, unnecessary
operations would happen.

> Initially I would just ignore very large objects while getting the
> basic code to work. After the basic code is working if a very large
> object is encountered when the main thread is faulting objects in, the
> main thread should just process this object on the spot using the
> existing low memory code.

I expect that the biggest gain will be for big objects, as they
require more time to read+unpack the source objects and compute the
delta index as well as the delta.

> > @@ -1862,10 +1863,12 @@ int pretend_sha1_file(void *buf, unsigned long len, enum object_type type,
> >  void *read_sha1_file(const unsigned char *sha1, enum object_type *type,
> >                      unsigned long *size)
> >  {
> > +       static pthread_mutex_t locky = PTHREAD_MUTEX_INITIALIZER;
> >         unsigned long mapsize;
> >         void *map, *buf;
> >         struct cached_object *co;
> >
> > +       pthread_mutex_lock(&locky);
> >         co = find_cached_object(sha1);
> >         if (co) {
> >                 buf = xmalloc(co->size + 1);
> > @@ -1873,20 +1876,26 @@ void *read_sha1_file(const unsigned char *sha1, enum object_type *type,
> >                 ((char*)buf)[co->size] = 0;
> >                 *type = co->type;
> >                 *size = co->size;
> > +               pthread_mutex_unlock(&locky);
> >                 return buf;
> >         }
> >
> >         buf = read_packed_sha1(sha1, type, size);

Couldn't we release the mutex at this point?

Why do we need to protect from concurrent access, when we are reading
a loose object?

> > -       if (buf)
> > +       if (buf) {
> > +               pthread_mutex_unlock(&locky);
> >                 return buf;
> > +       }
> >         map = map_sha1_file(sha1, &mapsize);
> >         if (map) {
> >                 buf = unpack_sha1_file(map, mapsize, type, size, sha1);
> >                 munmap(map, mapsize);
> > +               pthread_mutex_unlock(&locky);
> >                 return buf;
> >         }
> >         reprepare_packed_git();
> > -       return read_packed_sha1(sha1, type, size);
> > +       buf = read_packed_sha1(sha1, type, size);
> > +       pthread_mutex_unlock(&locky);
> > +       return buf;
> >  }

mfg Martin Kögler

  reply	other threads:[~2007-08-15 17:12 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-11 21:12 performance on repack Jon Smirl
2007-08-11 22:09 ` David Kastrup
2007-08-11 22:34   ` Linus Torvalds
2007-08-11 23:21     ` Jon Smirl
2007-08-12 10:33 ` Martin Koegler
2007-08-12 13:49   ` Jon Smirl
2007-08-14  3:12     ` Shawn O. Pearce
2007-08-14  4:10       ` Jon Smirl
2007-08-14  5:13         ` Shawn O. Pearce
2007-08-14  5:57           ` Jon Smirl
2007-08-14 14:52       ` Nicolas Pitre
2007-08-14 21:41       ` Nicolas Pitre
2007-08-15  1:20         ` Jon Smirl
2007-08-15  1:59           ` Nicolas Pitre
2007-08-15  5:32         ` Shawn O. Pearce
2007-08-15 15:08           ` Jon Smirl
2007-08-15 17:11             ` Martin Koegler [this message]
2007-08-15 18:38               ` Jon Smirl
2007-08-15 19:00                 ` Nicolas Pitre
2007-08-15 19:42                   ` Jon Smirl
2007-08-16  8:10                   ` David Kastrup
2007-08-16 15:34                     ` Nicolas Pitre
2007-08-16 16:13                       ` Jon Smirl
2007-08-16 16:21                         ` Nicolas Pitre
2007-08-15 21:05             ` Nicolas Pitre
2007-08-15 20:49           ` Nicolas Pitre
2007-08-30  4:27             ` Nicolas Pitre
2007-08-30  4:36               ` Nicolas Pitre
2007-08-30 16:17                 ` Jon Smirl
2007-09-01 21:54                 ` Jon Smirl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070815171152.GA15155@auto.tuwien.ac.at \
    --to=mkoegler@auto.tuwien.ac.at \
    --cc=git@vger.kernel.org \
    --cc=jonsmirl@gmail.com \
    --cc=nico@cam.org \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).