git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] pack-objects: document about thread synchronization
@ 2018-07-29 15:36 Nguyễn Thái Ngọc Duy
  2018-07-29 15:46 ` Duy Nguyen
  2018-08-02 19:35 ` Jeff King
  0 siblings, 2 replies; 6+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-07-29 15:36 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

These extra comments should be make it easier to understand how to use
locks in pack-objects delta search code. For reference, see

8ecce684a3 (basic threaded delta search - 2007-09-06)
384b32c09b (pack-objects: fix threaded load balancing - 2007-12-08)
50f22ada52 (threaded pack-objects: Use condition... - 2007-12-16)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/pack-objects.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ebc8cefb53..6270f74c0b 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1852,18 +1852,30 @@ static int delta_cacheable(unsigned long src_size, unsigned long trg_size,
 
 #ifndef NO_PTHREADS
 
+/* Protect access to object database */
 static pthread_mutex_t read_mutex;
 #define read_lock()		pthread_mutex_lock(&read_mutex)
 #define read_unlock()		pthread_mutex_unlock(&read_mutex)
 
+/* Protect delta_cache_size */
 static pthread_mutex_t cache_mutex;
 #define cache_lock()		pthread_mutex_lock(&cache_mutex)
 #define cache_unlock()		pthread_mutex_unlock(&cache_mutex)
 
+/*
+ * Protect object list partitioning (e.g. struct thread_param) and
+ * progress_state
+ */
 static pthread_mutex_t progress_mutex;
 #define progress_lock()		pthread_mutex_lock(&progress_mutex)
 #define progress_unlock()	pthread_mutex_unlock(&progress_mutex)
 
+/*
+ * Access to struct object_entry is unprotected since each thread owns
+ * a portion of the main object list. Just don't access object entries
+ * ahead in the list because they can be stolen and would need
+ * progress_mutex for protection.
+ */
 #else
 
 #define read_lock()		(void)0
@@ -2245,12 +2257,19 @@ static void try_to_free_from_threads(size_t size)
 static try_to_free_t old_try_to_free_routine;
 
 /*
+ * The main object list is split into smaller lists, each is handed to
+ * one worker.
+ *
  * The main thread waits on the condition that (at least) one of the workers
  * has stopped working (which is indicated in the .working member of
  * struct thread_params).
+ *
  * When a work thread has completed its work, it sets .working to 0 and
  * signals the main thread and waits on the condition that .data_ready
  * becomes 1.
+ *
+ * The main thread steals half of the work from the worker that has
+ * most work left to hand it to the idle worker.
  */
 
 struct thread_params {
-- 
2.18.0.656.gda699b98b3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] pack-objects: document about thread synchronization
  2018-07-29 15:36 [PATCH] pack-objects: document about thread synchronization Nguyễn Thái Ngọc Duy
@ 2018-07-29 15:46 ` Duy Nguyen
  2018-08-02 19:39   ` Jeff King
  2018-08-02 19:35 ` Jeff King
  1 sibling, 1 reply; 6+ messages in thread
From: Duy Nguyen @ 2018-07-29 15:46 UTC (permalink / raw)
  To: Git Mailing List

tOn Sun, Jul 29, 2018 at 5:36 PM Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
>
> These extra comments should be make it easier to understand how to use
> locks in pack-objects delta search code. For reference, see

Side note. I wonder if we could take progress_lock() less often in
find_deltas() to reduce contention. Right now we take the lock every
time we check out one object_entry but we could pull like 16 items out
of the list per lock. I don't know how much actual contention on this
lock though so maybe not worth doing.
-- 
Duy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pack-objects: document about thread synchronization
  2018-07-29 15:36 [PATCH] pack-objects: document about thread synchronization Nguyễn Thái Ngọc Duy
  2018-07-29 15:46 ` Duy Nguyen
@ 2018-08-02 19:35 ` Jeff King
  2018-08-04  7:04   ` Jonathan Nieder
  1 sibling, 1 reply; 6+ messages in thread
From: Jeff King @ 2018-08-02 19:35 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

On Sun, Jul 29, 2018 at 05:36:05PM +0200, Nguyễn Thái Ngọc Duy wrote:

> These extra comments should be make it easier to understand how to use
> locks in pack-objects delta search code. For reference, see
> 
> 8ecce684a3 (basic threaded delta search - 2007-09-06)
> 384b32c09b (pack-objects: fix threaded load balancing - 2007-12-08)
> 50f22ada52 (threaded pack-objects: Use condition... - 2007-12-16)

Thanks, I think this is an improvement.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pack-objects: document about thread synchronization
  2018-07-29 15:46 ` Duy Nguyen
@ 2018-08-02 19:39   ` Jeff King
  2018-08-04  5:56     ` Duy Nguyen
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2018-08-02 19:39 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

On Sun, Jul 29, 2018 at 05:46:17PM +0200, Duy Nguyen wrote:

> tOn Sun, Jul 29, 2018 at 5:36 PM Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> >
> > These extra comments should be make it easier to understand how to use
> > locks in pack-objects delta search code. For reference, see
> 
> Side note. I wonder if we could take progress_lock() less often in
> find_deltas() to reduce contention. Right now we take the lock every
> time we check out one object_entry but we could pull like 16 items out
> of the list per lock. I don't know how much actual contention on this
> lock though so maybe not worth doing.

I doubt it really matters that much, since we hold it for such a small
amount of time (basically just a few pointer/integer tweaks). Compared
to the heavy lifting of actually loading objects, you're not likely to
see a huge amount of contention, since at any given moment most threads
will be doing that work (or actually computing deltas).

Of course I could be wrong. If you hit a point where many threads are
skipping work (e.g., because they are considering deltas from objects in
the same pack, and skip forward without actually doing any work), then
they'd be ripping through the find_deltas() loop pretty quickly.

OTOH, in cases like that (and the ultimate case would just be running
"git repack -ad" twice in a row), the compression phase seems to go
quite quickly, implying we're not spending a lot of time there.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pack-objects: document about thread synchronization
  2018-08-02 19:39   ` Jeff King
@ 2018-08-04  5:56     ` Duy Nguyen
  0 siblings, 0 replies; 6+ messages in thread
From: Duy Nguyen @ 2018-08-04  5:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

On Thu, Aug 2, 2018 at 9:39 PM Jeff King <peff@peff.net> wrote:
>
> On Sun, Jul 29, 2018 at 05:46:17PM +0200, Duy Nguyen wrote:
>
> > tOn Sun, Jul 29, 2018 at 5:36 PM Nguyễn Thái Ngọc Duy <pclouds@gmail.com> wrote:
> > >
> > > These extra comments should be make it easier to understand how to use
> > > locks in pack-objects delta search code. For reference, see
> >
> > Side note. I wonder if we could take progress_lock() less often in
> > find_deltas() to reduce contention. Right now we take the lock every
> > time we check out one object_entry but we could pull like 16 items out
> > of the list per lock. I don't know how much actual contention on this
> > lock though so maybe not worth doing.
>
> I doubt it really matters that much, since we hold it for such a small
> amount of time (basically just a few pointer/integer tweaks). Compared
> to the heavy lifting of actually loading objects, you're not likely to
> see a huge amount of contention, since at any given moment most threads
> will be doing that work (or actually computing deltas).

That was also my assumption when I added the lock to that
oe_get_delta_size() but it seemed to slow lots-of-core system
significantly. My theory is when we do small deltas, turnaround time
could be very quick so if you have lots of threads we'll end up racing
to hold the lock.

What I missed here though, is that we do try_delta() on the whole
window in a thread before we would need progress_lock again, so it's a
lot more work than a single try_delta() and likely less contention.

>
> Of course I could be wrong. If you hit a point where many threads are
> skipping work (e.g., because they are considering deltas from objects in
> the same pack, and skip forward without actually doing any work), then
> they'd be ripping through the find_deltas() loop pretty quickly.
>
> OTOH, in cases like that (and the ultimate case would just be running
> "git repack -ad" twice in a row), the compression phase seems to go
> quite quickly, implying we're not spending a lot of time there.
>
> -Peff
-- 
Duy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] pack-objects: document about thread synchronization
  2018-08-02 19:35 ` Jeff King
@ 2018-08-04  7:04   ` Jonathan Nieder
  0 siblings, 0 replies; 6+ messages in thread
From: Jonathan Nieder @ 2018-08-04  7:04 UTC (permalink / raw)
  To: Jeff King; +Cc: Nguyễn Thái Ngọc Duy, git

Jeff King wrote:
> On Sun, Jul 29, 2018 at 05:36:05PM +0200, Nguyễn Thái Ngọc Duy wrote:

>> These extra comments should be make it easier to understand how to use
>> locks in pack-objects delta search code. For reference, see
>>
>> 8ecce684a3 (basic threaded delta search - 2007-09-06)
>> 384b32c09b (pack-objects: fix threaded load balancing - 2007-12-08)
>> 50f22ada52 (threaded pack-objects: Use condition... - 2007-12-16)
>
> Thanks, I think this is an improvement.

Yes,
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
as well.

Usually I would prefer to see comments about what lock protects some
state where the state is defined, but here the state is normally not
protected by any lock, since Git is single-threaded except in limited
places (like pack-objects).  So the documentation you added is in the
right place today, even though it's an unusual place.

Longer term, if we start using more multithreading in Git, we'll have
to reconsider how to structure the locking anyway.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-08-04  7:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-29 15:36 [PATCH] pack-objects: document about thread synchronization Nguyễn Thái Ngọc Duy
2018-07-29 15:46 ` Duy Nguyen
2018-08-02 19:39   ` Jeff King
2018-08-04  5:56     ` Duy Nguyen
2018-08-02 19:35 ` Jeff King
2018-08-04  7:04   ` Jonathan Nieder

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).