git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: steadmon@google.com
Cc: jonathantanmy@google.com, git@vger.kernel.org, stolee@gmail.com,
	peff@peff.net
Subject: Re: [PATCH v2 7/7] index-pack: make quantum of work smaller
Date: Tue, 10 Mar 2020 14:42:06 -0700	[thread overview]
Message-ID: <20200310214206.81606-1-jonathantanmy@google.com> (raw)
In-Reply-To: <20200228000459.GE12115@google.com>

> > +/*
> > + * Stack of struct base_data that have unprocessed children.
> > + * threaded_second_pass() uses this as a source of work (the other being the
> > + * objects array).
> > + */
> > +LIST_HEAD(work_head);
> > +
> > +/*
> > + * Stack of struct base_data that have children, all of whom have been
> > + * processed or are being processed, and at least one child is being processed.
> > + * These struct base_data must be kept around until the last child is
> > + * processed.
> > + */
> > +LIST_HEAD(done_head);
> > +
> > +/*
> > + * All threads share one delta base cache.
> > + */
> > +size_t base_cache_used;
> > +size_t base_cache_limit;
> > +
> >  struct thread_local {
> >  	pthread_t thread;
> > -	size_t base_cache_used;
> >  	int pack_fd;
> >  };
> 
> Seeing several new shared variables made me paranoid about whether or
> not these were thread-safe. There are already five mutexes in this file,
> but there are no clear descriptions (apart from hints in the lock names)
> about which locks protect which variables. It would be nice to make this
> a bit more explicit.

I think that this patch set should confine itself to the work mutex and
what it controls (since that's what this patch set is affecting), but
admittedly, that is already more than a handful.

The work mutex controls work_head and done_head, and all the struct
base_data contained therein, except for data. However, data is
controlled by the semaphore retain_data (and the semaphore is itself
controlled by the work mutex, as described in the previous sentence):
when retain_data is positive, data can be read from but not written to.
The work mutex also controls base_cache_used but not base_cache_limit.
The work mutex does not control any struct base_data outside of
work_head and done_head, so it is possible to generate a struct
base_data outside the mutex, take the mutex, and add it to the list
(reducing time spent in the mutex).

This scheme seems rather complicated to me, but I couldn't make it
simpler - I was hoping for advice on this. If this is the best we can
do, I can try to document it in code comments.

> I wish we had something similar to Abseil's "GUARDED_BY" annotations [1]
> so that we could ensure thread safety with static analysis, but that's
> obviously out of scope for this series.
> 
> [1]: https://abseil.io/docs/cpp/guides/synchronization#thread-annotations

I think what we need is more complicated than what these annotations
provide, but anyway, Abseil is C++-only, I believe.

> > @@ -1065,33 +1011,128 @@ static int compare_ref_delta_entry(const void *a, const void *b)
> [snip]
> >  		work_unlock();
> >  
> > -		resolve_base(&objects[i]);
> > +		if (parent) {
> > +			child = resolve_delta(child_obj, parent);
> > +			if (!child->children_remaining)
> > +				FREE_AND_NULL(child->data);
> > +		} else {
> > +			child = make_base(child_obj, NULL);
> > +			if (child->children_remaining) {
> > +				/*
> > +				 * Since this child has its own delta children,
> > +				 * we will need this data in the future.
> > +				 * Inflate now so that future iterations will
> > +				 * have access to this object's data while
> > +				 * outside the work mutex.
> > +				 */
> > +				child->data = get_data_from_pack(child_obj);
> > +				child->size = child_obj->size;
> > +			}
> > +		}
> > +
> > +		work_lock();
> 
> This part made me uneasy at first, because it looks like we might be
> doing work on shared data outside of the work lock. But just prior to
> unlocking, we parent->retain_data, so we know it won't be freed.

That's correct that it made you uneasy (I think the usual solution is to
make data refcounted, but we don't have a refcounting implementation in
Git), and that's correct that my solution was that parent->retain_data
prevents data from being concurrently written to (in this case, freed).
I didn't know how better to implement this.

> And the
> only part we modify is child, which I don't believe is shared with any
> other threads.

Yes - in the part you quoted, child is a whole new object (created from
resolve_delta() or make_base()). It is only added to the list within the
mutex (the work_lock()).

  reply	other threads:[~2020-03-10 21:42 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-09 23:44 [RFC PATCH 0/6] Better threaded delta resolution in index-pack Jonathan Tan
2019-10-09 23:44 ` [PATCH 1/6] index-pack: unify threaded and unthreaded code Jonathan Tan
2019-10-17  6:20   ` Jeff King
2019-10-09 23:44 ` [PATCH 2/6] index-pack: remove redundant parameter Jonathan Tan
2019-10-17  6:21   ` Jeff King
2019-10-09 23:44 ` [PATCH 3/6] index-pack: remove redundant child field Jonathan Tan
2019-10-10 14:45   ` Derrick Stolee
2019-10-10 19:02     ` Jonathan Tan
2019-10-17  6:24       ` Jeff King
2019-10-17  6:26   ` Jeff King
2019-10-09 23:44 ` [PATCH 4/6] index-pack: calculate {ref,ofs}_{first,last} early Jonathan Tan
2019-10-17  6:30   ` Jeff King
2019-10-09 23:44 ` [PATCH 5/6] index-pack: make resolve_delta() assume base data Jonathan Tan
2019-10-17  6:32   ` Jeff King
2019-10-09 23:44 ` [PATCH 6/6] index-pack: make quantum of work smaller Jonathan Tan
2019-10-17  6:35   ` Jeff King
2019-10-17 20:17 ` [PATCH v2 0/7] Better threaded delta resolution in index-pack Jonathan Tan
2019-10-17 20:17   ` [PATCH v2 1/7] Documentation: deltaBaseCacheLimit is per-thread Jonathan Tan
2019-10-17 20:17   ` [PATCH v2 2/7] index-pack: unify threaded and unthreaded code Jonathan Tan
2019-10-17 20:17   ` [PATCH v2 3/7] index-pack: remove redundant parameter Jonathan Tan
2020-02-28  0:04     ` Josh Steadmon
2020-03-10 21:29       ` Jonathan Tan
2019-10-17 20:17   ` [PATCH v2 4/7] index-pack: remove redundant child field Jonathan Tan
2020-02-28  0:04     ` Josh Steadmon
2019-10-17 20:17   ` [PATCH v2 5/7] index-pack: calculate {ref,ofs}_{first,last} early Jonathan Tan
2019-10-17 20:17   ` [PATCH v2 6/7] index-pack: make resolve_delta() assume base data Jonathan Tan
2019-10-17 20:17   ` [PATCH v2 7/7] index-pack: make quantum of work smaller Jonathan Tan
2020-02-28  0:04     ` Josh Steadmon
2020-03-10 21:42       ` Jonathan Tan [this message]
2020-02-28  0:03   ` [PATCH v2 0/7] Better threaded delta resolution in index-pack Josh Steadmon
2020-03-10 21:45     ` Jonathan Tan
  -- strict thread matches above, loose matches on Subject: below --
2020-08-24 19:16 [PATCH 0/7] Better threaded delta resolution in index-pack (another try) Jonathan Tan
2020-09-08 19:48 ` [PATCH v2 " Jonathan Tan
2020-09-08 19:48   ` [PATCH v2 7/7] index-pack: make quantum of work smaller Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200310214206.81606-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=steadmon@google.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).