From: Phil Hord <phil.hord@gmail.com>
To: Matheus Tavares <matheus.bernardino@usp.br>
Cc: Derrick Stolee <dstolee@microsoft.com>, Git <git@vger.kernel.org>,
Jonathan Tan <jonathantanmy@google.com>,
Stefan Beller <stefanbeller@gmail.com>
Subject: Re: RFC - concurrency causes segfault in git grep since 2.26.0
Date: Fri, 25 Sep 2020 12:53:03 -0700 [thread overview]
Message-ID: <CABURp0puC324750dZUpeXBricWizy9Ldaz_=JzdvdOkUp8V4pA@mail.gmail.com> (raw)
In-Reply-To: <20200925055258.79347-1-matheus.bernardino@usp.br>
Hi Matheus,
Thanks for the quick response.
On Thu, Sep 24, 2020 at 10:53 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> Hi, Phil
>
> Thanks for the bug report and for the crash logs.
>
> On Thu, Sep 24, 2020 at 11:36 PM Phil Hord <phil.hord@gmail.com> wrote:
> >
> > Hi list. It's been a while.
> >
> > I can reliably reproduce a segfault in git grep when searching the
> > object DB. It is caused by a race when threads > 2.
> >
> > I have a patch that fixes the problem, but I don't know exactly why.
> > Can someone else explain it and/or offer a better solution?
> >
> > ====
> >
> > diff --git packfile.c packfile.c
> > index e69012e7f2..52b7b54aeb 100644
> > --- packfile.c
> > +++ packfile.c
> > @@ -1812,7 +1812,9 @@ void *unpack_entry(struct repository *r, struct
> > packed_git *p, off_t obj_offset,
> > if (!base)
> > continue;
> >
> > + obj_read_lock();
> > delta_data = unpack_compressed_entry(p, &w_curs,
> > curpos, delta_size);
> > + obj_read_unlock();
> >
> > if (!delta_data) {
> > error("failed to unpack compressed delta "
> >
> > ====
>
> Hmm, obj_read_mutex is a recursive lock, so by adding the
> obj_read_lock() call here, we are incrementing the lock value to [at
> least] 2 (because oid_object_info_extended() had already incremented
> once). Therefore, when unpack_compressed_entry() calls obj_read_unlock()
> before inflating the entry, the lock is not actually released, only
> decremented. So the effect of this change is that the phase III of
> unpack_entry() is performed entirely without releasing the lock.
Ah, there's the piece I was missing. I didn't realize we temporarily
drop the lock inside unpack_compressed_entry. And obviously I also
didn't notice that we were already holding a lock before entering
unpack_entry. Thanks for the detailed explanation.
> Your crash log shows us that the segfault happened when trying to
> memcpy() the string `base` (in unpack_entry()). I.e., the same string
> that we had previously added to the delta base cache, right before
> calling unpack_compressed_entry() and releasing the lock. The
> problematic sequence is:
> add `base` to the cache -> release lock -> inflate ->
> acquire lock -> try to use `base`
>
> Maybe, when a thread X releases the lock to perform decompression,
> thread Y acquires the lock and tries to add another base to the cache.
> The cache is full, so Y has to remove entries, and it ends up free()'ing
> the base that was just inserted by X. Later, X tries to dereference
> `base` in patch_entry(), which would cause a segfault. It would also
> explain why your change solves the segfault.
>
> I'm not sure, though, why this entry would be removed, since it was just
> added... Maybe Y was adding a huge base to the cache, which required
> removing all entries?
I think your theory is correct here on both counts. The repo I am
searching has some sinfully large objects, and when there is a cache
limit, they are likely to exceed it.
> Anyways, you mentioned you can reproduce the failure quite reliably in your
> repo with 20 threads, right? Could you please check with the following patch
> applied? It adds a `base` copy to the cache instead of the original string,
> allowing us to keep running decompression in parallel but without the risk of
> having another thread free()'ing `base` in the mean time.
Yes, your patch reliably solves the problem, too. The performance is
only slightly improved over holding the lock during inflate in my
case, but it does save a bit of time. Surprisingly, performance seems
to peak between 5 and 10 threads for me in both cases. It reliably
gets slightly worse at 20 threads and 40 threads.
In all cases when threads > 2, this search appears to be 10%-20%
slower than running without this change. Even when threads=1, though,
the result is about 10% slower. Perhaps it is worth avoiding the
base_dup when !!obj_read_use_lock.
> diff --git a/packfile.c b/packfile.c
> index 9ef27508f2..79f83b6034 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -1772,14 +1772,15 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
> while (delta_stack_nr) {
> void *delta_data;
> void *base = data;
> - void *external_base = NULL;
> unsigned long delta_size, base_size = size;
> int i;
>
> data = NULL;
>
> - if (base)
> - add_delta_base_cache(p, obj_offset, base, base_size, type);
> + if (base) {
> + char *base_dup = memcpy(xmalloc(base_size), base, base_size);
> + add_delta_base_cache(p, obj_offset, base_dup, base_size, type);
> + }
>
> if (!base) {
> /*
> @@ -1799,7 +1800,6 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
> p->pack_name);
> mark_bad_packed_object(p, base_oid.hash);
> base = read_object(r, &base_oid, &type, &base_size);
> - external_base = base;
> }
> }
>
> @@ -1818,7 +1818,7 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
> "at offset %"PRIuMAX" from %s",
> (uintmax_t)curpos, p->pack_name);
> data = NULL;
> - free(external_base);
> + free(base);
> continue;
> }
>
> @@ -1838,7 +1838,7 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
> error("failed to apply delta");
>
> free(delta_data);
> - free(external_base);
> + free(base);
> }
>
> if (final_type)
> --
next prev parent reply other threads:[~2020-09-25 20:39 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-25 2:36 RFC - concurrency causes segfault in git grep since 2.26.0 Phil Hord
2020-09-25 5:52 ` Matheus Tavares
2020-09-25 19:53 ` Phil Hord [this message]
2020-09-28 16:50 ` [PATCH 0/2] Fix race condition and memory leak in delta base cache Matheus Tavares
2020-09-28 16:50 ` [PATCH 1/2] packfile: fix race condition on unpack_entry() Matheus Tavares
2020-09-28 18:05 ` Junio C Hamano
2020-09-28 16:50 ` [PATCH 2/2] packfile: fix memory leak in add_delta_base_cache() Matheus Tavares
2020-09-28 18:22 ` Junio C Hamano
2020-09-29 0:01 ` [PATCH v2 0/2] Fix race condition and memory leak in delta base cache Matheus Tavares
2020-09-29 0:01 ` [PATCH v2 1/2] packfile: fix race condition on unpack_entry() Matheus Tavares
2020-10-02 20:06 ` Phil Hord
2020-09-29 0:01 ` [PATCH v2 2/2] packfile: fix memory leak in add_delta_base_cache() Matheus Tavares
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CABURp0puC324750dZUpeXBricWizy9Ldaz_=JzdvdOkUp8V4pA@mail.gmail.com' \
--to=phil.hord@gmail.com \
--cc=dstolee@microsoft.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
--cc=matheus.bernardino@usp.br \
--cc=stefanbeller@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).