Re: RFC - concurrency causes segfault in git grep since 2.26.0

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Phil Hord <phil.hord@gmail.com>
To: Matheus Tavares <matheus.bernardino@usp.br>
Cc: Derrick Stolee <dstolee@microsoft.com>, Git <git@vger.kernel.org>,
	Jonathan Tan <jonathantanmy@google.com>,
	Stefan Beller <stefanbeller@gmail.com>
Subject: Re: RFC - concurrency causes segfault in git grep since 2.26.0
Date: Fri, 25 Sep 2020 12:53:03 -0700	[thread overview]
Message-ID: <CABURp0puC324750dZUpeXBricWizy9Ldaz_=JzdvdOkUp8V4pA@mail.gmail.com> (raw)
In-Reply-To: <20200925055258.79347-1-matheus.bernardino@usp.br>

Hi Matheus,

Thanks for the quick response.

On Thu, Sep 24, 2020 at 10:53 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> Hi, Phil
>
> Thanks for the bug report and for the crash logs.
>
> On Thu, Sep 24, 2020 at 11:36 PM Phil Hord <phil.hord@gmail.com> wrote:
> >
> > Hi list.  It's been a while.
> >
> > I can reliably reproduce a segfault in git grep when searching the
> > object DB. It is caused by a race when threads > 2.
> >
> > I have a patch that fixes the problem, but I don't know exactly why.
> > Can someone else explain it and/or offer a better solution?
> >
> > ====
> >
> > diff --git packfile.c packfile.c
> > index e69012e7f2..52b7b54aeb 100644
> > --- packfile.c
> > +++ packfile.c
> > @@ -1812,7 +1812,9 @@ void *unpack_entry(struct repository *r, struct
> > packed_git *p, off_t obj_offset,
> >                 if (!base)
> >                         continue;
> >
> > +               obj_read_lock();
> >                 delta_data = unpack_compressed_entry(p, &w_curs,
> > curpos, delta_size);
> > +               obj_read_unlock();
> >
> >                 if (!delta_data) {
> >                         error("failed to unpack compressed delta "
> >
> > ====
>
> Hmm, obj_read_mutex is a recursive lock, so by adding the
> obj_read_lock() call here, we are incrementing the lock value to [at
> least] 2 (because oid_object_info_extended() had already incremented
> once). Therefore, when unpack_compressed_entry() calls obj_read_unlock()
> before inflating the entry, the lock is not actually released, only
> decremented. So the effect of this change is that the phase III of
> unpack_entry() is performed entirely without releasing the lock.

Ah, there's the piece I was missing.  I didn't realize we temporarily
drop the lock inside unpack_compressed_entry.  And obviously I also
didn't notice that we were already holding a lock before entering
unpack_entry.  Thanks for the detailed explanation.

> Your crash log shows us that the segfault happened when trying to
> memcpy() the string `base` (in unpack_entry()). I.e., the same string
> that we had previously added to the delta base cache, right before
> calling unpack_compressed_entry() and releasing the lock. The
> problematic sequence is:
>         add `base` to the cache -> release lock -> inflate ->
>         acquire lock -> try to use `base`
>
> Maybe, when a thread X releases the lock to perform decompression,
> thread Y acquires the lock and tries to add another base to the cache.
> The cache is full, so Y has to remove entries, and it ends up free()'ing
> the base that was just inserted by X. Later, X tries to dereference
> `base` in patch_entry(), which would cause a segfault. It would also
> explain why your change solves the segfault.
>
> I'm not sure, though, why this entry would be removed, since it was just
> added... Maybe Y was adding a huge base to the cache, which required
> removing all entries?

I think your theory is correct here on both counts.  The repo I am
searching has some sinfully large objects, and when there is a cache
limit, they are likely to exceed it.

> Anyways, you mentioned you can reproduce the failure quite reliably in your
> repo with 20 threads, right? Could you please check with the following patch
> applied? It adds a `base` copy to the cache instead of the original string,
> allowing us to keep running decompression in parallel but without the risk of
> having another thread free()'ing `base` in the mean time.


Yes, your patch reliably solves the problem, too.  The performance is
only slightly improved over holding the lock during inflate in my
case, but it does save a bit of time.  Surprisingly, performance seems
to peak between 5 and 10 threads for me in both cases.  It reliably
gets slightly worse at 20 threads and 40 threads.

In all cases when threads > 2, this search appears to be 10%-20%
slower than running without this change.  Even when threads=1, though,
the result is about 10% slower.  Perhaps it is worth avoiding the
base_dup when !!obj_read_use_lock.

> diff --git a/packfile.c b/packfile.c
> index 9ef27508f2..79f83b6034 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -1772,14 +1772,15 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
>         while (delta_stack_nr) {
>                 void *delta_data;
>                 void *base = data;
> -               void *external_base = NULL;
>                 unsigned long delta_size, base_size = size;
>                 int i;
>
>                 data = NULL;
>
> -               if (base)
> -                       add_delta_base_cache(p, obj_offset, base, base_size, type);
> +               if (base) {
> +                       char *base_dup = memcpy(xmalloc(base_size), base, base_size);
> +                       add_delta_base_cache(p, obj_offset, base_dup, base_size, type);
> +               }
>
>                 if (!base) {
>                         /*
> @@ -1799,7 +1800,6 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
>                                       p->pack_name);
>                                 mark_bad_packed_object(p, base_oid.hash);
>                                 base = read_object(r, &base_oid, &type, &base_size);
> -                               external_base = base;
>                         }
>                 }
>
> @@ -1818,7 +1818,7 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
>                               "at offset %"PRIuMAX" from %s",
>                               (uintmax_t)curpos, p->pack_name);
>                         data = NULL;
> -                       free(external_base);
> +                       free(base);
>                         continue;
>                 }
>
> @@ -1838,7 +1838,7 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset,
>                         error("failed to apply delta");
>
>                 free(delta_data);
> -               free(external_base);
> +               free(base);
>         }
>
>         if (final_type)
> --

next prev parent reply	other threads:[~2020-09-25 20:39 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-25  2:36 RFC - concurrency causes segfault in git grep since 2.26.0 Phil Hord
2020-09-25  5:52 ` Matheus Tavares
2020-09-25 19:53   ` Phil Hord [this message]
2020-09-28 16:50     ` [PATCH 0/2] Fix race condition and memory leak in delta base cache Matheus Tavares
2020-09-28 16:50       ` [PATCH 1/2] packfile: fix race condition on unpack_entry() Matheus Tavares
2020-09-28 18:05         ` Junio C Hamano
2020-09-28 16:50       ` [PATCH 2/2] packfile: fix memory leak in add_delta_base_cache() Matheus Tavares
2020-09-28 18:22         ` Junio C Hamano
2020-09-29  0:01       ` [PATCH v2 0/2] Fix race condition and memory leak in delta base cache Matheus Tavares
2020-09-29  0:01         ` [PATCH v2 1/2] packfile: fix race condition on unpack_entry() Matheus Tavares
2020-10-02 20:06           ` Phil Hord
2020-09-29  0:01         ` [PATCH v2 2/2] packfile: fix memory leak in add_delta_base_cache() Matheus Tavares

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABURp0puC324750dZUpeXBricWizy9Ldaz_=JzdvdOkUp8V4pA@mail.gmail.com' \
    --to=phil.hord@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=matheus.bernardino@usp.br \
    --cc=stefanbeller@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).