git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Matheus Tavares Bernardino <matheus.bernardino@usp.br>
Cc: git <git@vger.kernel.org>,
	"brian m . carlson" <sandals@crustytoothpaste.net>,
	Johannes Sixt <j6t@kdbg.org>,
	Jonathan Tan <jonathantanmy@google.com>,
	Jeff King <peff@peff.net>,
	Christian Couder <christian.couder@gmail.com>,
	Fredrik Kuivinen <frekui@gmail.com>
Subject: Re: [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe
Date: Wed, 1 Jul 2020 13:32:57 +0200 (CEST)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.2007011244150.56@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <CAHd-oW6U9krWYd++5ipGA6e4Xy6ZoBxE+QVFpshrjUKjN+RH0g@mail.gmail.com>

Hi Matheus,

On Tue, 30 Jun 2020, Matheus Tavares Bernardino wrote:

> On Tue, Jun 30, 2020 at 12:01 PM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > Hi Matheus,
> >
> > I am fine with the Windows changes (although I have to admit that I did
> > not find time to test things yet).
> >
> > There is one problem in that I do not necessarily know that the memory is
> > released correctly when threads end; You will notice that the
> > `pthread_key_create()` shim in `compat/win32/pthread.h` does not use the
> > `destructor` parameter at all. The documentation at
> >
> >       https://docs.microsoft.com/en-us/windows/win32/procthread/using-thread-local-storage
> >
> > is also not terribly clear _how_ the memory is released that was assigned
> > via `TlsSetValue()`.
>
> Yes, I wasn't sure about that either... It would be great if someone
> familiar with TLS memory on Windows could help us with this.

From reading
https://docs.microsoft.com/en-us/windows/win32/procthread/using-thread-local-storage,
I get the impression that it is the duty of the thread function to take
care of releasing the memory.

Looking at our code base, I do not see any obvious single point where we
could take care of that.

And it does not look like there is a function you can register to be
called just before threads shut down.

Hmm.

A very involved solution would be to override our `pthread_create()`
emulation to override the function being called, passing as function
parameter a pointer to a struct containing the originally-specified thread
function and parameter, and the overriding function would then call that
function, save its return value, take care of releasing the memory, and
return the saved return value.

Likewise, our `pthread_setspecific()` shim in `compat/win32/pthread.h`
would need to learn to append the passed pointer to a list, and we should
probably also edit the `pthread_key_create()` shim to record the function
to call when releasing the memory at the end of a thread.

This strikes me as pretty ugly and complex.

Yet when I read the LGPLv2'ed code of
https://sourceware.org/pthreads-win32/ that is exactly what they use.
Since their source code is still in CVS, and their anonymous CVS viewer no
longer works, I had to find a copy to point to:
https://github.com/jingyu/pthreads-w32/blob/master/pthread_key_create.c

So we could now start to require a non-minimal pthreads emulation on
Windows. I am somewhat opposed to that idea.

Now I wonder just how involved it _really_ would be to convert all of
those `oid_to_hex()` calls into what is effectively an `_r()` variant,
where you _have_ to pass a buffer to store the result. That would make it
a ton easier to keep Git portable.

I do understand that 800+ hits for `oid_to_hex()` make this somewhat of a
pain to implement...

Maybe there is a way to convince Coccinelle to do all that work for us?

> > A couple more things:
> >
> > On Fri, 26 Jun 2020, Matheus Tavares wrote:
> > >
> [...]
> > > +struct hexbuf_array {
> > > +     int idx;
> >
> > Is there a specific reason why you renamed `bufno` to `idx`? If not, I'd
> > rather keep the old name.
>
> OK, I'll change it back in v3.
>
> > > +     char bufs[4][GIT_MAX_HEXSZ + 1];
> > > +};
> > > +
> > > +#ifdef HAVE_THREADS
> > > +static pthread_key_t hexbuf_array_key;
> > > +static pthread_once_t hexbuf_array_once = PTHREAD_ONCE_INIT;
> > > +
> > > +static void init_hexbuf_array_key(void)
> > > +{
> > > +     if (pthread_key_create(&hexbuf_array_key, free))
> > > +             die(_("failed to initialize threads' key for hash to hex conversion"));
> > > +}
> > > +
> > > +#else
> > > +static struct hexbuf_array default_hexbuf_array;
> > > +#endif
> > > +
> > >  char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *algop)
> > >  {
> > > -     static int bufno;
> > > -     static char hexbuffer[4][GIT_MAX_HEXSZ + 1];
> > > -     bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer);
> > > -     return hash_to_hex_algop_r(hexbuffer[bufno], hash, algop);
> > > +     struct hexbuf_array *ha;
> > > +
> > > +#ifdef HAVE_THREADS
> > > +     void *value;
> > > +
> > > +     if (pthread_once(&hexbuf_array_once, init_hexbuf_array_key))
> > > +             die(_("failed to initialize threads' key for hash to hex conversion"));
> > > +
> > > +     value = pthread_getspecific(hexbuf_array_key);
> > > +     if (value) {
> > > +             ha = (struct hexbuf_array *) value;
> > > +     } else {
> > > +             ha = xmalloc(sizeof(*ha));
> > > +             if (pthread_setspecific(hexbuf_array_key, (void *)ha))
> > > +                     die(_("failed to set thread buffer for hash to hex conversion"));
> > > +     }
> > > +#else
> > > +     ha = &default_hexbuf_array;
> > > +#endif
> >
> > This introduces two ugly `#ifdef HAVE_THREADS` constructs which are
> > problematic because they are the most likely places to introduce compile
> > errors.
> >
> > I wonder whether you considered introducing a function (and probably a
> > macro) that transparently gives you a thread-specific instance of a given
> > data type? The caller would look something like
> >
> >         struct hexbuf_array *hex_array;
> >
> >         GET_THREADSPECIFIC(hex_array);
> >
> > where the macro would look somewhat like this:
> >
> >         #define GET_THREADSPECIFIC(var) \
> >                 if (get_thread_specific(&var, sizeof(var)) < 0)
> >                         die(_("Failed to get thread-specific %s"), #var);
> >
> > and the function would allocate and assign the variable.
>
> Hmm, we would need two macros, wouldn't we? GET_THREADSPECIFIC(var)
> and a DECLARE_THREADSPECIFIC(var, destructor), to declare the
> pthread_once_t and pthread_key_t variables, as well as define a
> initialization function for the key (i.e. the callback to
> pthread_once()). Or we could provide these declarations ourselves, but
> then we would need the "ifdef HAVE_THREADS" guard to avoid calling
> pthread_key_create() when there is no multithreading.

Right. I wanted to avoid the need to call `DECLARE_THREADSPECIFIC()`, in
particular because that would have to be _outside_ of any function
(because it has to define a function, and nested functions are not really
supported in C, at least not widely).

Ciao,
Dscho

>
> I think that would work, yes. Alternatively, I think we could adjust
> the dummy pthread_key_* functions in thread-utils.h to emulate the
> real ones when HAVE_THREADS == false. Then we could eliminate the
> `#ifdef HAVE_THREADS` guards and use the same code for both cases here
> (and everywhere else pthread_key is used). I haven't thought about it
> carefully enough yet, but I think this *might* be as simple as adding
> the following defines in the "#ifdef NO_PTHREADS" section of
> thread-utils.h:
>
> #define pthread_key_t void *
> /*
>  * The destructor is not used in this case as the main thread will only
>  * exit when the program terminates.
>  */
> #define pthread_key_create(key_ptr, unused) return_0((*key_ptr) = NULL)
> #define pthread_setspecific(key, value) return_0((key) = (value))
> #define pthread_getspecific(key) (key)
> #define pthread_key_delete(key) return_0(NULL)
>
> static inline int return_0(void *unused)
> {
>         return 0;
> }
>
> That's the general idea, but we might as well define a `struct
> dummy_pthread_key_t` instead of defining the key directly as `void *`
> (and have functions instead of macros). This way we could store, e.g.,
> an "initialized" flag that could be used to return an error code on
> double-initializations.
>
> What do you think?
>

  reply	other threads:[~2020-07-01 11:33 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-24 22:29 [RFC] Thread safety in some low-level functions Matheus Tavares Bernardino
2020-06-24 22:52 ` Matheus Tavares Bernardino
2020-06-25  1:38   ` brian m. carlson
2020-06-25 20:32     ` [PATCH 0/2] Make oid_to_hex() thread-safe Matheus Tavares
2020-06-25 20:32       ` [PATCH 1/2] compat/win32/pthread: add pthread_once() Matheus Tavares
2020-06-26  5:45         ` Christian Couder
2020-06-26 14:13           ` Matheus Tavares Bernardino
2020-06-25 20:32       ` [PATCH 2/2] hex: make hash_to_hex_algop() and friends thread-safe Matheus Tavares
2020-06-25 23:07         ` brian m. carlson
2020-06-25 23:54           ` Matheus Tavares Bernardino
2020-06-26  0:00             ` Matheus Tavares Bernardino
2020-06-26  6:02         ` Christian Couder
2020-06-26  8:22       ` [PATCH 0/2] Make oid_to_hex() thread-safe Christian Couder
2020-06-26 16:22         ` Matheus Tavares Bernardino
2020-06-26 21:54       ` [PATCH v2 " Matheus Tavares
2020-06-26 21:54         ` [PATCH v2 1/2] compat/win32/pthread: add pthread_once() Matheus Tavares
2020-06-26 21:54         ` [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe Matheus Tavares
2020-06-29 15:11           ` Johannes Schindelin
2020-06-30 20:37             ` Matheus Tavares Bernardino
2020-07-01 11:32               ` Johannes Schindelin [this message]
2020-07-16 11:29           ` Johannes Schindelin
2020-07-18  3:09             ` Matheus Tavares Bernardino
2020-08-10 14:15               ` Johannes Schindelin
2020-07-18  3:52             ` Matheus Tavares
2020-07-26 17:41               ` René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.2007011244150.56@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=christian.couder@gmail.com \
    --cc=frekui@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=j6t@kdbg.org \
    --cc=jonathantanmy@google.com \
    --cc=matheus.bernardino@usp.br \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).