* [RFC] Thread safety in some low-level functions @ 2020-06-24 22:29 Matheus Tavares Bernardino 2020-06-24 22:52 ` Matheus Tavares Bernardino 0 siblings, 1 reply; 25+ messages in thread From: Matheus Tavares Bernardino @ 2020-06-24 22:29 UTC (permalink / raw) To: git; +Cc: Johannes Sixt, Jeff King, Jonathan Tan Hi, I'm working with threads in unpack-trees and noticed that [warning|error]_errno() uses strerror(), which is not thread-safe. We could try to avoid calling these functions in threaded code, but they are sometimes too deep in the call stack to be noticed... (or even avoided). The same happens with oid_to_hex(), which writes to a static buffer. I don't think I've ever seen a bug report involving these functions being called racily, but this possibility is not currently excluded in our codebase. For example, see grep_source_load_file(), which is called by multiple threads concurrently and might call the thread-unsafe error_errno(). (Although, I admit, the chance of a race here must be very low...) I still haven't been able to come up with a simple / easy change that could make these functions thread safe, but here are my thoughts so far: - For strerror(), there is a thread-safe variant: strerror_r(). However IIUC, this variant is not present on Windows (although there is strerror_s() which *seems* to be somewhat similar). Also, there are two versions of strerror_r() on Linux: one is XSI-compliant and the other is GNU-specific. I don't know what the situation is in other OSes... - Regarding, oid_to_hex(), a patch from 2010 [1] proposed a solution using thread-local storage and pthread_once(). But as Hannes pointed out in this other thread [2] , implementing a Windows equivalence for pthread_once() could be tricky and voluminous. Since this thread dates from 7 years ago, I was wondering if we would be able to implement it nowadays with InitOnceExecuteOnce() [3]. Finally, leaving these functions thread-unsafe is also a possibility... As I mentioned earlier, they don't seem to be causing problems out there for now (at least not reported). But if we can find a feasible solution to introduce thread-safety, I think it would be great. The codebase would be more robust and we would be able to work with threads with much more confidence. Any thoughts? Thanks, Matheus [1] https://lore.kernel.org/git/20100323173130.GC4218@fredrik-laptop/ [2] https://lore.kernel.org/git/516D5CA4.7000500@viscovery.net/ [3]: https://docs.microsoft.com/en-us/windows/win32/sync/using-one-time-initialization ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] Thread safety in some low-level functions 2020-06-24 22:29 [RFC] Thread safety in some low-level functions Matheus Tavares Bernardino @ 2020-06-24 22:52 ` Matheus Tavares Bernardino 2020-06-25 1:38 ` brian m. carlson 0 siblings, 1 reply; 25+ messages in thread From: Matheus Tavares Bernardino @ 2020-06-24 22:52 UTC (permalink / raw) To: git; +Cc: Johannes Sixt, Jeff King, Jonathan Tan On Wed, Jun 24, 2020 at 7:29 PM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote: > > - Regarding, oid_to_hex(), a patch from 2010 [1] proposed a solution > using thread-local storage and pthread_once(). But as Hannes pointed > out in this other thread [2] , implementing a Windows equivalence for > pthread_once() could be tricky and voluminous. Since this thread dates > from 7 years ago, I was wondering if we would be able to implement it > nowadays with InitOnceExecuteOnce() [3]. I forgot to link it before, but here is a commit in libav.git which emulates pthread_once(): https://git.libav.org/?p=libav.git;a=commitdiff;h=b22693b06d1e5d73454a65c203b4d31c1ca5b69a The patch is not so big, and if we consider only Vista+, it seems very straightforward. I'm not very familiar with Windows, though, and I'm not sure if this solution would work for us. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [RFC] Thread safety in some low-level functions 2020-06-24 22:52 ` Matheus Tavares Bernardino @ 2020-06-25 1:38 ` brian m. carlson 2020-06-25 20:32 ` [PATCH 0/2] Make oid_to_hex() thread-safe Matheus Tavares 0 siblings, 1 reply; 25+ messages in thread From: brian m. carlson @ 2020-06-25 1:38 UTC (permalink / raw) To: Matheus Tavares Bernardino; +Cc: git, Johannes Sixt, Jeff King, Jonathan Tan [-- Attachment #1: Type: text/plain, Size: 1606 bytes --] On 2020-06-24 at 22:52:58, Matheus Tavares Bernardino wrote: > On Wed, Jun 24, 2020 at 7:29 PM Matheus Tavares Bernardino > <matheus.bernardino@usp.br> wrote: > > > > - Regarding, oid_to_hex(), a patch from 2010 [1] proposed a solution > > using thread-local storage and pthread_once(). But as Hannes pointed > > out in this other thread [2] , implementing a Windows equivalence for > > pthread_once() could be tricky and voluminous. Since this thread dates > > from 7 years ago, I was wondering if we would be able to implement it > > nowadays with InitOnceExecuteOnce() [3]. > > I forgot to link it before, but here is a commit in libav.git which > emulates pthread_once(): > https://git.libav.org/?p=libav.git;a=commitdiff;h=b22693b06d1e5d73454a65c203b4d31c1ca5b69a > > The patch is not so big, and if we consider only Vista+, it seems very > straightforward. I'm not very familiar with Windows, though, and I'm > not sure if this solution would work for us. I believe Git for Windows dropped support for Windows XP some time back. Regardless, since pretty much nobody using Windows XP (with the possible exception of ATMs) is getting security support, I'm fine with ignoring it either way. I am also not a Windows person, so others may have more insight. As for the general idea, I'm mildly in favor of it. While I don't plan to work on the project myself (outside of cleaning up places I'm already working), I'm definitely not opposed to seeing patches come in to improve thread safety. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 0/2] Make oid_to_hex() thread-safe 2020-06-25 1:38 ` brian m. carlson @ 2020-06-25 20:32 ` Matheus Tavares 2020-06-25 20:32 ` [PATCH 1/2] compat/win32/pthread: add pthread_once() Matheus Tavares ` (3 more replies) 0 siblings, 4 replies; 25+ messages in thread From: Matheus Tavares @ 2020-06-25 20:32 UTC (permalink / raw) To: git; +Cc: sandals, j6t, jonathantanmy, peff, Johannes.Schindelin Some thread-unsafe functions of our codebase appear very down in the call stack, which can be hard to notice (or avoid). Thus they are sometimes used in threaded code unsafely. In this series we add pthread_once() to compat/win32/ and use it in conjunction with pthread_key to make a subset of the said functions thread-safe. As a next step, I would love to make [warning|error|die]_errno() thread-safe as well. strerror() is not safe on *nix, and there are some thread functions today that call these (although the actual risk of a race condition must be very small...) My idea was to implement a xstrerror() wrapper which calls the appropriate thread-safe function (dependant on the OS), or even call strerror() itself but holding a lock to copy the result for a local buffer (which should be OK as we don't expect contention in strerror). We could also set a thread local buffer array, as in the second patch of this series, to excuse callers from allocating/freeing memory. One concern with this idea is the risk of an infinite recursion if xstrerror() or any of its childs call [warning|error|die]_errno(). However, if we are only using strerror() and pthread_*() within the wrapper, there should be no problem, right? Has anyone thought of other problems with this approach? Finally, should such change also come with a coccinelle patch to replace usages of strerror() with xstrerror()? Or better not, as the change would be too big? Matheus Tavares (2): compat/win32/pthread: add pthread_once() hex: make hash_to_hex_algop() and friends thread-safe compat/win32/pthread.c | 22 +++++++++++++++++++++ compat/win32/pthread.h | 5 +++++ hex.c | 45 ++++++++++++++++++++++++++++++++++++++---- thread-utils.c | 11 +++++++++++ thread-utils.h | 6 ++++++ 5 files changed, 85 insertions(+), 4 deletions(-) -- 2.26.2 ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 1/2] compat/win32/pthread: add pthread_once() 2020-06-25 20:32 ` [PATCH 0/2] Make oid_to_hex() thread-safe Matheus Tavares @ 2020-06-25 20:32 ` Matheus Tavares 2020-06-26 5:45 ` Christian Couder 2020-06-25 20:32 ` [PATCH 2/2] hex: make hash_to_hex_algop() and friends thread-safe Matheus Tavares ` (2 subsequent siblings) 3 siblings, 1 reply; 25+ messages in thread From: Matheus Tavares @ 2020-06-25 20:32 UTC (permalink / raw) To: git; +Cc: sandals, j6t, jonathantanmy, peff, Johannes.Schindelin Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- Note: the pthread_once() function is adapted from: https://git.libav.org/?p=libav.git;a=commitdiff;h=b22693b06d1e5d73454a65c203b4d31c1ca5b69a Which is LGPLv2.1. Should I add any notice/acknowledgment somewhere, besides the comment I added above the function? compat/win32/pthread.c | 22 ++++++++++++++++++++++ compat/win32/pthread.h | 5 +++++ thread-utils.c | 11 +++++++++++ thread-utils.h | 6 ++++++ 4 files changed, 44 insertions(+) diff --git a/compat/win32/pthread.c b/compat/win32/pthread.c index 2e7eead42c..5a7ecbd999 100644 --- a/compat/win32/pthread.c +++ b/compat/win32/pthread.c @@ -56,3 +56,25 @@ pthread_t pthread_self(void) t.tid = GetCurrentThreadId(); return t; } + +/* Adapted from libav's compat/w32pthreads.h. */ +int pthread_once(pthread_once_t *once_control, void (*init_routine)(void)) +{ + BOOL pending = FALSE; + int ret = 0; + + if(!InitOnceBeginInitialize(once_control, 0, &pending, NULL)) { + ret = err_win_to_posix(GetLastError()); + goto out; + } + + if (pending) + init_routine(); + + if(!InitOnceComplete(once_control, 0, NULL)) + ret = err_win_to_posix(GetLastError()); + +out: + /* POSIX doesn't allow pthread_once() to return EINTR */ + return ret == EINTR ? EIO : ret; +} diff --git a/compat/win32/pthread.h b/compat/win32/pthread.h index 737983d00b..c50f1e89c7 100644 --- a/compat/win32/pthread.h +++ b/compat/win32/pthread.h @@ -40,6 +40,11 @@ typedef int pthread_mutexattr_t; #define pthread_cond_signal WakeConditionVariable #define pthread_cond_broadcast WakeAllConditionVariable +#define pthread_once_t INIT_ONCE + +#define PTHREAD_ONCE_INIT INIT_ONCE_STATIC_INIT +int pthread_once(pthread_once_t *once_control, void (*init_routine)(void)); + /* * Simple thread creation implementation using pthread API */ diff --git a/thread-utils.c b/thread-utils.c index 5329845691..937deb3f2e 100644 --- a/thread-utils.c +++ b/thread-utils.c @@ -122,4 +122,15 @@ int dummy_pthread_join(pthread_t pthread, void **retval) return ENOSYS; } +int nothreads_pthread_once(pthread_once_t *once_control, + void (*init_routine)(void)) +{ + if (*once_control == 1) + return 0; + + init_routine(); + *once_control = 1; + return 0; +} + #endif diff --git a/thread-utils.h b/thread-utils.h index 4961487ed9..bab9dc5e4d 100644 --- a/thread-utils.h +++ b/thread-utils.h @@ -19,6 +19,7 @@ #define pthread_mutex_t int #define pthread_cond_t int #define pthread_key_t int +#define pthread_once_t int #define pthread_mutex_init(mutex, attr) dummy_pthread_init(mutex) #define pthread_mutex_lock(mutex) @@ -48,6 +49,11 @@ int dummy_pthread_join(pthread_t pthread, void **retval); int dummy_pthread_init(void *); +#define PTHREAD_ONCE_INIT 0 +int nothreads_pthread_once(pthread_once_t *once_control, + void (*init_routine)(void)); +#define pthread_once(once, routine) nothreads_pthread_once(once, routine) + #endif int online_cpus(void); -- 2.26.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 1/2] compat/win32/pthread: add pthread_once() 2020-06-25 20:32 ` [PATCH 1/2] compat/win32/pthread: add pthread_once() Matheus Tavares @ 2020-06-26 5:45 ` Christian Couder 2020-06-26 14:13 ` Matheus Tavares Bernardino 0 siblings, 1 reply; 25+ messages in thread From: Christian Couder @ 2020-06-26 5:45 UTC (permalink / raw) To: Matheus Tavares Cc: git, brian m. carlson, Johannes Sixt, Jonathan Tan, Jeff King, Johannes Schindelin On Fri, Jun 26, 2020 at 3:00 AM Matheus Tavares <matheus.bernardino@usp.br> wrote: > > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> > --- The commit message might want to explain a bit the purpose of adding these features. > Note: the pthread_once() function is adapted from: > https://git.libav.org/?p=libav.git;a=commitdiff;h=b22693b06d1e5d73454a65c203b4d31c1ca5b69a > > Which is LGPLv2.1. Should I add any notice/acknowledgment somewhere, > besides the comment I added above the function? Yeah, I think you should also tell in the commit message where the code comes from (along with the hash of the commit) and that libav is LGPLv2.1 which is compatible with GPLv2 as explained in section 3 of the LGPLv2.1. > compat/win32/pthread.c | 22 ++++++++++++++++++++++ > compat/win32/pthread.h | 5 +++++ > thread-utils.c | 11 +++++++++++ > thread-utils.h | 6 ++++++ > 4 files changed, 44 insertions(+) > > diff --git a/compat/win32/pthread.c b/compat/win32/pthread.c > index 2e7eead42c..5a7ecbd999 100644 > --- a/compat/win32/pthread.c > +++ b/compat/win32/pthread.c > @@ -56,3 +56,25 @@ pthread_t pthread_self(void) > t.tid = GetCurrentThreadId(); > return t; > } > + > +/* Adapted from libav's compat/w32pthreads.h. */ > +int pthread_once(pthread_once_t *once_control, void (*init_routine)(void)) > +{ > + BOOL pending = FALSE; > + int ret = 0; > + > + if(!InitOnceBeginInitialize(once_control, 0, &pending, NULL)) { We put a space between "if" and the following "(". It might also be interesting to know perhaps in the commit message how much you adapted the code. For example perhaps a good strategy would be in the commit that imports the code to do the minimal amount of change so that it builds and passes the test, and then to have another commit that adapts the style of the code. > + ret = err_win_to_posix(GetLastError()); > + goto out; > + } > + > + if (pending) > + init_routine(); > + > + if(!InitOnceComplete(once_control, 0, NULL)) Space missing between "if" and the following "(". > + ret = err_win_to_posix(GetLastError()); > + > +out: > + /* POSIX doesn't allow pthread_once() to return EINTR */ > + return ret == EINTR ? EIO : ret; > +} ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 1/2] compat/win32/pthread: add pthread_once() 2020-06-26 5:45 ` Christian Couder @ 2020-06-26 14:13 ` Matheus Tavares Bernardino 0 siblings, 0 replies; 25+ messages in thread From: Matheus Tavares Bernardino @ 2020-06-26 14:13 UTC (permalink / raw) To: Christian Couder Cc: git, brian m. carlson, Johannes Sixt, Jonathan Tan, Jeff King, Johannes Schindelin On Fri, Jun 26, 2020 at 2:45 AM Christian Couder <christian.couder@gmail.com> wrote: > > On Fri, Jun 26, 2020 at 3:00 AM Matheus Tavares > <matheus.bernardino@usp.br> wrote: > > > > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> > > --- > > The commit message might want to explain a bit the purpose of adding > these features. Will do! > > Note: the pthread_once() function is adapted from: > > https://git.libav.org/?p=libav.git;a=commitdiff;h=b22693b06d1e5d73454a65c203b4d31c1ca5b69a > > > > Which is LGPLv2.1. Should I add any notice/acknowledgment somewhere, > > besides the comment I added above the function? > > Yeah, I think you should also tell in the commit message where the > code comes from (along with the hash of the commit) and that libav is > LGPLv2.1 which is compatible with GPLv2 as explained in section 3 of > the LGPLv2.1. Sounds good, I'll add that information. > > compat/win32/pthread.c | 22 ++++++++++++++++++++++ > > compat/win32/pthread.h | 5 +++++ > > thread-utils.c | 11 +++++++++++ > > thread-utils.h | 6 ++++++ > > 4 files changed, 44 insertions(+) > > > > diff --git a/compat/win32/pthread.c b/compat/win32/pthread.c > > index 2e7eead42c..5a7ecbd999 100644 > > --- a/compat/win32/pthread.c > > +++ b/compat/win32/pthread.c > > @@ -56,3 +56,25 @@ pthread_t pthread_self(void) > > t.tid = GetCurrentThreadId(); > > return t; > > } > > + > > +/* Adapted from libav's compat/w32pthreads.h. */ > > +int pthread_once(pthread_once_t *once_control, void (*init_routine)(void)) > > +{ > > + BOOL pending = FALSE; > > + int ret = 0; > > + > > + if(!InitOnceBeginInitialize(once_control, 0, &pending, NULL)) { > > We put a space between "if" and the following "(". It might also be > interesting to know perhaps in the commit message how much you adapted > the code. > > For example perhaps a good strategy would be in the commit that > imports the code to do the minimal amount of change so that it builds > and passes the test, and then to have another commit that adapts the > style of the code. Makes sense. Alternatively, since the code ported from libav is quite small, I think we might as well already adapt the style in this same commit. I'll fix that for v2, thanks! ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 2/2] hex: make hash_to_hex_algop() and friends thread-safe 2020-06-25 20:32 ` [PATCH 0/2] Make oid_to_hex() thread-safe Matheus Tavares 2020-06-25 20:32 ` [PATCH 1/2] compat/win32/pthread: add pthread_once() Matheus Tavares @ 2020-06-25 20:32 ` Matheus Tavares 2020-06-25 23:07 ` brian m. carlson 2020-06-26 6:02 ` Christian Couder 2020-06-26 8:22 ` [PATCH 0/2] Make oid_to_hex() thread-safe Christian Couder 2020-06-26 21:54 ` [PATCH v2 " Matheus Tavares 3 siblings, 2 replies; 25+ messages in thread From: Matheus Tavares @ 2020-06-25 20:32 UTC (permalink / raw) To: git Cc: sandals, j6t, jonathantanmy, peff, Johannes.Schindelin, Fredrik Kuivinen hash_to_hex_algop() returns a static buffer, relieving callers from the responsibility of freeing memory after use. But the current implementation uses the same static data for all threads and, thus, is not thread-safe. We could avoid using this function and its wrappers in threaded code, but they are sometimes too deep in the call stack to be noticed or even avoided. For example, we can take a look at the number of oid_to_hex() calls, which calls hash_to_hex_algop(): $ git grep 'oid_to_hex(' | wc -l 818 Although these functions don't seem to be causing problems out there for now (at least not reported), making them thread-safe makes the codebase more robust against race conditions. We can easily do that replicating the static buffer in each thread's local storage. Original-patch-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- hex.c | 45 +++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 41 insertions(+), 4 deletions(-) diff --git a/hex.c b/hex.c index da51e64929..1094ed25bd 100644 --- a/hex.c +++ b/hex.c @@ -136,12 +136,49 @@ char *oid_to_hex_r(char *buffer, const struct object_id *oid) return hash_to_hex_algop_r(buffer, oid->hash, the_hash_algo); } +struct hexbuf_array { + int idx; + char bufs[4][GIT_MAX_HEXSZ + 1]; +}; + +#ifdef HAVE_THREADS +static pthread_key_t hexbuf_array_key; +static pthread_once_t hexbuf_array_once = PTHREAD_ONCE_INIT; + +void init_hexbuf_array_key(void) +{ + if (pthread_key_create(&hexbuf_array_key, free)) + die(_("failed to initialize threads' key for hash to hex conversion")); +} + +#else +static struct hexbuf_array default_hexbuf_array; +#endif + char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *algop) { - static int bufno; - static char hexbuffer[4][GIT_MAX_HEXSZ + 1]; - bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer); - return hash_to_hex_algop_r(hexbuffer[bufno], hash, algop); + struct hexbuf_array *ha; + +#ifdef HAVE_THREADS + void *value; + + if (pthread_once(&hexbuf_array_once, init_hexbuf_array_key)) + die(_("failed to initialize threads' key for hash to hex conversion")); + + value = pthread_key_getspecific(&hexbuf_array_key); + if (value) { + ha = (struct hexbuf_array *) value; + } else { + ha = xmalloc(sizeof(*ha)); + if (pthread_key_setspecific(&hexbuf_array_key, (void *)ha)) + die(_("failed to set thread buffer for hash to hex conversion")); + } +#else + ha = &default_hexbuf_array; +#endif + + ha->idx = (ha->idx + 1) % ARRAY_SIZE(ha->bufs); + return hash_to_hex_algop_r(ha->bufs[ha->idx], hash, algop); } char *hash_to_hex(const unsigned char *hash) -- 2.26.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] hex: make hash_to_hex_algop() and friends thread-safe 2020-06-25 20:32 ` [PATCH 2/2] hex: make hash_to_hex_algop() and friends thread-safe Matheus Tavares @ 2020-06-25 23:07 ` brian m. carlson 2020-06-25 23:54 ` Matheus Tavares Bernardino 2020-06-26 6:02 ` Christian Couder 1 sibling, 1 reply; 25+ messages in thread From: brian m. carlson @ 2020-06-25 23:07 UTC (permalink / raw) To: Matheus Tavares Cc: git, j6t, jonathantanmy, peff, Johannes.Schindelin, Fredrik Kuivinen [-- Attachment #1: Type: text/plain, Size: 1051 bytes --] On 2020-06-25 at 20:32:57, Matheus Tavares wrote: > +#ifdef HAVE_THREADS > + void *value; > + > + if (pthread_once(&hexbuf_array_once, init_hexbuf_array_key)) > + die(_("failed to initialize threads' key for hash to hex conversion")); > + > + value = pthread_key_getspecific(&hexbuf_array_key); I think the portable name for this is "pthread_getspecific". That appears to be what POSIX specifies and what our Windows pthread compat code uses. > + if (value) { > + ha = (struct hexbuf_array *) value; > + } else { > + ha = xmalloc(sizeof(*ha)); > + if (pthread_key_setspecific(&hexbuf_array_key, (void *)ha)) Same thing here. > + die(_("failed to set thread buffer for hash to hex conversion")); > + } > +#else > + ha = &default_hexbuf_array; > +#endif > + > + ha->idx = (ha->idx + 1) % ARRAY_SIZE(ha->bufs); > + return hash_to_hex_algop_r(ha->bufs[ha->idx], hash, algop); > } > > char *hash_to_hex(const unsigned char *hash) -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] hex: make hash_to_hex_algop() and friends thread-safe 2020-06-25 23:07 ` brian m. carlson @ 2020-06-25 23:54 ` Matheus Tavares Bernardino 2020-06-26 0:00 ` Matheus Tavares Bernardino 0 siblings, 1 reply; 25+ messages in thread From: Matheus Tavares Bernardino @ 2020-06-25 23:54 UTC (permalink / raw) To: brian m. carlson Cc: git, Johannes Sixt, Jonathan Tan, Jeff King, Johannes Schindelin, Fredrik Kuivinen On Thu, Jun 25, 2020 at 8:08 PM brian m. carlson <sandals@crustytoothpaste.net> wrote: > > On 2020-06-25 at 20:32:57, Matheus Tavares wrote: > > +#ifdef HAVE_THREADS > > + void *value; > > + > > + if (pthread_once(&hexbuf_array_once, init_hexbuf_array_key)) > > + die(_("failed to initialize threads' key for hash to hex conversion")); > > + > > + value = pthread_key_getspecific(&hexbuf_array_key); > > I think the portable name for this is "pthread_getspecific". That > appears to be what POSIX specifies and what our Windows pthread compat > code uses. Right, thanks for noticing that! (I wonder how the Windows build passed successfully in our CI [1]... Shouldn't the compiler have failed due to a "undefined reference" error?) [1]: https://github.com/matheustavares/git/runs/808649609 ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] hex: make hash_to_hex_algop() and friends thread-safe 2020-06-25 23:54 ` Matheus Tavares Bernardino @ 2020-06-26 0:00 ` Matheus Tavares Bernardino 0 siblings, 0 replies; 25+ messages in thread From: Matheus Tavares Bernardino @ 2020-06-26 0:00 UTC (permalink / raw) To: brian m. carlson Cc: git, Johannes Sixt, Jonathan Tan, Jeff King, Johannes Schindelin, Fredrik Kuivinen On Thu, Jun 25, 2020 at 8:54 PM Matheus Tavares Bernardino <matheus.bernardino@usp.br> wrote: > > On Thu, Jun 25, 2020 at 8:08 PM brian m. carlson > <sandals@crustytoothpaste.net> wrote: > > > > On 2020-06-25 at 20:32:57, Matheus Tavares wrote: > > > +#ifdef HAVE_THREADS > > > + void *value; > > > + > > > + if (pthread_once(&hexbuf_array_once, init_hexbuf_array_key)) > > > + die(_("failed to initialize threads' key for hash to hex conversion")); > > > + > > > + value = pthread_key_getspecific(&hexbuf_array_key); > > > > I think the portable name for this is "pthread_getspecific". That > > appears to be what POSIX specifies and what our Windows pthread compat > > code uses. > > Right, thanks for noticing that! > > (I wonder how the Windows build passed successfully in our CI [1]... > Shouldn't the compiler have failed due to a "undefined reference" > error?) Oh, I know why... I forgot to include "thread-utils.h", so HAVE_THREADS is never defined and this code is always ignored (on both Windows and Linux). That's embarrassing... I'll correct that for v2. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 2/2] hex: make hash_to_hex_algop() and friends thread-safe 2020-06-25 20:32 ` [PATCH 2/2] hex: make hash_to_hex_algop() and friends thread-safe Matheus Tavares 2020-06-25 23:07 ` brian m. carlson @ 2020-06-26 6:02 ` Christian Couder 1 sibling, 0 replies; 25+ messages in thread From: Christian Couder @ 2020-06-26 6:02 UTC (permalink / raw) To: Matheus Tavares Cc: git, brian m. carlson, Johannes Sixt, Jonathan Tan, Jeff King, Johannes Schindelin, Fredrik Kuivinen On Fri, Jun 26, 2020 at 1:35 AM Matheus Tavares <matheus.bernardino@usp.br> wrote: [...] > Although these functions don't seem to be causing problems out there for > now (at least not reported), making them thread-safe makes the codebase > more robust against race conditions. We can easily do that replicating Maybe s/that replicating/that by replicating/. > the static buffer in each thread's local storage. > > Original-patch-by: Fredrik Kuivinen <frekui@gmail.com> If Fredrik gave his "Signed-off-by:", maybe it's better to keep it too. > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> > --- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/2] Make oid_to_hex() thread-safe 2020-06-25 20:32 ` [PATCH 0/2] Make oid_to_hex() thread-safe Matheus Tavares 2020-06-25 20:32 ` [PATCH 1/2] compat/win32/pthread: add pthread_once() Matheus Tavares 2020-06-25 20:32 ` [PATCH 2/2] hex: make hash_to_hex_algop() and friends thread-safe Matheus Tavares @ 2020-06-26 8:22 ` Christian Couder 2020-06-26 16:22 ` Matheus Tavares Bernardino 2020-06-26 21:54 ` [PATCH v2 " Matheus Tavares 3 siblings, 1 reply; 25+ messages in thread From: Christian Couder @ 2020-06-26 8:22 UTC (permalink / raw) To: Matheus Tavares Cc: git, brian m. carlson, Johannes Sixt, Jonathan Tan, Jeff King, Johannes Schindelin On Fri, Jun 26, 2020 at 1:35 AM Matheus Tavares <matheus.bernardino@usp.br> wrote: > > Some thread-unsafe functions of our codebase appear very down in the > call stack, which can be hard to notice (or avoid). Thus they are > sometimes used in threaded code unsafely. In this series we add > pthread_once() to compat/win32/ and use it in conjunction with > pthread_key to make a subset of the said functions thread-safe. Great! > As a next step, I would love to make [warning|error|die]_errno() > thread-safe as well. strerror() is not safe on *nix, and there are some > thread functions today that call these (although the actual risk of a > race condition must be very small...) > > My idea was to implement a xstrerror() wrapper which calls the > appropriate thread-safe function (dependant on the OS), Yeah, that works if there are appropriate thread-safe functions for all the OS we are interested in, or if we can fallback to strerror() or calling it holding a lock. > or even call > strerror() itself but holding a lock to copy the result for a local > buffer (which should be OK as we don't expect contention in strerror). I agree that it should be ok. > We could also set a thread local buffer array, as in the second patch of > this series, to excuse callers from allocating/freeing memory. I don't think caller allocating/freeing memory for error strings is a performance or code simplification issue. > One concern with this idea is the risk of an infinite recursion if > xstrerror() or any of its childs call [warning|error|die]_errno(). > However, if we are only using strerror() and pthread_*() within the > wrapper, there should be no problem, right? Yeah, I agree. > Has anyone thought of > other problems with this approach? > > Finally, should such change also come with a coccinelle patch to replace > usages of strerror() with xstrerror()? Or better not, as the change > would be too big? I would agree that it's better to avoid too big changes. I am not sure how much we want to automate and check that though. Thanks, Christian. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 0/2] Make oid_to_hex() thread-safe 2020-06-26 8:22 ` [PATCH 0/2] Make oid_to_hex() thread-safe Christian Couder @ 2020-06-26 16:22 ` Matheus Tavares Bernardino 0 siblings, 0 replies; 25+ messages in thread From: Matheus Tavares Bernardino @ 2020-06-26 16:22 UTC (permalink / raw) To: Christian Couder Cc: git, brian m. carlson, Johannes Sixt, Jonathan Tan, Jeff King, Johannes Schindelin Hi, Christian Thanks for the feedback! On Fri, Jun 26, 2020 at 5:22 AM Christian Couder <christian.couder@gmail.com> wrote: > > On Fri, Jun 26, 2020 at 1:35 AM Matheus Tavares > <matheus.bernardino@usp.br> wrote: > > > > My idea was to implement a xstrerror() wrapper which calls the > > appropriate thread-safe function (dependant on the OS), [...] > > We could also set a thread local buffer array, as in the second patch of > > this series, to excuse callers from allocating/freeing memory. > > I don't think caller allocating/freeing memory for error strings is a > performance or code simplification issue. Yeah, I agree that passing a buffer to xstrerror() should be fine. The problem is that we already have many uses of strerror() (98, according to git-grep), which expect a static buffer in return. So the change would be too big :( > > Finally, should such change also come with a coccinelle patch to replace > > usages of strerror() with xstrerror()? Or better not, as the change > > would be too big? > > I would agree that it's better to avoid too big changes. I am not sure > how much we want to automate and check that though. Another alternative would be to replace the definition of strerror() with a: #define strerror(errnum) xstrerror(errnum) This way, all the 98 current strerror() callers would start using the thread-safe version without the need to change them.... However, I don't think I've ever seen such a replacement in our codebase before (besides the banned functions in banned.h). Also, people might expect strerror() to refer to the system version, and perhaps make wrong assumptions about it? I'm not sure which is the best alternative, it's a tricky issue :( ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v2 0/2] Make oid_to_hex() thread-safe 2020-06-25 20:32 ` [PATCH 0/2] Make oid_to_hex() thread-safe Matheus Tavares ` (2 preceding siblings ...) 2020-06-26 8:22 ` [PATCH 0/2] Make oid_to_hex() thread-safe Christian Couder @ 2020-06-26 21:54 ` Matheus Tavares 2020-06-26 21:54 ` [PATCH v2 1/2] compat/win32/pthread: add pthread_once() Matheus Tavares 2020-06-26 21:54 ` [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe Matheus Tavares 3 siblings, 2 replies; 25+ messages in thread From: Matheus Tavares @ 2020-06-26 21:54 UTC (permalink / raw) To: git Cc: sandals, j6t, jonathantanmy, peff, Johannes.Schindelin, christian.couder Some thread-unsafe functions of our codebase appear very down in the call stack, which can be hard to notice (or avoid). Thus they are sometimes used in threaded code unsafely. In this series we add pthread_once() to compat/win32/ and use it in conjunction with pthread_key to make a subset of the said functions thread-safe. Changes since v1: Patch 1: - Fixed style problems. - Wrote a more descriptive commit message, with motivation and info about where the code was ported from (including commit, license and modifications made). - Corrected usage of InitOnceComplete() in pthread_once(). It was previously being called unconditionally, which lead to errors in subsequent calls to pthread_once(), after the initialization had already succeded. Patch 2: - Used the correct pthread_[get|set]specific() functions in hex.c, as Brian pointed out. - Imported thread_utils.h to hex.c (providing the HAVE_THREADS macro) - Added Fredrik's S-o-B - Fixed spelling error - Added the static identifier to init_hexbuf_array_key() in hex.c - Provided an example in the commit message where oid_to_hex() was being used unsafelly. Matheus Tavares (2): compat/win32/pthread: add pthread_once() hex: make hash_to_hex_algop() and friends thread-safe compat/win32/pthread.c | 18 +++++++++++++++++ compat/win32/pthread.h | 5 +++++ hex.c | 46 ++++++++++++++++++++++++++++++++++++++---- thread-utils.c | 11 ++++++++++ thread-utils.h | 7 +++++++ 5 files changed, 83 insertions(+), 4 deletions(-) Range-diff against v1: 1: e5a10d3f21 ! 1: 783fcddf8d compat/win32/pthread: add pthread_once() @@ Metadata ## Commit message ## compat/win32/pthread: add pthread_once() + Add pthread_once() emulation for Windows. This function provides an easy + way to do thread-safe one-time initializations in multithreaded code. It + will be used in the next commit to make hash_to_hex_algop() + thread-safe. + + The pthread_once() implementation added comes from libav + (https://git.libav.org/?p=libav.git), commit b22693b ("w32pthreads: Add + pthread_once emulation", 2015-10-07). The code is licensed under + LGPLv2.1 which is compatible with GPLv2. Only the section for support on + Windows Vista+ has been ported, as that's the minimum version required + to build Git for Windows. Also, the code was modified to (1) check and + return error codes; and (2) do not call InitOnceComplete() again after a + successful initialization, as that results in an error. + Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> ## compat/win32/pthread.c ## @@ compat/win32/pthread.c: pthread_t pthread_self(void) + BOOL pending = FALSE; + int ret = 0; + -+ if(!InitOnceBeginInitialize(once_control, 0, &pending, NULL)) { ++ if (!InitOnceBeginInitialize(once_control, 0, &pending, NULL)) { + ret = err_win_to_posix(GetLastError()); -+ goto out; -+ } -+ -+ if (pending) ++ } else if (pending) { + init_routine(); ++ if (!InitOnceComplete(once_control, 0, NULL)) ++ ret = err_win_to_posix(GetLastError()); ++ } + -+ if(!InitOnceComplete(once_control, 0, NULL)) -+ ret = err_win_to_posix(GetLastError()); -+ -+out: + /* POSIX doesn't allow pthread_once() to return EINTR */ + return ret == EINTR ? EIO : ret; +} @@ thread-utils.h: int dummy_pthread_join(pthread_t pthread, void **retval); int dummy_pthread_init(void *); +#define PTHREAD_ONCE_INIT 0 ++#define pthread_once(once, routine) nothreads_pthread_once(once, routine) ++ +int nothreads_pthread_once(pthread_once_t *once_control, + void (*init_routine)(void)); -+#define pthread_once(once, routine) nothreads_pthread_once(once, routine) + #endif 2: 0104cd9c76 ! 2: b47445fa1c hex: make hash_to_hex_algop() and friends thread-safe @@ Commit message in threaded code, but they are sometimes too deep in the call stack to be noticed or even avoided. - For example, we can take a look at the number of oid_to_hex() calls, - which calls hash_to_hex_algop(): + grep.c:grep_source_load_oid(), for example, uses the thread-unsafe + oid_to_hex() (on errors) despite being called in threaded code. And + oid_to_hex() -- which calls hash_to_hex_algop() -- is used in many other + places, as well: $ git grep 'oid_to_hex(' | wc -l 818 - Although these functions don't seem to be causing problems out there for - now (at least not reported), making them thread-safe makes the codebase - more robust against race conditions. We can easily do that replicating - the static buffer in each thread's local storage. + Although hash_to_hex_algop() and its wrappers don't seem to be causing + problems out there for now (at least not reported), making them + thread-safe makes the codebase more robust against race conditions. We + can easily do that by replicating the static buffer in each thread's + local storage. Original-patch-by: Fredrik Kuivinen <frekui@gmail.com> + Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> ## hex.c ## +@@ + #include "cache.h" ++#include "thread-utils.h" + + const signed char hexval_table[256] = { + -1, -1, -1, -1, -1, -1, -1, -1, /* 00-07 */ @@ hex.c: char *oid_to_hex_r(char *buffer, const struct object_id *oid) return hash_to_hex_algop_r(buffer, oid->hash, the_hash_algo); } @@ hex.c: char *oid_to_hex_r(char *buffer, const struct object_id *oid) +static pthread_key_t hexbuf_array_key; +static pthread_once_t hexbuf_array_once = PTHREAD_ONCE_INIT; + -+void init_hexbuf_array_key(void) ++static void init_hexbuf_array_key(void) +{ + if (pthread_key_create(&hexbuf_array_key, free)) + die(_("failed to initialize threads' key for hash to hex conversion")); @@ hex.c: char *oid_to_hex_r(char *buffer, const struct object_id *oid) + if (pthread_once(&hexbuf_array_once, init_hexbuf_array_key)) + die(_("failed to initialize threads' key for hash to hex conversion")); + -+ value = pthread_key_getspecific(&hexbuf_array_key); ++ value = pthread_getspecific(hexbuf_array_key); + if (value) { + ha = (struct hexbuf_array *) value; + } else { + ha = xmalloc(sizeof(*ha)); -+ if (pthread_key_setspecific(&hexbuf_array_key, (void *)ha)) ++ if (pthread_setspecific(hexbuf_array_key, (void *)ha)) + die(_("failed to set thread buffer for hash to hex conversion")); + } +#else -- 2.26.2 ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v2 1/2] compat/win32/pthread: add pthread_once() 2020-06-26 21:54 ` [PATCH v2 " Matheus Tavares @ 2020-06-26 21:54 ` Matheus Tavares 2020-06-26 21:54 ` [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe Matheus Tavares 1 sibling, 0 replies; 25+ messages in thread From: Matheus Tavares @ 2020-06-26 21:54 UTC (permalink / raw) To: git Cc: sandals, j6t, jonathantanmy, peff, Johannes.Schindelin, christian.couder Add pthread_once() emulation for Windows. This function provides an easy way to do thread-safe one-time initializations in multithreaded code. It will be used in the next commit to make hash_to_hex_algop() thread-safe. The pthread_once() implementation added comes from libav (https://git.libav.org/?p=libav.git), commit b22693b ("w32pthreads: Add pthread_once emulation", 2015-10-07). The code is licensed under LGPLv2.1 which is compatible with GPLv2. Only the section for support on Windows Vista+ has been ported, as that's the minimum version required to build Git for Windows. Also, the code was modified to (1) check and return error codes; and (2) do not call InitOnceComplete() again after a successful initialization, as that results in an error. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- compat/win32/pthread.c | 18 ++++++++++++++++++ compat/win32/pthread.h | 5 +++++ thread-utils.c | 11 +++++++++++ thread-utils.h | 7 +++++++ 4 files changed, 41 insertions(+) diff --git a/compat/win32/pthread.c b/compat/win32/pthread.c index 2e7eead42c..cb32f8c504 100644 --- a/compat/win32/pthread.c +++ b/compat/win32/pthread.c @@ -56,3 +56,21 @@ pthread_t pthread_self(void) t.tid = GetCurrentThreadId(); return t; } + +/* Adapted from libav's compat/w32pthreads.h. */ +int pthread_once(pthread_once_t *once_control, void (*init_routine)(void)) +{ + BOOL pending = FALSE; + int ret = 0; + + if (!InitOnceBeginInitialize(once_control, 0, &pending, NULL)) { + ret = err_win_to_posix(GetLastError()); + } else if (pending) { + init_routine(); + if (!InitOnceComplete(once_control, 0, NULL)) + ret = err_win_to_posix(GetLastError()); + } + + /* POSIX doesn't allow pthread_once() to return EINTR */ + return ret == EINTR ? EIO : ret; +} diff --git a/compat/win32/pthread.h b/compat/win32/pthread.h index 737983d00b..c50f1e89c7 100644 --- a/compat/win32/pthread.h +++ b/compat/win32/pthread.h @@ -40,6 +40,11 @@ typedef int pthread_mutexattr_t; #define pthread_cond_signal WakeConditionVariable #define pthread_cond_broadcast WakeAllConditionVariable +#define pthread_once_t INIT_ONCE + +#define PTHREAD_ONCE_INIT INIT_ONCE_STATIC_INIT +int pthread_once(pthread_once_t *once_control, void (*init_routine)(void)); + /* * Simple thread creation implementation using pthread API */ diff --git a/thread-utils.c b/thread-utils.c index 5329845691..937deb3f2e 100644 --- a/thread-utils.c +++ b/thread-utils.c @@ -122,4 +122,15 @@ int dummy_pthread_join(pthread_t pthread, void **retval) return ENOSYS; } +int nothreads_pthread_once(pthread_once_t *once_control, + void (*init_routine)(void)) +{ + if (*once_control == 1) + return 0; + + init_routine(); + *once_control = 1; + return 0; +} + #endif diff --git a/thread-utils.h b/thread-utils.h index 4961487ed9..8f9786217a 100644 --- a/thread-utils.h +++ b/thread-utils.h @@ -19,6 +19,7 @@ #define pthread_mutex_t int #define pthread_cond_t int #define pthread_key_t int +#define pthread_once_t int #define pthread_mutex_init(mutex, attr) dummy_pthread_init(mutex) #define pthread_mutex_lock(mutex) @@ -48,6 +49,12 @@ int dummy_pthread_join(pthread_t pthread, void **retval); int dummy_pthread_init(void *); +#define PTHREAD_ONCE_INIT 0 +#define pthread_once(once, routine) nothreads_pthread_once(once, routine) + +int nothreads_pthread_once(pthread_once_t *once_control, + void (*init_routine)(void)); + #endif int online_cpus(void); -- 2.26.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe 2020-06-26 21:54 ` [PATCH v2 " Matheus Tavares 2020-06-26 21:54 ` [PATCH v2 1/2] compat/win32/pthread: add pthread_once() Matheus Tavares @ 2020-06-26 21:54 ` Matheus Tavares 2020-06-29 15:11 ` Johannes Schindelin 2020-07-16 11:29 ` Johannes Schindelin 1 sibling, 2 replies; 25+ messages in thread From: Matheus Tavares @ 2020-06-26 21:54 UTC (permalink / raw) To: git Cc: sandals, j6t, jonathantanmy, peff, Johannes.Schindelin, christian.couder, Fredrik Kuivinen hash_to_hex_algop() returns a static buffer, relieving callers from the responsibility of freeing memory after use. But the current implementation uses the same static data for all threads and, thus, is not thread-safe. We could avoid using this function and its wrappers in threaded code, but they are sometimes too deep in the call stack to be noticed or even avoided. grep.c:grep_source_load_oid(), for example, uses the thread-unsafe oid_to_hex() (on errors) despite being called in threaded code. And oid_to_hex() -- which calls hash_to_hex_algop() -- is used in many other places, as well: $ git grep 'oid_to_hex(' | wc -l 818 Although hash_to_hex_algop() and its wrappers don't seem to be causing problems out there for now (at least not reported), making them thread-safe makes the codebase more robust against race conditions. We can easily do that by replicating the static buffer in each thread's local storage. Original-patch-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> --- hex.c | 46 ++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 42 insertions(+), 4 deletions(-) diff --git a/hex.c b/hex.c index da51e64929..4f2f163d5e 100644 --- a/hex.c +++ b/hex.c @@ -1,4 +1,5 @@ #include "cache.h" +#include "thread-utils.h" const signed char hexval_table[256] = { -1, -1, -1, -1, -1, -1, -1, -1, /* 00-07 */ @@ -136,12 +137,49 @@ char *oid_to_hex_r(char *buffer, const struct object_id *oid) return hash_to_hex_algop_r(buffer, oid->hash, the_hash_algo); } +struct hexbuf_array { + int idx; + char bufs[4][GIT_MAX_HEXSZ + 1]; +}; + +#ifdef HAVE_THREADS +static pthread_key_t hexbuf_array_key; +static pthread_once_t hexbuf_array_once = PTHREAD_ONCE_INIT; + +static void init_hexbuf_array_key(void) +{ + if (pthread_key_create(&hexbuf_array_key, free)) + die(_("failed to initialize threads' key for hash to hex conversion")); +} + +#else +static struct hexbuf_array default_hexbuf_array; +#endif + char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *algop) { - static int bufno; - static char hexbuffer[4][GIT_MAX_HEXSZ + 1]; - bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer); - return hash_to_hex_algop_r(hexbuffer[bufno], hash, algop); + struct hexbuf_array *ha; + +#ifdef HAVE_THREADS + void *value; + + if (pthread_once(&hexbuf_array_once, init_hexbuf_array_key)) + die(_("failed to initialize threads' key for hash to hex conversion")); + + value = pthread_getspecific(hexbuf_array_key); + if (value) { + ha = (struct hexbuf_array *) value; + } else { + ha = xmalloc(sizeof(*ha)); + if (pthread_setspecific(hexbuf_array_key, (void *)ha)) + die(_("failed to set thread buffer for hash to hex conversion")); + } +#else + ha = &default_hexbuf_array; +#endif + + ha->idx = (ha->idx + 1) % ARRAY_SIZE(ha->bufs); + return hash_to_hex_algop_r(ha->bufs[ha->idx], hash, algop); } char *hash_to_hex(const unsigned char *hash) -- 2.26.2 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe 2020-06-26 21:54 ` [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe Matheus Tavares @ 2020-06-29 15:11 ` Johannes Schindelin 2020-06-30 20:37 ` Matheus Tavares Bernardino 2020-07-16 11:29 ` Johannes Schindelin 1 sibling, 1 reply; 25+ messages in thread From: Johannes Schindelin @ 2020-06-29 15:11 UTC (permalink / raw) To: Matheus Tavares Cc: git, sandals, j6t, jonathantanmy, peff, christian.couder, Fredrik Kuivinen Hi Matheus, I am fine with the Windows changes (although I have to admit that I did not find time to test things yet). There is one problem in that I do not necessarily know that the memory is released correctly when threads end; You will notice that the `pthread_key_create()` shim in `compat/win32/pthread.h` does not use the `destructor` parameter at all. The documentation at https://docs.microsoft.com/en-us/windows/win32/procthread/using-thread-local-storage is also not terribly clear _how_ the memory is released that was assigned via `TlsSetValue()`. I notice that the example uses `LocalAlloc()`, but we override `malloc()` via the `compat/nedmalloc/` functions, so that should cause problems unless I am wrong. Maybe there is an expert reading this who could jump in and help understand the ramifications? A couple more things: On Fri, 26 Jun 2020, Matheus Tavares wrote: > hash_to_hex_algop() returns a static buffer, relieving callers from the > responsibility of freeing memory after use. But the current > implementation uses the same static data for all threads and, thus, is > not thread-safe. We could avoid using this function and its wrappers > in threaded code, but they are sometimes too deep in the call stack to > be noticed or even avoided. > > grep.c:grep_source_load_oid(), for example, uses the thread-unsafe > oid_to_hex() (on errors) despite being called in threaded code. And > oid_to_hex() -- which calls hash_to_hex_algop() -- is used in many other > places, as well: > > $ git grep 'oid_to_hex(' | wc -l > 818 > > Although hash_to_hex_algop() and its wrappers don't seem to be causing > problems out there for now (at least not reported), making them > thread-safe makes the codebase more robust against race conditions. We > can easily do that by replicating the static buffer in each thread's > local storage. > > Original-patch-by: Fredrik Kuivinen <frekui@gmail.com> > Signed-off-by: Fredrik Kuivinen <frekui@gmail.com> > Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> > --- > hex.c | 46 ++++++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 42 insertions(+), 4 deletions(-) > > diff --git a/hex.c b/hex.c > index da51e64929..4f2f163d5e 100644 > --- a/hex.c > +++ b/hex.c > @@ -1,4 +1,5 @@ > #include "cache.h" > +#include "thread-utils.h" > > const signed char hexval_table[256] = { > -1, -1, -1, -1, -1, -1, -1, -1, /* 00-07 */ > @@ -136,12 +137,49 @@ char *oid_to_hex_r(char *buffer, const struct object_id *oid) > return hash_to_hex_algop_r(buffer, oid->hash, the_hash_algo); > } > > +struct hexbuf_array { > + int idx; Is there a specific reason why you renamed `bufno` to `idx`? If not, I'd rather keep the old name. > + char bufs[4][GIT_MAX_HEXSZ + 1]; > +}; > + > +#ifdef HAVE_THREADS > +static pthread_key_t hexbuf_array_key; > +static pthread_once_t hexbuf_array_once = PTHREAD_ONCE_INIT; > + > +static void init_hexbuf_array_key(void) > +{ > + if (pthread_key_create(&hexbuf_array_key, free)) > + die(_("failed to initialize threads' key for hash to hex conversion")); > +} > + > +#else > +static struct hexbuf_array default_hexbuf_array; > +#endif > + > char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *algop) > { > - static int bufno; > - static char hexbuffer[4][GIT_MAX_HEXSZ + 1]; > - bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer); > - return hash_to_hex_algop_r(hexbuffer[bufno], hash, algop); > + struct hexbuf_array *ha; > + > +#ifdef HAVE_THREADS > + void *value; > + > + if (pthread_once(&hexbuf_array_once, init_hexbuf_array_key)) > + die(_("failed to initialize threads' key for hash to hex conversion")); > + > + value = pthread_getspecific(hexbuf_array_key); > + if (value) { > + ha = (struct hexbuf_array *) value; > + } else { > + ha = xmalloc(sizeof(*ha)); > + if (pthread_setspecific(hexbuf_array_key, (void *)ha)) > + die(_("failed to set thread buffer for hash to hex conversion")); > + } > +#else > + ha = &default_hexbuf_array; > +#endif This introduces two ugly `#ifdef HAVE_THREADS` constructs which are problematic because they are the most likely places to introduce compile errors. I wonder whether you considered introducing a function (and probably a macro) that transparently gives you a thread-specific instance of a given data type? The caller would look something like struct hexbuf_array *hex_array; GET_THREADSPECIFIC(hex_array); where the macro would look somewhat like this: #define GET_THREADSPECIFIC(var) \ if (get_thread_specific(&var, sizeof(var)) < 0) die(_("Failed to get thread-specific %s"), #var); and the function would allocate and assign the variable. I guess this scheme won't work, though, as `pthread_once()` does not take a callback parameter, right? And we would need that parameter to be able to create the `pthread_key_t`. Hmm. Ciao, Dscho > + > + ha->idx = (ha->idx + 1) % ARRAY_SIZE(ha->bufs); > + return hash_to_hex_algop_r(ha->bufs[ha->idx], hash, algop); > } > > char *hash_to_hex(const unsigned char *hash) > -- > 2.26.2 > > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe 2020-06-29 15:11 ` Johannes Schindelin @ 2020-06-30 20:37 ` Matheus Tavares Bernardino 2020-07-01 11:32 ` Johannes Schindelin 0 siblings, 1 reply; 25+ messages in thread From: Matheus Tavares Bernardino @ 2020-06-30 20:37 UTC (permalink / raw) To: Johannes Schindelin Cc: git, brian m . carlson, Johannes Sixt, Jonathan Tan, Jeff King, Christian Couder, Fredrik Kuivinen On Tue, Jun 30, 2020 at 12:01 PM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > Hi Matheus, > > I am fine with the Windows changes (although I have to admit that I did > not find time to test things yet). > > There is one problem in that I do not necessarily know that the memory is > released correctly when threads end; You will notice that the > `pthread_key_create()` shim in `compat/win32/pthread.h` does not use the > `destructor` parameter at all. The documentation at > > https://docs.microsoft.com/en-us/windows/win32/procthread/using-thread-local-storage > > is also not terribly clear _how_ the memory is released that was assigned > via `TlsSetValue()`. Yes, I wasn't sure about that either... It would be great if someone familiar with TLS memory on Windows could help us with this. > A couple more things: > > On Fri, 26 Jun 2020, Matheus Tavares wrote: > > [...] > > +struct hexbuf_array { > > + int idx; > > Is there a specific reason why you renamed `bufno` to `idx`? If not, I'd > rather keep the old name. OK, I'll change it back in v3. > > + char bufs[4][GIT_MAX_HEXSZ + 1]; > > +}; > > + > > +#ifdef HAVE_THREADS > > +static pthread_key_t hexbuf_array_key; > > +static pthread_once_t hexbuf_array_once = PTHREAD_ONCE_INIT; > > + > > +static void init_hexbuf_array_key(void) > > +{ > > + if (pthread_key_create(&hexbuf_array_key, free)) > > + die(_("failed to initialize threads' key for hash to hex conversion")); > > +} > > + > > +#else > > +static struct hexbuf_array default_hexbuf_array; > > +#endif > > + > > char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *algop) > > { > > - static int bufno; > > - static char hexbuffer[4][GIT_MAX_HEXSZ + 1]; > > - bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer); > > - return hash_to_hex_algop_r(hexbuffer[bufno], hash, algop); > > + struct hexbuf_array *ha; > > + > > +#ifdef HAVE_THREADS > > + void *value; > > + > > + if (pthread_once(&hexbuf_array_once, init_hexbuf_array_key)) > > + die(_("failed to initialize threads' key for hash to hex conversion")); > > + > > + value = pthread_getspecific(hexbuf_array_key); > > + if (value) { > > + ha = (struct hexbuf_array *) value; > > + } else { > > + ha = xmalloc(sizeof(*ha)); > > + if (pthread_setspecific(hexbuf_array_key, (void *)ha)) > > + die(_("failed to set thread buffer for hash to hex conversion")); > > + } > > +#else > > + ha = &default_hexbuf_array; > > +#endif > > This introduces two ugly `#ifdef HAVE_THREADS` constructs which are > problematic because they are the most likely places to introduce compile > errors. > > I wonder whether you considered introducing a function (and probably a > macro) that transparently gives you a thread-specific instance of a given > data type? The caller would look something like > > struct hexbuf_array *hex_array; > > GET_THREADSPECIFIC(hex_array); > > where the macro would look somewhat like this: > > #define GET_THREADSPECIFIC(var) \ > if (get_thread_specific(&var, sizeof(var)) < 0) > die(_("Failed to get thread-specific %s"), #var); > > and the function would allocate and assign the variable. Hmm, we would need two macros, wouldn't we? GET_THREADSPECIFIC(var) and a DECLARE_THREADSPECIFIC(var, destructor), to declare the pthread_once_t and pthread_key_t variables, as well as define a initialization function for the key (i.e. the callback to pthread_once()). Or we could provide these declarations ourselves, but then we would need the "ifdef HAVE_THREADS" guard to avoid calling pthread_key_create() when there is no multithreading. I think that would work, yes. Alternatively, I think we could adjust the dummy pthread_key_* functions in thread-utils.h to emulate the real ones when HAVE_THREADS == false. Then we could eliminate the `#ifdef HAVE_THREADS` guards and use the same code for both cases here (and everywhere else pthread_key is used). I haven't thought about it carefully enough yet, but I think this *might* be as simple as adding the following defines in the "#ifdef NO_PTHREADS" section of thread-utils.h: #define pthread_key_t void * /* * The destructor is not used in this case as the main thread will only * exit when the program terminates. */ #define pthread_key_create(key_ptr, unused) return_0((*key_ptr) = NULL) #define pthread_setspecific(key, value) return_0((key) = (value)) #define pthread_getspecific(key) (key) #define pthread_key_delete(key) return_0(NULL) static inline int return_0(void *unused) { return 0; } That's the general idea, but we might as well define a `struct dummy_pthread_key_t` instead of defining the key directly as `void *` (and have functions instead of macros). This way we could store, e.g., an "initialized" flag that could be used to return an error code on double-initializations. What do you think? ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe 2020-06-30 20:37 ` Matheus Tavares Bernardino @ 2020-07-01 11:32 ` Johannes Schindelin 0 siblings, 0 replies; 25+ messages in thread From: Johannes Schindelin @ 2020-07-01 11:32 UTC (permalink / raw) To: Matheus Tavares Bernardino Cc: git, brian m . carlson, Johannes Sixt, Jonathan Tan, Jeff King, Christian Couder, Fredrik Kuivinen Hi Matheus, On Tue, 30 Jun 2020, Matheus Tavares Bernardino wrote: > On Tue, Jun 30, 2020 at 12:01 PM Johannes Schindelin > <Johannes.Schindelin@gmx.de> wrote: > > > > Hi Matheus, > > > > I am fine with the Windows changes (although I have to admit that I did > > not find time to test things yet). > > > > There is one problem in that I do not necessarily know that the memory is > > released correctly when threads end; You will notice that the > > `pthread_key_create()` shim in `compat/win32/pthread.h` does not use the > > `destructor` parameter at all. The documentation at > > > > https://docs.microsoft.com/en-us/windows/win32/procthread/using-thread-local-storage > > > > is also not terribly clear _how_ the memory is released that was assigned > > via `TlsSetValue()`. > > Yes, I wasn't sure about that either... It would be great if someone > familiar with TLS memory on Windows could help us with this. From reading https://docs.microsoft.com/en-us/windows/win32/procthread/using-thread-local-storage, I get the impression that it is the duty of the thread function to take care of releasing the memory. Looking at our code base, I do not see any obvious single point where we could take care of that. And it does not look like there is a function you can register to be called just before threads shut down. Hmm. A very involved solution would be to override our `pthread_create()` emulation to override the function being called, passing as function parameter a pointer to a struct containing the originally-specified thread function and parameter, and the overriding function would then call that function, save its return value, take care of releasing the memory, and return the saved return value. Likewise, our `pthread_setspecific()` shim in `compat/win32/pthread.h` would need to learn to append the passed pointer to a list, and we should probably also edit the `pthread_key_create()` shim to record the function to call when releasing the memory at the end of a thread. This strikes me as pretty ugly and complex. Yet when I read the LGPLv2'ed code of https://sourceware.org/pthreads-win32/ that is exactly what they use. Since their source code is still in CVS, and their anonymous CVS viewer no longer works, I had to find a copy to point to: https://github.com/jingyu/pthreads-w32/blob/master/pthread_key_create.c So we could now start to require a non-minimal pthreads emulation on Windows. I am somewhat opposed to that idea. Now I wonder just how involved it _really_ would be to convert all of those `oid_to_hex()` calls into what is effectively an `_r()` variant, where you _have_ to pass a buffer to store the result. That would make it a ton easier to keep Git portable. I do understand that 800+ hits for `oid_to_hex()` make this somewhat of a pain to implement... Maybe there is a way to convince Coccinelle to do all that work for us? > > A couple more things: > > > > On Fri, 26 Jun 2020, Matheus Tavares wrote: > > > > [...] > > > +struct hexbuf_array { > > > + int idx; > > > > Is there a specific reason why you renamed `bufno` to `idx`? If not, I'd > > rather keep the old name. > > OK, I'll change it back in v3. > > > > + char bufs[4][GIT_MAX_HEXSZ + 1]; > > > +}; > > > + > > > +#ifdef HAVE_THREADS > > > +static pthread_key_t hexbuf_array_key; > > > +static pthread_once_t hexbuf_array_once = PTHREAD_ONCE_INIT; > > > + > > > +static void init_hexbuf_array_key(void) > > > +{ > > > + if (pthread_key_create(&hexbuf_array_key, free)) > > > + die(_("failed to initialize threads' key for hash to hex conversion")); > > > +} > > > + > > > +#else > > > +static struct hexbuf_array default_hexbuf_array; > > > +#endif > > > + > > > char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *algop) > > > { > > > - static int bufno; > > > - static char hexbuffer[4][GIT_MAX_HEXSZ + 1]; > > > - bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer); > > > - return hash_to_hex_algop_r(hexbuffer[bufno], hash, algop); > > > + struct hexbuf_array *ha; > > > + > > > +#ifdef HAVE_THREADS > > > + void *value; > > > + > > > + if (pthread_once(&hexbuf_array_once, init_hexbuf_array_key)) > > > + die(_("failed to initialize threads' key for hash to hex conversion")); > > > + > > > + value = pthread_getspecific(hexbuf_array_key); > > > + if (value) { > > > + ha = (struct hexbuf_array *) value; > > > + } else { > > > + ha = xmalloc(sizeof(*ha)); > > > + if (pthread_setspecific(hexbuf_array_key, (void *)ha)) > > > + die(_("failed to set thread buffer for hash to hex conversion")); > > > + } > > > +#else > > > + ha = &default_hexbuf_array; > > > +#endif > > > > This introduces two ugly `#ifdef HAVE_THREADS` constructs which are > > problematic because they are the most likely places to introduce compile > > errors. > > > > I wonder whether you considered introducing a function (and probably a > > macro) that transparently gives you a thread-specific instance of a given > > data type? The caller would look something like > > > > struct hexbuf_array *hex_array; > > > > GET_THREADSPECIFIC(hex_array); > > > > where the macro would look somewhat like this: > > > > #define GET_THREADSPECIFIC(var) \ > > if (get_thread_specific(&var, sizeof(var)) < 0) > > die(_("Failed to get thread-specific %s"), #var); > > > > and the function would allocate and assign the variable. > > Hmm, we would need two macros, wouldn't we? GET_THREADSPECIFIC(var) > and a DECLARE_THREADSPECIFIC(var, destructor), to declare the > pthread_once_t and pthread_key_t variables, as well as define a > initialization function for the key (i.e. the callback to > pthread_once()). Or we could provide these declarations ourselves, but > then we would need the "ifdef HAVE_THREADS" guard to avoid calling > pthread_key_create() when there is no multithreading. Right. I wanted to avoid the need to call `DECLARE_THREADSPECIFIC()`, in particular because that would have to be _outside_ of any function (because it has to define a function, and nested functions are not really supported in C, at least not widely). Ciao, Dscho > > I think that would work, yes. Alternatively, I think we could adjust > the dummy pthread_key_* functions in thread-utils.h to emulate the > real ones when HAVE_THREADS == false. Then we could eliminate the > `#ifdef HAVE_THREADS` guards and use the same code for both cases here > (and everywhere else pthread_key is used). I haven't thought about it > carefully enough yet, but I think this *might* be as simple as adding > the following defines in the "#ifdef NO_PTHREADS" section of > thread-utils.h: > > #define pthread_key_t void * > /* > * The destructor is not used in this case as the main thread will only > * exit when the program terminates. > */ > #define pthread_key_create(key_ptr, unused) return_0((*key_ptr) = NULL) > #define pthread_setspecific(key, value) return_0((key) = (value)) > #define pthread_getspecific(key) (key) > #define pthread_key_delete(key) return_0(NULL) > > static inline int return_0(void *unused) > { > return 0; > } > > That's the general idea, but we might as well define a `struct > dummy_pthread_key_t` instead of defining the key directly as `void *` > (and have functions instead of macros). This way we could store, e.g., > an "initialized" flag that could be used to return an error code on > double-initializations. > > What do you think? > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe 2020-06-26 21:54 ` [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe Matheus Tavares 2020-06-29 15:11 ` Johannes Schindelin @ 2020-07-16 11:29 ` Johannes Schindelin 2020-07-18 3:09 ` Matheus Tavares Bernardino 2020-07-18 3:52 ` Matheus Tavares 1 sibling, 2 replies; 25+ messages in thread From: Johannes Schindelin @ 2020-07-16 11:29 UTC (permalink / raw) To: Matheus Tavares Cc: git, sandals, j6t, jonathantanmy, peff, christian.couder, Fredrik Kuivinen Hi Matheus, On Fri, 26 Jun 2020, Matheus Tavares wrote: > @@ -136,12 +137,49 @@ char *oid_to_hex_r(char *buffer, const struct object_id *oid) > return hash_to_hex_algop_r(buffer, oid->hash, the_hash_algo); > } > > +struct hexbuf_array { > + int idx; > + char bufs[4][GIT_MAX_HEXSZ + 1]; > +}; > + > +#ifdef HAVE_THREADS > +static pthread_key_t hexbuf_array_key; > +static pthread_once_t hexbuf_array_once = PTHREAD_ONCE_INIT; > + > +static void init_hexbuf_array_key(void) > +{ > + if (pthread_key_create(&hexbuf_array_key, free)) > + die(_("failed to initialize threads' key for hash to hex conversion")); > +} > + > +#else > +static struct hexbuf_array default_hexbuf_array; > +#endif > + > char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *algop) > { > - static int bufno; > - static char hexbuffer[4][GIT_MAX_HEXSZ + 1]; > - bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer); > - return hash_to_hex_algop_r(hexbuffer[bufno], hash, algop); > + struct hexbuf_array *ha; > + > +#ifdef HAVE_THREADS > + void *value; > + > + if (pthread_once(&hexbuf_array_once, init_hexbuf_array_key)) > + die(_("failed to initialize threads' key for hash to hex conversion")); > + > + value = pthread_getspecific(hexbuf_array_key); > + if (value) { > + ha = (struct hexbuf_array *) value; > + } else { > + ha = xmalloc(sizeof(*ha)); I just realized (while trying to debug something independent) that this leaves `ha->idx` uninitialized. So you will need at least this patch to fix a bug that currently haunts `seen`'s CI builds (you can use `--valgrind`, like I did, to identify the problem): -- snip -- diff --git a/hex.c b/hex.c index 4f2f163d5e7..365ba94ab11 100644 --- a/hex.c +++ b/hex.c @@ -171,6 +171,7 @@ char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *a ha = (struct hexbuf_array *) value; } else { ha = xmalloc(sizeof(*ha)); + ha->idx = 0; if (pthread_setspecific(hexbuf_array_key, (void *)ha)) die(_("failed to set thread buffer for hash to hex conversion")); } -- snap -- But as I mentioned before, I would be much more in favor of abandoning this thread-local idea (because it is _still_ fragile, as the same thread could try to make use of more than four hex values in the same `printf()`, for example) and instead using Coccinelle to convert all those `oid_to_hex()` calls to `oid_to_hex_r()` calls. Now, I am _far_ from knowing what I'm doing with Coccinelle, but I think this here semantic patch should get you going: -- snipsnap -- @@ expression E; @@ { ++ char hex[GIT_MAX_HEXSZ + 1]; ... - oid_to_hex(E) + oid_to_hex_r(hex, E) ... } @@ expression E1, E2; @@ { ++ char hex1[GIT_MAX_HEXSZ + 1], hex2[GIT_MAX_HEXSZ + 1]; ... - oid_to_hex(E1) + oid_to_hex_r(hex1, E1) ... - oid_to_hex(E2) + oid_to_hex_r(hex2, E2) ... } ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe 2020-07-16 11:29 ` Johannes Schindelin @ 2020-07-18 3:09 ` Matheus Tavares Bernardino 2020-08-10 14:15 ` Johannes Schindelin 2020-07-18 3:52 ` Matheus Tavares 1 sibling, 1 reply; 25+ messages in thread From: Matheus Tavares Bernardino @ 2020-07-18 3:09 UTC (permalink / raw) To: Johannes Schindelin Cc: git, brian m . carlson, Johannes Sixt, Jonathan Tan, Jeff King, Christian Couder, Fredrik Kuivinen Hi, Dscho On Thu, Jul 16, 2020 at 9:56 AM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > On Fri, 26 Jun 2020, Matheus Tavares wrote: > > > + value = pthread_getspecific(hexbuf_array_key); > > + if (value) { > > + ha = (struct hexbuf_array *) value; > > + } else { > > + ha = xmalloc(sizeof(*ha)); > > I just realized (while trying to debug something independent) that this > leaves `ha->idx` uninitialized. Thanks for catching that! I fixed it in my local branch. > But as I mentioned before, I would be much more in favor of abandoning > this thread-local idea (because it is _still_ fragile, as the same thread > could try to make use of more than four hex values in the same `printf()`, > for example) and instead using Coccinelle to convert all those > `oid_to_hex()` calls to `oid_to_hex_r()` calls. Yeah, I agree that removing oid_to_hex() in favor of oid_to_hex_r() would be great. Unfortunately, I only used Coccinelle for basic things, such as function renaming. And I won't have the time to study it further at the moment :( Therefore, I think I'll ask Junio to drop this series for now, until I or someone else finds some time to work on the semantic patch. Alternatively, if using thread-local storage is still an option, I think I might have solved the problems we had in the previous iteration with memory leaks on Windows. I changed our pthread_key_create() emulation to start using the destructor callback on Windows, through the Fiber Local Storage (FLS) API. As the documentation says [1] "If no fiber switching occurs, FLS acts exactly the same as thread local storage". The advantage over TLS is that FLSAlloc() does take a callback parameter. I also removed the ugly `#ifdef HAVE_THREADS` guards on the last patch, as you suggested, and added some tests for our pthread_key emulation. In case you want to take a look to see if it might be worth pursuing this route, the patches are here: https://github.com/matheustavares/git/commits/safe_oid_to_hex_v3 [1]: https://docs.microsoft.com/en-us/windows/win32/procthread/fibers#fiber-local-storage ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe 2020-07-18 3:09 ` Matheus Tavares Bernardino @ 2020-08-10 14:15 ` Johannes Schindelin 0 siblings, 0 replies; 25+ messages in thread From: Johannes Schindelin @ 2020-08-10 14:15 UTC (permalink / raw) To: Matheus Tavares Bernardino Cc: git, brian m . carlson, Johannes Sixt, Jonathan Tan, Jeff King, Christian Couder, Fredrik Kuivinen Hi Matheus, On Sat, 18 Jul 2020, Matheus Tavares Bernardino wrote: > On Thu, Jul 16, 2020 at 9:56 AM Johannes Schindelin > <Johannes.Schindelin@gmx.de> wrote: > > Alternatively, if using thread-local storage is still an option, I > think I might have solved the problems we had in the previous > iteration with memory leaks on Windows. I changed our > pthread_key_create() emulation to start using the destructor callback > on Windows, through the Fiber Local Storage (FLS) API. As the > documentation says [1] "If no fiber switching occurs, FLS acts exactly > the same as thread local storage". The advantage over TLS is that > FLSAlloc() does take a callback parameter. Okay, but I am still not so enthusiastic. We can fix this in a much simpler way, I believe, than introducing the first thread-local storage user. Let's leave TLS until the time we actually need it? Ciao, Dscho ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe 2020-07-16 11:29 ` Johannes Schindelin 2020-07-18 3:09 ` Matheus Tavares Bernardino @ 2020-07-18 3:52 ` Matheus Tavares 2020-07-26 17:41 ` René Scharfe 1 sibling, 1 reply; 25+ messages in thread From: Matheus Tavares @ 2020-07-18 3:52 UTC (permalink / raw) To: johannes.schindelin Cc: christian.couder, frekui, git, j6t, jonathantanmy, matheus.bernardino, peff, sandals On Thu, Jul 16, 2020 at 9:56 AM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > Now, I am _far_ from knowing what I'm doing with Coccinelle, but I think > this here semantic patch should get you going: > > -- snipsnap -- > @@ > expression E; > @@ > { > ++ char hex[GIT_MAX_HEXSZ + 1]; > ... > - oid_to_hex(E) > + oid_to_hex_r(hex, E) > ... > } > > @@ > expression E1, E2; > @@ > { > ++ char hex1[GIT_MAX_HEXSZ + 1], hex2[GIT_MAX_HEXSZ + 1]; > ... > - oid_to_hex(E1) > + oid_to_hex_r(hex1, E1) > ... > - oid_to_hex(E2) > + oid_to_hex_r(hex2, E2) > ... > } Thanks for this nice example! This already worked very well in some of my tests :) However, with my _very_ limited notion of Coccinelle, I didn't understand why some code snippets didn't match the above rules. For example, the structure below: func(...) { if (cond) func2("%s", oid_to_hex(a)); } I thought it could be because the `if` statement is missing the curly brackets (and it does work if I add the brackets), but to my surprise, adding another oid_to_hex() call in an `else` case also made the code match the rule: func(...) { if (cond) func2("%s", oid_to_hex(a)); else func2("%s", oid_to_hex(a)); } The following snippet also correctly matches, but spatch introduces only one `hex` variable: if (cond) func2("%s, %s", oid_to_hex(a), oid_to_hex(b)); else func2("%s", oid_to_hex(a)); We will probably want our semantic rules to handle an arbitrary number of `oid_to_hex()` calls in each function, but in scenarios like the above one, we only really need 2 hex buffers despite having 3 calls... That might be a little tricky, I guess. Another thing that might be tricky in this conversion is checking for name conflicts with the added `hex` variable (but maybe Coccinelle already has a facilitator mechanism for such cases? IDK). ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe 2020-07-18 3:52 ` Matheus Tavares @ 2020-07-26 17:41 ` René Scharfe 0 siblings, 0 replies; 25+ messages in thread From: René Scharfe @ 2020-07-26 17:41 UTC (permalink / raw) To: Matheus Tavares, johannes.schindelin Cc: christian.couder, frekui, git, j6t, jonathantanmy, peff, sandals Am 18.07.20 um 05:52 schrieb Matheus Tavares: > On Thu, Jul 16, 2020 at 9:56 AM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: >> >> Now, I am _far_ from knowing what I'm doing with Coccinelle, but I think >> this here semantic patch should get you going: >> >> -- snipsnap -- >> @@ >> expression E; >> @@ >> { >> ++ char hex[GIT_MAX_HEXSZ + 1]; >> ... >> - oid_to_hex(E) >> + oid_to_hex_r(hex, E) >> ... >> } >> >> @@ >> expression E1, E2; >> @@ >> { >> ++ char hex1[GIT_MAX_HEXSZ + 1], hex2[GIT_MAX_HEXSZ + 1]; >> ... >> - oid_to_hex(E1) >> + oid_to_hex_r(hex1, E1) >> ... >> - oid_to_hex(E2) >> + oid_to_hex_r(hex2, E2) >> ... >> } > > Thanks for this nice example! This already worked very well in some of > my tests :) > > However, with my _very_ limited notion of Coccinelle, I didn't > understand why some code snippets didn't match the above rules. For > example, the structure below: > > func(...) > { > if (cond) > func2("%s", oid_to_hex(a)); > } > > I thought it could be because the `if` statement is missing the curly > brackets (and it does work if I add the brackets), but to my surprise, > adding another oid_to_hex() call in an `else` case also made the code > match the rule: > > func(...) > { > if (cond) > func2("%s", oid_to_hex(a)); > else > func2("%s", oid_to_hex(a)); > } > > The following snippet also correctly matches, but spatch introduces only > one `hex` variable: > > if (cond) > func2("%s, %s", oid_to_hex(a), oid_to_hex(b)); > else > func2("%s", oid_to_hex(a)); > The produced diff looks like this: + char hex[GIT_MAX_HEXSZ + 1]; if (cond) - func2("%s, %s", oid_to_hex(a), oid_to_hex(b)); + func2("%s, %s", oid_to_hex_r(hex, a), oid_to_hex_r(hex, b)); else - func2("%s", oid_to_hex(a)); + func2("%s", oid_to_hex_r(hex, a)); So the first rule was applied (the one for a single oid_to_hex call), but we need the second one (for two oid_to_hex calls). Using the same hex buffer for two different values won't work. > We will probably want our semantic rules to handle an arbitrary number > of `oid_to_hex()` calls in each function, but in scenarios like the > above one, we only really need 2 hex buffers despite having 3 calls... oid_to_hex() has two interesting features, and we need to make sure they are preserved for callers that use them: It has a ring of four buffers, so you can use it for four different values, and those buffers are static, so its results can be passed around arbitrarily. > That might be a little tricky, I guess. It may be impossible to cover all cases. E.g. callers of oid_to_hex() could return that buffer (like e.g. diff_aligned_abbrev()) or save them in some global variable (like e.g. string_list_append() with a non-DUP string_list). But safe conversion rules can got us surprisingly far. > Another thing that might be tricky in this conversion is checking for > name conflicts with the added `hex` variable (but maybe Coccinelle > already has a facilitator mechanism for such cases? IDK). That's easy. There exists no hex_buffer, yet. We can just claim that name for automatic conversions. It's a bit too long for people to type out (we have a few variables named hexbuffer, though), but not crazy long. So what is safe? Calls of oid_to_hex() in the argument list of many functions is. E.g. puts() and printf() just consume the string that is passed to them, and they don't store it anywhere. That means no static storage is needed for those. string_list_append() is only safe if it's the duplicating variant. Since this is a runtime option of the underlying string_list this is hard to prove in a Coccinelle rule. The time is better spent converting them manually, I think. And function calls with more than four oid_to_hex() calls are broken already, so we only need to have rules for up to four of them. Here is the simplest rule I can come up with for handling up to four oid_to_hex() calls: @@ identifier fn =~ "^(.*printf.*|puts)$"; @@ ( fn(..., - oid_to_hex + oid_to_hex_r ( + hex_buffer, ... ), ..., - oid_to_hex + oid_to_hex_r ( + hex_buffer2, ... ), ..., - oid_to_hex + oid_to_hex_r ( + hex_buffer3, ... ), ..., - oid_to_hex + oid_to_hex_r ( + hex_buffer4, ... ), ... ) | fn(..., - oid_to_hex + oid_to_hex_r ( + hex_buffer, ... ), ..., - oid_to_hex + oid_to_hex_r ( + hex_buffer2, ... ), ..., - oid_to_hex + oid_to_hex_r ( + hex_buffer3, ... ), ... ) | fn(..., - oid_to_hex + oid_to_hex_r ( + hex_buffer, ... ), ..., - oid_to_hex + oid_to_hex_r ( + hex_buffer2, ... ), ... ) | fn(..., - oid_to_hex + oid_to_hex_r ( + hex_buffer, ... ), ... ) ) So the sub-rules are ordered from matching all four possible oid_to_hex() calls down to a single one. Only safe consumers are matched. That regex can and should be extended. Having a list of good consumers has the nice side-effect of speeding up the diff generation. It still takes a few minutes on my system, though. We still need to declare of the local buffers. We can add them on demand to each function with rules like this: @avoid_duplicates@ identifier buf =~ "^hex_buffer[2-4]?$"; @@ - char buf[GIT_MAX_HEXSZ + 1]; @hex_buffer4_on_demand exists@ identifier fn; @@ fn(...) { + char hex_buffer4[GIT_MAX_HEXSZ + 1]; <+... hex_buffer4 ...+> } @hex_buffer3_on_demand exists@ identifier fn; @@ fn(...) { + char hex_buffer3[GIT_MAX_HEXSZ + 1]; <+... hex_buffer3 ...+> } @hex_buffer2_on_demand exists@ identifier fn; @@ fn(...) { + char hex_buffer2[GIT_MAX_HEXSZ + 1]; <+... hex_buffer2 ...+> } @hex_buffer_on_demand exists@ identifier fn; @@ fn(...) { + char hex_buffer[GIT_MAX_HEXSZ + 1]; <+... hex_buffer ...+> } Why remove them first? To avoid duplicates when other convertible oid_to_hex() calls are added later and the semantic patch applied again. Why declare the buffers with function scope? To avoid shadowing. Why are the rules reversed? They add declarations at the top, so hex_buffer to hex_buffer4 end up being declared in that order in the resulting file. Why not use the "fresh identifier" feature of Coccinelle to generate an unused name each time? I don't know how to integrate that into the 4/3/2/1 rule above without having to repeat the list of safe consumers or declaring unused buffers. And reusing buffers between safe consumers is fine. René ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2020-08-10 14:15 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-06-24 22:29 [RFC] Thread safety in some low-level functions Matheus Tavares Bernardino 2020-06-24 22:52 ` Matheus Tavares Bernardino 2020-06-25 1:38 ` brian m. carlson 2020-06-25 20:32 ` [PATCH 0/2] Make oid_to_hex() thread-safe Matheus Tavares 2020-06-25 20:32 ` [PATCH 1/2] compat/win32/pthread: add pthread_once() Matheus Tavares 2020-06-26 5:45 ` Christian Couder 2020-06-26 14:13 ` Matheus Tavares Bernardino 2020-06-25 20:32 ` [PATCH 2/2] hex: make hash_to_hex_algop() and friends thread-safe Matheus Tavares 2020-06-25 23:07 ` brian m. carlson 2020-06-25 23:54 ` Matheus Tavares Bernardino 2020-06-26 0:00 ` Matheus Tavares Bernardino 2020-06-26 6:02 ` Christian Couder 2020-06-26 8:22 ` [PATCH 0/2] Make oid_to_hex() thread-safe Christian Couder 2020-06-26 16:22 ` Matheus Tavares Bernardino 2020-06-26 21:54 ` [PATCH v2 " Matheus Tavares 2020-06-26 21:54 ` [PATCH v2 1/2] compat/win32/pthread: add pthread_once() Matheus Tavares 2020-06-26 21:54 ` [PATCH v2 2/2] hex: make hash_to_hex_algop() and friends thread-safe Matheus Tavares 2020-06-29 15:11 ` Johannes Schindelin 2020-06-30 20:37 ` Matheus Tavares Bernardino 2020-07-01 11:32 ` Johannes Schindelin 2020-07-16 11:29 ` Johannes Schindelin 2020-07-18 3:09 ` Matheus Tavares Bernardino 2020-08-10 14:15 ` Johannes Schindelin 2020-07-18 3:52 ` Matheus Tavares 2020-07-26 17:41 ` René Scharfe
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).