On 2022-01-07 at 10:55:47, Patrick Steinhardt wrote: > [Resend with the correct In-Reply-To header set to fix threading] > > When fetching packfiles, we write a bunch of lockfiles for the packfiles > we're writing into the repository. In order to not leave behind any > cruft in case we exit or receive a signal, we register both an exit > handler as well as signal handlers for common signals like SIGINT. These > handlers will then unlink the locks and free the data structure tracking > them. We have observed a deadlock in this logic though: > > (gdb) bt > #0 __lll_lock_wait_private () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95 > #1 0x00007f4932bea2cd in _int_free (av=0x7f4932f2eb20 , p=0x3e3e4200, have_lock=0) at malloc.c:3969 > #2 0x00007f4932bee58c in __GI___libc_free (mem=) at malloc.c:2975 > #3 0x0000000000662ab1 in string_list_clear () > #4 0x000000000044f5bc in unlock_pack_on_signal () > #5 > #6 _int_free (av=0x7f4932f2eb20 , p=, have_lock=0) at malloc.c:4024 > #7 0x00007f4932bee58c in __GI___libc_free (mem=) at malloc.c:2975 > #8 0x000000000065afd5 in strbuf_release () > #9 0x000000000066ddb9 in delete_tempfile () > #10 0x0000000000610d0b in files_transaction_cleanup.isra () > #11 0x0000000000611718 in files_transaction_abort () > #12 0x000000000060d2ef in ref_transaction_abort () > #13 0x000000000060d441 in ref_transaction_prepare () > #14 0x000000000060e0b5 in ref_transaction_commit () > #15 0x00000000004511c2 in fetch_and_consume_refs () > #16 0x000000000045279a in cmd_fetch () > #17 0x0000000000407c48 in handle_builtin () > #18 0x0000000000408df2 in cmd_main () > #19 0x00000000004078b5 in main () > > The process was killed with a signal, which caused the signal handler to > kick in and try free the data structures after we have unlinked the > locks. It then deadlocks while calling free(3P). > > The root cause of this is that it is not allowed to call certain > functions in async-signal handlers, as specified by signal-safety(7). > Next to most I/O functions, this list of disallowed functions also > includes memory-handling functions like malloc(3P) and free(3P) because > they may not be reentrant. As a result, if we execute such functions in > the signal handler, then they may operate on inconistent state and fail > in unexpected ways. > > Fix this bug by not calling non-async-signal-safe functions when running > in the signal handler. We're about to re-raise the signal anyway and > will thus exit, so it's not much of a problem to keep the string list of > lockfiles untouched. Note that it's fine though to call unlink(2), so > we'll still clean up the lockfiles correctly. I took a look, and this seems reasonable to me. I know in the non-signal case, we'd want to clean up because it means we can check for leaks, but I don't see the utility of running Git under Valgrind and then sending it a signal, and I think it's just safe to ignore that case. -- brian m. carlson (he/him or they/them) Toronto, Ontario, CA