git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] refs: fix interleaving hook calls with reference-transaction hook
@ 2020-08-07  7:05 Patrick Steinhardt
  2020-08-07  7:58 ` Jeff King
  0 siblings, 1 reply; 9+ messages in thread
From: Patrick Steinhardt @ 2020-08-07  7:05 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 2795 bytes --]

In order to not repeatedly search for the reference-transaction hook in
case it's getting called multiple times, we use a caching mechanism to
only call `find_hook()` once. What was missed though is that the return
value of `find_hook()` actually comes from a static strbuf, which means
it will get overwritten when calling `find_hook()` again. As a result,
we may call the wrong hook with parameters of the reference-transaction
hook.

This scenario was spotted in the wild when executing a git-push(1) with
multiple references, where there are interleaving calls to both the
update and the reference-transaction hook. While initial calls to the
reference-transaction hook work as expected, it will stop working after
the next invocation of the update hook. The result is that we now start
calling the update hook with parameters and stdin of the
reference-transaction hook.

This commit fixes the issue by storing a copy of `find_hook()`'s return
value in the cache.
---
 refs.c                           |  2 +-
 t/t1416-ref-transaction-hooks.sh | 26 ++++++++++++++++++++++++++
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/refs.c b/refs.c
index 2dd851fe81..17e515b288 100644
--- a/refs.c
+++ b/refs.c
@@ -2044,7 +2044,7 @@ static int run_transaction_hook(struct ref_transaction *transaction,
 	if (hook == &hook_not_found)
 		return ret;
 	if (!hook)
-		hook = find_hook("reference-transaction");
+		hook = xstrdup_or_null(find_hook("reference-transaction"));
 	if (!hook) {
 		hook = &hook_not_found;
 		return ret;
diff --git a/t/t1416-ref-transaction-hooks.sh b/t/t1416-ref-transaction-hooks.sh
index da58d867a5..d4d19194bf 100755
--- a/t/t1416-ref-transaction-hooks.sh
+++ b/t/t1416-ref-transaction-hooks.sh
@@ -106,4 +106,30 @@ test_expect_success 'hook gets all queued updates in aborted state' '
 	test_cmp expect actual
 '
 
+test_expect_success 'interleaving hook calls succeed' '
+	test_when_finished "rm -r target-repo.git" &&
+
+	git init --bare target-repo.git &&
+
+	write_script target-repo.git/hooks/reference-transaction <<-\EOF &&
+		echo $0 "$@" >>actual
+	EOF
+
+	write_script target-repo.git/hooks/update <<-\EOF &&
+		echo $0 "$@" >>actual
+	EOF
+
+	cat >expect <<-EOF &&
+		hooks/update refs/tags/PRE 0000000000000000000000000000000000000000 63ac8e7bcdb882293465435909f54a96de17d4f7
+		hooks/reference-transaction prepared
+		hooks/reference-transaction committed
+		hooks/update refs/tags/POST 0000000000000000000000000000000000000000 99d53161c3a0a903b6561b9f6c0c665b3a476401
+		hooks/reference-transaction prepared
+		hooks/reference-transaction committed
+	EOF
+
+	git push ./target-repo.git PRE POST &&
+	test_cmp expect target-repo.git/actual
+'
+
 test_done
-- 
2.28.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] refs: fix interleaving hook calls with reference-transaction hook
  2020-08-07  7:05 [PATCH] refs: fix interleaving hook calls with reference-transaction hook Patrick Steinhardt
@ 2020-08-07  7:58 ` Jeff King
  2020-08-07  9:04   ` Patrick Steinhardt
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2020-08-07  7:58 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On Fri, Aug 07, 2020 at 09:05:58AM +0200, Patrick Steinhardt wrote:

> In order to not repeatedly search for the reference-transaction hook in
> case it's getting called multiple times, we use a caching mechanism to
> only call `find_hook()` once. What was missed though is that the return
> value of `find_hook()` actually comes from a static strbuf, which means
> it will get overwritten when calling `find_hook()` again. As a result,
> we may call the wrong hook with parameters of the reference-transaction
> hook.
> 
> This scenario was spotted in the wild when executing a git-push(1) with
> multiple references, where there are interleaving calls to both the
> update and the reference-transaction hook. While initial calls to the
> reference-transaction hook work as expected, it will stop working after
> the next invocation of the update hook. The result is that we now start
> calling the update hook with parameters and stdin of the
> reference-transaction hook.

That makes sense. I think of push as a single transaction, but that's
only if the caller sends the "atomic" capability. Otherwise get one per
ref.

> diff --git a/refs.c b/refs.c
> index 2dd851fe81..17e515b288 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -2044,7 +2044,7 @@ static int run_transaction_hook(struct ref_transaction *transaction,
>  	if (hook == &hook_not_found)
>  		return ret;
>  	if (!hook)
> -		hook = find_hook("reference-transaction");
> +		hook = xstrdup_or_null(find_hook("reference-transaction"));
>  	if (!hook) {
>  		hook = &hook_not_found;
>  		return ret;

The fix here looks obviously correct, though I have to wonder if the
caching is even worth it. It's saving us an access() call, but we're
about to exec a whole sub-process.

It's perhaps more justifiable when there isn't a hook (we're still just
paying that one access(), but it's proportionally bigger). I kind of
doubt it's measurable, though, since a ref write requires a bunch of
syscalls anyway.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] refs: fix interleaving hook calls with reference-transaction hook
  2020-08-07  7:58 ` Jeff King
@ 2020-08-07  9:04   ` Patrick Steinhardt
  2020-08-07  9:32     ` Jeff King
  0 siblings, 1 reply; 9+ messages in thread
From: Patrick Steinhardt @ 2020-08-07  9:04 UTC (permalink / raw)
  To: Jeff King; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2931 bytes --]

On Fri, Aug 07, 2020 at 03:58:37AM -0400, Jeff King wrote:
> On Fri, Aug 07, 2020 at 09:05:58AM +0200, Patrick Steinhardt wrote:
> 
> > In order to not repeatedly search for the reference-transaction hook in
> > case it's getting called multiple times, we use a caching mechanism to
> > only call `find_hook()` once. What was missed though is that the return
> > value of `find_hook()` actually comes from a static strbuf, which means
> > it will get overwritten when calling `find_hook()` again. As a result,
> > we may call the wrong hook with parameters of the reference-transaction
> > hook.
> > 
> > This scenario was spotted in the wild when executing a git-push(1) with
> > multiple references, where there are interleaving calls to both the
> > update and the reference-transaction hook. While initial calls to the
> > reference-transaction hook work as expected, it will stop working after
> > the next invocation of the update hook. The result is that we now start
> > calling the update hook with parameters and stdin of the
> > reference-transaction hook.
> 
> That makes sense. I think of push as a single transaction, but that's
> only if the caller sends the "atomic" capability. Otherwise get one per
> ref.
> 
> > diff --git a/refs.c b/refs.c
> > index 2dd851fe81..17e515b288 100644
> > --- a/refs.c
> > +++ b/refs.c
> > @@ -2044,7 +2044,7 @@ static int run_transaction_hook(struct ref_transaction *transaction,
> >  	if (hook == &hook_not_found)
> >  		return ret;
> >  	if (!hook)
> > -		hook = find_hook("reference-transaction");
> > +		hook = xstrdup_or_null(find_hook("reference-transaction"));
> >  	if (!hook) {
> >  		hook = &hook_not_found;
> >  		return ret;
> 
> The fix here looks obviously correct, though I have to wonder if the
> caching is even worth it. It's saving us an access() call, but we're
> about to exec a whole sub-process.
> 
> It's perhaps more justifiable when there isn't a hook (we're still just
> paying that one access(), but it's proportionally bigger). I kind of
> doubt it's measurable, though, since a ref write requires a bunch of
> syscalls anyway.

Yeah, this really was done to not have to pay a performance penalty if
updating thousands of refs if no reference-transaction hook exists. E.g.
if doing a non-atomic push of n reference, we'd have n calls to
access(3P). See [1] for reference.

I've just did another quick benchmark without the cache, and it still
consistently shows a non-zero performance hit without it:

Test                         pks-reftx-hook-interleaving   no-cache
--------------------------------------------------------------------------------
1400.2: update-ref           2.82(2.13+0.81)               2.86(2.19+0.78) +1.4%
1400.3: update-ref --stdin   0.22(0.07+0.15)               0.22(0.07+0.15) +0.0%

Patrick

[1]: https://public-inbox.org/git/20200603165142.GA24049@syl.local/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] refs: fix interleaving hook calls with reference-transaction hook
  2020-08-07  9:04   ` Patrick Steinhardt
@ 2020-08-07  9:32     ` Jeff King
  2020-08-07  9:49       ` Patrick Steinhardt
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2020-08-07  9:32 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On Fri, Aug 07, 2020 at 11:04:12AM +0200, Patrick Steinhardt wrote:

> > It's perhaps more justifiable when there isn't a hook (we're still just
> > paying that one access(), but it's proportionally bigger). I kind of
> > doubt it's measurable, though, since a ref write requires a bunch of
> > syscalls anyway.
> 
> Yeah, this really was done to not have to pay a performance penalty if
> updating thousands of refs if no reference-transaction hook exists. E.g.
> if doing a non-atomic push of n reference, we'd have n calls to
> access(3P). See [1] for reference.
> 
> I've just did another quick benchmark without the cache, and it still
> consistently shows a non-zero performance hit without it:
> 
> Test                         pks-reftx-hook-interleaving   no-cache
> --------------------------------------------------------------------------------
> 1400.2: update-ref           2.82(2.13+0.81)               2.86(2.19+0.78) +1.4%
> 1400.3: update-ref --stdin   0.22(0.07+0.15)               0.22(0.07+0.15) +0.0%

I'm skeptical that those results are useful. In the first test, we're
running update-ref 1000 times, so:

  - the cache shouldn't be helping at all, since we only have one ref to
    update (well, I guess once for "prepare" and once for "commit", so
    really it's saving one syscall total per process).

  - I'd expect a lot of noise because we're spending most of our time in
    starting up the process

In the second test, we run 1000 ref operations per update-ref process.
So we should be cutting down on our hook-lookup overhead by a factor of
1000. Yet it shows no improvement.

That implies you're just seeing noise. And indeed, with the patch below
I get:

Test                         HEAD^             HEAD
--------------------------------------------------------------------
1400.2: update-ref           1.93(1.57+0.42)   1.91(1.55+0.42) -1.0%
1400.3: update-ref --stdin   0.07(0.02+0.05)   0.07(0.02+0.05) +0.0%

Running it a second time gets me +0.5%. :)

---
 refs.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/refs.c b/refs.c
index 2dd851fe81..427bf5843c 100644
--- a/refs.c
+++ b/refs.c
@@ -2031,24 +2031,16 @@ int ref_update_reject_duplicates(struct string_list *refnames,
 	return 0;
 }
 
-static const char hook_not_found;
-static const char *hook;
-
 static int run_transaction_hook(struct ref_transaction *transaction,
 				const char *state)
 {
 	struct child_process proc = CHILD_PROCESS_INIT;
 	struct strbuf buf = STRBUF_INIT;
+	const char *hook = find_hook("reference-transaction");
 	int ret = 0, i;
 
-	if (hook == &hook_not_found)
-		return ret;
 	if (!hook)
-		hook = find_hook("reference-transaction");
-	if (!hook) {
-		hook = &hook_not_found;
 		return ret;
-	}
 
 	argv_array_pushl(&proc.args, hook, state, NULL);
 	proc.in = -1;
-- 
2.28.0.520.g10e2ce7e11


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] refs: fix interleaving hook calls with reference-transaction hook
  2020-08-07  9:32     ` Jeff King
@ 2020-08-07  9:49       ` Patrick Steinhardt
  2020-08-07 17:32         ` Junio C Hamano
  2020-08-07 18:21         ` Jeff King
  0 siblings, 2 replies; 9+ messages in thread
From: Patrick Steinhardt @ 2020-08-07  9:49 UTC (permalink / raw)
  To: Jeff King; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2470 bytes --]

On Fri, Aug 07, 2020 at 05:32:39AM -0400, Jeff King wrote:
> On Fri, Aug 07, 2020 at 11:04:12AM +0200, Patrick Steinhardt wrote:
> 
> > > It's perhaps more justifiable when there isn't a hook (we're still just
> > > paying that one access(), but it's proportionally bigger). I kind of
> > > doubt it's measurable, though, since a ref write requires a bunch of
> > > syscalls anyway.
> > 
> > Yeah, this really was done to not have to pay a performance penalty if
> > updating thousands of refs if no reference-transaction hook exists. E.g.
> > if doing a non-atomic push of n reference, we'd have n calls to
> > access(3P). See [1] for reference.
> > 
> > I've just did another quick benchmark without the cache, and it still
> > consistently shows a non-zero performance hit without it:
> > 
> > Test                         pks-reftx-hook-interleaving   no-cache
> > --------------------------------------------------------------------------------
> > 1400.2: update-ref           2.82(2.13+0.81)               2.86(2.19+0.78) +1.4%
> > 1400.3: update-ref --stdin   0.22(0.07+0.15)               0.22(0.07+0.15) +0.0%
> 
> I'm skeptical that those results are useful. In the first test, we're
> running update-ref 1000 times, so:
> 
>   - the cache shouldn't be helping at all, since we only have one ref to
>     update (well, I guess once for "prepare" and once for "commit", so
>     really it's saving one syscall total per process).
> 
>   - I'd expect a lot of noise because we're spending most of our time in
>     starting up the process
> 
> In the second test, we run 1000 ref operations per update-ref process.
> So we should be cutting down on our hook-lookup overhead by a factor of
> 1000. Yet it shows no improvement.
> 
> That implies you're just seeing noise. And indeed, with the patch below
> I get:
> 
> Test                         HEAD^             HEAD
> --------------------------------------------------------------------
> 1400.2: update-ref           1.93(1.57+0.42)   1.91(1.55+0.42) -1.0%
> 1400.3: update-ref --stdin   0.07(0.02+0.05)   0.07(0.02+0.05) +0.0%
> 
> Running it a second time gets me +0.5%. :)

Yeah, it's also been my take that OS-level overhead is probably going to
matter more than those access calls, and I argued such back when I
proposed the hook. So I'm perfectly happy to see this caching mechanism
go.

Should I re-post a v2 with your patch and my test?

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] refs: fix interleaving hook calls with reference-transaction hook
  2020-08-07  9:49       ` Patrick Steinhardt
@ 2020-08-07 17:32         ` Junio C Hamano
  2020-08-07 19:00           ` Jeff King
  2020-08-07 18:21         ` Jeff King
  1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2020-08-07 17:32 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Jeff King, git

Patrick Steinhardt <ps@pks.im> writes:

> On Fri, Aug 07, 2020 at 05:32:39AM -0400, Jeff King wrote:
>
>> That implies you're just seeing noise. And indeed, with the patch below
>> I get:
>> 
>> Test                         HEAD^             HEAD
>> --------------------------------------------------------------------
>> 1400.2: update-ref           1.93(1.57+0.42)   1.91(1.55+0.42) -1.0%
>> 1400.3: update-ref --stdin   0.07(0.02+0.05)   0.07(0.02+0.05) +0.0%
>> 
>> Running it a second time gets me +0.5%. :)
>
> Yeah, it's also been my take that OS-level overhead is probably going to
> matter more than those access calls, and I argued such back when I
> proposed the hook. So I'm perfectly happy to see this caching mechanism
> go.

Is the above about negative cache?  IOW, does the above demonstrate
that one extra access() to make sure there is no hook does not hurt
us anything?  

If so, yes, I am 100% for removing the cache mechanism.

Thanks for driving design decision with numbers.  That's always
pleasant to see.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] refs: fix interleaving hook calls with reference-transaction hook
  2020-08-07  9:49       ` Patrick Steinhardt
  2020-08-07 17:32         ` Junio C Hamano
@ 2020-08-07 18:21         ` Jeff King
  2020-08-07 19:26           ` Junio C Hamano
  1 sibling, 1 reply; 9+ messages in thread
From: Jeff King @ 2020-08-07 18:21 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On Fri, Aug 07, 2020 at 11:49:46AM +0200, Patrick Steinhardt wrote:

> Yeah, it's also been my take that OS-level overhead is probably going to
> matter more than those access calls, and I argued such back when I
> proposed the hook. So I'm perfectly happy to see this caching mechanism
> go.
> 
> Should I re-post a v2 with your patch and my test?

Sure, that would be fine (to be clear, I'd also be OK with your original
patch, too; it was mostly just a curiosity to me).

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] refs: fix interleaving hook calls with reference-transaction hook
  2020-08-07 17:32         ` Junio C Hamano
@ 2020-08-07 19:00           ` Jeff King
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2020-08-07 19:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, git

On Fri, Aug 07, 2020 at 10:32:26AM -0700, Junio C Hamano wrote:

> >> Test                         HEAD^             HEAD
> >> --------------------------------------------------------------------
> >> 1400.2: update-ref           1.93(1.57+0.42)   1.91(1.55+0.42) -1.0%
> >> 1400.3: update-ref --stdin   0.07(0.02+0.05)   0.07(0.02+0.05) +0.0%
> >> 
> >> Running it a second time gets me +0.5%. :)
> >
> > Yeah, it's also been my take that OS-level overhead is probably going to
> > matter more than those access calls, and I argued such back when I
> > proposed the hook. So I'm perfectly happy to see this caching mechanism
> > go.
> 
> Is the above about negative cache?  IOW, does the above demonstrate
> that one extra access() to make sure there is no hook does not hurt
> us anything?

Yes, those numbers are with no cache at all, and without a hook. So they
are measuring the cost of access() only.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] refs: fix interleaving hook calls with reference-transaction hook
  2020-08-07 18:21         ` Jeff King
@ 2020-08-07 19:26           ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2020-08-07 19:26 UTC (permalink / raw)
  To: Jeff King; +Cc: Patrick Steinhardt, git

Jeff King <peff@peff.net> writes:

> On Fri, Aug 07, 2020 at 11:49:46AM +0200, Patrick Steinhardt wrote:
>
>> Yeah, it's also been my take that OS-level overhead is probably going to
>> matter more than those access calls, and I argued such back when I
>> proposed the hook. So I'm perfectly happy to see this caching mechanism
>> go.
>> 
>> Should I re-post a v2 with your patch and my test?
>
> Sure, that would be fine (to be clear, I'd also be OK with your original
> patch, too; it was mostly just a curiosity to me).

Let's queue the correctness fix patch as-is. 

We can and should make the simplification as a separate step,
justified with the performance numbers.

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-08-07 19:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-07  7:05 [PATCH] refs: fix interleaving hook calls with reference-transaction hook Patrick Steinhardt
2020-08-07  7:58 ` Jeff King
2020-08-07  9:04   ` Patrick Steinhardt
2020-08-07  9:32     ` Jeff King
2020-08-07  9:49       ` Patrick Steinhardt
2020-08-07 17:32         ` Junio C Hamano
2020-08-07 19:00           ` Jeff King
2020-08-07 18:21         ` Jeff King
2020-08-07 19:26           ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).