git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, "SZEDER Gábor" <szeder.dev@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: Re: [PATCH v2] refs: implement reference transaction hook
Date: Thu, 4 Jun 2020 09:36:32 +0200	[thread overview]
Message-ID: <20200604073632.GA498923@tanuki.pks.im> (raw)
In-Reply-To: <20200603165142.GA24049@syl.local>

[-- Attachment #1: Type: text/plain, Size: 5142 bytes --]

On Wed, Jun 03, 2020 at 10:51:42AM -0600, Taylor Blau wrote:
> Hi Patrick,
> 
> On Wed, Jun 03, 2020 at 02:27:50PM +0200, Patrick Steinhardt wrote:
> > In order to test the impact on the case where we don't have any
> > "reference-transaction" hook installed in the repository, this commit
> > introduces a new performance test for git-update-refs(1). Run against an
> > empty repository, it produces the following results:
> >
> >   Test                                   HEAD~             HEAD
> >   ------------------------------------------------------------------------------
> >   1400.2: update existing reference      2.05(1.58+0.54)   2.08(1.58+0.57) +1.5%
> >   1400.3: create and destroy reference   1.79(1.38+0.49)   1.82(1.39+0.51) +1.7%
> >
> > So the overhead is around ~1.5%. Given that git-update-refs(1) is a
> > near-direct wrapper around reference transactions, there likely is no
> > other command that is impacted much worse than this.
> 
> This is a serious performance regression that is worth considering. For
> example, a 1.5% slow-down on average in reference transactions would
> cause be enough for me to bisect the regression down to see what
> changed.
> 
> What are ways that this could be avoided? I bet that this would work
> quite well with Emily's work on hooks, where we could check in the
> configuration first whether a hook is even configured.
> 
> Could this be integrated with that? If not, could you cache the result
> of whether or not the hook exists, and/or implement some mechanism to
> say something like, for eg., "only run reference transaction hooks
> core.blah = 1" is true?

I wasn't aware of her work until now, so I'll take a look.

Meanwhile, I toyed around with performance and tried out two different
caching mechanisms:

    - map-cache: `find_hook()` uses a map of hook names mapped to their
      resolved hook path (or `NULL` if none was found). This is a
      generic mechanism working across all hooks, but has some overhead
      in maintaining the map.

    - reftx-cache: `run_transaction_hook()` caches the path returned by
      `find_hook()`. It's got less overhead as it only caches the path,
      but obviously only applies to the reference-transaction hook.

In theory, we could go even further and cache the hook's file
descriptor, executing it via fexecve(3P) whenever it's required, but I
didn't go down that route yet. This would also solve the atomicity
issue, though, if the administrator replaces the reference-transactions
hook halfway through the transaction.

Benchmarking results are mixed, mostly in the sense that I can choose
which run of the benchmarks I take in order to have my own work look
better or worse, as timings vary quite a lot between runs. Which
probably hints at the fact that the benchmarks themselves aren't really
reliable. The issue is that a single git-update-ref(1) run finishes so
quick that it's hard to measure with our benchmarks, but spawning
thousands of them to get something different than 0.0s very much depends
on the operating system and thus fluctuates. On the other hand, if we
only spawn a single git-update-refs and have it perform a few thousand
ref updates, overhead of the hook will not show at all as it is only
executed once per prepare/commit of the transaction.

The following timings are for the case where we execute

    git update-ref refs/heads/update-branch PRE POST &&
    git update-ref refs/heads/update-branch POST PRE

respectively

    git update-ref refs/heads/new POST &&
    git update-ref -d refs/heads/new

a thousand times:

Test                                   master            ref-hook-no-cache       ref-hook-map-cache      ref-hook-reftx-cache
------------------------------------------------------------------------------------------------------------------------------
1400.2: update existing reference      1.96(1.50+0.53)   2.00(1.54+0.53) +2.0%   2.02(1.54+0.55) +3.1%   1.98(1.52+0.52) +1.0%
1400.3: create and destroy reference   1.74(1.33+0.49)   1.77(1.38+0.47) +1.7%   1.77(1.36+0.48) +1.7%   1.76(1.35+0.49) +1.1%

For such a short-lived program like git-update-refs(1), one can see that
the overhead of using a map negatively impacts performance compared to
the no-cache case. But using the reftx-cache roughly cuts the overhead
in half as expected, as we only need to look up the hook once instead of
twice.

And here's the results if we use a single `git update-ref --stdin` with a
thousand reference updates at once:

Test                             master            ref-hook-no-cache       ref-hook-map-cache      ref-hook-reftx-cache
------------------------------------------------------------------------------------------------------------------------
1400.2: git update-ref --stdin   0.21(0.09+0.12)   0.21(0.07+0.14) +0.0%   0.21(0.07+0.13) +0.0%   0.21(0.07+0.13) +0.0%

As expected, there's nothing much to be seen here because there's only a
single transaction for all ref updates and, as a result, at most two
calls to `access(refhook_path, X_OK)`.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2020-06-04  7:35 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-02  8:25 [PATCH] refs: implement reference transaction hooks Patrick Steinhardt
2020-06-02 17:47 ` Junio C Hamano
2020-06-03 11:26   ` Patrick Steinhardt
2020-06-07 20:12     ` SZEDER Gábor
2020-06-08  5:36       ` Patrick Steinhardt
2020-06-02 18:09 ` SZEDER Gábor
2020-06-03  9:46   ` Patrick Steinhardt
2020-06-03 12:27 ` [PATCH v2] refs: implement reference transaction hook Patrick Steinhardt
2020-06-03 16:51   ` Taylor Blau
2020-06-04  7:36     ` Patrick Steinhardt [this message]
2020-06-15 16:46       ` Taylor Blau
2020-06-16  5:45         ` Patrick Steinhardt
2020-06-03 17:44   ` Han-Wen Nienhuys
2020-06-03 18:03     ` Junio C Hamano
2020-06-18 10:27 ` [PATCH v3] " Patrick Steinhardt
2020-06-18 22:23   ` Junio C Hamano
2020-06-19  6:56 ` [PATCH v4] " Patrick Steinhardt
2020-10-23  1:03   ` Jeff King
2020-10-23  3:59     ` Junio C Hamano
2020-10-23 19:57       ` Taylor Blau
2020-10-23 22:07         ` Taylor Blau
2020-10-26  7:43       ` Patrick Steinhardt
2020-10-26 23:52         ` Taylor Blau
2020-10-27  5:37           ` Jeff King
2020-10-28 18:22           ` Patrick Steinhardt
2020-10-23 20:00     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200604073632.GA498923@tanuki.pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).