On Mon, Jun 28, 2021 at 09:49:54AM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Jun 28 2021, Patrick Steinhardt wrote:
> 
> > [[PGP Signed Part:Undecided]]
> > We'll the connectivity check logic for git-receive-pack(1) in the
> > following commits to make it perform better. As a preparatory step, add
> > some benchmarks such that we can measure these changes.
> >
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> >  t/perf/p5400-receive-pack.sh | 97 ++++++++++++++++++++++++++++++++++++
> >  1 file changed, 97 insertions(+)
> >  create mode 100755 t/perf/p5400-receive-pack.sh
> >
> > diff --git a/t/perf/p5400-receive-pack.sh b/t/perf/p5400-receive-pack.sh
> > new file mode 100755
> > index 0000000000..a945e014a3
> > --- /dev/null
> > +++ b/t/perf/p5400-receive-pack.sh
> > @@ -0,0 +1,97 @@
> > +#!/bin/sh
> > +
> > +test_description="Tests performance of receive-pack"
> > +
> > +. ./perf-lib.sh
> > +
> > +test_perf_large_repo
> 
> From the runtime I think this just needs test_perf_default_repo, no?
> I.e. we should only have *_large_* for cases where git.git is too small
> to produce meaningful results.
> 
> Part of th problem is that git.git has become larger over time...

I did these tests for 3/3 with git.git first, and results were
significantly different. The performance issues I'm trying to fix with
the connectivity check really only start to show up with largish
repositories.

> > +test_expect_success 'setup' '
> > +	# Create a main branch such that we do not have to rely on any specific
> > +	# branch to exist in the perf repository.
> > +	git switch --force-create main &&
> > +
> > +	# Set up a pre-receive hook such that no refs will ever be changed.
> > +	# This easily allows multiple perf runs, but still exercises
> > +	# server-side reference negotiation and checking for consistency.
> > +	mkdir hooks &&
> > +	write_script hooks/pre-receive <<-EOF &&
> > +		#!/bin/sh
> 
> You don't need the #!/bin/sh here, and it won't be used. write_script()
> adds it (or the wanted shell path).

Makes sense.

> > +		echo "failed in pre-receive hook"
> > +		exit 1
> > +	EOF
> > +	cat >config <<-EOF &&
> > +		[core]
> > +			hooksPath=$(pwd)/hooks
> > +	EOF
> 
> Easier understood IMO as:
> 
>     git config -f config core.hooksPath ...

Yup, will change.

> > +	GIT_CONFIG_GLOBAL="$(pwd)/config" &&
> > +	export GIT_CONFIG_GLOBAL &&
> > +
> > +	git switch --create updated &&
> > +	test_commit --no-tag updated
> > +'
> > +
> > +setup_empty() {
> > +	git init --bare "$2"
> > +}
> 
> I searched ahead for setup_empty, looked unused, but...
> 
> > +setup_clone() {
> > +	git clone --bare --no-local --branch main "$1" "$2"
> > +}
> > +
> > +setup_clone_bitmap() {
> > +	git clone --bare --no-local --branch main "$1" "$2" &&
> > +	git -C "$2" repack -Adb
> > +}
> > +
> > +# Create a reference for each commit in the target repository with extra-refs.
> > +# While this may be an atypical setup, biggish repositories easily end up with
> > +# hundreds of thousands of refs, and this is a good enough approximation.
> > +setup_extrarefs() {
> > +	git clone --bare --no-local --branch main "$1" "$2" &&
> > +	git -C "$2" log --all --format="tformat:create refs/commit/%h %H" |
> > +		git -C "$2" update-ref --stdin
> > +}
> > +
> > +# Create a reference for each commit in the target repository with extra-refs.
> > +# While this may be an atypical setup, biggish repositories easily end up with
> > +# hundreds of thousands of refs, and this is a good enough approximation.
> > +setup_extrarefs_bitmap() {
> > +	git clone --bare --no-local --branch main "$1" "$2" &&
> > +	git -C "$2" log --all --format="tformat:create refs/commit/%h %H" |
> > +		git -C "$2" update-ref --stdin &&
> > +	git -C "$2" repack -Adb
> > +}
> > +
> > +for repo in empty clone clone_bitmap extrarefs extrarefs_bitmap
> > +do
> > +	test_expect_success "$repo setup" '
> 
> > +		rm -rf target.git &&
> > +		setup_$repo "$(pwd)" target.git
> 
> ...here we use it via interpolation.
> 
> I'd find this whole pattern much easier to understand if the setups were
> just a bunch of test_expect_success that created a repo_empty.git,
> repo_extrarefs.git etc. Then this loop would be:
> 
>     for repo in repo*.git ...
> 
> I'd think that would also give you more meaningful perf data, as now the
> OS will churn between the clone & the subsequent push tests, better to
> do all the setup, then all the different perf tests.
> 
> Perhaps there's also a way to re-use this setup across different runs, I
> don't know/can't remember if t/perf has a less transient thing than the
> normal trash directory to use for that.

I originally had code like this, but the issue with first creating all
the repos is that it requires lots of disk space with large repos given
that we'll clone it once per setup. Combined with the fact that I
often run tests in tmpfs, this led to out-of-memory situations quite
fast given that I had 3x6GB repositories plus the seeded packfiles in
RAM.

This is why I've changed the setup to do the setup as we go, to bring
disk usage down to something sane.

Patrick