git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Josh Steadmon <steadmon@google.com>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: js/partial-clone-connectivity-check (was: What's cooking in git.git (Apr 2019, #05; Thu, 25))
Date: Thu, 2 May 2019 17:45:09 -0400	[thread overview]
Message-ID: <20190502214509.GA19188@sigill.intra.peff.net> (raw)
In-Reply-To: <20190502025206.GA25395@sigill.intra.peff.net>

On Wed, May 01, 2019 at 10:52:06PM -0400, Jeff King wrote:

> > > * js/partial-clone-connectivity-check (2019-04-21) 1 commit
> > >   (merged to 'next' on 2019-04-25 at ebd8b4bffd)
> > >  + clone: do faster object check for partial clones
> > > 
> > >  During an initial "git clone --depth=..." partial clone, it is
> > >  pointless to spend cycles for a large portion of the connectivity
> > >  check that enumerates and skips promisor objects (which by
> > >  definition is all objects fetched from the other side).  This has
> > >  been optimized out.
> > > 
> > >  Will merge to 'master'.
> > 
> > Peff asked for a perf test for this [1], but I haven't had time to write one
> > yet. I can do that in a separate patch if you still want to merge this
> > as-is.
> 
> I won't die without one, but it would be nice. It may also be that an
> existing perf test, but I don't think we cover partial clones in t/perf
> at all. Might be worth just a straight-up "git clone --filter=blob:none"
> test.

Here's what I came up with. Note that there's a bug in 'master' right
now which causes perf to produce nonsense results. It's due to my
0baf78e7bc (perf-lib.sh: rely on test-lib.sh for --tee handling,
2019-03-15). I'll fix that separately (the timing below is done with
that commit reverted).

-- >8 --
Subject: [PATCH] t/perf: add perf script for partial clones

We don't cover the partial clone feature at all in t/perf. Let's at
least run a few basic tests so that we'll notice any regressions.

We'll do a no-blob clone, and split it into two parts: the actual object
transfer, and the subsequent checkout (which will of course require
another transfer to get the blobs). That will help us more clearly
assess the performance of each.

There are obviously a lot more possibilities besides just a no-blob
partial clone, but this should serve as a canary that alerts us to any
generic slow-downs (and we can add more tests later for cases that
aren't exercised here).

There are a few non-ideal things here that make this not an entirely
accurate test, but are probably OK for our purposes:

  1. We have to do some extra prep/cleanup work inside the timing tests,
     since they impact the on-disk state and the perf harness may run
     each one multiple times.

     In practice this is probably OK, since these bits should be much
     less expensive than the operations we are measuring.

  2. The clone time is likely to be dominated by the server's object
     enumeration. In the real world, a repo large enough to drive people
     to partial clones is likely to have reachability bitmaps enabled.

     And in the opposite direction, our object transfer is happening at
     the speed of a local pipe, whereas in the real world it would
     bottle-neck on the network.

     So any percentage speedups should be taken with a grain of salt.
     But hopefully any regressions will produce enough of an effect to
     be noticeable.

This script also demonstrates the recent improvement from dfa33a298d
(clone: do faster object check for partial clones, 2019-04-19):

  Test                          dfa33a298d^         dfa33a298d
  -------------------------------------------------------------------------
  5600.2: clone without blobs   18.41(22.72+1.09)   6.83(11.65+0.50) -62.9%
  5600.3: checkout of result    1.82(3.24+0.26)     1.84(3.24+0.26) +1.1%

Signed-off-by: Jeff King <peff@peff.net>
---
The speedup from dfa33a298d was larger than I expected. Doing a full
`rev-list --objects --all` on a non-partial clone is only 3-4s. So how
did we manage to save 11s by removing it? The answer is that the
is_promisor check is super expensive: we have to walk all of those trees
in the partial pack to find which blobs are mentioned, only to then walk
them all again for the actual traversal and say "yep, this is in our
promisor list". Yikes.

Your patch makes all of that go away, but I do think there's room for us
to be more clever here (but that leads us back down the rabbit hole you
explored earlier, so I think it makes sense to take your patch now and
deal with other optimizations separately).

 t/perf/p5600-partial-clone.sh | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)
 create mode 100755 t/perf/p5600-partial-clone.sh

diff --git a/t/perf/p5600-partial-clone.sh b/t/perf/p5600-partial-clone.sh
new file mode 100755
index 0000000000..3e04bd2ae1
--- /dev/null
+++ b/t/perf/p5600-partial-clone.sh
@@ -0,0 +1,26 @@
+#!/bin/sh
+
+test_description='performance of partial clones'
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+test_expect_success 'enable server-side config' '
+	git config uploadpack.allowFilter true &&
+	git config uploadpack.allowAnySHA1InWant true
+'
+
+test_perf 'clone without blobs' '
+	rm -rf bare.git &&
+	git clone --no-local --bare --filter=blob:none . bare.git
+'
+
+test_perf 'checkout of result' '
+	rm -rf worktree &&
+	mkdir -p worktree/.git &&
+	tar -C bare.git -cf - . | tar -C worktree/.git -xf - &&
+	git -C worktree config core.bare false &&
+	git -C worktree checkout -f
+'
+
+test_done
-- 
2.21.0.1314.g224b191707


  parent reply	other threads:[~2019-05-02 21:45 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-25 10:15 What's cooking in git.git (Apr 2019, #05; Thu, 25) Junio C Hamano
2019-04-25 14:50 ` Elijah Newren
2019-04-25 22:16 ` js/partial-clone-connectivity-check (was: What's cooking in git.git (Apr 2019, #05; Thu, 25)) Josh Steadmon
2019-05-02  2:52   ` Jeff King
2019-05-02 16:48     ` Josh Steadmon
2019-05-02 21:45     ` Jeff King [this message]
2019-05-02 22:24       ` Jeff King
2019-05-06 19:16         ` [PATCH] perf-lib.sh: make "./run <revisions>" use the correct gits Ævar Arnfjörð Bjarmason
2019-05-06 20:24           ` Jeff King
2019-05-06 23:23             ` [PATCH v2 0/2] perf-lib.sh: fix 0baf78e7bc regression, refactor & fix bugs Ævar Arnfjörð Bjarmason
2019-05-07  7:06               ` Jeff King
2019-05-07 10:54               ` [PATCH v3 0/6] " Ævar Arnfjörð Bjarmason
2019-05-07 21:36                 ` Jeff King
2019-05-08  1:59                   ` Junio C Hamano
2019-05-07 10:54               ` [PATCH v3 1/6] perf README: correct docs for 3c8f12c96c regression Ævar Arnfjörð Bjarmason
2019-05-07 10:54               ` [PATCH v3 2/6] perf aggregate: remove GIT_TEST_INSTALLED from --codespeed Ævar Arnfjörð Bjarmason
2019-05-07 10:54               ` [PATCH v3 3/6] perf-lib.sh: make "./run <revisions>" use the correct gits Ævar Arnfjörð Bjarmason
2019-05-07 10:54               ` [PATCH v3 4/6] perf-lib.sh: remove GIT_TEST_INSTALLED from perf-lib.sh Ævar Arnfjörð Bjarmason
2019-05-07 10:54               ` [PATCH v3 5/6] perf tests: add "bindir" prefix to git tree test results Ævar Arnfjörð Bjarmason
2019-05-07 10:54               ` [PATCH v3 6/6] perf-lib.sh: forbid the use of GIT_TEST_INSTALLED Ævar Arnfjörð Bjarmason
2019-05-06 23:23             ` [PATCH v2 1/2] perf-lib.sh: make "./run <revisions>" use the correct gits Ævar Arnfjörð Bjarmason
2019-05-07  7:19               ` Jeff King
2019-05-06 23:23             ` [PATCH v2 2/2] perf-lib.sh: remove GIT_TEST_INSTALLED from perf-lib.sh Ævar Arnfjörð Bjarmason
2019-05-07  7:16               ` Jeff King
2019-05-07  8:31                 ` Ævar Arnfjörð Bjarmason
2019-05-07 21:23                   ` Jeff King
2019-04-26  5:05 ` What's cooking in git.git (Apr 2019, #05; Thu, 25) Taylor Blau
2019-04-26  5:41   ` Junio C Hamano
2019-04-26  5:53     ` Taylor Blau
2019-04-26  6:01       ` Junio C Hamano
2019-04-26 17:58     ` Kaartic Sivaraam
2019-04-27  5:06       ` Junio C Hamano
2019-04-27  5:57         ` Kaartic Sivaraam
2019-05-02  3:09         ` Jeff King
2019-04-29 22:20 ` js/macos-gettext-build, was " Johannes Schindelin
2019-05-05  5:13   ` Junio C Hamano
2019-05-06  9:11 ` Duy Nguyen
2019-05-07  2:36   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190502214509.GA19188@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).