git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, peff@peff.net, jrnieder@gmail.com,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 0/5] Add a new "sparse" tree walk algorithm
Date: Wed, 28 Nov 2018 23:18:14 +0100
Message-ID: <874lc0zw0p.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <pull.89.git.gitgitgadget@gmail.com>


On Wed, Nov 28 2018, Derrick Stolee via GitGitGadget wrote:

> One of the biggest remaining pain points for users of very large
> repositories is the time it takes to run 'git push'. We inspected some slow
> pushes by our developers and found that the "Enumerating Objects" phase of a
> push was very slow. This is unsurprising, because this is why reachability
> bitmaps exist. However, reachability bitmaps are not available to us because
> of the single pack-file requirement. The bitmap approach is intended for
> servers anyway, and clients have a much different behavior pattern.
>
> Specifically, clients are normally pushing a very small number of objects
> compared to the entire working directory. A typical user changes only a
> small cone of the working directory, so let's use that to our benefit.
>
> Create a new "sparse" mode for 'git pack-objects' that uses the paths that
> introduce new objects to direct our search into the reachable trees. By
> collecting trees at each path, we can then recurse into a path only when
> there are uninteresting and interesting trees at that path. This gains a
> significant performance boost for small topics while presenting a
> possibility of packing extra objects.
>
> The main algorithm change is in patch 4, but is set up a little bit in
> patches 1 and 2.
>
> As demonstrated in the included test script, we see that the existing
> algorithm can send extra objects due to the way we specify the "frontier".
> But we can send even more objects if a user copies objects from one folder
> to another. I say "copy" because a rename would (usually) change the
> original folder and trigger a walk into that path, discovering the objects.
>
> In order to benefit from this approach, the user can opt-in using the
> pack.useSparse config setting. This setting can be overridden using the
> '--no-sparse' option.

This is really interesting. I tested this with:

    diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
    index 124b1bafc4..5c7615f06c 100644
    --- a/builtin/pack-objects.c
    +++ b/builtin/pack-objects.c
    @@ -3143 +3143 @@ static void get_object_list(int ac, const char **av)
    -       mark_edges_uninteresting(&revs, show_edge, sparse);
    +       mark_edges_uninteresting(&revs, show_edge, 1);

To emulate having a GIT_TEST_* mode for this, which seems like a good
idea since it turned up a lot of segfaults in pack-objects. I wasn't
able to get a backtrace for that since it always happens indirectly, and
I didn't dig enough to see how to manually invoke it the right way.

  parent reply index

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-28 21:52 Derrick Stolee via GitGitGadget
2018-11-28 21:52 ` [PATCH 1/5] revision: add mark_tree_uninteresting_sparse Derrick Stolee via GitGitGadget
2018-11-28 21:52 ` [PATCH 2/5] list-objects: consume sparse tree walk Derrick Stolee via GitGitGadget
2018-11-28 21:52 ` [PATCH 3/5] pack-objects: add --sparse option Derrick Stolee via GitGitGadget
2018-11-28 22:11   ` Stefan Beller
2018-11-29 14:20     ` Derrick Stolee
2018-11-30  2:39       ` Junio C Hamano
2018-11-30 15:53         ` Derrick Stolee
2018-11-28 21:52 ` [PATCH 4/5] revision: implement sparse algorithm Derrick Stolee via GitGitGadget
2018-11-28 21:52 ` [PATCH 5/5] pack-objects: create pack.useSparse setting Derrick Stolee via GitGitGadget
2018-11-28 22:18 ` Ævar Arnfjörð Bjarmason [this message]
2018-11-29  4:05   ` [PATCH 0/5] Add a new "sparse" tree walk algorithm Derrick Stolee
2018-11-29 14:24 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
2018-11-29 14:24   ` [PATCH v2 1/6] revision: add mark_tree_uninteresting_sparse Derrick Stolee via GitGitGadget
2018-11-29 14:24   ` [PATCH v2 2/6] list-objects: consume sparse tree walk Derrick Stolee via GitGitGadget
2018-11-29 14:24   ` [PATCH v2 3/6] pack-objects: add --sparse option Derrick Stolee via GitGitGadget
2018-11-29 14:24   ` [PATCH v2 4/6] revision: implement sparse algorithm Derrick Stolee via GitGitGadget
2018-11-29 14:24   ` [PATCH v2 5/6] pack-objects: create pack.useSparse setting Derrick Stolee via GitGitGadget
2018-11-29 14:24   ` [PATCH v2 6/6] pack-objects: create GIT_TEST_PACK_SPARSE Derrick Stolee via GitGitGadget
2018-12-10 16:42   ` [PATCH v3 0/6] Add a new "sparse" tree walk algorithm Derrick Stolee via GitGitGadget
2018-12-10 16:42     ` [PATCH v3 1/6] revision: add mark_tree_uninteresting_sparse Derrick Stolee via GitGitGadget
2018-12-10 16:42     ` [PATCH v3 2/6] list-objects: consume sparse tree walk Derrick Stolee via GitGitGadget
2018-12-10 16:42     ` [PATCH v3 3/6] pack-objects: add --sparse option Derrick Stolee via GitGitGadget
2018-12-10 16:42     ` [PATCH v3 4/6] revision: implement sparse algorithm Derrick Stolee via GitGitGadget
2018-12-10 16:42     ` [PATCH v3 5/6] pack-objects: create pack.useSparse setting Derrick Stolee via GitGitGadget
2018-12-10 16:42     ` [PATCH v3 6/6] pack-objects: create GIT_TEST_PACK_SPARSE Derrick Stolee via GitGitGadget
2018-12-14 21:22     ` [PATCH v4 0/6] Add a new "sparse" tree walk algorithm Derrick Stolee via GitGitGadget
2018-12-14 21:22       ` [PATCH v4 1/6] revision: add mark_tree_uninteresting_sparse Derrick Stolee via GitGitGadget
2019-01-11 19:43         ` Junio C Hamano
2019-01-11 20:25           ` Junio C Hamano
2019-01-11 22:05             ` Derrick Stolee
2018-12-14 21:22       ` [PATCH v4 2/6] list-objects: consume sparse tree walk Derrick Stolee via GitGitGadget
2019-01-11 23:20         ` Junio C Hamano
2018-12-14 21:22       ` [PATCH v4 3/6] pack-objects: add --sparse option Derrick Stolee via GitGitGadget
2019-01-11 22:30         ` Junio C Hamano
2019-01-15 15:06           ` Derrick Stolee
2019-01-15 18:23             ` Junio C Hamano
2018-12-14 21:22       ` [PATCH v4 4/6] revision: implement sparse algorithm Derrick Stolee via GitGitGadget
2018-12-14 23:32         ` Ævar Arnfjörð Bjarmason
2018-12-17 14:20           ` Derrick Stolee
2018-12-17 14:26             ` Ævar Arnfjörð Bjarmason
2018-12-17 14:50               ` Derrick Stolee
2019-01-11 23:20         ` Junio C Hamano
2018-12-14 21:22       ` [PATCH v4 5/6] pack-objects: create pack.useSparse setting Derrick Stolee via GitGitGadget
2018-12-14 21:22       ` [PATCH v4 6/6] pack-objects: create GIT_TEST_PACK_SPARSE Derrick Stolee via GitGitGadget
2019-01-16 18:25       ` [PATCH v5 0/5] Add a new "sparse" tree walk algorithm Derrick Stolee via GitGitGadget
2019-01-16 18:25         ` [PATCH v5 2/5] list-objects: consume sparse tree walk Derrick Stolee via GitGitGadget
2019-01-16 18:25         ` [PATCH v5 1/5] revision: add mark_tree_uninteresting_sparse Derrick Stolee via GitGitGadget
2019-01-16 18:25         ` [PATCH v5 3/5] revision: implement sparse algorithm Derrick Stolee via GitGitGadget
2019-01-16 18:26         ` [PATCH v5 4/5] pack-objects: create pack.useSparse setting Derrick Stolee via GitGitGadget
2019-01-16 18:26         ` [PATCH v5 5/5] pack-objects: create GIT_TEST_PACK_SPARSE Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874lc0zw0p.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git