From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, peff@peff.net, jrnieder@gmail.com,
Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 0/5] Add a new "sparse" tree walk algorithm
Date: Wed, 28 Nov 2018 23:18:14 +0100 [thread overview]
Message-ID: <874lc0zw0p.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <pull.89.git.gitgitgadget@gmail.com>
On Wed, Nov 28 2018, Derrick Stolee via GitGitGadget wrote:
> One of the biggest remaining pain points for users of very large
> repositories is the time it takes to run 'git push'. We inspected some slow
> pushes by our developers and found that the "Enumerating Objects" phase of a
> push was very slow. This is unsurprising, because this is why reachability
> bitmaps exist. However, reachability bitmaps are not available to us because
> of the single pack-file requirement. The bitmap approach is intended for
> servers anyway, and clients have a much different behavior pattern.
>
> Specifically, clients are normally pushing a very small number of objects
> compared to the entire working directory. A typical user changes only a
> small cone of the working directory, so let's use that to our benefit.
>
> Create a new "sparse" mode for 'git pack-objects' that uses the paths that
> introduce new objects to direct our search into the reachable trees. By
> collecting trees at each path, we can then recurse into a path only when
> there are uninteresting and interesting trees at that path. This gains a
> significant performance boost for small topics while presenting a
> possibility of packing extra objects.
>
> The main algorithm change is in patch 4, but is set up a little bit in
> patches 1 and 2.
>
> As demonstrated in the included test script, we see that the existing
> algorithm can send extra objects due to the way we specify the "frontier".
> But we can send even more objects if a user copies objects from one folder
> to another. I say "copy" because a rename would (usually) change the
> original folder and trigger a walk into that path, discovering the objects.
>
> In order to benefit from this approach, the user can opt-in using the
> pack.useSparse config setting. This setting can be overridden using the
> '--no-sparse' option.
This is really interesting. I tested this with:
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 124b1bafc4..5c7615f06c 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3143 +3143 @@ static void get_object_list(int ac, const char **av)
- mark_edges_uninteresting(&revs, show_edge, sparse);
+ mark_edges_uninteresting(&revs, show_edge, 1);
To emulate having a GIT_TEST_* mode for this, which seems like a good
idea since it turned up a lot of segfaults in pack-objects. I wasn't
able to get a backtrace for that since it always happens indirectly, and
I didn't dig enough to see how to manually invoke it the right way.
next prev parent reply other threads:[~2018-11-28 22:18 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-28 21:52 [PATCH 0/5] Add a new "sparse" tree walk algorithm Derrick Stolee via GitGitGadget
2018-11-28 21:52 ` [PATCH 1/5] revision: add mark_tree_uninteresting_sparse Derrick Stolee via GitGitGadget
2018-11-28 21:52 ` [PATCH 2/5] list-objects: consume sparse tree walk Derrick Stolee via GitGitGadget
2018-11-28 21:52 ` [PATCH 3/5] pack-objects: add --sparse option Derrick Stolee via GitGitGadget
2018-11-28 22:11 ` Stefan Beller
2018-11-29 14:20 ` Derrick Stolee
2018-11-30 2:39 ` Junio C Hamano
2018-11-30 15:53 ` Derrick Stolee
2018-11-28 21:52 ` [PATCH 4/5] revision: implement sparse algorithm Derrick Stolee via GitGitGadget
2018-11-28 21:52 ` [PATCH 5/5] pack-objects: create pack.useSparse setting Derrick Stolee via GitGitGadget
2018-11-28 22:18 ` Ævar Arnfjörð Bjarmason [this message]
2018-11-29 4:05 ` [PATCH 0/5] Add a new "sparse" tree walk algorithm Derrick Stolee
2018-11-29 14:24 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
2018-11-29 14:24 ` [PATCH v2 1/6] revision: add mark_tree_uninteresting_sparse Derrick Stolee via GitGitGadget
2018-11-29 14:24 ` [PATCH v2 2/6] list-objects: consume sparse tree walk Derrick Stolee via GitGitGadget
2018-11-29 14:24 ` [PATCH v2 3/6] pack-objects: add --sparse option Derrick Stolee via GitGitGadget
2018-11-29 14:24 ` [PATCH v2 4/6] revision: implement sparse algorithm Derrick Stolee via GitGitGadget
2018-11-29 14:24 ` [PATCH v2 5/6] pack-objects: create pack.useSparse setting Derrick Stolee via GitGitGadget
2018-11-29 14:24 ` [PATCH v2 6/6] pack-objects: create GIT_TEST_PACK_SPARSE Derrick Stolee via GitGitGadget
2018-12-10 16:42 ` [PATCH v3 0/6] Add a new "sparse" tree walk algorithm Derrick Stolee via GitGitGadget
2018-12-10 16:42 ` [PATCH v3 1/6] revision: add mark_tree_uninteresting_sparse Derrick Stolee via GitGitGadget
2018-12-10 16:42 ` [PATCH v3 2/6] list-objects: consume sparse tree walk Derrick Stolee via GitGitGadget
2018-12-10 16:42 ` [PATCH v3 3/6] pack-objects: add --sparse option Derrick Stolee via GitGitGadget
2018-12-10 16:42 ` [PATCH v3 4/6] revision: implement sparse algorithm Derrick Stolee via GitGitGadget
2018-12-10 16:42 ` [PATCH v3 5/6] pack-objects: create pack.useSparse setting Derrick Stolee via GitGitGadget
2018-12-10 16:42 ` [PATCH v3 6/6] pack-objects: create GIT_TEST_PACK_SPARSE Derrick Stolee via GitGitGadget
2018-12-14 21:22 ` [PATCH v4 0/6] Add a new "sparse" tree walk algorithm Derrick Stolee via GitGitGadget
2018-12-14 21:22 ` [PATCH v4 1/6] revision: add mark_tree_uninteresting_sparse Derrick Stolee via GitGitGadget
2019-01-11 19:43 ` Junio C Hamano
2019-01-11 20:25 ` Junio C Hamano
2019-01-11 22:05 ` Derrick Stolee
2018-12-14 21:22 ` [PATCH v4 2/6] list-objects: consume sparse tree walk Derrick Stolee via GitGitGadget
2019-01-11 23:20 ` Junio C Hamano
2018-12-14 21:22 ` [PATCH v4 3/6] pack-objects: add --sparse option Derrick Stolee via GitGitGadget
2019-01-11 22:30 ` Junio C Hamano
2019-01-15 15:06 ` Derrick Stolee
2019-01-15 18:23 ` Junio C Hamano
2018-12-14 21:22 ` [PATCH v4 4/6] revision: implement sparse algorithm Derrick Stolee via GitGitGadget
2018-12-14 23:32 ` Ævar Arnfjörð Bjarmason
2018-12-17 14:20 ` Derrick Stolee
2018-12-17 14:26 ` Ævar Arnfjörð Bjarmason
2018-12-17 14:50 ` Derrick Stolee
2019-01-11 23:20 ` Junio C Hamano
2018-12-14 21:22 ` [PATCH v4 5/6] pack-objects: create pack.useSparse setting Derrick Stolee via GitGitGadget
2018-12-14 21:22 ` [PATCH v4 6/6] pack-objects: create GIT_TEST_PACK_SPARSE Derrick Stolee via GitGitGadget
2019-01-16 18:25 ` [PATCH v5 0/5] Add a new "sparse" tree walk algorithm Derrick Stolee via GitGitGadget
2019-01-16 18:25 ` [PATCH v5 2/5] list-objects: consume sparse tree walk Derrick Stolee via GitGitGadget
2019-01-16 18:25 ` [PATCH v5 1/5] revision: add mark_tree_uninteresting_sparse Derrick Stolee via GitGitGadget
2019-01-16 18:25 ` [PATCH v5 3/5] revision: implement sparse algorithm Derrick Stolee via GitGitGadget
2019-01-16 18:26 ` [PATCH v5 4/5] pack-objects: create pack.useSparse setting Derrick Stolee via GitGitGadget
2019-01-16 18:26 ` [PATCH v5 5/5] pack-objects: create GIT_TEST_PACK_SPARSE Derrick Stolee via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=874lc0zw0p.fsf@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=jrnieder@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).