From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> To: git@vger.kernel.org Cc: peff@peff.net, avarab@gmail.com, jrnieder@gmail.com, Junio C Hamano <gitster@pobox.com> Subject: [PATCH v2 0/6] Add a new "sparse" tree walk algorithm Date: Thu, 29 Nov 2018 06:24:00 -0800 (PST) Message-ID: <pull.89.v2.git.gitgitgadget@gmail.com> (raw) In-Reply-To: <pull.89.git.gitgitgadget@gmail.com> One of the biggest remaining pain points for users of very large repositories is the time it takes to run 'git push'. We inspected some slow pushes by our developers and found that the "Enumerating Objects" phase of a push was very slow. This is unsurprising, because this is why reachability bitmaps exist. However, reachability bitmaps are not available to us because of the single pack-file requirement. The bitmap approach is intended for servers anyway, and clients have a much different behavior pattern. Specifically, clients are normally pushing a very small number of objects compared to the entire working directory. A typical user changes only a small cone of the working directory, so let's use that to our benefit. Create a new "sparse" mode for 'git pack-objects' that uses the paths that introduce new objects to direct our search into the reachable trees. By collecting trees at each path, we can then recurse into a path only when there are uninteresting and interesting trees at that path. This gains a significant performance boost for small topics while presenting a possibility of packing extra objects. The main algorithm change is in patch 4, but is set up a little bit in patches 1 and 2. As demonstrated in the included test script, we see that the existing algorithm can send extra objects due to the way we specify the "frontier". But we can send even more objects if a user copies objects from one folder to another. I say "copy" because a rename would (usually) change the original folder and trigger a walk into that path, discovering the objects. In order to benefit from this approach, the user can opt-in using the pack.useSparse config setting. This setting can be overridden using the '--no-sparse' option. Update in V2: * Added GIT_TEST_PACK_SPARSE test option. * Fixed test breakages when GIT_TEST_PACK_SPARSE is enabled by adding null checks. Derrick Stolee (6): revision: add mark_tree_uninteresting_sparse list-objects: consume sparse tree walk pack-objects: add --sparse option revision: implement sparse algorithm pack-objects: create pack.useSparse setting pack-objects: create GIT_TEST_PACK_SPARSE Documentation/git-pack-objects.txt | 9 +- bisect.c | 2 +- builtin/pack-objects.c | 10 ++- builtin/rev-list.c | 2 +- http-push.c | 2 +- list-objects.c | 55 +++++++++++- list-objects.h | 4 +- revision.c | 121 +++++++++++++++++++++++++ revision.h | 2 + t/README | 4 + t/t5322-pack-objects-sparse.sh | 139 +++++++++++++++++++++++++++++ 11 files changed, 340 insertions(+), 10 deletions(-) create mode 100755 t/t5322-pack-objects-sparse.sh base-commit: a1598010f775d82b5adf12c29d0f5bc9b41434c6 Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-89%2Fderrickstolee%2Fpush%2Fsparse-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-89/derrickstolee/push/sparse-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/89 Range-diff vs v1: 1: b73b8de98c = 1: 60617681f7 revision: add mark_tree_uninteresting_sparse 2: 9bf04c748b ! 2: 4527addacb list-objects: consume sparse tree walk @@ -116,6 +116,10 @@ + for (parents = commit->parents; parents; parents = parents->next) { + struct commit *parent = parents->item; + struct tree *tree = get_commit_tree(parent); ++ ++ if (!tree) ++ continue; ++ + oidset_insert(set, &tree->object.oid); + + if (!(parent->object.flags & UNINTERESTING)) @@ -142,14 +146,14 @@ + for (list = revs->commits; list; list = list->next) { struct commit *commit = list->item; -+ + +- if (commit->object.flags & UNINTERESTING) { + if (sparse) { + struct tree *tree = get_commit_tree(commit); -+ ++ + if (commit->object.flags & UNINTERESTING) + tree->object.flags |= UNINTERESTING; - -- if (commit->object.flags & UNINTERESTING) { ++ + oidset_insert(&set, &tree->object.oid); + add_edge_parents(commit, revs, show_edge, &set); + } else if (commit->object.flags & UNINTERESTING) { @@ -189,3 +193,17 @@ struct oidset; struct list_objects_filter_options; + +diff --git a/revision.c b/revision.c +--- a/revision.c ++++ b/revision.c +@@ + while ((oid = oidset_iter_next(&iter))) { + struct tree *tree = lookup_tree(r, oid); + ++ if (!tree) ++ continue; ++ + if (tree->object.flags & UNINTERESTING) { + /* + * Remove the flag so the next call 3: 9d6b8f6d06 = 3: 9644f6ff04 pack-objects: add --sparse option 4: 0725aac4bb ! 4: c99957d06f revision: implement sparse algorithm @@ -157,6 +157,9 @@ + struct tree_desc desc; + struct name_entry entry; + ++ if (!tree) ++ return; ++ + if (parse_tree_gently(tree, 1) < 0) + return; + @@ -168,13 +171,15 @@ + + if (tree->object.flags & UNINTERESTING) { + struct tree *child = lookup_tree(r, entry.oid); -+ child->object.flags |= UNINTERESTING; ++ if (child) ++ child->object.flags |= UNINTERESTING; + } + break; + case OBJ_BLOB: + if (tree->object.flags & UNINTERESTING) { + struct blob *child = lookup_blob(r, entry.oid); -+ child->object.flags |= UNINTERESTING; ++ if (child) ++ child->object.flags |= UNINTERESTING; + } + break; + default: @@ -201,6 +206,9 @@ + (oid = oidset_iter_next(&iter))) { struct tree *tree = lookup_tree(r, oid); + if (!tree) + continue; + - if (tree->object.flags & UNINTERESTING) { - /* - * Remove the flag so the next call 5: bbc3f78182 = 5: d6912188be pack-objects: create pack.useSparse setting -: ---------- > 6: 3d394a9136 pack-objects: create GIT_TEST_PACK_SPARSE -- gitgitgadget
next prev parent reply other threads:[~2018-11-29 14:51 UTC|newest] Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-11-28 21:52 [PATCH 0/5] " Derrick Stolee via GitGitGadget 2018-11-28 21:52 ` [PATCH 1/5] revision: add mark_tree_uninteresting_sparse Derrick Stolee via GitGitGadget 2018-11-28 21:52 ` [PATCH 2/5] list-objects: consume sparse tree walk Derrick Stolee via GitGitGadget 2018-11-28 21:52 ` [PATCH 3/5] pack-objects: add --sparse option Derrick Stolee via GitGitGadget 2018-11-28 22:11 ` Stefan Beller 2018-11-29 14:20 ` Derrick Stolee 2018-11-30 2:39 ` Junio C Hamano 2018-11-30 15:53 ` Derrick Stolee 2018-11-28 21:52 ` [PATCH 4/5] revision: implement sparse algorithm Derrick Stolee via GitGitGadget 2018-11-28 21:52 ` [PATCH 5/5] pack-objects: create pack.useSparse setting Derrick Stolee via GitGitGadget 2018-11-28 22:18 ` [PATCH 0/5] Add a new "sparse" tree walk algorithm Ævar Arnfjörð Bjarmason 2018-11-29 4:05 ` Derrick Stolee 2018-11-29 14:24 ` Derrick Stolee via GitGitGadget [this message] 2018-11-29 14:24 ` [PATCH v2 1/6] revision: add mark_tree_uninteresting_sparse Derrick Stolee via GitGitGadget 2018-11-29 14:24 ` [PATCH v2 2/6] list-objects: consume sparse tree walk Derrick Stolee via GitGitGadget 2018-11-29 14:24 ` [PATCH v2 3/6] pack-objects: add --sparse option Derrick Stolee via GitGitGadget 2018-11-29 14:24 ` [PATCH v2 4/6] revision: implement sparse algorithm Derrick Stolee via GitGitGadget 2018-11-29 14:24 ` [PATCH v2 5/6] pack-objects: create pack.useSparse setting Derrick Stolee via GitGitGadget 2018-11-29 14:24 ` [PATCH v2 6/6] pack-objects: create GIT_TEST_PACK_SPARSE Derrick Stolee via GitGitGadget 2018-12-10 16:42 ` [PATCH v3 0/6] Add a new "sparse" tree walk algorithm Derrick Stolee via GitGitGadget 2018-12-10 16:42 ` [PATCH v3 1/6] revision: add mark_tree_uninteresting_sparse Derrick Stolee via GitGitGadget 2018-12-10 16:42 ` [PATCH v3 2/6] list-objects: consume sparse tree walk Derrick Stolee via GitGitGadget 2018-12-10 16:42 ` [PATCH v3 3/6] pack-objects: add --sparse option Derrick Stolee via GitGitGadget 2018-12-10 16:42 ` [PATCH v3 4/6] revision: implement sparse algorithm Derrick Stolee via GitGitGadget 2018-12-10 16:42 ` [PATCH v3 5/6] pack-objects: create pack.useSparse setting Derrick Stolee via GitGitGadget 2018-12-10 16:42 ` [PATCH v3 6/6] pack-objects: create GIT_TEST_PACK_SPARSE Derrick Stolee via GitGitGadget 2018-12-14 21:22 ` [PATCH v4 0/6] Add a new "sparse" tree walk algorithm Derrick Stolee via GitGitGadget 2018-12-14 21:22 ` [PATCH v4 1/6] revision: add mark_tree_uninteresting_sparse Derrick Stolee via GitGitGadget 2019-01-11 19:43 ` Junio C Hamano 2019-01-11 20:25 ` Junio C Hamano 2019-01-11 22:05 ` Derrick Stolee 2018-12-14 21:22 ` [PATCH v4 2/6] list-objects: consume sparse tree walk Derrick Stolee via GitGitGadget 2019-01-11 23:20 ` Junio C Hamano 2018-12-14 21:22 ` [PATCH v4 3/6] pack-objects: add --sparse option Derrick Stolee via GitGitGadget 2019-01-11 22:30 ` Junio C Hamano 2019-01-15 15:06 ` Derrick Stolee 2019-01-15 18:23 ` Junio C Hamano 2018-12-14 21:22 ` [PATCH v4 4/6] revision: implement sparse algorithm Derrick Stolee via GitGitGadget 2018-12-14 23:32 ` Ævar Arnfjörð Bjarmason 2018-12-17 14:20 ` Derrick Stolee 2018-12-17 14:26 ` Ævar Arnfjörð Bjarmason 2018-12-17 14:50 ` Derrick Stolee 2019-01-11 23:20 ` Junio C Hamano 2018-12-14 21:22 ` [PATCH v4 5/6] pack-objects: create pack.useSparse setting Derrick Stolee via GitGitGadget 2018-12-14 21:22 ` [PATCH v4 6/6] pack-objects: create GIT_TEST_PACK_SPARSE Derrick Stolee via GitGitGadget 2019-01-16 18:25 ` [PATCH v5 0/5] Add a new "sparse" tree walk algorithm Derrick Stolee via GitGitGadget 2019-01-16 18:25 ` [PATCH v5 2/5] list-objects: consume sparse tree walk Derrick Stolee via GitGitGadget 2019-01-16 18:25 ` [PATCH v5 1/5] revision: add mark_tree_uninteresting_sparse Derrick Stolee via GitGitGadget 2019-01-16 18:25 ` [PATCH v5 3/5] revision: implement sparse algorithm Derrick Stolee via GitGitGadget 2019-01-16 18:26 ` [PATCH v5 4/5] pack-objects: create pack.useSparse setting Derrick Stolee via GitGitGadget 2019-01-16 18:26 ` [PATCH v5 5/5] pack-objects: create GIT_TEST_PACK_SPARSE Derrick Stolee via GitGitGadget
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=pull.89.v2.git.gitgitgadget@gmail.com \ --to=gitgitgadget@gmail.com \ --cc=avarab@gmail.com \ --cc=git@vger.kernel.org \ --cc=gitster@pobox.com \ --cc=jrnieder@gmail.com \ --cc=peff@peff.net \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
git@vger.kernel.org list mirror (unofficial, one of many) This inbox may be cloned and mirrored by anyone: git clone --mirror https://public-inbox.org/git git clone --mirror http://ou63pmih66umazou.onion/git git clone --mirror http://czquwvybam4bgbro.onion/git git clone --mirror http://hjrcffqmbrq6wope.onion/git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V1 git git/ https://public-inbox.org/git \ git@vger.kernel.org public-inbox-index git Example config snippet for mirrors. Newsgroups are available over NNTP: nntp://news.public-inbox.org/inbox.comp.version-control.git nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git nntp://news.gmane.io/gmane.comp.version-control.git note: .onion URLs require Tor: https://www.torproject.org/ code repositories for the project(s) associated with this inbox: https://80x24.org/mirrors/git.git AGPL code for this site: git clone https://public-inbox.org/public-inbox.git