From: Nathaniel Filardo <nwf20@cl.cam.ac.uk> To: git@vger.kernel.org Cc: Derrick Stolee <dstolee@microsoft.com>, Nathaniel Filardo <nwf20@cl.cam.ac.uk> Subject: [PATCH 0/4] Speed up repacking when lots of pack-kept objects Date: Tue, 12 Mar 2019 13:18:54 +0000 Message-ID: <20190312131858.26115-1-nwf20@cl.cam.ac.uk> (raw) This patch series improves handling of very large repositories, as generated by, for example, bup (https://github.com/bup/bup). Prolonged operation thereof creates quite a lot of small pack files; repacking improves filesystem performance of the objects/pack directory, but is quite expensive, in terms of time and memory. We have adopted a strategy that marks "large" (tens of GB) of pack files as "kept" and defers repacking until there are enough un-kept packs or enough bytes of un-kept objects. (The first patch in the series will make our accounting easier, replacing some terrible shell scripting with grep.) While this strategy has generally improved our lives relative to either extreme (not repacking, or repacking after every bup save operation), it still leaves a good bit to be desired. Because our packs are marked as kept, repacking will leave the objects therein alone, but it still must instantiate in memory and walk the entire object graph. However, because our kept packs are transitively closed, such that an object in one necessarily references only objects in other kept packs, we should like to avoid reasoning about them more or less altogether. This series attempts to do just that. The middle patches are just some groundwork for the last patch, which carries the punch line. This last patch adds an option to builtin/repack to enumerate commit and tree objects within kept packs as UNINTERESTING to its spawned builtin/pack-objects command. Together with inducing the use of sparse reachability, this speeds enumerating candidate objects for repacking and thereby substantially reduces the runtime of our repack operations, while producing identical results. I am, however, rather a novice when it comes to git internals, so any and all feedback is quite welcome. Nathaniel Filardo (4): count-objects: report statistics about kept packs revision walk: optionally use sparse reachability repack: add --sparse and pass to pack-objects repack: optionally assume transitive kept packs Documentation/git-gc.txt | 5 +++ Documentation/git-repack.txt | 25 +++++++++++++ bisect.c | 2 +- blame.c | 2 +- builtin/checkout.c | 2 +- builtin/commit.c | 2 +- builtin/count-objects.c | 17 ++++++++- builtin/describe.c | 2 +- builtin/fast-export.c | 2 +- builtin/fmt-merge-msg.c | 2 +- builtin/gc.c | 5 +++ builtin/log.c | 10 ++--- builtin/merge.c | 2 +- builtin/pack-objects.c | 4 +- builtin/repack.c | 64 +++++++++++++++++++++++++++++++- builtin/rev-list.c | 2 +- builtin/shortlog.c | 2 +- bundle.c | 2 +- http-push.c | 2 +- merge-recursive.c | 2 +- pack-bitmap-write.c | 2 +- pack-bitmap.c | 4 +- reachable.c | 4 +- ref-filter.c | 2 +- remote.c | 2 +- revision.c | 10 +++-- revision.h | 2 +- sequencer.c | 6 +-- shallow.c | 2 +- submodule.c | 4 +- t/helper/test-revision-walking.c | 2 +- 31 files changed, 154 insertions(+), 42 deletions(-) -- 2.17.1
next reply other threads:[~2019-03-12 13:37 UTC|newest] Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-03-12 13:18 Nathaniel Filardo [this message] 2019-03-12 13:18 ` [PATCH 1/4] count-objects: report statistics about kept packs Nathaniel Filardo 2019-03-12 13:18 ` [PATCH 2/4] revision walk: optionally use sparse reachability Nathaniel Filardo 2019-03-12 13:59 ` Derrick Stolee 2019-03-12 13:18 ` [PATCH 3/4] repack: add --sparse and pass to pack-objects Nathaniel Filardo 2019-03-12 13:47 ` Derrick Stolee 2019-03-12 14:03 ` Dr N.W. Filardo 2019-03-12 13:18 ` [PATCH 4/4] repack: optionally assume transitive kept packs Nathaniel Filardo
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190312131858.26115-1-nwf20@cl.cam.ac.uk \ --to=nwf20@cl.cam.ac.uk \ --cc=dstolee@microsoft.com \ --cc=git@vger.kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
git@vger.kernel.org list mirror (unofficial, one of many) This inbox may be cloned and mirrored by anyone: git clone --mirror https://public-inbox.org/git git clone --mirror http://ou63pmih66umazou.onion/git git clone --mirror http://czquwvybam4bgbro.onion/git git clone --mirror http://hjrcffqmbrq6wope.onion/git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V1 git git/ https://public-inbox.org/git \ git@vger.kernel.org public-inbox-index git Example config snippet for mirrors. Newsgroups are available over NNTP: nntp://news.public-inbox.org/inbox.comp.version-control.git nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git nntp://news.gmane.io/gmane.comp.version-control.git note: .onion URLs require Tor: https://www.torproject.org/ code repositories for the project(s) associated with this inbox: https://80x24.org/mirrors/git.git AGPL code for this site: git clone https://public-inbox.org/public-inbox.git