From: Jeff King <peff@peff.net> To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com> Cc: Eric Wong <e@80x24.org>, git@vger.kernel.org Subject: Re: [PATCH v3] repack: enable bitmaps by default on bare repos Date: Wed, 10 Apr 2019 18:57:21 -0400 Message-ID: <20190410225721.GA32262@sigill.intra.peff.net> (raw) In-Reply-To: <87zhoz8b9o.fsf@evledraar.gmail.com> On Tue, Apr 09, 2019 at 05:10:43PM +0200, Ævar Arnfjörð Bjarmason wrote: > I've found a case where turning bitmaps on does horrible things for > bitmap "push" performance. > [...] > I can't share the repo, but I had a report where just a "git push" of a > topic branch that was 2/58 ahead/behind took ~2 minutes just in > "Enumerating objects", but ~500ms without bitmaps. That's pretty bad, though I'm not terribly surprised. The worst cases are ones where we have to traverse a lot to fill in the bitmap. So either there are a lot of commits newer than the bitmapped pack, or we've done a bad job of picking old commits to bitmap, requiring us to walk back to find all of the reachable objects (until we find something that does have a bitmap). And "bad" here is somewhat subjective. The other side told us some "have" lines, and those are what we have to walk back from. _Usually_ those would correspond to ref tips, and the bitmap code tries to put a bitmap at each ref tip. But if you have tons of refs, it can't always do so. > I.e. almost all the time is in get_object_list_from_bitmap() and around > 1m30s in just this in pack-bitmap.c: > > haves_bitmap = find_objects(bitmap_git, revs, haves, NULL); > > And then another ~20s in: > > wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap); Yeah, that's where I'd expect the time to go. Inside find_objects() we'll call traverse_commit_list() to do the actual walk. The first walk for "haves" is looking mostly at old commits. Stuff that is mentioned in the pack, but for which we have to walk to find the bitmap. The second walk for wants is probably mostly looking at new commits. Things that don't have a bitmap yet, and for which we have to walk (though it's interesting that it's so much more expensive than the non-bitmap walk, which has to fully enumerate those trees itself; that implies that some of the "haves" are recent, too, with respect to the last pack). > This is with 10 packs and where only the largest (initial clone pack) > had a *.bitmap, but I can also reproduce with a 'git repack -A -d -b', > i.e. with only one pack with a *.bitmap, although that makes it a bit > better for the first bit, and almost completely cuts down on the time > spent in the second phase: Yeah, that makes sense. By repacking you've taken all those new commits and included them in on-disk bitmaps. So I'd expect the "wants" to get much shorter, but the "haves" phase staying long means we could do a better job of picking commits to have on-disk bitmaps. So two avenues for exploration I think: 1. I've long suspected that the bitmap selection code isn't ideal. Both in terms of what it picks, but also in its runtime (I think it ends up walking the same slices of history multiple times in some cases). 2. The answer we get from a bitmap versus a regular traversal are not apples-to-apples equivalent. The regular traversal walks down to the UNINTERESTING commits, marks the boundaries trees and blobs as UNINTERESTING, and then adds in all the interesting trees and blobs minus the UNINTERESTING parts. So it can sometimes give the wrong answer, claiming something is interesting when it is not. Whereas with bitmaps we fill in the trees and blobs as we walk, and you get the true answer. But it means we may open up a lot more trees than the equivalent traversal would. So one thing I've never really experimented with (and indeed, never really thought about until writing this email) is that the bitmaps could try to do that looser style of traversal, knowing that we might err on the side of calling things interesting in a few cases. But hopefully spending a lot less time opening trees. I'm not even 100% sure what that would look like in code, but just thinking about it from a high level, I don't there's a particular mathematical reason it couldn't work. -Peff
next prev parent reply other threads:[~2019-04-10 22:57 UTC|newest] Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-02-14 4:31 [PATCH 0/3] some prune optimizations Jeff King 2019-02-14 4:35 ` [PATCH 1/3] prune: lazily perform reachability traversal Jeff King 2019-02-14 10:54 ` Eric Sunshine 2019-02-14 11:07 ` Jeff King 2019-02-14 4:37 ` [PATCH 2/3] prune: use bitmaps for " Jeff King 2019-03-09 2:49 ` bitmaps by default? [was: prune: use bitmaps for reachability traversal] Eric Wong 2019-03-10 23:39 ` Jeff King 2019-03-12 3:13 ` [PATCH] repack: enable bitmaps by default on bare repos Eric Wong 2019-03-12 9:07 ` Ævar Arnfjörð Bjarmason 2019-03-12 10:49 ` Jeff King 2019-03-12 12:05 ` Jeff King 2019-03-13 1:51 ` Eric Wong 2019-03-13 14:54 ` Jeff King 2019-03-14 9:12 ` [PATCH v3] " Eric Wong 2019-03-14 16:02 ` Jeff King 2019-03-15 6:21 ` [PATCH 0/2] enable bitmap hash-cache by default Jeff King 2019-03-15 6:22 ` [PATCH 1/2] t5310: correctly remove bitmaps for jgit test Jeff King 2019-03-15 13:25 ` SZEDER Gábor 2019-03-15 18:36 ` Jeff King 2019-03-15 6:25 ` [PATCH 2/2] pack-objects: default to writing bitmap hash-cache Jeff King 2019-04-09 15:10 ` [PATCH v3] repack: enable bitmaps by default on bare repos Ævar Arnfjörð Bjarmason 2019-04-10 22:57 ` Jeff King [this message] 2019-04-25 7:16 ` Junio C Hamano 2019-05-04 1:37 ` Jeff King 2019-05-04 6:52 ` Ævar Arnfjörð Bjarmason 2019-05-04 13:23 ` SZEDER Gábor 2019-05-08 20:17 ` Ævar Arnfjörð Bjarmason 2019-05-09 4:24 ` Junio C Hamano 2019-05-07 7:45 ` Jeff King 2019-05-07 8:12 ` Ævar Arnfjörð Bjarmason 2019-05-08 7:11 ` Jeff King 2019-05-08 14:20 ` Derrick Stolee 2019-05-08 16:13 ` Ævar Arnfjörð Bjarmason 2019-05-08 22:25 ` Jeff King 2019-05-23 11:30 ` Jeff King 2019-05-23 12:53 ` Derrick Stolee 2019-05-24 7:24 ` Jeff King 2019-05-24 10:33 ` Derrick Stolee 2019-05-23 19:26 ` Ævar Arnfjörð Bjarmason 2019-05-24 7:27 ` Jeff King 2019-05-24 7:55 ` Ævar Arnfjörð Bjarmason 2019-05-24 8:26 ` Jeff King 2019-05-24 9:01 ` Ævar Arnfjörð Bjarmason 2019-05-24 9:29 ` SZEDER Gábor 2019-05-24 11:17 ` Ævar Arnfjörð Bjarmason 2019-05-24 11:41 ` SZEDER Gábor 2019-05-24 11:58 ` Ævar Arnfjörð Bjarmason 2019-05-24 12:34 ` SZEDER Gábor 2019-05-24 13:41 ` Ævar Arnfjörð Bjarmason 2019-05-24 11:31 ` [PATCH] pack-bitmap: look for an uninteresting bitmap Derrick Stolee 2019-04-15 15:00 ` [PATCH 2/3] prune: use bitmaps for reachability traversal Derrick Stolee 2019-04-18 19:49 ` Jeff King 2019-04-18 20:08 ` [PATCH] t5304: add a test for pruning with bitmaps Jeff King 2019-04-20 1:01 ` Derrick Stolee 2019-04-20 3:24 ` Jeff King 2019-04-20 21:01 ` Derrick Stolee 2019-02-14 4:38 ` [PATCH 3/3] prune: check SEEN flag for reachability Jeff King
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190410225721.GA32262@sigill.intra.peff.net \ --to=peff@peff.net \ --cc=avarab@gmail.com \ --cc=e@80x24.org \ --cc=git@vger.kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
git@vger.kernel.org list mirror (unofficial, one of many) This inbox may be cloned and mirrored by anyone: git clone --mirror https://public-inbox.org/git git clone --mirror http://ou63pmih66umazou.onion/git git clone --mirror http://czquwvybam4bgbro.onion/git git clone --mirror http://hjrcffqmbrq6wope.onion/git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V1 git git/ https://public-inbox.org/git \ git@vger.kernel.org public-inbox-index git Example config snippet for mirrors. Newsgroups are available over NNTP: nntp://news.public-inbox.org/inbox.comp.version-control.git nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git nntp://news.gmane.io/gmane.comp.version-control.git note: .onion URLs require Tor: https://www.torproject.org/ code repositories for the project(s) associated with this inbox: https://80x24.org/mirrors/git.git AGPL code for this site: git clone https://public-inbox.org/public-inbox.git