From: Jeff King <firstname.lastname@example.org> To: Rasmus Villemoes <email@example.com> Cc: "Git Mailing List" <firstname.lastname@example.org>, "Stefan Beller" <email@example.com>, "Nguyễn Thái Ngọc Duy" <firstname.lastname@example.org> Subject: [PATCH] packfile: actually set approximate_object_count_valid Date: Thu, 17 Sep 2020 12:47:43 -0400 [thread overview] Message-ID: <20200917164743.GA3731633@coredump.intra.peff.net> (raw) In-Reply-To: <20200917125333.GA3024501@coredump.intra.peff.net> On Thu, Sep 17, 2020 at 08:53:33AM -0400, Jeff King wrote: > Still not great, but caching the count did save us 15%. That seems worth > it to me (1000 packs is more than we'd hope for, but not uncommon in a > poorly maintained repo). The failure to set the flag is just a bug; > looks like mine from 8e3f52d778 (find_unique_abbrev: move logic out of > get_short_sha1(), 2016-10-03). > > You're right that caching runs the risk of the cache being invalidated. > But in this case I think we're covered. We'd generally modify packed_git > only via reprepare_packed_git(), prepare_packed_git(), and we do reset > the flag there. Plus count is only meant to be approximate, so even if > it ended up stale within a single process, I don't think it would be > that big a deal. So here it is wrapped up as a patch. I think it's worth fixing (as opposed to dropping the unused flag code). Thanks for finding it. -- >8 -- Subject: [PATCH] packfile: actually set approximate_object_count_valid The approximate_object_count() function tries to compute the count only once per process. But ever since it was introduced in 8e3f52d778 (find_unique_abbrev: move logic out of get_short_sha1(), 2016-10-03), we failed to actually set the "valid" flag, meaning we'd compute it fresh on every call. This turns out not to be _too_ bad, because we're only iterating through the packed_git list, and not making any system calls. But since it may get called for every abbreviated hash we output, even this can add up if you have many packs. Here are before-and-after timings for a new perf test which just asks rev-list to abbreviate each commit hash (the test repo is linux.git, with commit-graphs): Test origin HEAD ---------------------------------------------------------------------------- 5303.3: rev-list (1) 28.91(28.46+0.44) 29.03(28.65+0.38) +0.4% 5303.4: abbrev-commit (1) 1.18(1.06+0.11) 1.17(1.02+0.14) -0.8% 5303.7: rev-list (50) 28.95(28.56+0.38) 29.50(29.17+0.32) +1.9% 5303.8: abbrev-commit (50) 3.67(3.56+0.10) 3.57(3.42+0.15) -2.7% 5303.11: rev-list (1000) 30.34(29.89+0.43) 30.82(30.35+0.46) +1.6% 5303.12: abbrev-commit (1000) 86.82(86.52+0.29) 77.82(77.59+0.22) -10.4% 5303.15: load 10,000 packs 0.08(0.02+0.05) 0.08(0.02+0.06) +0.0% It doesn't help at all when we have 1 pack (5303.4), but we get a 10% speedup when there are 1000 packs (5303.12). That's a modest speedup for a case that's already slow and we'd hope to avoid in general (note how slow it is even after, because we have to look in each of those packs for abbreviations). But it's a one-line change that clearly matches the original intent, so it seems worth doing. The included perf test may also be useful for keeping an eye on any regressions in the overall abbreviation code. Reported-by: Rasmus Villemoes <email@example.com> Signed-off-by: Jeff King <firstname.lastname@example.org> --- packfile.c | 1 + t/perf/p5303-many-packs.sh | 4 ++++ 2 files changed, 5 insertions(+) diff --git a/packfile.c b/packfile.c index 9ef27508f2..e69012e7f2 100644 --- a/packfile.c +++ b/packfile.c @@ -923,6 +923,7 @@ unsigned long repo_approximate_object_count(struct repository *r) count += p->num_objects; } r->objects->approximate_object_count = count; + r->objects->approximate_object_count_valid = 1; } return r->objects->approximate_object_count; } diff --git a/t/perf/p5303-many-packs.sh b/t/perf/p5303-many-packs.sh index 7ee791669a..f4c2ab0584 100755 --- a/t/perf/p5303-many-packs.sh +++ b/t/perf/p5303-many-packs.sh @@ -73,6 +73,10 @@ do git rev-list --objects --all >/dev/null ' + test_perf "abbrev-commit ($nr_packs)" ' + git rev-list --abbrev-commit HEAD >/dev/null + ' + # This simulates the interesting part of the repack, which is the # actual pack generation, without smudging the on-disk setup # between trials. -- 188.8.131.522.gdd163d6eb1
next prev parent reply other threads:[~2020-09-17 16:48 UTC|newest] Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-09-17 8:20 approximate_object_count_valid never set? Rasmus Villemoes 2020-09-17 11:58 ` Ævar Arnfjörð Bjarmason 2020-09-17 12:53 ` Jeff King 2020-09-17 16:47 ` Jeff King [this message] 2020-09-17 16:53 ` [PATCH] packfile: actually set approximate_object_count_valid Taylor Blau 2020-09-17 18:26 ` Junio C Hamano
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200917164743.GA3731633@coredump.intra.peff.net \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --subject='Re: [PATCH] packfile: actually set approximate_object_count_valid' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Code repositories for project(s) associated with this inbox: https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).