git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 1/2] pack-objects: break out of want_object loop early
Date: Mon, 25 Jul 2016 14:50:10 -0400	[thread overview]
Message-ID: <20160725185010.GA12974@sigill.intra.peff.net> (raw)
In-Reply-To: <20160725184938.GA12871@sigill.intra.peff.net>

When pack-objects collects the list of objects to pack
(either from stdin, or via its internal rev-list), it
filters each one through want_object_in_pack().

This function loops through each existing packfile, looking
for the object. When we find it, we mark the pack/offset
combo for later use. However, we can't just return "yes, we
want it" at that point. If --honor-pack-keep is in effect,
we must keep looking to find it in _all_ packs, to make sure
none of them has a .keep. Likewise, if --local is in effect,
we must make sure it is not present in any local pack.

As a result, the sum effort of these calls is effectively
O(nr_objects * nr_packs). In an ordinary repository, we have
only a handful of packs, and this doesn't make a big
difference. But in pathological cases, it can slow the
counting phase to a crawl.

This patch notices the case that we have neither "--local"
nor "--honor-pack-keep" in effect and breaks out of the loop
early, after finding the first instance. Note that our worst
case is still "objects * packs" (i.e., we might find each
object in the last pack we look in), but in practice we will
often break out early. On an "average" repo, my git.git with
8 packs, this shows a modest 2% (a few dozen milliseconds)
improvement in the counting-objects phase of "git
pack-objects --all <foo" (hackily instrumented by sticking
exit(0) right after list_objects).

But in a much more pathological case, it makes a bigger
difference. I ran the same command on a real-world example
with ~9 million objects across 1300 packs. The counting time
dropped from 413s to 45s, an improvement of about 89%.

Note that this patch won't do anything by itself for a
normal "git gc", as it uses both --honor-pack-keep and
--local.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/pack-objects.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a2f8cfd..55ef5a8 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -981,6 +981,8 @@ static int want_object_in_pack(const unsigned char *sha1,
 				return 0;
 			if (ignore_packed_keep && p->pack_local && p->pack_keep)
 				return 0;
+			if (!ignore_packed_keep && !local)
+				break;
 		}
 	}
 
-- 
2.9.2.512.g8a06708


  reply	other threads:[~2016-07-25 18:50 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-25 18:49 [PATCH 0/2] speed up "Counting objects" when there are many packs Jeff King
2016-07-25 18:50 ` Jeff King [this message]
2016-07-25 19:56   ` [PATCH 1/2] pack-objects: break out of want_object loop early Junio C Hamano
2016-07-25 21:41     ` Jeff King
2016-07-25 21:52       ` Junio C Hamano
2016-07-25 22:14         ` Jeff King
2016-07-26 20:38           ` Junio C Hamano
2016-07-26 20:48             ` Jeff King
2016-07-26 21:38               ` Junio C Hamano
2016-07-27 21:13                 ` Jeff King
2016-07-27 21:28                   ` Junio C Hamano
2016-07-27 22:04                     ` Jeff King
2016-07-25 18:50 ` [PATCH 2/2] pack-objects: compute local/ignore_pack_keep early Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160725185010.GA12974@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).