git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Michael Haggerty <mhagger@alum.mit.edu>,
	Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2 6/7] pack-objects: compute local/ignore_pack_keep early
Date: Fri, 29 Jul 2016 00:11:31 -0400	[thread overview]
Message-ID: <20160729041130.GF22408@sigill.intra.peff.net> (raw)
In-Reply-To: <20160729040422.GA19678@sigill.intra.peff.net>

In want_object_in_pack(), we can exit early from our loop if
neither "local" nor "ignore_pack_keep" are set. If they are,
however, we must examine each pack to see if it has the
object and is non-local or has a ".keep".

It's quite common for there to be no non-local or .keep
packs at all, in which case we know ahead of time that
looking further will be pointless. We can pre-compute this
by simply iterating over the list of packs ahead of time,
and dropping the flags if there are no packs that could
match.

Another similar strategy would be to modify the loop in
want_object_in_pack() to notice that we have already found
the object once, and that we are looping only to check for
"local" and "keep" attributes. If a pack has neither of
those, we can skip the call to find_pack_entry_one(), which
is the expensive part of the loop.

This has two advantages:

  - it isn't all-or-nothing; we still get some improvement
    when there's a small number of kept or non-local packs,
    and a large number of non-kept local packs

  - it eliminates any possible race where we add new
    non-local or kept packs after our initial scan. In
    practice, I don't think this race matters; we already
    cache the packed_git information, so somebody who adds a
    new pack or .keep file after we've started will not be
    noticed at all, unless we happen to need to call
    reprepare_packed_git() because a lookup fails.

    In other words, we're already racy, and the race is not
    a big deal (losing the race means we might include an
    object in the pack that would not otherwise be, which is
    an acceptable outcome).

However, it also has a disadvantage: we still loop over the
rest of the packs for each object to check their flags. This
is much less expensive than doing the object lookup, but
still not free. So if we wanted to implement that strategy
to cover the non-all-or-nothing cases, we could do so in
addition to this one (so you get the most speedup in the
all-or-nothing case, and the best we can do in the other
cases). But given that the all-or-nothing case is likely the
most common, it is probably not worth the trouble, and we
can revisit this later if evidence points otherwise.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/pack-objects.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 8ad11f2..c4c2a3c 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -46,6 +46,7 @@ static int keep_unreachable, unpack_unreachable, include_tag;
 static unsigned long unpack_unreachable_expiration;
 static int pack_loose_unreachable;
 static int local;
+static int have_non_local_packs;
 static int incremental;
 static int ignore_packed_keep;
 static int allow_ofs_delta;
@@ -990,7 +991,8 @@ static int want_object_in_pack(const unsigned char *sha1,
 			 * we just found is going to be packed, so break
 			 * out of the loop to return 1 now.
 			 */
-			if (!ignore_packed_keep && !local)
+			if (!ignore_packed_keep &&
+			    (!local || !have_non_local_packs))
 				break;
 
 			if (local && !p->pack_local)
@@ -2799,6 +2801,28 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 		progress = 2;
 
 	prepare_packed_git();
+	if (ignore_packed_keep) {
+		struct packed_git *p;
+		for (p = packed_git; p; p = p->next)
+			if (p->pack_local && p->pack_keep)
+				break;
+		if (!p) /* no keep-able packs found */
+			ignore_packed_keep = 0;
+	}
+	if (local) {
+		/*
+		 * unlike ignore_packed_keep above, we do not want to
+		 * unset "local" based on looking at packs, as it
+		 * also covers non-local objects
+		 */
+		struct packed_git *p;
+		for (p = packed_git; p; p = p->next) {
+			if (!p->pack_local) {
+				have_non_local_packs = 1;
+				break;
+			}
+		}
+	}
 
 	if (progress)
 		progress_state = start_progress(_("Counting objects"), 0);
-- 
2.9.2.607.g98dce7b


  parent reply	other threads:[~2016-07-29  4:11 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-29  4:04 [PATCH v2 0/7] speed up pack-objects counting with many packs Jeff King
2016-07-29  4:06 ` [PATCH v2 1/7] t/perf: add tests for many-pack scenarios Jeff King
2016-07-29  4:06 ` [PATCH v2 2/7] sha1_file: drop free_pack_by_name Jeff King
2016-07-29  4:06 ` [PATCH v2 3/7] add generic most-recently-used list Jeff King
2016-07-29  4:09 ` [PATCH v2 4/7] find_pack_entry: replace last_found_pack with MRU cache Jeff King
2016-07-29  4:10 ` [PATCH v2 5/7] pack-objects: break out of want_object loop early Jeff King
2016-07-29  4:11 ` Jeff King [this message]
2016-07-29  4:15 ` [PATCH v2 7/7] pack-objects: use mru list when iterating over packs Jeff King
2016-07-29  5:45   ` Jeff King
2016-07-29 15:02     ` Junio C Hamano
2016-08-08 14:50       ` Jeff King
2016-08-08 16:28         ` Junio C Hamano
2016-08-08 16:51           ` Jeff King
2016-08-08 17:16             ` Junio C Hamano
2016-08-09 14:04               ` Jeff King
2016-08-09 17:45                 ` Jeff King
2016-08-09 18:06                   ` Junio C Hamano
2016-08-09 22:29                 ` Junio C Hamano
2016-08-10 11:52                   ` [PATCH v3 0/2] pack-objects mru Jeff King
2016-08-10 12:02                     ` [PATCH v3 1/2] pack-objects: break delta cycles before delta-search phase Jeff King
2016-08-10 20:17                       ` Junio C Hamano
2016-08-11  5:02                         ` Jeff King
2016-08-11  5:15                           ` [PATCH v4 " Jeff King
2016-08-11  6:57                           ` [PATCH v3 " Jeff King
2016-08-11  9:20                             ` [PATCH v5] pack-objects mru Jeff King
2016-08-11  9:24                               ` [PATCH v5 1/4] provide an initializer for "struct object_info" Jeff King
2016-08-11  9:25                               ` [PATCH v5 2/4] sha1_file: make packed_object_info public Jeff King
2016-08-11  9:26                               ` [PATCH v5 3/4] pack-objects: break delta cycles before delta-search phase Jeff King
2016-08-11  9:26                               ` [PATCH v5 4/4] pack-objects: use mru list when iterating over packs Jeff King
2016-08-11  9:57                               ` [PATCH v5] pack-objects mru Jeff King
2016-08-11 15:11                                 ` Junio C Hamano
2016-08-11 16:19                                   ` Jeff King
2016-08-10 12:03                     ` [PATCH v3 2/2] pack-objects: use mru list when iterating over packs Jeff King
2016-08-10 16:47                     ` [PATCH v3 0/2] pack-objects mru Junio C Hamano
2016-08-11  4:48                       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160729041130.GF22408@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).