git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: peff@peff.net, dstolee@microsoft.com, gitster@pobox.com
Subject: [PATCH v4 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic
Date: Mon, 22 Feb 2021 21:25:20 -0500	[thread overview]
Message-ID: <a116587fb2b7f6484b9206de68ff66d10bb2a2a2.1614047097.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1614047097.git.me@ttaylorr.com>

From: Jeff King <peff@peff.net>

Now that we have find_kept_pack_entry(), we don't have to manually keep
hunting through every pack to find a possible "kept" duplicate of the
object. This should be faster, assuming only a portion of your total
packs are actually kept.

Note that we have to re-order the logic a bit here; we can deal with the
disqualifying situations first (e.g., finding the object in a non-local
pack with --local), then "kept" situation(s), and then just fall back to
other "--local" conditions.

Here are the results from p5303 (measurements again taken on the
kernel):

  Test                                        HEAD^                   HEAD
  -----------------------------------------------------------------------------------------------
  5303.5: repack (1)                          57.26(54.59+10.84)      57.34(54.66+10.88) +0.1%
  5303.6: repack with kept (1)                57.33(54.80+10.51)      57.38(54.83+10.49) +0.1%
  5303.11: repack (50)                        71.54(88.57+4.84)       71.70(88.99+4.74) +0.2%
  5303.12: repack with kept (50)              85.12(102.05+4.94)      72.58(89.61+4.78) -14.7%
  5303.17: repack (1000)                      216.87(490.79+14.57)    217.19(491.72+14.25) +0.1%
  5303.18: repack with kept (1000)            665.63(938.87+15.76)    246.12(520.07+14.93) -63.0%

and the --stdin-packs timings:

  5303.7: repack with --stdin-packs (1)       0.01(0.01+0.00)         0.00(0.00+0.00) -100.0%
  5303.13: repack with --stdin-packs (50)     3.53(12.07+0.24)        3.43(11.75+0.24) -2.8%
  5303.19: repack with --stdin-packs (1000)   195.83(371.82+8.10)     130.50(307.15+7.66) -33.4%

So our repack with an empty .keep pack is roughly as fast as one without
a .keep pack up to 50 packs. But the --stdin-packs case scales a little
better, too.

Notably, it is faster than a repack of the same size and a kept pack. It
looks at fewer objects, of course, but the penalty for looking at many
packs isn't as costly.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 131 ++++++++++++++++++++++++-----------------
 1 file changed, 78 insertions(+), 53 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 6ee8e40665..8cb32763b7 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1188,7 +1188,8 @@ static int have_duplicate_entry(const struct object_id *oid,
 	return 1;
 }
 
-static int want_found_object(int exclude, struct packed_git *p)
+static int want_found_object(const struct object_id *oid, int exclude,
+			     struct packed_git *p)
 {
 	if (exclude)
 		return 1;
@@ -1204,27 +1205,82 @@ static int want_found_object(int exclude, struct packed_git *p)
 	 * make sure no copy of this object appears in _any_ pack that makes us
 	 * to omit the object, so we need to check all the packs.
 	 *
-	 * We can however first check whether these options can possible matter;
+	 * We can however first check whether these options can possibly matter;
 	 * if they do not matter we know we want the object in generated pack.
 	 * Otherwise, we signal "-1" at the end to tell the caller that we do
 	 * not know either way, and it needs to check more packs.
 	 */
-	if (!ignore_packed_keep_on_disk &&
-	    !ignore_packed_keep_in_core &&
-	    (!local || !have_non_local_packs))
-		return 1;
 
+	/*
+	 * Objects in packs borrowed from elsewhere are discarded regardless of
+	 * if they appear in other packs that weren't borrowed.
+	 */
 	if (local && !p->pack_local)
 		return 0;
-	if (p->pack_local &&
-	    ((ignore_packed_keep_on_disk && p->pack_keep) ||
-	     (ignore_packed_keep_in_core && p->pack_keep_in_core)))
-		return 0;
+
+	/*
+	 * Then handle .keep first, as we have a fast(er) path there.
+	 */
+	if (ignore_packed_keep_on_disk || ignore_packed_keep_in_core) {
+		/*
+		 * Set the flags for the kept-pack cache to be the ones we want
+		 * to ignore.
+		 *
+		 * That is, if we are ignoring objects in on-disk keep packs,
+		 * then we want to search through the on-disk keep and ignore
+		 * the in-core ones.
+		 */
+		unsigned flags = 0;
+		if (ignore_packed_keep_on_disk)
+			flags |= ON_DISK_KEEP_PACKS;
+		if (ignore_packed_keep_in_core)
+			flags |= IN_CORE_KEEP_PACKS;
+
+		if (ignore_packed_keep_on_disk && p->pack_keep)
+			return 0;
+		if (ignore_packed_keep_in_core && p->pack_keep_in_core)
+			return 0;
+		if (has_object_kept_pack(oid, flags))
+			return 0;
+	}
+
+	/*
+	 * At this point we know definitively that either we don't care about
+	 * keep-packs, or the object is not in one. Keep checking other
+	 * conditions...
+	 */
+	if (!local || !have_non_local_packs)
+		return 1;
 
 	/* we don't know yet; keep looking for more packs */
 	return -1;
 }
 
+static int want_object_in_pack_one(struct packed_git *p,
+				   const struct object_id *oid,
+				   int exclude,
+				   struct packed_git **found_pack,
+				   off_t *found_offset)
+{
+	off_t offset;
+
+	if (p == *found_pack)
+		offset = *found_offset;
+	else
+		offset = find_pack_entry_one(oid->hash, p);
+
+	if (offset) {
+		if (!*found_pack) {
+			if (!is_pack_valid(p))
+				return -1;
+			*found_offset = offset;
+			*found_pack = p;
+		}
+		return want_found_object(oid, exclude, p);
+	}
+	return -1;
+}
+
 /*
  * Check whether we want the object in the pack (e.g., we do not want
  * objects found in non-local stores if the "--local" option was used).
@@ -1252,7 +1308,7 @@ static int want_object_in_pack(const struct object_id *oid,
 	 * are present we will determine the answer right now.
 	 */
 	if (*found_pack) {
-		want = want_found_object(exclude, *found_pack);
+		want = want_found_object(oid, exclude, *found_pack);
 		if (want != -1)
 			return want;
 	}
@@ -1260,53 +1316,22 @@ static int want_object_in_pack(const struct object_id *oid,
 	for (m = get_multi_pack_index(the_repository); m; m = m->next) {
 		struct pack_entry e;
 		if (fill_midx_entry(the_repository, oid, &e, m)) {
-			struct packed_git *p = e.p;
-			off_t offset;
-
-			if (p == *found_pack)
-				offset = *found_offset;
-			else
-				offset = find_pack_entry_one(oid->hash, p);
-
-			if (offset) {
-				if (!*found_pack) {
-					if (!is_pack_valid(p))
-						continue;
-					*found_offset = offset;
-					*found_pack = p;
-				}
-				want = want_found_object(exclude, p);
-				if (want != -1)
-					return want;
-			}
-		}
-	}
-
-	list_for_each(pos, get_packed_git_mru(the_repository)) {
-		struct packed_git *p = list_entry(pos, struct packed_git, mru);
-		off_t offset;
-
-		if (p == *found_pack)
-			offset = *found_offset;
-		else
-			offset = find_pack_entry_one(oid->hash, p);
-
-		if (offset) {
-			if (!*found_pack) {
-				if (!is_pack_valid(p))
-					continue;
-				*found_offset = offset;
-				*found_pack = p;
-			}
-			want = want_found_object(exclude, p);
-			if (!exclude && want > 0)
-				list_move(&p->mru,
-					  get_packed_git_mru(the_repository));
+			want = want_object_in_pack_one(e.p, oid, exclude, found_pack, found_offset);
 			if (want != -1)
 				return want;
 		}
 	}
 
+	list_for_each(pos, get_packed_git_mru(the_repository)) {
+		struct packed_git *p = list_entry(pos, struct packed_git, mru);
+		want = want_object_in_pack_one(p, oid, exclude, found_pack, found_offset);
+		if (!exclude && want > 0)
+			list_move(&p->mru,
+				  get_packed_git_mru(the_repository));
+		if (want != -1)
+			return want;
+	}
+
 	if (uri_protocols.nr) {
 		struct configured_exclusion *ex =
 			oidmap_get(&configured_exclusions, oid);
-- 
2.30.0.667.g81c0cbc6fd


  parent reply	other threads:[~2021-02-23  2:26 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-19 23:23 [PATCH 00/10] repack: support repacking into a geometric sequence Taylor Blau
2021-01-19 23:24 ` [PATCH 01/10] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-01-20 13:40   ` Derrick Stolee
2021-01-20 14:38     ` Taylor Blau
2021-01-29  2:33   ` Junio C Hamano
2021-01-29 18:38     ` Taylor Blau
2021-01-29 19:31     ` Jeff King
2021-01-29 20:20       ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 02/10] revision: learn '--no-kept-objects' Taylor Blau
2021-01-29  3:10   ` Junio C Hamano
2021-01-29 19:13     ` Taylor Blau
2021-01-19 23:24 ` [PATCH 03/10] builtin/pack-objects.c: learn '--assume-kept-packs-closed' Taylor Blau
2021-01-29  3:21   ` Junio C Hamano
2021-01-29 19:19     ` Jeff King
2021-01-29 20:01       ` Taylor Blau
2021-01-29 20:25         ` Jeff King
2021-01-29 22:10           ` Taylor Blau
2021-01-29 22:57             ` Jeff King
2021-01-29 23:03             ` Junio C Hamano
2021-01-29 23:28               ` Taylor Blau
2021-02-02  3:04                 ` Taylor Blau
2021-01-29 23:31               ` Jeff King
2021-01-29 22:13           ` Junio C Hamano
2021-01-29 20:30       ` Junio C Hamano
2021-01-29 22:43         ` Jeff King
2021-01-29 22:53           ` Taylor Blau
2021-01-29 23:00             ` Jeff King
2021-01-29 23:10             ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 04/10] p5303: add missing &&-chains Taylor Blau
2021-01-19 23:24 ` [PATCH 05/10] p5303: measure time to repack with keep Taylor Blau
2021-01-29  3:40   ` Junio C Hamano
2021-01-29 19:32     ` Jeff King
2021-01-29 20:04       ` [PATCH] p5303: avoid sed GNU-ism Jeff King
2021-01-29 20:19         ` Eric Sunshine
2021-01-29 20:27           ` Jeff King
2021-01-29 20:36             ` Eric Sunshine
2021-01-29 22:11               ` Taylor Blau
2021-01-29 20:38       ` [PATCH 05/10] p5303: measure time to repack with keep Junio C Hamano
2021-01-29 22:10         ` Jeff King
2021-01-29 23:12           ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 06/10] pack-objects: rewrite honor-pack-keep logic Taylor Blau
2021-01-19 23:24 ` [PATCH 07/10] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-01-19 23:24 ` [PATCH 08/10] builtin/pack-objects.c: teach '--keep-pack-stdin' Taylor Blau
2021-01-19 23:24 ` [PATCH 09/10] builtin/repack.c: extract loose object handling Taylor Blau
2021-01-20 13:59   ` Derrick Stolee
2021-01-20 14:34     ` Taylor Blau
2021-01-20 15:51       ` Derrick Stolee
2021-01-21  3:45     ` Junio C Hamano
2021-01-19 23:24 ` [PATCH 10/10] builtin/repack.c: add '--geometric' option Taylor Blau
2021-01-20 14:05 ` [PATCH 00/10] repack: support repacking into a geometric sequence Derrick Stolee
2021-02-04  3:58 ` [PATCH v2 0/8] " Taylor Blau
2021-02-04  3:58   ` [PATCH v2 1/8] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-02-16 21:42     ` Jeff King
2021-02-16 21:48       ` Taylor Blau
2021-02-04  3:58   ` [PATCH v2 2/8] revision: learn '--no-kept-objects' Taylor Blau
2021-02-16 23:17     ` Jeff King
2021-02-17 18:35       ` Taylor Blau
2021-02-04  3:59   ` [PATCH v2 3/8] builtin/pack-objects.c: add '--stdin-packs' option Taylor Blau
2021-02-16 23:46     ` Jeff King
2021-02-17 18:59       ` Taylor Blau
2021-02-17 19:21         ` Jeff King
2021-02-04  3:59   ` [PATCH v2 4/8] p5303: add missing &&-chains Taylor Blau
2021-02-04  3:59   ` [PATCH v2 5/8] p5303: measure time to repack with keep Taylor Blau
2021-02-16 23:58     ` Jeff King
2021-02-17  0:02       ` Jeff King
2021-02-17 19:13       ` Taylor Blau
2021-02-17 19:25         ` Jeff King
2021-02-04  3:59   ` [PATCH v2 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic Taylor Blau
2021-02-17 16:05     ` Jeff King
2021-02-17 19:23       ` Taylor Blau
2021-02-17 19:29         ` Jeff King
2021-02-04  3:59   ` [PATCH v2 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-02-17 17:11     ` Jeff King
2021-02-17 19:54       ` Taylor Blau
2021-02-17 20:25         ` Jeff King
2021-02-17 20:29           ` Taylor Blau
2021-02-17 21:43             ` Jeff King
2021-02-04  3:59   ` [PATCH v2 8/8] builtin/repack.c: add '--geometric' option Taylor Blau
2021-02-17 18:17     ` Jeff King
2021-02-17 20:01       ` Taylor Blau
2021-02-17  0:01   ` [PATCH v2 0/8] repack: support repacking into a geometric sequence Jeff King
2021-02-17 18:18     ` Jeff King
2021-02-18  3:14 ` [PATCH v3 " Taylor Blau
2021-02-18  3:14   ` [PATCH v3 1/8] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-02-18  3:14   ` [PATCH v3 2/8] revision: learn '--no-kept-objects' Taylor Blau
2021-02-18  3:14   ` [PATCH v3 3/8] builtin/pack-objects.c: add '--stdin-packs' option Taylor Blau
2021-02-18  3:14   ` [PATCH v3 4/8] p5303: add missing &&-chains Taylor Blau
2021-02-18  3:14   ` [PATCH v3 5/8] p5303: measure time to repack with keep Taylor Blau
2021-02-18  3:14   ` [PATCH v3 6/8] builtin/pack-objects.c: rewrite honor-pack-keep logic Taylor Blau
2021-02-18  3:14   ` [PATCH v3 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-02-18  3:14   ` [PATCH v3 8/8] builtin/repack.c: add '--geometric' option Taylor Blau
2021-02-23  0:31   ` [PATCH v3 0/8] repack: support repacking into a geometric sequence Jeff King
2021-02-23  1:06     ` Taylor Blau
2021-02-23  1:42       ` Jeff King
2021-02-23  2:24 ` [PATCH v4 " Taylor Blau
2021-02-23  2:25   ` [PATCH v4 1/8] packfile: introduce 'find_kept_pack_entry()' Taylor Blau
2021-02-23  2:25   ` [PATCH v4 2/8] revision: learn '--no-kept-objects' Taylor Blau
2021-02-23  2:25   ` [PATCH v4 3/8] builtin/pack-objects.c: add '--stdin-packs' option Taylor Blau
2021-02-23  8:07     ` Junio C Hamano
2021-02-23 18:51       ` Jeff King
2021-02-23  2:25   ` [PATCH v4 4/8] p5303: add missing &&-chains Taylor Blau
2021-02-23  2:25   ` [PATCH v4 5/8] p5303: measure time to repack with keep Taylor Blau
2021-02-23  2:25   ` Taylor Blau [this message]
2021-02-23  2:25   ` [PATCH v4 7/8] packfile: add kept-pack cache for find_kept_pack_entry() Taylor Blau
2021-02-23  2:25   ` [PATCH v4 8/8] builtin/repack.c: add '--geometric' option Taylor Blau
2021-02-24 23:19     ` Junio C Hamano
2021-02-24 23:43       ` Junio C Hamano
2021-03-04 21:40         ` Taylor Blau
2021-03-04 21:55       ` Taylor Blau
2021-02-23  3:39   ` [PATCH v4 0/8] repack: support repacking into a geometric sequence Jeff King
2021-02-23  7:43   ` Junio C Hamano
2021-02-23 18:44     ` Jeff King
2021-02-23 19:54       ` Martin Fick
2021-02-23 20:06         ` Taylor Blau
2021-02-23 21:57           ` Martin Fick
2021-02-23 20:15         ` Jeff King
2021-02-23 21:41           ` Martin Fick
2021-02-23 21:53             ` Jeff King
2021-02-24 18:13               ` Martin Fick
2021-02-26  6:23                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a116587fb2b7f6484b9206de68ff66d10bb2a2a2.1614047097.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).