git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Derrick Stolee <derrickstolee@github.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v2 2/2] pack-bitmap.c: use commit boundary during bitmap traversal
Date: Fri, 5 May 2023 14:43:41 -0400	[thread overview]
Message-ID: <ZFVOXc6qvVWsI/2C@nand.local> (raw)
In-Reply-To: <e6e2401a-519a-4859-d20b-6b947e94e1ec@github.com>

On Fri, May 05, 2023 at 02:13:45PM -0400, Derrick Stolee wrote:
> On 5/5/2023 1:27 PM, Taylor Blau wrote:
>
> > +static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
> > +					    struct rev_info *revs,
> > +					    struct object_list *roots)
> > +{
> > +	struct bitmap_boundary_cb cb = { .bitmap_git = bitmap_git };
> > +	unsigned int i;
> > +	unsigned int tmp_blobs, tmp_trees, tmp_tags;
> > +	int any_missing = 0;
> > +
> > +	revs->ignore_missing_links = 1;
> > +
> > +	/*
> > +	 * OR in any existing reachability bitmaps among `roots` into `base`.
> > +	 */
> > +	while (roots) {
> > +		struct object *object = roots->item;
> > +		roots = roots->next;
> > +
> > +		if (object->type == OBJ_COMMIT &&
> > +		    !obj_in_bitmap(bitmap_git, object, cb.base) &&
> > +		    add_commit_to_bitmap(bitmap_git, &cb.base,
> > +					 (struct commit *)object)) {
> > +			object->flags |= SEEN;
> > +			continue;
> > +		}
> > +
> > +		any_missing = 1;
> > +	}
> > +
> > +	if (!any_missing)
> > +		goto cleanup;
>
> Here, we check if the list of roots are completely covered by existing
> bitmaps. This prevents doing the commit-only walk as well as the boundary
> checks.
>
> There's another confusing bit here: if we have a bitmap for the commit,
> then we mark it as SEEN. Does that mean that the later commit walk will
> skip walking its history? Would we then get a boundary that is actually
> further in history than necessary? ("A --not B C" would walk all of
> B..A if C has a bitmap, even if a lot of that region is reachable from C.)
>
> My initial thought here was that this is an unlikely case, so the
> optimization isn't worth the code complexity. But now, I'm a little
> concerned that it actually hurts the later walk in the case of multiple
> roots.

The point about commits with existing bitmaps as SEEN hurting the
boundary traversal is a good one that I hadn't considered. I agree with
what you're saying here, though I think that the optimization still
makes sense (without touching the SEEN bit).

If all of the haves have existing bitmaps, we don't want to bother
starting the boundary traversal at all, since we can just OR those
together and be done. So I think we'd just want to drop the SEEN bits
there, but let me know if I am missing something.

> > +	revs->boundary = 0;
> > +	revs->blob_objects = tmp_blobs;
> > +	revs->tree_objects = tmp_trees;
> > +	revs->tag_objects = tmp_tags;
>
> These would seem more natural if they were after the trace2_region_leave().

Good suggestion, thanks.

> > +			} else {
> > +				add_pending_object(revs, obj, "");
> > +				needs_walk = 1;
> > +			}
> > +		}
> > +
> > +		if (needs_walk)
> > +			cb.base = fill_in_bitmap(bitmap_git, revs, cb.base, NULL);
>
> I wonder if fill_in_bitmap() already does "the right thing" when there
> are no pending objects or when all pending objects are already in the
> bitmap. Do we need to do these checks, or should we just put all boundary
> commits in the pending set?

Great suggestion. It does do the right thing via its `should_include`
checks, which see if the commit we're trying to process is already in
the bitmap, and if so, propagates SEEN to it and its parents.

Here's a diff that I generated after reading through this message, let
me know if you think I'm missing anything:

--- >8 ---
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 634fc10156..f57b74bcc0 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1141,7 +1141,6 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 		    !obj_in_bitmap(bitmap_git, object, cb.base) &&
 		    add_commit_to_bitmap(bitmap_git, &cb.base,
 					 (struct commit *)object)) {
-			object->flags |= SEEN;
 			continue;
 		}

@@ -1175,13 +1174,14 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 				      show_boundary_object,
 				      &cb, NULL);
 	revs->boundary = 0;
+	trace2_region_leave("pack-bitmap", "boundary-traverse", the_repository);
+
 	revs->blob_objects = tmp_blobs;
 	revs->tree_objects = tmp_trees;
 	revs->tag_objects = tmp_tags;

 	reset_revision_walk();
 	clear_object_flags(UNINTERESTING);
-	trace2_region_leave("pack-bitmap", "boundary-traverse", the_repository);

 	/*
 	 * Then add the boundary commit(s) as fill-in traversal tips.
@@ -1189,26 +1189,21 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 	trace2_region_enter("pack-bitmap", "boundary-fill-in", the_repository);
 	if (cb.boundary.nr) {
 		struct object *obj;
-		int needs_walk = 0;
-
 		for (i = 0; i < cb.boundary.nr; i++) {
 			obj = cb.boundary.objects[i].item;

-			if (obj_in_bitmap(bitmap_git, obj, cb.base)) {
+			if (obj_in_bitmap(bitmap_git, obj, cb.base))
 				obj->flags |= SEEN;
-			} else {
+			else
 				add_pending_object(revs, obj, "");
-				needs_walk = 1;
-			}
 		}

-		if (needs_walk)
-			cb.base = fill_in_bitmap(bitmap_git, revs, cb.base, NULL);
+		cb.base = fill_in_bitmap(bitmap_git, revs, cb.base, NULL);
 	}
 	trace2_region_leave("pack-bitmap", "boundary-fill-in", the_repository);

-
 cleanup:
+	object_array_clear(&cb.boundary);
 	revs->ignore_missing_links = 0;

 	return cb.base;
--- 8< ---

Thanks,
Taylor

  reply	other threads:[~2023-05-05 18:44 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-25  0:00 [PATCH 0/3] pack-bitmap: boundary-based bitmap traversal Taylor Blau
2023-04-25  0:00 ` [PATCH 1/3] revision: support tracking uninteresting commits Taylor Blau
2023-04-25 18:15   ` Derrick Stolee
2023-05-03 21:48     ` Taylor Blau
2023-05-04 13:46       ` Derrick Stolee
2023-05-03 22:08     ` Taylor Blau
2023-05-04 13:59       ` Derrick Stolee
2023-05-05 17:30         ` Taylor Blau
2023-04-25 18:22   ` Junio C Hamano
2023-04-25 18:48     ` Taylor Blau
2023-04-25  0:00 ` [PATCH 2/3] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-04-25 18:32   ` Junio C Hamano
2023-04-25 18:51     ` [PATCH 2/3] pack-bitmap.c: extract `fill_in_bitmap()`t Taylor Blau
2023-04-25  0:00 ` [PATCH 3/3] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau
2023-04-25 18:52   ` Junio C Hamano
2023-05-02 21:31     ` Felipe Contreras
2023-05-03 21:42     ` Taylor Blau
2023-05-03 21:58       ` Junio C Hamano
2023-04-25 18:53   ` Derrick Stolee
2023-04-25 18:02 ` [PATCH 0/3] pack-bitmap: boundary-based " Junio C Hamano
2023-04-25 18:30   ` Derrick Stolee
2023-04-25 18:57     ` Taylor Blau
2023-04-25 19:52       ` Derrick Stolee
2023-05-03 21:43         ` Taylor Blau
2023-04-25 18:06 ` Derrick Stolee
2023-04-25 19:01   ` Taylor Blau
2023-04-25 20:27     ` Derrick Stolee
2023-05-01 22:08 ` Junio C Hamano
2023-05-02 23:52   ` Taylor Blau
2023-05-05 17:27 ` [PATCH v2 0/2] " Taylor Blau
2023-05-05 17:27   ` [PATCH v2 1/2] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-05-05 17:27   ` [PATCH v2 2/2] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau
2023-05-05 18:13     ` Derrick Stolee
2023-05-05 18:43       ` Taylor Blau [this message]
2023-05-05 17:33   ` [PATCH v2 0/2] pack-bitmap: boundary-based " Junio C Hamano
2023-05-05 17:59   ` Derrick Stolee
2023-05-05 18:46     ` [PATCH v2 0/2] pack-bitmap: boundary-based bitmap traversalt Taylor Blau
2023-05-05 20:45       ` Derrick Stolee
2023-05-08 17:38 ` [PATCH v3 0/3] pack-bitmap: boundary-based bitmap traversal Taylor Blau
2023-05-08 17:38   ` [PATCH v3 1/3] object: add object_array initializer helper function Taylor Blau
2023-05-08 17:38   ` [PATCH v3 2/3] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-05-08 17:38   ` [PATCH v3 3/3] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau
2023-05-08 20:53     ` Derrick Stolee
2023-05-08 22:12       ` Taylor Blau
2023-05-10 22:55   ` [PATCH v3 0/3] pack-bitmap: boundary-based " Junio C Hamano
2023-05-10 23:10     ` Taylor Blau
2023-05-11 15:23       ` Derrick Stolee
2023-06-08 16:25 ` [PATCH v4 " Taylor Blau
2023-06-08 16:25   ` [PATCH v4 1/3] object: add object_array initializer helper function Taylor Blau
2023-06-08 16:25   ` [PATCH v4 2/3] pack-bitmap.c: extract `fill_in_bitmap()` Taylor Blau
2023-06-08 16:25   ` [PATCH v4 3/3] pack-bitmap.c: use commit boundary during bitmap traversal Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZFVOXc6qvVWsI/2C@nand.local \
    --to=me@ttaylorr.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).