git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Josh Steadmon <steadmon@google.com>
Cc: git@vger.kernel.org, jonathantanmy@google.com, jrnieder@gmail.com
Subject: Re: [PATCH v2] rev-list: exclude promisor objects at walk time
Date: Thu, 4 Apr 2019 19:08:00 -0400	[thread overview]
Message-ID: <20190404230759.GA26623@sigill.intra.peff.net> (raw)
In-Reply-To: <9f327d6d8dc5e71eb0039aef3ac76ea16c2adab3.1554417917.git.steadmon@google.com>

On Thu, Apr 04, 2019 at 03:53:56PM -0700, Josh Steadmon wrote:

> For large repositories, enumerating the list of all promisor objects (in
> order to exclude them from a rev-list walk) can take a significant
> amount of time).
> 
> When --exclude-promisor-objects is passed to rev-list, don't enumerate
> the promisor objects. Instead, filter them (and any children objects)
> during the actual graph walk.

Yeah, this is definitely the approach I was thinking of.

Did you (or anybody else) have any thoughts on the case where a given
object is referred to both by a promisor and a non-promisor (and we
don't have it)? That's the "shortcut" I think we're taking here: we
would no longer realize that it's available via the promisor when we
traverse to it from the non-promisor. I'm just not clear on whether that
can ever happen.

> Helped-By: Jonathan Tan <jonathantanmy@google.com>
> Helped-By: Jeff King <peff@peff.net>
> Helped-By: Jonathan Nieder <jrnieder@gmail.com>
> 
> Signed-off-by: Josh Steadmon <steadmon@google.com>

Minor nit, but these should all be part of the same block.

> diff --git a/list-objects.c b/list-objects.c
> index dc77361e11..d1eaa0999e 100644
> --- a/list-objects.c
> +++ b/list-objects.c
> @@ -30,6 +30,7 @@ static void process_blob(struct traversal_context *ctx,
>  	struct object *obj = &blob->object;
>  	size_t pathlen;
>  	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
> +	struct object_info oi = OBJECT_INFO_INIT;
>  
>  	if (!ctx->revs->blob_objects)
>  		return;
> @@ -37,6 +38,11 @@ static void process_blob(struct traversal_context *ctx,
>  		die("bad blob object");
>  	if (obj->flags & (UNINTERESTING | SEEN))
>  		return;
> +	if (ctx->revs->exclude_promisor_objects &&
> +	    !oid_object_info_extended(the_repository, &obj->oid, &oi, 0) &&
> +	    oi.whence == OI_PACKED &&
> +	    oi.u.packed.pack->pack_promisor)
> +		return;

This conditional gets repeated a lot in your patch. Perhaps it's worth a
helper so we can say:

  if (skip_promisor_object(&ctx->revs, &obj->oid))
	return;

in each place?

One other possible small optimization: we don't look up the object
unless the caller asked to exclude promisors, which is good. But we
could also keep a single flag for "is there a promisor pack at all?".
When there isn't, we know there's no point in looking for the object.

It might not matter much in practice. The main caller here is going to
be check_connected(), and it only passes --exclude-promisor-objects if
it's in a partial clone.

> [...]

I didn't see any tweaks to the callers, which makes sense; we're already
passing --exclude-promisor-objects as necessary. Which means by itself,
this patch should be making things faster, right? Do you have timings to
show that off?

-Peff

  reply	other threads:[~2019-04-04 23:08 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-03 17:27 [PATCH] clone: do faster object check for partial clones Josh Steadmon
2019-04-03 18:58 ` Jonathan Tan
2019-04-03 19:41 ` Jeff King
2019-04-03 20:57   ` Jonathan Tan
2019-04-04  0:21     ` Josh Steadmon
2019-04-04  1:33     ` Jeff King
2019-04-04 22:53 ` [PATCH v2] rev-list: exclude promisor objects at walk time Josh Steadmon
2019-04-04 23:08   ` Jeff King [this message]
2019-04-04 23:47     ` Josh Steadmon
2019-04-05  0:00       ` Jeff King
2019-04-05  0:09         ` Josh Steadmon
2019-04-08 20:59           ` Josh Steadmon
2019-04-08 21:06 ` [PATCH v3] " Josh Steadmon
2019-04-08 22:23   ` Christian Couder
2019-04-08 23:12     ` Josh Steadmon
2019-04-09 15:14   ` Junio C Hamano
2019-04-09 15:15     ` Jeff King
2019-04-09 15:43       ` Junio C Hamano
2019-04-09 16:35         ` Josh Steadmon
2019-04-09 18:04   ` SZEDER Gábor
2019-04-09 23:42     ` Josh Steadmon
2019-04-11  4:06       ` Jeff King
2019-04-12 22:38         ` Josh Steadmon
2019-04-13  5:34           ` Jeff King
2019-04-19 20:26             ` Josh Steadmon
2019-04-19 21:00 ` [PATCH v4] clone: do faster object check for partial clones Josh Steadmon
2019-04-22 21:31   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190404230759.GA26623@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).