git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Josh Steadmon <steadmon@google.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, jonathantanmy@google.com, jrnieder@gmail.com
Subject: Re: [PATCH v2] rev-list: exclude promisor objects at walk time
Date: Thu, 4 Apr 2019 16:47:26 -0700	[thread overview]
Message-ID: <20190404234726.GG60888@google.com> (raw)
In-Reply-To: <20190404230759.GA26623@sigill.intra.peff.net>

On 2019.04.04 19:08, Jeff King wrote:
> On Thu, Apr 04, 2019 at 03:53:56PM -0700, Josh Steadmon wrote:
> 
> > For large repositories, enumerating the list of all promisor objects (in
> > order to exclude them from a rev-list walk) can take a significant
> > amount of time).
> > 
> > When --exclude-promisor-objects is passed to rev-list, don't enumerate
> > the promisor objects. Instead, filter them (and any children objects)
> > during the actual graph walk.
> 
> Yeah, this is definitely the approach I was thinking of.
> 
> Did you (or anybody else) have any thoughts on the case where a given
> object is referred to both by a promisor and a non-promisor (and we
> don't have it)? That's the "shortcut" I think we're taking here: we
> would no longer realize that it's available via the promisor when we
> traverse to it from the non-promisor. I'm just not clear on whether that
> can ever happen.

I am not sure either. In process_blob() and process_tree() there are
additional checks for whether missing blobs/trees are promisor objects
using is_promisor_object()...  but if we call that we undo the
performance gains from this change.


> > Helped-By: Jonathan Tan <jonathantanmy@google.com>
> > Helped-By: Jeff King <peff@peff.net>
> > Helped-By: Jonathan Nieder <jrnieder@gmail.com>
> > 
> > Signed-off-by: Josh Steadmon <steadmon@google.com>
> 
> Minor nit, but these should all be part of the same block.

Will fix in v3.


> > diff --git a/list-objects.c b/list-objects.c
> > index dc77361e11..d1eaa0999e 100644
> > --- a/list-objects.c
> > +++ b/list-objects.c
> > @@ -30,6 +30,7 @@ static void process_blob(struct traversal_context *ctx,
> >  	struct object *obj = &blob->object;
> >  	size_t pathlen;
> >  	enum list_objects_filter_result r = LOFR_MARK_SEEN | LOFR_DO_SHOW;
> > +	struct object_info oi = OBJECT_INFO_INIT;
> >  
> >  	if (!ctx->revs->blob_objects)
> >  		return;
> > @@ -37,6 +38,11 @@ static void process_blob(struct traversal_context *ctx,
> >  		die("bad blob object");
> >  	if (obj->flags & (UNINTERESTING | SEEN))
> >  		return;
> > +	if (ctx->revs->exclude_promisor_objects &&
> > +	    !oid_object_info_extended(the_repository, &obj->oid, &oi, 0) &&
> > +	    oi.whence == OI_PACKED &&
> > +	    oi.u.packed.pack->pack_promisor)
> > +		return;
> 
> This conditional gets repeated a lot in your patch. Perhaps it's worth a
> helper so we can say:
> 
>   if (skip_promisor_object(&ctx->revs, &obj->oid))
> 	return;
> 
> in each place?

Will fix in v3.


> One other possible small optimization: we don't look up the object
> unless the caller asked to exclude promisors, which is good. But we
> could also keep a single flag for "is there a promisor pack at all?".
> When there isn't, we know there's no point in looking for the object.
> 
> It might not matter much in practice. The main caller here is going to
> be check_connected(), and it only passes --exclude-promisor-objects if
> it's in a partial clone.

I'm not necessarily opposed, but I'm leaning towards the "won't matter
much" side.

Where would such a flag live, in this case, and who would be responsible
for initializing it? I guess it would only matter for rev-list, so we
could initialize it in cmd_rev_list() if --exclude-promisor-objects is
passed?

> > [...]
> 
> I didn't see any tweaks to the callers, which makes sense; we're already
> passing --exclude-promisor-objects as necessary. Which means by itself,
> this patch should be making things faster, right? Do you have timings to
> show that off?

Yeah, for a partial clone of a large-ish Android repo [1], we see the
connectivity check go from >180s to ~7s.

[1]: https://android.googlesource.com/platform/frameworks/base/

  reply	other threads:[~2019-04-04 23:49 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-03 17:27 [PATCH] clone: do faster object check for partial clones Josh Steadmon
2019-04-03 18:58 ` Jonathan Tan
2019-04-03 19:41 ` Jeff King
2019-04-03 20:57   ` Jonathan Tan
2019-04-04  0:21     ` Josh Steadmon
2019-04-04  1:33     ` Jeff King
2019-04-04 22:53 ` [PATCH v2] rev-list: exclude promisor objects at walk time Josh Steadmon
2019-04-04 23:08   ` Jeff King
2019-04-04 23:47     ` Josh Steadmon [this message]
2019-04-05  0:00       ` Jeff King
2019-04-05  0:09         ` Josh Steadmon
2019-04-08 20:59           ` Josh Steadmon
2019-04-08 21:06 ` [PATCH v3] " Josh Steadmon
2019-04-08 22:23   ` Christian Couder
2019-04-08 23:12     ` Josh Steadmon
2019-04-09 15:14   ` Junio C Hamano
2019-04-09 15:15     ` Jeff King
2019-04-09 15:43       ` Junio C Hamano
2019-04-09 16:35         ` Josh Steadmon
2019-04-09 18:04   ` SZEDER Gábor
2019-04-09 23:42     ` Josh Steadmon
2019-04-11  4:06       ` Jeff King
2019-04-12 22:38         ` Josh Steadmon
2019-04-13  5:34           ` Jeff King
2019-04-19 20:26             ` Josh Steadmon
2019-04-19 21:00 ` [PATCH v4] clone: do faster object check for partial clones Josh Steadmon
2019-04-22 21:31   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190404234726.GG60888@google.com \
    --to=steadmon@google.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).