git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference()
Date: Tue, 4 Dec 2018 23:54:16 -0500	[thread overview]
Message-ID: <20181205045416.GB12284@sigill.intra.peff.net> (raw)
In-Reply-To: <20181204224238.50966-1-jonathantanmy@google.com>

On Tue, Dec 04, 2018 at 02:42:38PM -0800, Jonathan Tan wrote:

> diff --git a/revision.c b/revision.c
> index b5108b75ab..e7da2c57ab 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -212,7 +212,20 @@ static struct object *get_reference(struct rev_info *revs, const char *name,
>  {
>  	struct object *object;
>  
> -	object = parse_object(revs->repo, oid);
> +	/*
> +	 * If the repository has commit graphs, repo_parse_commit() avoids
> +	 * reading the object buffer, so use it whenever possible.
> +	 */
> +	if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT) {
> +		struct commit *c = lookup_commit(revs->repo, oid);
> +		if (!repo_parse_commit(revs->repo, c))
> +			object = (struct object *) c;
> +		else
> +			object = NULL;
> +	} else {
> +		object = parse_object(revs->repo, oid);
> +	}

This seems like a reasonable thing to do, but I have sort of a
meta-comment. In several places we've started doing this kind of "if
it's this type of object, do X, otherwise do Y" optimization (e.g.,
handling large blobs for streaming).

And in the many cases we end up doubling the effort to do object
lookups: here we do one lookup to get the type, and then if it's not a
commit (or if we don't have a commit graph) we end up parsing it anyway.

I wonder if we could do better. In this instance, it might make sense
to first see if we actually have a commit graph available (it might not
have this object, of course, but at least we'd expect it to have most
commits). In general, it would be nice if we had a more incremental API
for accessing objects: open, get metadata, then read the data. That
would make these kinds of optimizations "free".

I don't have numbers for how much the extra lookups cost. The lookups
are probably dwarfed by parse_object() in general, so even if we save
only a few full object loads, it may be a win. It just seems a shame
that we may be making the "slow" paths (when our type-specific check
doesn't match) even slower.

-Peff

  parent reply	other threads:[~2018-12-05  4:54 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-04 22:42 [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference() Jonathan Tan
2018-12-04 23:12 ` Stefan Beller
2018-12-06 23:36   ` Jonathan Tan
2018-12-07 13:49     ` Derrick Stolee
2018-12-05  4:54 ` Jeff King [this message]
2018-12-06 23:54   ` Jonathan Tan
2018-12-07  8:53     ` Jeff King
2018-12-05 23:15 ` Junio C Hamano
2018-12-07 21:50 ` [PATCH on master v2] " Jonathan Tan
2018-12-09  0:51   ` Junio C Hamano
2018-12-09  1:49     ` Junio C Hamano
2018-12-11 10:54     ` Jeff King
2018-12-12 19:58       ` Jonathan Tan
2018-12-13  1:27         ` Jeff King
2018-12-13 16:20           ` Derrick Stolee
2018-12-13 18:54 ` [PATCH v3] " Jonathan Tan
2018-12-14  3:20   ` Junio C Hamano
2018-12-14  8:45   ` Jeff King
2019-01-25 15:33 ` Regression in: [PATCH on sb/more-repo-in-api] " SZEDER Gábor
2019-01-25 19:56   ` Stefan Beller
2019-01-25 22:01     ` Jonathan Tan
2019-01-25 22:14     ` SZEDER Gábor
2019-01-25 22:21       ` SZEDER Gábor
2019-01-27 13:08         ` [PATCH] object_as_type: initialize commit-graph-related fields of 'struct commit' SZEDER Gábor
2019-01-27 13:28           ` SZEDER Gábor
2019-01-27 18:40             ` Derrick Stolee
2019-01-28 16:15           ` Jeff King
2019-01-28 16:57           ` Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181205045416.GB12284@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).