git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>, sbeller@google.com
Cc: git@vger.kernel.org
Subject: Re: [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference()
Date: Fri, 7 Dec 2018 08:49:21 -0500	[thread overview]
Message-ID: <aa0cd481-c135-47aa-2a69-e3dc71661caa@gmail.com> (raw)
In-Reply-To: <20181206233626.144072-1-jonathantanmy@google.com>

On 12/6/2018 6:36 PM, Jonathan Tan wrote:
>> AFAICT oid_object_info doesn't take advantage of the commit graph,
>> but just looks up the object header, which is still less than completely
>> parsing it. Then lookup_commit is overly strict, as it may return
>> NULL as when there still is a type mismatch (I don't think a mismatch
>> could happen here, as both rely on just the object store, and not the
>> commit graph.), so this would be just defensive programming for
>> the sake of it. I dunno.
>>
>>      struct commit *c;
>>
>>      if (oid_object_info(revs->repo, oid, NULL) == OBJ_COMMIT &&
>>          (c = lookup_commit(revs->repo, oid)) &&
>>          !repo_parse_commit(revs->repo, c))
>>              object = (struct object *) c;
>>      else
>>          object = parse_object(revs->repo, oid);
> I like this way better - I'll do it in the next version.

If we do _not_ have a commit-graph or if the commit-graph does not have
that commit, this will have the same performance problem, right?

Should we instead create a direct dependence on the commit-graph, and try
to parse the oid from the graph directly? If it succeeds, then we learn
that the object is a commit, in addition to all of the parsing work. This
means we could avoid oid_object_info() loading data if we succeed. We
would fall back to parse_object() if it fails.

I was thinking this should be a simple API call to parse_commit_in_graph(),
but that requires a struct commit filled with an oid, which is not the
best idea if we don't actually know it is a commit yet.

The approach I recommend would then be more detailed:

1. Modify find_commit_in_graph() to take a struct object_id instead of a
    struct commit. This helps find the integer position in the graph. That
    position can be used in fill_commit_in_graph() to load the commit
    contents. Keep find_commit_in_graph() static as it should not be a
    public function.

2. Create a public function with prototype

struct commit *try_parse_commit_from_graph(struct repository *r, struct 
object_id *oid)

    that returns a commit struct fully parsed if and only if the repository
    has that oid. It can call find_commit_in_graph(), then 
lookup_commit() and
    fill_commit_in_graph() to create the commit and parse the data.

3. In replace of the snippet above, do:

     struct commit *c;

     if ((c = try_parse_commit_from_graph(revs->repo, oid))
         object = (struct object *)c;
     else
         object = parse_object(revs->repo, oid);

A similar pattern _could_ be used in parse_object(), but I don't recommend
doing this pattern unless we have a reasonable suspicion that we are going
to parse commits more often than other objects. (It adds an O(log(# 
commits))
binary search to each object.)

A final thought: consider making this "try the commit graph first, but fall
back to parse_object()" a library function with a name like

     struct object *parse_probably_commit(struct repository *r, struct 
object_id *oid)

so other paths that are parsing a lot of commits (but also maybe tags) could
use the logic.

Thanks!
-Stolee

  reply	other threads:[~2018-12-07 13:50 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-04 22:42 [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference() Jonathan Tan
2018-12-04 23:12 ` Stefan Beller
2018-12-06 23:36   ` Jonathan Tan
2018-12-07 13:49     ` Derrick Stolee [this message]
2018-12-05  4:54 ` Jeff King
2018-12-06 23:54   ` Jonathan Tan
2018-12-07  8:53     ` Jeff King
2018-12-05 23:15 ` Junio C Hamano
2018-12-07 21:50 ` [PATCH on master v2] " Jonathan Tan
2018-12-09  0:51   ` Junio C Hamano
2018-12-09  1:49     ` Junio C Hamano
2018-12-11 10:54     ` Jeff King
2018-12-12 19:58       ` Jonathan Tan
2018-12-13  1:27         ` Jeff King
2018-12-13 16:20           ` Derrick Stolee
2018-12-13 18:54 ` [PATCH v3] " Jonathan Tan
2018-12-14  3:20   ` Junio C Hamano
2018-12-14  8:45   ` Jeff King
2019-01-25 15:33 ` Regression in: [PATCH on sb/more-repo-in-api] " SZEDER Gábor
2019-01-25 19:56   ` Stefan Beller
2019-01-25 22:01     ` Jonathan Tan
2019-01-25 22:14     ` SZEDER Gábor
2019-01-25 22:21       ` SZEDER Gábor
2019-01-27 13:08         ` [PATCH] object_as_type: initialize commit-graph-related fields of 'struct commit' SZEDER Gábor
2019-01-27 13:28           ` SZEDER Gábor
2019-01-27 18:40             ` Derrick Stolee
2019-01-28 16:15           ` Jeff King
2019-01-28 16:57           ` Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa0cd481-c135-47aa-2a69-e3dc71661caa@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).