From: Jonathan Tan <jonathantanmy@google.com>
To: git@vger.kernel.org
Cc: Jonathan Tan <jonathantanmy@google.com>,
peff@peff.net, stolee@gmail.com, gitster@pobox.com
Subject: [PATCH v3] revision: use commit graph in get_reference()
Date: Thu, 13 Dec 2018 10:54:50 -0800 [thread overview]
Message-ID: <20181213185450.230953-1-jonathantanmy@google.com> (raw)
In-Reply-To: <20181204224238.50966-1-jonathantanmy@google.com>
When fetching into a repository, a connectivity check is first made by
check_exist_and_connected() in builtin/fetch.c that runs:
git rev-list --objects --stdin --not --all --quiet <(list of objects)
If the client repository has many refs, this command can be slow,
regardless of the nature of the server repository or what is being
fetched. A profiler reveals that most of the time is spent in
setup_revisions() (approx. 60/63), and of the time spent in
setup_revisions(), most of it is spent in parse_object() (approx.
49/60). This is because setup_revisions() parses the target of every ref
(from "--all"), and parse_object() reads the buffer of the object.
Reading the buffer is unnecessary if the repository has a commit graph
and if the ref points to a commit (which is typically the case). This
patch uses the commit graph wherever possible; on my computer, when I
run the above command with a list of 1 object on a many-ref repository,
I get a speedup from 1.8s to 1.0s.
Another way to accomplish this effect would be to modify parse_object()
to use the commit graph if possible; however, I did not want to change
parse_object()'s current behavior of always checking the object
signature of the returned object.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
This patch is still on master. Junio, let me know if you would rather
have me base it on sb/more-repo-in-api instead.
I mentioned [1] that I would unify parse_commit_in_graph_one() and
parse_commit_in_graph() so that one documentation comment could cover
all the functionality, but with the simpler API, I decided not to do
that to minimize the diff.
Change in v3: Now uses a simpler API with the algorithm suggested by
Peff in [2], except that I also retain the existing optimization that
checks if graph_pos is already set.
[1] https://public-inbox.org/git/20181212195812.232726-1-jonathantanmy@google.com/
[2] https://public-inbox.org/git/20181213012707.GC26210@sigill.intra.peff.net/
---
commit-graph.c | 44 ++++++++++++++++++++++++++------------
commit-graph.h | 11 +++++-----
commit.c | 4 +++-
revision.c | 5 ++++-
t/helper/test-repository.c | 8 ++-----
5 files changed, 45 insertions(+), 27 deletions(-)
diff --git a/commit-graph.c b/commit-graph.c
index 40c855f185..0aca7ec0fe 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -286,7 +286,8 @@ void close_commit_graph(struct repository *r)
r->objects->commit_graph = NULL;
}
-static int bsearch_graph(struct commit_graph *g, struct object_id *oid, uint32_t *pos)
+static int bsearch_graph(struct commit_graph *g, const struct object_id *oid,
+ uint32_t *pos)
{
return bsearch_hash(oid->hash, g->chunk_oid_fanout,
g->chunk_oid_lookup, g->hash_len, pos);
@@ -374,24 +375,42 @@ static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uin
}
}
-static int parse_commit_in_graph_one(struct commit_graph *g, struct commit *item)
+static struct commit *parse_commit_in_graph_one(struct repository *r,
+ struct commit_graph *g,
+ const struct object_id *oid)
{
+ struct object *obj;
+ struct commit *commit;
uint32_t pos;
- if (item->object.parsed)
- return 1;
+ obj = lookup_object(r, oid->hash);
+ commit = obj && obj->type == OBJ_COMMIT ? (struct commit *) obj : NULL;
+ if (commit && obj->parsed)
+ return commit;
- if (find_commit_in_graph(item, g, &pos))
- return fill_commit_in_graph(item, g, pos);
+ if (commit && commit->graph_pos != COMMIT_NOT_FROM_GRAPH)
+ pos = commit->graph_pos;
+ else if (bsearch_graph(g, oid, &pos))
+ ; /* bsearch_graph sets pos */
+ else
+ return NULL;
- return 0;
+ if (!commit) {
+ commit = lookup_commit(r, oid);
+ if (!commit)
+ return NULL;
+ }
+
+ fill_commit_in_graph(commit, g, pos);
+ return commit;
}
-int parse_commit_in_graph(struct repository *r, struct commit *item)
+struct commit *parse_commit_in_graph(struct repository *r,
+ const struct object_id *oid)
{
if (!prepare_commit_graph(r))
- return 0;
- return parse_commit_in_graph_one(r->objects->commit_graph, item);
+ return NULL;
+ return parse_commit_in_graph_one(r, r->objects->commit_graph, oid);
}
void load_commit_graph_info(struct repository *r, struct commit *item)
@@ -1004,8 +1023,6 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
}
for (i = 0; i < g->num_commits; i++) {
- struct commit *graph_commit;
-
hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
if (i && oidcmp(&prev_oid, &cur_oid) >= 0)
@@ -1024,8 +1041,7 @@ int verify_commit_graph(struct repository *r, struct commit_graph *g)
cur_fanout_pos++;
}
- graph_commit = lookup_commit(r, &cur_oid);
- if (!parse_commit_in_graph_one(g, graph_commit))
+ if (!parse_commit_in_graph_one(r, g, &cur_oid))
graph_report("failed to parse %s from commit-graph",
oid_to_hex(&cur_oid));
}
diff --git a/commit-graph.h b/commit-graph.h
index 9db40b4d3a..b67aac1125 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -13,16 +13,17 @@ struct commit;
char *get_commit_graph_filename(const char *obj_dir);
/*
- * Given a commit struct, try to fill the commit struct info, including:
+ * If the given commit is in the commit graph, returns a commit struct
+ * including the following info:
* 1. tree object
* 2. date
* 3. parents.
*
- * Returns 1 if and only if the commit was found in the packed graph.
- *
- * See parse_commit_buffer() for the fallback after this call.
+ * If not, returns NULL. See parse_commit_buffer() for the fallback after this
+ * call.
*/
-int parse_commit_in_graph(struct repository *r, struct commit *item);
+struct commit *parse_commit_in_graph(struct repository *r,
+ const struct object_id *oid);
/*
* It is possible that we loaded commit contents from the commit buffer,
diff --git a/commit.c b/commit.c
index d13a7bc374..19ce5e34a2 100644
--- a/commit.c
+++ b/commit.c
@@ -456,7 +456,9 @@ int parse_commit_internal(struct commit *item, int quiet_on_missing, int use_com
return -1;
if (item->object.parsed)
return 0;
- if (use_commit_graph && parse_commit_in_graph(the_repository, item))
+ if (use_commit_graph &&
+ parse_commit_in_graph(the_repository, &item->object.oid) &&
+ item->object.parsed)
return 0;
buffer = read_object_file(&item->object.oid, &type, &size);
if (!buffer)
diff --git a/revision.c b/revision.c
index 13e0519c02..7f54f3b4c7 100644
--- a/revision.c
+++ b/revision.c
@@ -213,7 +213,10 @@ static struct object *get_reference(struct rev_info *revs, const char *name,
{
struct object *object;
- object = parse_object(revs->repo, oid);
+ object = (struct object *) parse_commit_in_graph(revs->repo, oid);
+ if (!object)
+ object = parse_object(revs->repo, oid);
+
if (!object) {
if (revs->ignore_missing)
return object;
diff --git a/t/helper/test-repository.c b/t/helper/test-repository.c
index 6a84a53efb..35bfd1233d 100644
--- a/t/helper/test-repository.c
+++ b/t/helper/test-repository.c
@@ -20,9 +20,7 @@ static void test_parse_commit_in_graph(const char *gitdir, const char *worktree,
if (repo_init(&r, gitdir, worktree))
die("Couldn't init repo");
- c = lookup_commit(&r, commit_oid);
-
- if (!parse_commit_in_graph(&r, c))
+ if (!(c = parse_commit_in_graph(&r, commit_oid)))
die("Couldn't parse commit");
printf("%"PRItime, c->date);
@@ -46,13 +44,11 @@ static void test_get_commit_tree_in_graph(const char *gitdir,
if (repo_init(&r, gitdir, worktree))
die("Couldn't init repo");
- c = lookup_commit(&r, commit_oid);
-
/*
* get_commit_tree_in_graph does not automatically parse the commit, so
* parse it first.
*/
- if (!parse_commit_in_graph(&r, c))
+ if (!(c = parse_commit_in_graph(&r, commit_oid)))
die("Couldn't parse commit");
tree = get_commit_tree_in_graph(&r, c);
if (!tree)
--
2.19.0.271.gfe8321ec05.dirty
next prev parent reply other threads:[~2018-12-13 18:54 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-04 22:42 [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference() Jonathan Tan
2018-12-04 23:12 ` Stefan Beller
2018-12-06 23:36 ` Jonathan Tan
2018-12-07 13:49 ` Derrick Stolee
2018-12-05 4:54 ` Jeff King
2018-12-06 23:54 ` Jonathan Tan
2018-12-07 8:53 ` Jeff King
2018-12-05 23:15 ` Junio C Hamano
2018-12-07 21:50 ` [PATCH on master v2] " Jonathan Tan
2018-12-09 0:51 ` Junio C Hamano
2018-12-09 1:49 ` Junio C Hamano
2018-12-11 10:54 ` Jeff King
2018-12-12 19:58 ` Jonathan Tan
2018-12-13 1:27 ` Jeff King
2018-12-13 16:20 ` Derrick Stolee
2018-12-13 18:54 ` Jonathan Tan [this message]
2018-12-14 3:20 ` [PATCH v3] " Junio C Hamano
2018-12-14 8:45 ` Jeff King
2019-01-25 15:33 ` Regression in: [PATCH on sb/more-repo-in-api] " SZEDER Gábor
2019-01-25 19:56 ` Stefan Beller
2019-01-25 22:01 ` Jonathan Tan
2019-01-25 22:14 ` SZEDER Gábor
2019-01-25 22:21 ` SZEDER Gábor
2019-01-27 13:08 ` [PATCH] object_as_type: initialize commit-graph-related fields of 'struct commit' SZEDER Gábor
2019-01-27 13:28 ` SZEDER Gábor
2019-01-27 18:40 ` Derrick Stolee
2019-01-28 16:15 ` Jeff King
2019-01-28 16:57 ` Jonathan Tan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181213185450.230953-1-jonathantanmy@google.com \
--to=jonathantanmy@google.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).