From: Jeff King <peff@peff.net>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, stolee@gmail.com
Subject: Re: [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference()
Date: Fri, 7 Dec 2018 03:53:34 -0500 [thread overview]
Message-ID: <20181207085334.GA5167@sigill.intra.peff.net> (raw)
In-Reply-To: <20181206235446.147173-1-jonathantanmy@google.com>
On Thu, Dec 06, 2018 at 03:54:46PM -0800, Jonathan Tan wrote:
> This makes sense - I thought I shouldn't mention the commit graph in the
> code since it seems like a layering violation, but I felt the need to
> mention commit graph in a comment, so maybe the need to mention commit
> graph in the code is there too. Subsequently, maybe the lookup-for-type
> could be replaced by a lookup-in-commit-graph (maybe by using
> parse_commit_in_graph() directly), which should be at least slightly
> faster.
That makes more sense to me. If we don't have a commit graph at all,
it's a quick noop. If we do, we might binary search in the list of
commits for a non-commit. But that's strictly faster than finding the
object's type (which involves a binary search of a larger list, followed
by actually accessing the type info).
> > In general, it would be nice if we had a more incremental API
> > for accessing objects: open, get metadata, then read the data. That
> > would make these kinds of optimizations "free".
>
> Would this be assuming that to read the data, you would (1) first need to
> read the metadata, and (2) there would be no redundancy in reading the
> two? It seems to me that for loose objects, you would want to perform
> all your reads at once, since any read requires opening the file, and
> for commit graphs, you just want to read what you want, since the
> metadata and the data are in separate places.
By metadata here, I don't mean the commit-graph data, but just the
object type and size. So I'm imagining an interface more like:
- object_open() locates the object, and stores either the pack
file/offset or a descriptor to a loose path in an opaque handle
struct
- object_size() and object_type() on that handle would do what you
expect. For loose objects, these would parse the header (the
equivalent of unpack_sha1_header()). For packed ones, they'd use the
object header in the pack (and chase down the delta bits as needed).
- object_contents() would return the full content
- object_read() could sequentially read a subset of the file (this
could replace the streaming interface we currently have)
We have most of the low-level bits for this already, if you poke into
what object_info_extended() is doing. We just don't have them packaged
in an interface which can persist across multiple calls.
With an interface like that, parse_object()'s large-blob check could be
something like the patch below.
But your case here is a bit more interesting. If we have a commit graph,
then we can avoid opening (or even finding!) the on-disk object at all.
So I actually think it makes sense to just check the commit-graph first,
as discussed above.
---
diff --git a/object.c b/object.c
index e54160550c..afce58c0bc 100644
--- a/object.c
+++ b/object.c
@@ -254,23 +254,31 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
const struct object_id *repl = lookup_replace_object(r, oid);
void *buffer;
struct object *obj;
+ struct object_handle oh;
obj = lookup_object(r, oid->hash);
if (obj && obj->parsed)
return obj;
- if ((obj && obj->type == OBJ_BLOB && has_object_file(oid)) ||
- (!obj && has_object_file(oid) &&
- oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
- if (check_object_signature(repl, NULL, 0, NULL) < 0) {
+ if (object_open(&oh, oid) < 0)
+ return NULL; /* missing object */
+
+ if (object_type(&oh) == OBJ_BLOB) {
+ /* this will call object_read() on 4k chunks */
+ if (check_object_signature_stream(&oh, oid)) {
error(_("sha1 mismatch %s"), oid_to_hex(oid));
return NULL;
}
+ object_close(&oh); /* we don't care about contents */
parse_blob_buffer(lookup_blob(r, oid), NULL, 0);
return lookup_object(r, oid->hash);
}
- buffer = read_object_file(oid, &type, &size);
+ type = object_type(&oh);
+ size = object_size(&oh);
+ buffer = object_contents(&oh);
+ object_close(&oh);
+
if (buffer) {
if (check_object_signature(repl, buffer, size, type_name(type)) < 0) {
free(buffer);
next prev parent reply other threads:[~2018-12-07 8:53 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-04 22:42 [PATCH on sb/more-repo-in-api] revision: use commit graph in get_reference() Jonathan Tan
2018-12-04 23:12 ` Stefan Beller
2018-12-06 23:36 ` Jonathan Tan
2018-12-07 13:49 ` Derrick Stolee
2018-12-05 4:54 ` Jeff King
2018-12-06 23:54 ` Jonathan Tan
2018-12-07 8:53 ` Jeff King [this message]
2018-12-05 23:15 ` Junio C Hamano
2018-12-07 21:50 ` [PATCH on master v2] " Jonathan Tan
2018-12-09 0:51 ` Junio C Hamano
2018-12-09 1:49 ` Junio C Hamano
2018-12-11 10:54 ` Jeff King
2018-12-12 19:58 ` Jonathan Tan
2018-12-13 1:27 ` Jeff King
2018-12-13 16:20 ` Derrick Stolee
2018-12-13 18:54 ` [PATCH v3] " Jonathan Tan
2018-12-14 3:20 ` Junio C Hamano
2018-12-14 8:45 ` Jeff King
2019-01-25 15:33 ` Regression in: [PATCH on sb/more-repo-in-api] " SZEDER Gábor
2019-01-25 19:56 ` Stefan Beller
2019-01-25 22:01 ` Jonathan Tan
2019-01-25 22:14 ` SZEDER Gábor
2019-01-25 22:21 ` SZEDER Gábor
2019-01-27 13:08 ` [PATCH] object_as_type: initialize commit-graph-related fields of 'struct commit' SZEDER Gábor
2019-01-27 13:28 ` SZEDER Gábor
2019-01-27 18:40 ` Derrick Stolee
2019-01-28 16:15 ` Jeff King
2019-01-28 16:57 ` Jonathan Tan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181207085334.GA5167@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).