git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Jonathan Tan <jonathantanmy@google.com>,
	git@vger.kernel.org, peartben@gmail.com, benpeart@microsoft.com
Subject: Re: [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info
Date: Tue, 28 Feb 2017 16:59:37 -0500	[thread overview]
Message-ID: <20170228215937.yd4juycjf7y3vish@sigill.intra.peff.net> (raw)
In-Reply-To: <xmqq1suij8kr.fsf@gitster.mtv.corp.google.com>

On Tue, Feb 28, 2017 at 01:42:44PM -0800, Junio C Hamano wrote:

> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > It could be argued that in the future, Git might need to distinguish
> > tree_objects from blob_objects - in particular, a user might want
> > rev-list to print the trees but not the blobs. 
> 
> That was exactly why these bits were originally made to "appear
> independent but in practice nobody sets only one and leaves others
> off".  
> 
> And it didn't happen in the past 10 years, which tells us that we
> should take this patch.

I actually have a patch which uses the distinction. It's for
upload-archive doing reachability checks (which seems rather familiar to
what's going on here).

The whole series (from 2013!) is at:

  git://github.com/peff/git jk/archive-reachability

but the relevant commits are below.

I don't think the same logic holds for this case, though, because
somebody actually can ask for a single blob.

-- >8 --
From: Jeff King <peff@peff.net>
Date: Wed, 5 Jun 2013 17:57:02 -0400
Subject: [PATCH] list-objects: optimize "revs->blob_objects = 0" case

If we are traversing trees during a "--objects"
traversal, we may skip blobs if the "blob_objects" field of
rev_info is not set. But we do so as the first thing in
process_blob(), only after we have actually created the
"struct blob" object, incurring a hash lookup. We can
optimize out this no-op call completely.

This does not actually affect any current code, as all of
the current traversals always set blob_objects when looking
at objects, anyway.

Signed-off-by: Jeff King <peff@peff.net>
---
 list-objects.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/list-objects.c b/list-objects.c
index f3ca6aafb..58ad69557 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -117,7 +117,7 @@ static void process_tree(struct rev_info *revs,
 			process_gitlink(revs, entry.oid->hash,
 					show, base, entry.path,
 					cb_data);
-		else
+		else if (revs->blob_objects)
 			process_blob(revs,
 				     lookup_blob(entry.oid->hash),
 				     show, base, entry.path,
-- 
2.12.0.359.gd4c8c42e9

-- >8 --
From: Jeff King <peff@peff.net>
Date: Wed, 5 Jun 2013 18:02:42 -0400
Subject: [PATCH] archive: ignore blob objects when checking reachability

We cannot create an archive from a blob object, so we would
not expect anyone to provide one to us. And if they do, we
will fail anyway just after the reachability check.  We can
therefore optimize our reachability check to ignore blobs
completely, and not even create a "struct blob" for them.

Depending on the repository size and the exact place we find
the reachable object in the traversal, this can save 20-25%,
a we can avoid many lookups in the object hash.

The downside of this is that a blob provided to a remote
archive process will fail with "no such object" rather than
"object is not a tree" (we could organize the code to retain
the old message, but since we no longer know whether the
blob is reachable or not, we would potentially be leaking
information about the existence of unreachable objects).

Signed-off-by: Jeff King <peff@peff.net>
---
 archive.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/archive.c b/archive.c
index ef89b2556..489115f9f 100644
--- a/archive.c
+++ b/archive.c
@@ -383,6 +383,7 @@ static int object_is_reachable(const unsigned char *sha1)
 	save_commit_buffer = 0;
 	init_revisions(&data.revs, NULL);
 	setup_revisions(ARRAY_SIZE(argv) - 1, argv, &data.revs, NULL);
+	data.revs.blob_objects = 0;
 	if (prepare_revision_walk(&data.revs))
 		return 0;
 
-- 
2.12.0.359.gd4c8c42e9


  reply	other threads:[~2017-02-28 21:59 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-25  1:18 [PATCH 0/3] Test fetch-pack's ability to fetch arbitrary blobs Jonathan Tan
2017-02-25  1:18 ` [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info Jonathan Tan
2017-02-28 21:42   ` Junio C Hamano
2017-02-28 21:59     ` Jeff King [this message]
2017-03-02 18:36       ` Junio C Hamano
2017-02-28 22:06   ` Jeff King
2017-02-25  1:18 ` [PATCH 2/3] revision: exclude trees/blobs given commit Jonathan Tan
2017-02-28 21:44   ` Junio C Hamano
2017-02-28 22:12   ` Jeff King
2017-03-02 19:50     ` [PATCH] t/perf: export variable used in other blocks Jonathan Tan
2017-03-03  6:45       ` Jeff King
2017-03-03  7:14         ` [PATCH] t/perf: use $MODERN_GIT for all repo-copying steps Jeff King
2017-03-03  7:36           ` [PATCH] t/perf: add fallback for pre-bin-wrappers versions of git Jeff King
2017-03-03 18:51         ` [PATCH] t/perf: export variable used in other blocks Junio C Hamano
2017-03-03 22:31           ` Jeff King
2017-02-28 23:12   ` [PATCH 2/3] revision: exclude trees/blobs given commit Junio C Hamano
2017-02-25  1:18 ` [PATCH 3/3] upload-pack: compute blob reachability correctly Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170228215937.yd4juycjf7y3vish@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=benpeart@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=peartben@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).