git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, gitster@pobox.com, peartben@gmail.com,
	benpeart@microsoft.com
Subject: Re: [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info
Date: Tue, 28 Feb 2017 17:06:26 -0500	[thread overview]
Message-ID: <20170228220626.at4cihedmvkqiq5c@sigill.intra.peff.net> (raw)
In-Reply-To: <06a84f8c77924b275606384ead8bb2fd7d75f7b6.1487984670.git.jonathantanmy@google.com>

On Fri, Feb 24, 2017 at 05:18:36PM -0800, Jonathan Tan wrote:

> Whenever tree_objects is set to 1 in revision.h's struct rev_info,
> blob_objects is likewise set, and vice versa. Combine those two fields
> into one.
> 
> Some of the existing code does not handle tree_objects being different
> from blob_objects properly. For example, "handle_commit" in revision.c
> recurses from an UNINTERESTING tree into its subtree if tree_objects ==
> 1, completely ignoring blob_objects; it probably should still recurse if
> tree_objects == 0 and blob_objects == 1 (to mark the blobs), and should
> behave differently depending on blob_objects (controlling the
> instantiation and marking of blob objects). This commit resolves the
> issue by forbidding tree_objects from being different to blob_objects.

Yeah, I agree that is awkward. I'm OK with the rule "if blob_objects is
set, then tree_objects must also be set". It's the other way around I
care more about.

> It could be argued that in the future, Git might need to distinguish
> tree_objects from blob_objects - in particular, a user might want
> rev-list to print the trees but not the blobs. However, this results in
> a minor performance savings at best in that objects no longer need to be
> instantiated (causing memory allocations and hashtable insertions) - no
> disk reads are being done for objects whether blob_objects is set or
> not.

In a full object-graph traversal, we actually spend a big chunk of our
time in hash lookups. My measurements (admittedly from 2013, which I
haven't repeated lately) show something like a 20-25% speedup for this
case.

My only use for it (and the source of those timings) was to compute
archive reachability, which nobody seems to care too much about. But I
suspect we could speed up your case, too, when we are just computing the
reachability of a non-blob. I.e., you should be able to turn on the
smallest subset of "commits only", "commits and trees", and "commits,
trees, and blobs", based on what the other side has asked for.

-Peff

  parent reply	other threads:[~2017-02-28 22:33 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-25  1:18 [PATCH 0/3] Test fetch-pack's ability to fetch arbitrary blobs Jonathan Tan
2017-02-25  1:18 ` [PATCH 1/3] revision: unify {tree,blob}_objects in rev_info Jonathan Tan
2017-02-28 21:42   ` Junio C Hamano
2017-02-28 21:59     ` Jeff King
2017-03-02 18:36       ` Junio C Hamano
2017-02-28 22:06   ` Jeff King [this message]
2017-02-25  1:18 ` [PATCH 2/3] revision: exclude trees/blobs given commit Jonathan Tan
2017-02-28 21:44   ` Junio C Hamano
2017-02-28 22:12   ` Jeff King
2017-03-02 19:50     ` [PATCH] t/perf: export variable used in other blocks Jonathan Tan
2017-03-03  6:45       ` Jeff King
2017-03-03  7:14         ` [PATCH] t/perf: use $MODERN_GIT for all repo-copying steps Jeff King
2017-03-03  7:36           ` [PATCH] t/perf: add fallback for pre-bin-wrappers versions of git Jeff King
2017-03-03 18:51         ` [PATCH] t/perf: export variable used in other blocks Junio C Hamano
2017-03-03 22:31           ` Jeff King
2017-02-28 23:12   ` [PATCH 2/3] revision: exclude trees/blobs given commit Junio C Hamano
2017-02-25  1:18 ` [PATCH 3/3] upload-pack: compute blob reachability correctly Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170228220626.at4cihedmvkqiq5c@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=benpeart@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=peartben@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).