git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Jeff Hostetler <git@jeffhostetler.com>
Cc: Jonathan Tan <jonathantanmy@google.com>, git@vger.kernel.org
Subject: Re: [RFC PATCH 1/3] promised-blob, fsck: introduce promised blobs
Date: Fri, 14 Jul 2017 14:30:18 -0700	[thread overview]
Message-ID: <20170714213018.GK93855@aiede.mtv.corp.google.com> (raw)
In-Reply-To: <4d0849b0-1340-5b82-ba3c-03a1f5c42f33@jeffhostetler.com>

Jeff Hostetler wrote:
> On 7/13/2017 3:39 PM, Jonathan Tan wrote:

>> I know that discussion has shifted to the possibility of not having this
>> list at all, and not sending size information together with the fetch,
>> but going back to this...maybe omitting trees *is* the solution to both
>> the large local list and the large amount of size information needing to
>> be transferred.
>>
>> So the large-blob (e.g. Android) and many-blob (e.g. Windows) cases
>> would look like this:
>>
>>   * Large-blob repositories have no trees omitted and a few blobs
>>     omitted, and we have sizes for all of them.
>>   * Many-blob repositories have many trees omitted and either all
>>     blobs omitted (and we have size information for them, useful for FUSE
>>     or FUSE-like things, for example) or possibly no blobs omitted (for
>>     example, if shallow clones are going to be the norm, there won't be
>>     many blobs to begin with if trees are omitted).
>
> I'm not sure I understand what you're saying here.  Does omitting a tree
> object change the set of blob sizes we receive?  Are you saying that if
> we omit a tree, then we implicitly omit all the blobs it references and
> don't send size info those blobs?  So that the local list only has
> reachable objects?  So faulting-in a tree would also have to send size
> info for the newly referenced blobs?
>
> Would this make it more similar to a shallow clone (in that none of the
> have_object tests work for items beyond the cut point) ?

Correct.  After the server sends a promise instead of a tree object, the
client has no reason to try to access blobs pointed to by that tree, any
more than it has reason to try to access commits on a branch it has not
fetched.  This means the client does not have to be aware of those blobs
until it fetches the tree and associated blob promises.

[...]
> For the former case, if you just have a few omitted objects, then a
> second round-trip to mget their sizes isn't that much work.

For the client, that is true.  For the server, decreasing the number
of requests even when requests are small and fast can be valuable.

[...]
> I think for the latter, forcing a full promise-list on clone is just
> too much data to send -- data that we likely won't ever need.

What did you think of the suggestion to not send promises for objects
that are only referenced by objects that weren't sent?

[...]
>> What do you think of doing this:
>>   * add a "type" field to the list of promised objects (formerly the list
>>     of promised blobs)
>>   * retain mandatory size for blobs
>>   * retain single file containing list of promised objects (I don't feel
>>     too strongly about this, but it has a slight simplicity and
>>     in-between-GC performance advantage)
>
> The single promise-set is problematic.  I think it will grow too
> large (in our case) and will need all the usual lock juggling
> and merging.
>
> I still prefer my suggestion for a per-packfile promise-set for all
> of the reasons I stated the other day.  This can be computed quickly
> during index-pack, is (nearly) read-only, and doesn't require the
> whole file rewrite lock file.  It also has the benefit of being
> portable -- in that I can also copy the .promise file if I copy the
> .pack and .idx file to another repo.

Okay.

Thanks,
Jonathan

  reply	other threads:[~2017-07-14 21:30 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-11 19:48 [RFC PATCH 0/3] Partial clone: promised blobs (formerly "missing blobs") Jonathan Tan
2017-07-11 19:48 ` [RFC PATCH 1/3] promised-blob, fsck: introduce promised blobs Jonathan Tan
2017-07-11 22:02   ` Stefan Beller
2017-07-19 23:37     ` Jonathan Tan
2017-07-12 17:29   ` Jeff Hostetler
2017-07-12 19:28     ` Jonathan Nieder
2017-07-13 14:48       ` Jeff Hostetler
2017-07-13 15:05         ` Jeff Hostetler
2017-07-13 19:39     ` Jonathan Tan
2017-07-14 20:03       ` Jeff Hostetler
2017-07-14 21:30         ` Jonathan Nieder [this message]
2017-07-11 19:48 ` [RFC PATCH 2/3] sha1-array: support appending unsigned char hash Jonathan Tan
2017-07-11 22:06   ` Stefan Beller
2017-07-19 23:56     ` Jonathan Tan
2017-07-20  0:06       ` Stefan Beller
2017-07-11 19:48 ` [RFC PATCH 3/3] sha1_file: add promised blob hook support Jonathan Tan
2017-07-11 22:38   ` Stefan Beller
2017-07-12 17:40   ` Ben Peart
2017-07-12 20:38     ` Jonathan Nieder
2017-07-16 15:23 ` [RFC PATCH 0/3] Partial clone: promised blobs (formerly "missing blobs") Philip Oakley
2017-07-17 17:43   ` Ben Peart
2017-07-25 20:48     ` Philip Oakley
2017-07-17 18:03   ` Jonathan Nieder
2017-07-29 12:51     ` Philip Oakley
2017-07-20  0:21 ` [RFC PATCH v2 0/4] Partial clone: promised objects (not only blobs) Jonathan Tan
2017-07-20  0:21 ` [RFC PATCH v2 1/4] object: remove "used" field from struct object Jonathan Tan
2017-07-20  0:36   ` Stefan Beller
2017-07-20  0:55     ` Jonathan Tan
2017-07-20 17:44       ` Ben Peart
2017-07-20 21:20   ` Junio C Hamano
2017-07-20  0:21 ` [RFC PATCH v2 2/4] promised-object, fsck: introduce promised objects Jonathan Tan
2017-07-20 18:07   ` Stefan Beller
2017-07-20 19:17     ` Jonathan Tan
2017-07-20 19:58   ` Ben Peart
2017-07-20 21:13     ` Jonathan Tan
2017-07-21 16:24       ` Ben Peart
2017-07-21 20:33         ` Jonathan Tan
2017-07-25 15:10           ` Ben Peart
2017-07-29 13:26             ` Philip Oakley
2017-07-20  0:21 ` [RFC PATCH v2 3/4] sha1-array: support appending unsigned char hash Jonathan Tan
2017-07-20  0:21 ` [RFC PATCH v2 4/4] sha1_file: support promised object hook Jonathan Tan
2017-07-20 18:23   ` Stefan Beller
2017-07-20 20:58     ` Ben Peart
2017-07-20 21:18       ` Jonathan Tan
2017-07-21 16:27         ` Ben Peart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170714213018.GK93855@aiede.mtv.corp.google.com \
    --to=jrnieder@gmail.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).