git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, peartben@gmail.com,
	Christian Couder <christian.couder@gmail.com>
Subject: Re: RFC: Design and code of partial clones (now, missing commits and trees OK) (part 3)
Date: Fri, 22 Sep 2017 17:32:00 -0400	[thread overview]
Message-ID: <7977bab0-09c3-0e43-4d6f-f2bf87a3fd9e@jeffhostetler.com> (raw)
In-Reply-To: <20170921160416.1c4c6e2c@twelve2.svl.corp.google.com>



On 9/21/2017 7:04 PM, Jonathan Tan wrote:
> On Thu, 21 Sep 2017 14:00:40 -0400
> Jeff Hostetler <git@jeffhostetler.com> wrote:
> 
>> (part 3)
>>
>> Additional overall comments on:
>> https://github.com/jonathantanmy/git/commits/partialclone2
>>
>> {} WRT the code in is_promised() [1]
>>
>> [1]
>> https://github.com/jonathantanmy/git/commit/7a9c2d9b6e2fce293817b595dee29a7eede0dddd#diff-5d5d5dc185ef37dc30bb7d9a7ae0c4e8R1960
>>
>>      {} it looked like it was adding ALL promisor- and
>> promised-objects to the "promised" OIDSET, rather than just
>> promised-objects, but I could be mistaken.
> 
> As far as I can tell, it is just adding the promised objects (some of
> which may also be promisor objects). If you're saying that you expected
> it to add the promisor objects as well, that might be a reasonable
> expectation...I'm thinking of doing that.
> 

It looked like it was adding both types.  I was concerned that that
it might be doing too much.  But I haven't run the code, that was from
an observation.

>>      {} Is this iterating over ALL promisor-packfiles?
> 
> Yes.
> 
>>      {} It looked like this was being used by fsck and rev-list.  I
>> have concerns about how big this OIDSET will get and how it will
>> scale, since if we start with a partial-clone all packfiles will be
>>         promisor-packfiles.
> 
> It's true that scaling is an issue. I'm not sure if omitting the oidset
> will solve anything, though - as it is, Git maintains an object hash and
> adds to it quite liberally.

I guess I'm afraid that the first call to is_promised() is going
cause a very long pause as it loads up a very large hash of objects.

Perhaps you could augment the OID lookup to remember where the object
was found (essentially a .promisor bit set).  Then you wouldn't need
to touch them all.

> 
> One thing that might help is some sort of flushing of objects in
> promisor packfiles from the local repository - that way, the oidset
> won't be so large.
> 
>>
>>      {} When iterating thru a tree object, you add everything that it
>>         references (everything in that folder).  This adds all of the
>>         child OIDs -- without regard to whether they are new to this
>>         version of the tree object. (Granted, this is hard to compute.)
> 
> The oidset will deduplicate OIDs.

Right, but you still have an entry for each object.  For a repo the
size of Windows, you may have 25M+ objects your copy of the ODB.

> 
>>         My concern is that this will add too many objects to the
>> OIDSET. That is, a new tree object (because of a recent change to
>> something in that folder) will also have the OIDs of the other
>> *unchanged* files which may be present in an earlier non-provisor
>> packfile from an earlier commit.
>>
>>         I worry that this will grow the OIDSET to essentially include
>>         everything.  And possibly defeating its own purpose.  I could
>> be wrong here, but that's my concern.
> 
> Same answer as above (about flushing of objects in promisor packfiles).
> 
>> {} I'm not opposed to the .promisor file concept, but I have concerns
>>      that in practice all packfiles after a partial clone will be
>>      promisor-packfiles and therefore short-cut during fsck, so fsck
>>      still won't gain anything.
>>
>>      It would help if there are also non-promisor packfiles present,
>>      but only for objects referenced by non-promisor packfiles.
>>
>>      But then I also have to wonder whether we can even support
>> non-promisor packfiles after starting with a partial clone -- because
>> of the properties of received thin-packs on a non-partial fetch.
> 
> Same reply as to your other e-mail - locally created objects are not in
> promisor packfiles. (Or were you thinking of a situation where locally
> created objects are immediately uploaded to the promisor remote, thus
> making them promisor objects too?)
> 

Thanks,
Jeff

  reply	other threads:[~2017-09-22 21:32 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-15 20:43 RFC: Design and code of partial clones (now, missing commits and trees OK) Jonathan Tan
2017-09-19  5:51 ` Junio C Hamano
2017-09-21 17:57 ` Jeff Hostetler
2017-09-21 22:42   ` Jonathan Tan
2017-09-22 21:02     ` Jeff Hostetler
2017-09-22 22:49       ` Jonathan Tan
2017-09-26 15:26     ` Michael Haggerty
2017-09-29 20:21       ` Jonathan Tan
2017-09-21 17:59 ` RFC: Design and code of partial clones (now, missing commits and trees OK) (part 2/3) Jeff Hostetler
2017-09-21 22:51   ` Jonathan Tan
2017-09-22 21:19     ` Jeff Hostetler
2017-09-22 22:52       ` Jonathan Tan
2017-09-26 14:03         ` Jeff Hostetler
2017-09-21 18:00 ` RFC: Design and code of partial clones (now, missing commits and trees OK) (part 3) Jeff Hostetler
2017-09-21 23:04   ` Jonathan Tan
2017-09-22 21:32     ` Jeff Hostetler [this message]
2017-09-22 22:58       ` Jonathan Tan
2017-09-26 14:25         ` Jeff Hostetler
2017-09-26 17:32           ` Jonathan Tan
2017-09-29  0:53 ` RFC: Design and code of partial clones (now, missing commits and trees OK) Jonathan Tan
2017-09-29  2:03   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7977bab0-09c3-0e43-4d6f-f2bf87a3fd9e@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=peartben@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).