From: Jeff Hostetler <git@jeffhostetler.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, peartben@gmail.com,
Christian Couder <christian.couder@gmail.com>
Subject: Re: RFC: Design and code of partial clones (now, missing commits and trees OK) (part 3)
Date: Fri, 22 Sep 2017 17:32:00 -0400 [thread overview]
Message-ID: <7977bab0-09c3-0e43-4d6f-f2bf87a3fd9e@jeffhostetler.com> (raw)
In-Reply-To: <20170921160416.1c4c6e2c@twelve2.svl.corp.google.com>
On 9/21/2017 7:04 PM, Jonathan Tan wrote:
> On Thu, 21 Sep 2017 14:00:40 -0400
> Jeff Hostetler <git@jeffhostetler.com> wrote:
>
>> (part 3)
>>
>> Additional overall comments on:
>> https://github.com/jonathantanmy/git/commits/partialclone2
>>
>> {} WRT the code in is_promised() [1]
>>
>> [1]
>> https://github.com/jonathantanmy/git/commit/7a9c2d9b6e2fce293817b595dee29a7eede0dddd#diff-5d5d5dc185ef37dc30bb7d9a7ae0c4e8R1960
>>
>> {} it looked like it was adding ALL promisor- and
>> promised-objects to the "promised" OIDSET, rather than just
>> promised-objects, but I could be mistaken.
>
> As far as I can tell, it is just adding the promised objects (some of
> which may also be promisor objects). If you're saying that you expected
> it to add the promisor objects as well, that might be a reasonable
> expectation...I'm thinking of doing that.
>
It looked like it was adding both types. I was concerned that that
it might be doing too much. But I haven't run the code, that was from
an observation.
>> {} Is this iterating over ALL promisor-packfiles?
>
> Yes.
>
>> {} It looked like this was being used by fsck and rev-list. I
>> have concerns about how big this OIDSET will get and how it will
>> scale, since if we start with a partial-clone all packfiles will be
>> promisor-packfiles.
>
> It's true that scaling is an issue. I'm not sure if omitting the oidset
> will solve anything, though - as it is, Git maintains an object hash and
> adds to it quite liberally.
I guess I'm afraid that the first call to is_promised() is going
cause a very long pause as it loads up a very large hash of objects.
Perhaps you could augment the OID lookup to remember where the object
was found (essentially a .promisor bit set). Then you wouldn't need
to touch them all.
>
> One thing that might help is some sort of flushing of objects in
> promisor packfiles from the local repository - that way, the oidset
> won't be so large.
>
>>
>> {} When iterating thru a tree object, you add everything that it
>> references (everything in that folder). This adds all of the
>> child OIDs -- without regard to whether they are new to this
>> version of the tree object. (Granted, this is hard to compute.)
>
> The oidset will deduplicate OIDs.
Right, but you still have an entry for each object. For a repo the
size of Windows, you may have 25M+ objects your copy of the ODB.
>
>> My concern is that this will add too many objects to the
>> OIDSET. That is, a new tree object (because of a recent change to
>> something in that folder) will also have the OIDs of the other
>> *unchanged* files which may be present in an earlier non-provisor
>> packfile from an earlier commit.
>>
>> I worry that this will grow the OIDSET to essentially include
>> everything. And possibly defeating its own purpose. I could
>> be wrong here, but that's my concern.
>
> Same answer as above (about flushing of objects in promisor packfiles).
>
>> {} I'm not opposed to the .promisor file concept, but I have concerns
>> that in practice all packfiles after a partial clone will be
>> promisor-packfiles and therefore short-cut during fsck, so fsck
>> still won't gain anything.
>>
>> It would help if there are also non-promisor packfiles present,
>> but only for objects referenced by non-promisor packfiles.
>>
>> But then I also have to wonder whether we can even support
>> non-promisor packfiles after starting with a partial clone -- because
>> of the properties of received thin-packs on a non-partial fetch.
>
> Same reply as to your other e-mail - locally created objects are not in
> promisor packfiles. (Or were you thinking of a situation where locally
> created objects are immediately uploaded to the promisor remote, thus
> making them promisor objects too?)
>
Thanks,
Jeff
next prev parent reply other threads:[~2017-09-22 21:32 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-15 20:43 RFC: Design and code of partial clones (now, missing commits and trees OK) Jonathan Tan
2017-09-19 5:51 ` Junio C Hamano
2017-09-21 17:57 ` Jeff Hostetler
2017-09-21 22:42 ` Jonathan Tan
2017-09-22 21:02 ` Jeff Hostetler
2017-09-22 22:49 ` Jonathan Tan
2017-09-26 15:26 ` Michael Haggerty
2017-09-29 20:21 ` Jonathan Tan
2017-09-21 17:59 ` RFC: Design and code of partial clones (now, missing commits and trees OK) (part 2/3) Jeff Hostetler
2017-09-21 22:51 ` Jonathan Tan
2017-09-22 21:19 ` Jeff Hostetler
2017-09-22 22:52 ` Jonathan Tan
2017-09-26 14:03 ` Jeff Hostetler
2017-09-21 18:00 ` RFC: Design and code of partial clones (now, missing commits and trees OK) (part 3) Jeff Hostetler
2017-09-21 23:04 ` Jonathan Tan
2017-09-22 21:32 ` Jeff Hostetler [this message]
2017-09-22 22:58 ` Jonathan Tan
2017-09-26 14:25 ` Jeff Hostetler
2017-09-26 17:32 ` Jonathan Tan
2017-09-29 0:53 ` RFC: Design and code of partial clones (now, missing commits and trees OK) Jonathan Tan
2017-09-29 2:03 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7977bab0-09c3-0e43-4d6f-f2bf87a3fd9e@jeffhostetler.com \
--to=git@jeffhostetler.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=jonathantanmy@google.com \
--cc=peartben@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).