git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, peartben@gmail.com,
	Christian Couder <christian.couder@gmail.com>
Subject: Re: RFC: Design and code of partial clones (now, missing commits and trees OK) (part 3)
Date: Tue, 26 Sep 2017 10:25:16 -0400	[thread overview]
Message-ID: <e05a7978-c312-9e26-79a2-fca3ae44f59a@jeffhostetler.com> (raw)
In-Reply-To: <20170922155802.ab79717818578a23cc31f6fe@google.com>



On 9/22/2017 6:58 PM, Jonathan Tan wrote:
> On Fri, 22 Sep 2017 17:32:00 -0400
> Jeff Hostetler <git@jeffhostetler.com> wrote:
> 
>> I guess I'm afraid that the first call to is_promised() is going
>> cause a very long pause as it loads up a very large hash of objects.
> 
> Yes, the first call will cause a long pause. (I think fsck and gc can
> tolerate this, but a better solution is appreciated.)
> 
>> Perhaps you could augment the OID lookup to remember where the object
>> was found (essentially a .promisor bit set).  Then you wouldn't need
>> to touch them all.
> 
> Sorry - I don't understand this. Are you saying that missing promisor
> objects should go into the global object hashtable, so that we can set a
> flag on them?

I just meant could we add a bit to "struct object_info" to indicate
that the object was found in a .promisor packfile ?  This could
be set in sha1_object_info_extended().

Then the is_promised() calls in fsck and gc would just test that bit.

Given that that bit will be set on promisOR objects (and we won't
have object_info for missing objects), you may need to adjust the
iterator in the fsck/gc code slightly.

This is a bit of a handwave, but could something like that eliminate
the need to build this oidset?


> 
>>> The oidset will deduplicate OIDs.
>>
>> Right, but you still have an entry for each object.  For a repo the
>> size of Windows, you may have 25M+ objects your copy of the ODB.
> 
> We have entries only for the "frontier" objects (the objects directly
> referenced by any promisor object). For the Windows repo, for example, I
> foresee that many of the blobs, trees, and commits will be "hiding"
> behind objects that the repository user did not download into their
> repo.
> 

  reply	other threads:[~2017-09-26 14:25 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-15 20:43 RFC: Design and code of partial clones (now, missing commits and trees OK) Jonathan Tan
2017-09-19  5:51 ` Junio C Hamano
2017-09-21 17:57 ` Jeff Hostetler
2017-09-21 22:42   ` Jonathan Tan
2017-09-22 21:02     ` Jeff Hostetler
2017-09-22 22:49       ` Jonathan Tan
2017-09-26 15:26     ` Michael Haggerty
2017-09-29 20:21       ` Jonathan Tan
2017-09-21 17:59 ` RFC: Design and code of partial clones (now, missing commits and trees OK) (part 2/3) Jeff Hostetler
2017-09-21 22:51   ` Jonathan Tan
2017-09-22 21:19     ` Jeff Hostetler
2017-09-22 22:52       ` Jonathan Tan
2017-09-26 14:03         ` Jeff Hostetler
2017-09-21 18:00 ` RFC: Design and code of partial clones (now, missing commits and trees OK) (part 3) Jeff Hostetler
2017-09-21 23:04   ` Jonathan Tan
2017-09-22 21:32     ` Jeff Hostetler
2017-09-22 22:58       ` Jonathan Tan
2017-09-26 14:25         ` Jeff Hostetler [this message]
2017-09-26 17:32           ` Jonathan Tan
2017-09-29  0:53 ` RFC: Design and code of partial clones (now, missing commits and trees OK) Jonathan Tan
2017-09-29  2:03   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e05a7978-c312-9e26-79a2-fca3ae44f59a@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=peartben@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).