git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Avery Pennarun <apenwarr@gmail.com>
Cc: "Shawn O. Pearce" <spearce@spearce.org>,
	"Nguyễn Thái Ngọc" <pclouds@gmail.com>,
	git@vger.kernel.org
Subject: Re: Sparse clones (Was: Re: [PATCH 1/2] upload-pack: support subtree  packing)
Date: Tue, 27 Jul 2010 21:31:55 -0600	[thread overview]
Message-ID: <AANLkTi=8u5VROYQygAXoCS4c+eAoEoP8V4t5rJ=wXL8q@mail.gmail.com> (raw)
In-Reply-To: <AANLkTikMLOFet-VMT7MntPgoSkvqGAXPd8Z1aaDpY1xs@mail.gmail.com>

2010/7/27 Avery Pennarun <apenwarr@gmail.com>:
> Note that if you happen to want to implement it in a way that you'll
> also get all the commit objects from your submodules too (which I
> highly encourage :)) then downloading the trees is the easiest way.
> Otherwise you won't know which submodule commits you need.

Makes sense.  Seems like a good reason to include all the trees.

> Since downloading commits is so cheap anyway, I'd suggest just
> defaulting to downloading all the refs, as clone currently does.  If
> people don't like it, they can do what they currently do:
>
>   git init
>   git remote add ...
>   git fetch
>
> Not that pretty, but then again, it's rarely needed.

Would you suggest then parsing the limiting arguments passed to clone
and disallowing refs?  Or just making it non-useful by always
appending "--all HEAD"?

>> 2) Sparse checkouts are automatically invoked with the path(s) from
>>   the specified rev-list arguments.
<snip>
> I don't totally understand what you mean here.  But I do think that if
Basically, I mean what you stated much more succinctly and eloquently
right here:
> I guess my point is, more complex exclusions could always be added
> later but they aren't so important right away.

>> 4) All revision-walking operations automatically use these limiting args.
<snip>
> It does sound sort of elegant: this way they *won't* run into the missing objects.
> Beware, however, that
>
>   git log -- Documentation
>
> outputs a different set of commits than just
>
>   git log

Yes, exactly.  In a sparse clone, why wouldn't one want the behavior
of the former automatically, without having to specify the paths on
the command line every time they ran log (or rev-list or fast-export
or...etc., especially if they cloned N directories rather than just
1)?

Actually, I can kind of see the desire to see the 'real' log since the
users do happen to have all commits locally, but it almost seems like
it should be the case that requires a special option to be passed to
git log ('--ignore-sparse-limiting'?).  But trying to get that option
to work in conjunction with other options (--stat, -S, -p, etc.) would
be really hard, if not impossible.

>> 5) "Densifying" a sparse clone can be done
<snip>
> I think this would work, but unless you want to re-download some
> (possibly lots of) objects you've already got, it would require some
> kind of extra support from the server, I think.  Maybe that's a rare
> enough case that few people will care and it could be fixed later.

For my first implementation, my plan was to simply re-download ALL
(not just some or lots of) objects I've already got in such a case.  A
bit wasteful to be sure, but I was hoping it was rare enough to
"densify" a clone that it wouldn't be a big deal...and that support
for smarter downloads could be added later.

> I don't think the pull vs. fetch distinction is valid; I would be very
> surprised if pull un-sparsified my checkout, just as I would be
> surprised if merge did.  And pull is just fetch+merge.

Right, I don't think pull should un-sparsify either the checkout OR
the clone by default (it should have fetch pass the same limiting
arguments and only download an equivalently sparse set of updates).
Your point about pull=fetch+merge (or fetch+rebase) makes sense, which
I guess means that un-sparsifying a clone+checkout should be a
separate toplevel command ("densify"?) rather than a special option
for fetch/pull.

>> 6) Cloning-from/fetching-from/pushing-to sparse clones is supported.
>>
>> Future fetches and pushes also make use of the limiting arguments.
>> Receives do as well, but only to make sure the pack obtained is not
>> "more sparse" than what the receiving repository already has.
>> (uploads ignore the stored rev-list arguments, instead using the
>> rev-list arguments passed to it -- it will die if asked for content
>> not locally available to it.)
>
> This scares me a little.  It's a reminder that it's all-too-easy to
> get your repository into a really messed up state by going in and
> screwing with your sparseness parameters at the wrong time.

I don't follow.  Why would people be "screwing with sparseness parameters"?

My basic idea was that there would be only three ways to change
sparseness parameters for clones, with only the first two documented:
the initial clone command, the "densify" command (someone probably
needs to think of a better name), and reading the source code to
figure out what bits on your disk to change and changing them.


Here's why I want the clone-able/fetch-able/pull-able sparse clone
functionality:

I like having translators (who only need maybe one file) or technical
writers (who only need the Documentation/ subdirectory) or other
similar folks having the ability to collaborate on the subset of the
repository that they need to do their work.  Thus, it makes sense for
them to be able to clone from, pull from, and push to each other.  The
only two rules that I think are necessary to enable such behavior are:

* No repository can provide information that it doesn't have (should
be pretty easy to enforce...)
* No repository accepts less data than it expects in its repository
(i.e. you can push to a sparse clone or a real clone, but need to
provide data that fulfills it's rev-list limiting arguments)

> It would make me more comfortable if there was some kind of "oh god,
> just fix it by downloading any objects you think are missing" mode :)
> In fact, git could benefit from that in general - every now and then
> someone on the list asks about a repository they managed to mangle by
> corrupting a pack or something, and there's no really good answer to
> that.

For sparse clones, Isn't that mode just running the "densify" command
with no limiting arguments?

>> 7) Operations that need unavailable data simply error out
>>
>> Examples: merge, cherry-pick, rebase (and upload-pack in a sparse
>> clone).  However, hopefully the error messages state what extra
>> information needs to be downloaded so the user can appropriately
>> "densify" their repository.
>
> That sounds good to me.

Thanks for the detailed feedback.  :-)

  parent reply	other threads:[~2010-07-28  3:32 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-28  0:13 Sparse clones (Was: Re: [PATCH 1/2] upload-pack: support subtree packing) Elijah Newren
2010-07-28  1:05 ` Avery Pennarun
2010-07-28  3:06   ` Nguyen Thai Ngoc Duy
2010-07-28  3:38     ` Nguyen Thai Ngoc Duy
2010-07-28  3:58       ` Avery Pennarun
2010-07-28  6:12         ` Sverre Rabbelier
2010-07-28  7:59           ` Nguyen Thai Ngoc Duy
2010-07-28 14:48             ` Sverre Rabbelier
2010-07-28  7:11         ` Nguyen Thai Ngoc Duy
2010-07-28  3:31   ` Elijah Newren [this message]
2010-07-31 22:36     ` Elijah Newren
2010-07-28  3:36 ` Nguyen Thai Ngoc Duy
2010-07-28  3:59   ` Elijah Newren
2010-07-29 10:29     ` Nguyen Thai Ngoc Duy
2010-08-13 17:31 ` Enrico Weigelt
2010-08-13 19:19   ` Truncating history (Re: Sparse clones) Jonathan Nieder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='AANLkTi=8u5VROYQygAXoCS4c+eAoEoP8V4t5rJ=wXL8q@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=apenwarr@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).