Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Stefan Beller <sbeller@google.com>
To: Jens Lehmann <Jens.Lehmann@web.de>
Cc: Jeff King <peff@peff.net>,
	Lars Schneider <larsxschneider@gmail.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Duy Nguyen <pclouds@gmail.com>
Subject: Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
Date: Mon, 16 Nov 2015 11:25:13 -0800	[thread overview]
Message-ID: <CAGZ79kbh_8oBRnQAmDzh3LANS6iGXNjLkYMLfuk9iysXghHQXg@mail.gmail.com> (raw)
In-Reply-To: <564A279C.6000802@web.de>

On Mon, Nov 16, 2015 at 10:59 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
> Am 14.11.2015 um 01:10 schrieb Stefan Beller:
>>
>> On Fri, Nov 13, 2015 at 3:41 PM, Jeff King <peff@peff.net> wrote:
>>>
>>> On Fri, Nov 13, 2015 at 06:38:07PM -0500, Jeff King wrote:
>>>
>>>> On Fri, Nov 13, 2015 at 03:16:01PM -0800, Stefan Beller wrote:
>>>>
>>>>> Junio wrote on Oct 09, 2014:
>>>>>>
>>>>>> This is so non-standard a thing to do that I doubt it is worth
>>>>>> supporting with "git clone".  "git clone --branch", which is about
>>>>>
>>>>> "> I want to follow that particular branch", would not mesh well with
>>>>>>
>>>>>> "I want to see the history that leads to this exact commit", either.
>>>>>> You would not know which branch(es) is that exact commit is on in
>>>>>> the first place.
>>>>>
>>>>>
>>>>> I disagree with this. This is the *exact* thing you actually want to do
>>>>> when
>>>>> dealing with submodules. When fetching/cloning for a submodule, you
>>>>> want
>>>>> to obtain the exact sha1, instead of a branch (which happens to be
>>>>> supported
>>>>> too, but is not the original use case with submodules.)
>
>
> Yes, being able to fetch certain sha1s makes lots of sense for submodules
> (this has been discussed some time ago at a GitTogether). But - apart from
> the extra network load - it's rather helpful to get all the submodule
> branches too (though that could be limited to the branches the sha1 is on).

Ok, I did not attend that GitTogether ;)

>
>>>> I think this is already implemented in 68ee628 (upload-pack: optionally
>>>> allow fetching reachable sha1, 2015-05-21), isn't it?
>>>
>>>
>>> Note that this just implements the server side. I think to use this with
>>> submodules right now, you'd have to manually "git init && git fetch" in
>>> the submodule. It might make sense to teach clone to handle this, to
>>> avoid the submodule code duplicating what the clone code does.
>>
>>
>> Yes I want to add it to clone, as that is a prerequisite for making
>> git clone --recursive --depth 1 to work as you'd expect. (such that
>> the submodule can be cloned&checkout instead of rewriting that to be
>> init&fetch.
>
>
> Cool, that should help recursive fetch too.
>
>> Thanks for pointing out that we already have some kind of server support.
>>
>> I wonder if we should add an additional way to make fetching only some
>> sha1s possible. ("I don't want users to fetch any sha1, but only those
>> where superprojects point{ed} to", even if you force push a superproject,
>> you want to want to only allow fetching all sha1s which exist in the
>> current
>> superprojects branch.)
>
>
> Me thinks the restrictions for sha1-fetching could come from the branches
> these sha1s are found in the upstream submodule: if the client is allowed
> to fetch a branch, it should be able to fetch any sha1 on that branch.

I'd agree on that. The server side even with uploadpack.allowTipSHA1InWant
set, is not sufficient though.

To fetch an arbitrary sha1, you would need to check if that sha1 is part
of the history of any advertised branch and then allow fetching serverside,
which sounds like some work for the server, which we may want to avoid
by having smarter data structures there.

Instead of having to search all branches for the requested sha1, we could have
some sort of data structure to make it not an O(n) operation (n being
all objects
in the repo).

Maybe I overestimate the work which needs to be done, because the server has
bitmaps nowadays.

Maybe a lazy reverse-pointer graph can be established on the serverside.
So I guess when we add the feature to fetch arbitrary sha1s, reachable from
any branch, people using submodules will make use of the feature. (such as with
git fetch --recurse --depth 1 or via a new `git fetch --recursive
--up-to-submodule-tip-only`)

So once the server is asked for a certain sha1, it will do the
reachability check,
which takes some effort, but then stores the result in the form:
"If ${current tip sha} of ${branch} is reachable, so is requested $sha1."

So when the next fetch request for $sha1 arrives, the server only needs to
check for ${current tip sha} to be part of $branch, which is expected to be
a shorter revwalk from the tip. (Because it is nearer to the tip, a bitmap could
just tell you or at least shorten the walk even more)
If the ${branch} has changed, the next evaluation for $sha1 can update
the cache,
such that the reverse lookup is not expensive on expectation.

I assume this will mostly be used with submodules, so only a few sha1s need
this caching.

>
>> Maybe our emails crossed, but in the other mail I pointed out we could use
>> some sort of hidden ref (refs/superprojects/*) for that, which are
>> allowed to mark
>> any sort of sha1, which are allowed in the superproject/submodule context
>> to be fetched.
>>
>> So whenever you push to a superproject (a project that has a gitlink),
>> we would need to check serverside if that submodule is at us and mark the
>> correct sha1s in the submodule. Then you can disallow fetching most of the
>> sha1s
>> but still could have a correctly working submodule update mechanism.
>
>
> And what happens if the submodule isn't at us? Involving the serverside of
> a superproject in submodule fetching sounds wrong to me. Me thinks that
> the upstream of the submodule should always control if a sha1 is allowed
> to be fetched. Or did I understand you wrong?

Yes and no.
The serverside submodule repository should be responsible for the ultimate
decision if you are allowed to fetch that sha1. But maybe on pushing the
superproject, we can store a hint in the submodule, that this sha1 is legit.
Although I may be missguided in my thinking here as the superproject
should have no influence on the submodule.

next prev parent reply	other threads:[~2015-11-16 19:25 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-12  9:37 [PATCH v2] add test to demonstrate that shallow recursive clones fail larsxschneider
2015-11-12 23:34 ` Stefan Beller
2015-11-15 12:43   ` Lars Schneider
2015-11-13  5:35 ` Jeff King
2015-11-13 18:41   ` Stefan Beller
2015-11-13 23:16     ` Stefan Beller
2015-11-13 23:38       ` Jeff King
2015-11-13 23:41         ` Jeff King
2015-11-14  0:10           ` Stefan Beller
2015-11-16 18:59             ` Jens Lehmann
2015-11-16 19:25               ` Stefan Beller [this message]
2015-11-16 21:42                 ` Jens Lehmann
2015-11-16 22:56                   ` Stefan Beller
2015-11-17 19:46                     ` Jens Lehmann
2015-11-17 20:04                       ` Stefan Beller
2015-11-17 20:39                         ` Jens Lehmann
2015-11-17 20:49                           ` Stefan Beller
2015-11-17 21:00                             ` Jens Lehmann
2015-11-17 20:17                 ` Duy Nguyen
2015-11-17 21:43                   ` Jeff King
2015-11-18 12:32                     ` Duy Nguyen
2015-11-18 21:11                       ` Jeff King
2015-11-18 21:36                         ` Stefan Beller
     [not found]                           ` <CAOE36qj2m4e3hw73-QoLbbpGv4RiyhBt_ou7eN4i4q8pF15rdA@mail.gmail.com>
2015-11-19 13:58                             ` Jeff King
2015-11-30 18:11       ` Junio C Hamano
2015-12-01  0:47         ` Stefan Beller
2015-12-01 12:15           ` Duy Nguyen
2015-12-01 21:47             ` Junio C Hamano
2015-12-01 22:22               ` Junio C Hamano
2015-12-03 20:03                 ` Stefan Beller
2015-12-04 15:32                   ` Junio C Hamano
2015-12-04 16:15                   ` Junio C Hamano
2015-12-04  7:52                 ` Duy Nguyen
2015-11-15 12:53   ` Lars Schneider
2015-11-17 21:34     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79kbh_8oBRnQAmDzh3LANS6iGXNjLkYMLfuk9iysXghHQXg@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=Jens.Lehmann@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=larsxschneider@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).