Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Stefan Beller <sbeller@google.com>
To: Jens Lehmann <Jens.Lehmann@web.de>
Cc: Jeff King <peff@peff.net>,
	Lars Schneider <larsxschneider@gmail.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Duy Nguyen <pclouds@gmail.com>
Subject: Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
Date: Mon, 16 Nov 2015 14:56:21 -0800	[thread overview]
Message-ID: <CAGZ79kZhn59GTs9LRnAoPrSwf43jjgi0_qWibrQ8gCvjas_MKA@mail.gmail.com> (raw)
In-Reply-To: <564A4DB1.4070507@web.de>

On Mon, Nov 16, 2015 at 1:42 PM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
> Am 16.11.2015 um 20:25 schrieb Stefan Beller:
>>
>> On Mon, Nov 16, 2015 at 10:59 AM, Jens Lehmann <Jens.Lehmann@web.de>
>> wrote:
>>>
>>> Am 14.11.2015 um 01:10 schrieb Stefan Beller:
>>>>
>>>> Thanks for pointing out that we already have some kind of server
>>>> support.
>>>>
>>>> I wonder if we should add an additional way to make fetching only some
>>>> sha1s possible. ("I don't want users to fetch any sha1, but only those
>>>> where superprojects point{ed} to", even if you force push a
>>>> superproject,
>>>> you want to want to only allow fetching all sha1s which exist in the
>>>> current
>>>> superprojects branch.)
>>>
>>>
>>>
>>> Me thinks the restrictions for sha1-fetching could come from the branches
>>> these sha1s are found in the upstream submodule: if the client is allowed
>>> to fetch a branch, it should be able to fetch any sha1 on that branch.
>>
>>
>> I'd agree on that. The server side even with uploadpack.allowTipSHA1InWant
>> set, is not sufficient though.
>>
>> To fetch an arbitrary sha1, you would need to check if that sha1 is part
>> of the history of any advertised branch and then allow fetching
>> serverside,
>> which sounds like some work for the server, which we may want to avoid
>> by having smarter data structures there.
>>
>> Instead of having to search all branches for the requested sha1, we could
>> have
>> some sort of data structure to make it not an O(n) operation (n being
>> all objects
>> in the repo).
>>
>> Maybe I overestimate the work which needs to be done, because the server
>> has
>> bitmaps nowadays.
>>
>> Maybe a lazy reverse-pointer graph can be established on the serverside.
>> So I guess when we add the feature to fetch arbitrary sha1s, reachable
>> from
>> any branch, people using submodules will make use of the feature. (such as
>> with
>> git fetch --recurse --depth 1 or via a new `git fetch --recursive
>> --up-to-submodule-tip-only`)
>>
>> So once the server is asked for a certain sha1, it will do the
>> reachability check,
>> which takes some effort, but then stores the result in the form:
>> "If ${current tip sha} of ${branch} is reachable, so is requested $sha1."
>>
>> So when the next fetch request for $sha1 arrives, the server only needs to
>> check for ${current tip sha} to be part of $branch, which is expected to
>> be
>> a shorter revwalk from the tip. (Because it is nearer to the tip, a bitmap
>> could
>> just tell you or at least shorten the walk even more)
>> If the ${branch} has changed, the next evaluation for $sha1 can update
>> the cache,
>> such that the reverse lookup is not expensive on expectation.
>
>
> Makes sense, although I do not know enough about the server side to tell if
> it would need such an optimization or will cope with the load just fine.
>
> But even if we'd enable such a feature without having to set an extra config
> option, a submodule fetch asking for certain sha1s would have to fall back
> to a simple "fetch all" like we do now when the server doesn't support that
> for backwards compatibility. But maybe that's just obvious.

It's not obvious to me.  Say you run the command:

    git clone --recursive --depth=1 ...

Currently the depth argument for the submodules is ignored, because it
doesn't work out conceptually. This is because recursive fetches fetch the
branch tips and not the submodule-specified sha1.

If we want to make it work, we would need to think about, what we want
to achieve here.
depth is usually used to reduce the transmit time/bandwidth required.

So if the server tells us it's not allowing fetching of arbitrary
sha1s by its cryptic message:

    $ git fetch origin 6f963a895a97d720c909fcf4eb0544a272ef7c49:refs/heads/copy
    error: no such remote ref 6f963a895a97d720c909fcf4eb0544a272ef7c49

we have two choices, either error out with

    die(_("Server doesn't support cloning of arbitrary sha1s"))

or we could pretend as if we know how to fix it by cloning regularly
with the whole history
attached and then present a tightened history by shallowing after
cloning. But that would
defeat the whole point of the depth argument in the first place, the
time and bandwidth would
have been wasted. So instead I'd rather have the user make the choice.



>
>> I assume this will mostly be used with submodules, so only a few sha1s
>> need
>> this caching.
>
>
> I won't bet on that, some of the submodules at $DAYJOB are rather busy and
> see almost the same traffic as their superprojects ;-)

But do you update the superproject with each submodules commit?
(We plan to update the superproject in Gerrit with each submodule eventually,
so yeah that point is nuts.)

>
> Submodules should never be aware of their superproject. But a superproject
> does know its submodules, so I don't think the influence you describe here
> is a problem per se. It's just looking like a corner case to me, as in a
> lot of scenarios submodules do not live on the same server. And even if
> they do, a superproject has no canonical way of finding their submodule's
> repos (except for submodules that use relative URLs). So I'd rather like
> to see a generic solution first, before we think about adding an optimized
> version for certain setups later ;-)

ok. :)

>
> The only real itch I have with the "superproject declaring submodule sha1s
> fetchable on the server" approach is that it smells like a security problem.
> The access rights of superprojects are often different from those of the
> submodules it contains and this feels like a privilege escalation waiting
> to happen.

Yeah, I agree. It was one of my first ideas, probably not the best
idea on this topic.
Currently I am amazed by the reverse lazy caching if that check should ever be
a problem on the server side.

next prev parent reply	other threads:[~2015-11-16 22:56 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-12  9:37 [PATCH v2] add test to demonstrate that shallow recursive clones fail larsxschneider
2015-11-12 23:34 ` Stefan Beller
2015-11-15 12:43   ` Lars Schneider
2015-11-13  5:35 ` Jeff King
2015-11-13 18:41   ` Stefan Beller
2015-11-13 23:16     ` Stefan Beller
2015-11-13 23:38       ` Jeff King
2015-11-13 23:41         ` Jeff King
2015-11-14  0:10           ` Stefan Beller
2015-11-16 18:59             ` Jens Lehmann
2015-11-16 19:25               ` Stefan Beller
2015-11-16 21:42                 ` Jens Lehmann
2015-11-16 22:56                   ` Stefan Beller [this message]
2015-11-17 19:46                     ` Jens Lehmann
2015-11-17 20:04                       ` Stefan Beller
2015-11-17 20:39                         ` Jens Lehmann
2015-11-17 20:49                           ` Stefan Beller
2015-11-17 21:00                             ` Jens Lehmann
2015-11-17 20:17                 ` Duy Nguyen
2015-11-17 21:43                   ` Jeff King
2015-11-18 12:32                     ` Duy Nguyen
2015-11-18 21:11                       ` Jeff King
2015-11-18 21:36                         ` Stefan Beller
     [not found]                           ` <CAOE36qj2m4e3hw73-QoLbbpGv4RiyhBt_ou7eN4i4q8pF15rdA@mail.gmail.com>
2015-11-19 13:58                             ` Jeff King
2015-11-30 18:11       ` Junio C Hamano
2015-12-01  0:47         ` Stefan Beller
2015-12-01 12:15           ` Duy Nguyen
2015-12-01 21:47             ` Junio C Hamano
2015-12-01 22:22               ` Junio C Hamano
2015-12-03 20:03                 ` Stefan Beller
2015-12-04 15:32                   ` Junio C Hamano
2015-12-04 16:15                   ` Junio C Hamano
2015-12-04  7:52                 ` Duy Nguyen
2015-11-15 12:53   ` Lars Schneider
2015-11-17 21:34     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79kZhn59GTs9LRnAoPrSwf43jjgi0_qWibrQ8gCvjas_MKA@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=Jens.Lehmann@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=larsxschneider@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).