git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jens Lehmann <Jens.Lehmann@web.de>
To: Stefan Beller <sbeller@google.com>
Cc: Jeff King <peff@peff.net>,
	Lars Schneider <larsxschneider@gmail.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Duy Nguyen <pclouds@gmail.com>
Subject: Re: [PATCH v2] add test to demonstrate that shallow recursive clones fail
Date: Mon, 16 Nov 2015 22:42:09 +0100	[thread overview]
Message-ID: <564A4DB1.4070507@web.de> (raw)
In-Reply-To: <CAGZ79kbh_8oBRnQAmDzh3LANS6iGXNjLkYMLfuk9iysXghHQXg@mail.gmail.com>

Am 16.11.2015 um 20:25 schrieb Stefan Beller:
> On Mon, Nov 16, 2015 at 10:59 AM, Jens Lehmann <Jens.Lehmann@web.de> wrote:
>> Am 14.11.2015 um 01:10 schrieb Stefan Beller:
>>> Thanks for pointing out that we already have some kind of server support.
>>>
>>> I wonder if we should add an additional way to make fetching only some
>>> sha1s possible. ("I don't want users to fetch any sha1, but only those
>>> where superprojects point{ed} to", even if you force push a superproject,
>>> you want to want to only allow fetching all sha1s which exist in the
>>> current
>>> superprojects branch.)
>>
>>
>> Me thinks the restrictions for sha1-fetching could come from the branches
>> these sha1s are found in the upstream submodule: if the client is allowed
>> to fetch a branch, it should be able to fetch any sha1 on that branch.
>
> I'd agree on that. The server side even with uploadpack.allowTipSHA1InWant
> set, is not sufficient though.
>
> To fetch an arbitrary sha1, you would need to check if that sha1 is part
> of the history of any advertised branch and then allow fetching serverside,
> which sounds like some work for the server, which we may want to avoid
> by having smarter data structures there.
>
> Instead of having to search all branches for the requested sha1, we could have
> some sort of data structure to make it not an O(n) operation (n being
> all objects
> in the repo).
>
> Maybe I overestimate the work which needs to be done, because the server has
> bitmaps nowadays.
>
> Maybe a lazy reverse-pointer graph can be established on the serverside.
> So I guess when we add the feature to fetch arbitrary sha1s, reachable from
> any branch, people using submodules will make use of the feature. (such as with
> git fetch --recurse --depth 1 or via a new `git fetch --recursive
> --up-to-submodule-tip-only`)
>
> So once the server is asked for a certain sha1, it will do the
> reachability check,
> which takes some effort, but then stores the result in the form:
> "If ${current tip sha} of ${branch} is reachable, so is requested $sha1."
>
> So when the next fetch request for $sha1 arrives, the server only needs to
> check for ${current tip sha} to be part of $branch, which is expected to be
> a shorter revwalk from the tip. (Because it is nearer to the tip, a bitmap could
> just tell you or at least shorten the walk even more)
> If the ${branch} has changed, the next evaluation for $sha1 can update
> the cache,
> such that the reverse lookup is not expensive on expectation.

Makes sense, although I do not know enough about the server side to tell if
it would need such an optimization or will cope with the load just fine.

But even if we'd enable such a feature without having to set an extra config
option, a submodule fetch asking for certain sha1s would have to fall back
to a simple "fetch all" like we do now when the server doesn't support that
for backwards compatibility. But maybe that's just obvious.

> I assume this will mostly be used with submodules, so only a few sha1s need
> this caching.

I won't bet on that, some of the submodules at $DAYJOB are rather busy and
see almost the same traffic as their superprojects ;-)

>>> Maybe our emails crossed, but in the other mail I pointed out we could use
>>> some sort of hidden ref (refs/superprojects/*) for that, which are
>>> allowed to mark
>>> any sort of sha1, which are allowed in the superproject/submodule context
>>> to be fetched.
>>>
>>> So whenever you push to a superproject (a project that has a gitlink),
>>> we would need to check serverside if that submodule is at us and mark the
>>> correct sha1s in the submodule. Then you can disallow fetching most of the
>>> sha1s
>>> but still could have a correctly working submodule update mechanism.
>>
>>
>> And what happens if the submodule isn't at us? Involving the serverside of
>> a superproject in submodule fetching sounds wrong to me. Me thinks that
>> the upstream of the submodule should always control if a sha1 is allowed
>> to be fetched. Or did I understand you wrong?
>
> Yes and no.
> The serverside submodule repository should be responsible for the ultimate
> decision if you are allowed to fetch that sha1. But maybe on pushing the
> superproject, we can store a hint in the submodule, that this sha1 is legit.
> Although I may be missguided in my thinking here as the superproject
> should have no influence on the submodule.

Submodules should never be aware of their superproject. But a superproject
does know its submodules, so I don't think the influence you describe here
is a problem per se. It's just looking like a corner case to me, as in a
lot of scenarios submodules do not live on the same server. And even if
they do, a superproject has no canonical way of finding their submodule's
repos (except for submodules that use relative URLs). So I'd rather like
to see a generic solution first, before we think about adding an optimized
version for certain setups later ;-)

The only real itch I have with the "superproject declaring submodule sha1s
fetchable on the server" approach is that it smells like a security problem.
The access rights of superprojects are often different from those of the
submodules it contains and this feels like a privilege escalation waiting
to happen.

  reply	other threads:[~2015-11-16 21:42 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-12  9:37 [PATCH v2] add test to demonstrate that shallow recursive clones fail larsxschneider
2015-11-12 23:34 ` Stefan Beller
2015-11-15 12:43   ` Lars Schneider
2015-11-13  5:35 ` Jeff King
2015-11-13 18:41   ` Stefan Beller
2015-11-13 23:16     ` Stefan Beller
2015-11-13 23:38       ` Jeff King
2015-11-13 23:41         ` Jeff King
2015-11-14  0:10           ` Stefan Beller
2015-11-16 18:59             ` Jens Lehmann
2015-11-16 19:25               ` Stefan Beller
2015-11-16 21:42                 ` Jens Lehmann [this message]
2015-11-16 22:56                   ` Stefan Beller
2015-11-17 19:46                     ` Jens Lehmann
2015-11-17 20:04                       ` Stefan Beller
2015-11-17 20:39                         ` Jens Lehmann
2015-11-17 20:49                           ` Stefan Beller
2015-11-17 21:00                             ` Jens Lehmann
2015-11-17 20:17                 ` Duy Nguyen
2015-11-17 21:43                   ` Jeff King
2015-11-18 12:32                     ` Duy Nguyen
2015-11-18 21:11                       ` Jeff King
2015-11-18 21:36                         ` Stefan Beller
     [not found]                           ` <CAOE36qj2m4e3hw73-QoLbbpGv4RiyhBt_ou7eN4i4q8pF15rdA@mail.gmail.com>
2015-11-19 13:58                             ` Jeff King
2015-11-30 18:11       ` Junio C Hamano
2015-12-01  0:47         ` Stefan Beller
2015-12-01 12:15           ` Duy Nguyen
2015-12-01 21:47             ` Junio C Hamano
2015-12-01 22:22               ` Junio C Hamano
2015-12-03 20:03                 ` Stefan Beller
2015-12-04 15:32                   ` Junio C Hamano
2015-12-04 16:15                   ` Junio C Hamano
2015-12-04  7:52                 ` Duy Nguyen
2015-11-15 12:53   ` Lars Schneider
2015-11-17 21:34     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=564A4DB1.4070507@web.de \
    --to=jens.lehmann@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=larsxschneider@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).