From mboxrd@z Thu Jan 1 00:00:00 1970 From: Heiko Voigt Subject: Re: Re: [RFD/PATCH] submodule doc: describe where we can configure them Date: Sun, 8 May 2016 23:54:08 +0200 Message-ID: <20160508215401.GA59077@book.hvoigt.net> References: <1462317985-640-1-git-send-email-sbeller@google.com> <20160503235652.GA395@google.com> <20160504150143.GA8575@sandbox> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Jonathan Nieder , "git@vger.kernel.org" , Jens Lehmann To: Stefan Beller X-From: git-owner@vger.kernel.org Sun May 08 23:54:33 2016 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1azWeZ-0005qB-Ge for gcvg-git-2@plane.gmane.org; Sun, 08 May 2016 23:54:20 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750916AbcEHVyO (ORCPT ); Sun, 8 May 2016 17:54:14 -0400 Received: from smtprelay01.ispgateway.de ([80.67.31.39]:32980 "EHLO smtprelay01.ispgateway.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750828AbcEHVyN (ORCPT ); Sun, 8 May 2016 17:54:13 -0400 Received: from [95.90.185.119] (helo=book.hvoigt.net) by smtprelay01.ispgateway.de with esmtpsa (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (Exim 4.84) (envelope-from ) id 1azWeO-0003Ya-Ob; Sun, 08 May 2016 23:54:08 +0200 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) X-Df-Sender: aHZvaWd0QGh2b2lndC5uZXQ= Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Hi, On Wed, May 04, 2016 at 01:50:24PM -0700, Stefan Beller wrote: > On Wed, May 4, 2016 at 8:01 AM, Heiko Voigt wrote: > > On Tue, May 03, 2016 at 05:59:58PM -0700, Stefan Beller wrote: > >> On Tue, May 3, 2016 at 4:56 PM, Jonathan Nieder wrote: > >> > Stefan Beller wrote: > >> > > >> >> This is similar to the gitignore document, but doesn't mirror > >> >> the current situation. It is rather meant to start a discussion for > >> >> the right approach for mirroring repositories with submodules. > >> > > >> > Ooh. > >> > >> Thanks for writing such a detailed answer. :) > > > > BTW, here is a pointer to the discussion (and what I wrote down) about > > this from back in 2014: > > > > https://github.com/jlehmann/git-submod-enhancements/wiki/Ideas#special-ref-overriding-gitmodules-values > > Thanks for pointing at the prior discussion! > Although not much happened since then (code wise)? Yes, IIRC nothing happened code wise. It went so far that a rough consensus was made but nobody actually stepped in to scratch that itch. > > Jonathan you suggested to copy the content from a remote to > > .git/info//gitmodules locally. How would one get it to the > > remote side? It seems to me as if we would need to implement additional > > infrastructure to do this. Would it not be simpler if we just kept it on > > a ref on the local side as well? We already have the infrastructure to > > read those values from a ref. We only would need to add something to > > write them. Then a simple push, which could be aliased in form of a > > git-submodule subcommand, suffices to get the values to the remote. > > That is good idea! Thanks. > > That also solves issues when people clone from their working copy. > > > > I would like to think a little bit further about the conflict situation > > when two remotes are providing values. Configuring this looks to me like > > a nightmare for users. Maybe there is some sort of elegant solution? > > E.g. like we use the values from remote A during a fetch from A, the > > ones from B during a fetch from B and no values from a special ref in > > case there is no remote operation involved. Since the main goal is to > > support forking of submodules isn't there always a remote operation > > involved? > > Here is what I imagine > When B mirrors from A, B sets up this special ref for its repository, > e.g. refs/meta/submodule-B and have a symbolic ref pointing at that. > (e.g. SUBMODULE_CONFIG pointing at refs/meta/submodule-B, > which has a worktree which contains a .gitmodules files which > sets up > "submodule.baz.url = http://B/baz" > "submodule.relativeBase = http://A" > > That way anyone cloning from B would get > the superproject and the submodule baz from B while the > rest of the submodules are found at A. This sounds sensible. But my imagination of a conflict was in a different way. E.g. project A has a submodule B. And now A has a remote 1 where you publish and maybe another remote 2 where someone else (a colleague?) publishes. Which configuration do you use? Here the two remotes are independent instead of subsequent forks. In this case my solution would be to use the configuration branch from 1 for B when interacting with 1. I do not have completely checked whether we always have a remote at hand for such a resolution. > When C mirrors from A, they add another branch refs/meta/submodule-C, > which can either be a fork of refs/meta/submodule-B with some changes on > top of it or it can add a reference to refs/meta/submodule-B, i.e. the > configuration > would be: > > "submodule.baseConfig = refs/meta/submodule-B" > "submodule.foo.url = ssh://C/foo" > > and SUBMODULE_CONFIG would point at refs/meta/submodule-C. > > When cloning from C, the user would get > > * the superproject from C > * submodule foo from C > * submodule baz from B > * all other submodules from A > > By the inheriting property of the branch of B there are no conflicting values. > C could just overwrite submodule.baseConfig for example. So that means in the default case we create a chain of all previous forks embedded in repository database. I am not saying that this is necessarily a bad thing but I feel that it is a new property which we should think about. It helps because users will get updated values from sources that are in the chain. On the other hand it adds a lot of dependencies which are point of failures in case a remote disappears. I am undecided on this. I would prefer if we could let people play with it a little (maybe on pu?) and then decide if there are practical pitfalls with this. > > My suggested scheme above does not solve the currently quite typical use > > case where you might 'git fetch' without submodules first and then do > > the submodule fetches during a 'git submodule update'. On the other hand > > in the 'ideal future world' where submodules behave like "normal files" the > > fetch will be done during the superproject fetch so in that case we > > could solve such conflicts. > > > > The main thing which we could keep in mind is that we only allow certain > > values in such special refs. E.g. only the ones needed to support the > > fork workflow. BTW, do we actually need to change other values than the > > URL? Addtionally we ignore other values that are more related to the > > overall project structure. E.g. like submodule..ignore. > > Maybe we want to have a dedicated protocol field, eventually. > A,B,C may have different standards on what they use by default. > e.g. Use ssh at kernel.org, but http in a corporate mirror, because http is > the only protocol not blocked by firewall. So I could imagine that a > complete mirror of submodules with relative URLs wants to only replace > ssh by http. By this you mean 'submodule.relativeBase' that you introduced above right? Or something similar. These values I would still consider them URL'ish. But my question was more geared towards this direction: Are there other values than the ones used to assemble the URL that make sense to share? E.g.: Someone might want to fork a repository and might want to change the default set of submodules that are populated with 'git submodule update --init'. Is this something we should allow via these special refs or is this actually changing the project structure and should also be reflected in project history? IMO the latter is the case. Only things like the technical organisation (like the place where a repository can be found) justify to be outside of the repository IMO. A repository without submodules does have one collection of remote repository urls. To me adding proper fork support seems be the switch from one collection for one repository to many collections for many repositories. Since this one collection is already outside of the superproject it makes sense to do the same for the submodules. So my question reformulated could be: Are there more values we currently keep inside the repository for submodules that actually belong outside? A good indication could be that they are already outside in the superproject. I did not find any flaw in these statements yet, but maybe I am oversimplifying? Cheers Heiko