From: Nick Townsend <nick.townsend@mac.com>
To: git@vger.kernel.org
Subject: Fwd: [PATCH] submodule recursion in git-archive
Date: Mon, 02 Dec 2013 16:03:37 -0800 [thread overview]
Message-ID: <D8D13DC5-0E93-4900-A738-A4A6700BC92F@mac.com> (raw)
In-Reply-To: 3651F1C2-741E-4170-9468-0EF07F120CB9@mac.com
Begin forwarded message:
> From: Nick Townsend <nick.townsend@mac.com>
> Subject: Re: [PATCH] submodule recursion in git-archive
> Date: 2 December 2013 16:00:50 GMT-8
> To: Junio C Hamano <gitster@pobox.com>
> Cc: René Scharfe <l.s.r@web.de>, Jens Lehmann <Jens.Lehmann@web.de>, git@vger.kernel.org, Jeff King <peff@peff.net>
>
>
> On 27 Nov 2013, at 11:43, Junio C Hamano <gitster@pobox.com> wrote:
>
>> Nick Townsend <nick.townsend@mac.com> writes:
>>
>>> On 26 Nov 2013, at 14:18, Junio C Hamano <gitster@pobox.com> wrote:
>>>
>>>> Even if the code is run inside a repository with a working tree,
>>>> when producing a tarball out of an ancient commit that had a
>>>> submodule not at its current location, --recurse-submodules option
>>>> should do the right thing, so asking for working tree location of
>>>> that submodule to find its repository is wrong, I think. It may
>>>> happen to find one if the archived revision is close enough to what
>>>> is currently checked out, but that may not necessarily be the case.
>>>>
>>>> At that point when the code discovers an S_ISGITLINK entry, it
>>>> should have both a pathname to the submodule relative to the
>>>> toplevel and the commit object name bound to that submodule
>>>> location. What it should do, when it does not find the repository
>>>> at the given path (maybe because there is no working tree, or the
>>>> sudmodule directory has moved over time) is roughly:
>>>>
>>>> - Read from .gitmodules at the top-level from the tree it is
>>>> creating the tarball out of;
>>>>
>>>> - Find "submodule.$name.path" entry that records that path to the
>>>> submodule; and then
>>>>
>>>> - Using that $name, find the stashed-away location of the submodule
>>>> repository in $GIT_DIR/modules/$name.
>>>>
>>>> or something like that.
>>>>
>>>> This is a related tangent, but when used in a repository that people
>>>> often use as their remote, the repository discovery may have to
>>>> interact with the relative URL. People often ship .gitmodules with
>>>>
>>>> [submodule "bar"]
>>>> URL = ../bar.git
>>>> path = barDir
>>>>
>>>> for a top-level project "foo" that can be cloned thusly:
>>>>
>>>> git clone git://site.xz/foo.git
>>>>
>>>> and host bar.git to be clonable with
>>>>
>>>> git clone git://site.xz/bar.git barDir/
>>>>
>>>> inside the working tree of the foo project. In such a case, when
>>>> "archive --recurse-submodules" is running, it would find the
>>>> repository for the "bar" submodule at "../bar.git", I would think.
>>>>
>>>> So this part needs a bit more thought, I am afraid.
>>>
>>> I see that there is a lot of potential complexity around setting up a submodule:
>>
>> No question about it.
>>
>>> * The .gitmodules file can be dirty (easy to flag, but should we
>>> allow archive to proceed?)
>>
>> As we are discussing "archive", which takes a tree object from the
>> top-level project that is recorded in the object database, the
>> information _about_ the submodule in question should come from the
>> given tree being archived. There is no reason for the .gitmodules
>> file that happens to be sitting in the working tree of the top-level
>> project to be involved in the decision, so its dirtyness should not
>> matter, I think. If the tree being archived has a submodule whose
>> name is "kernel" at path "linux/" (relative to the top-level
>> project), its repository should be at .git/modules/kernel in the
>> layout recent git-submodule prepares, and we should find that
>> path-and-name mapping from .gitmodules recorded in that tree object
>> we are archiving. The version that happens to be checked out to the
>> working tree may have moved the submodule to a new path "linux-3.0/"
>> and "linux-3.0/.git" may have "gitdir: .git/modules/kernel" in it,
>> but when archiving a tree that has the submodule at "linux/", it
>> would not help---we would not know to look at "linux-3.0/.git" to
>> learn that information anyway because .gitmodules in the working
>> tree would say that the submodule at path "linux-3.0/" is with name
>> "kernel", and would not tell us anything about "linux/".
>>
>>> * Users can mess with settings both prior to git submodule init
>>> and before git submodule update.
>>
>> I think this is irrelevant for exactly the same reason as above.
>>
>> What makes this tricker, however, is how to deal with an old-style
>> repository, where the submodule repositories are embedded in the
>> working tree that happens to be checked out. In that case, we may
>> have to read .gitmodules from two places, i.e.
>>
>> (1) We are archiving a tree with a submodule at "linux/";
>>
>> (2) We read .gitmodules from that tree and learn that the submodule
>> has name "kernel";
>>
>> (3) There is no ".git/modules/kernel" because the repository uses
>> the old layout (if the user never was interested in this
>> submodule, .git/modules/kernel may also be missing, and we
>> should tell these two cases apart by checking .git/config to
>> see if a corresponding entry for the "kernel" submodule exists
>> there);
>>
>> (4) In a repository that uses the old layout, there must be the
>> repository somewhere embedded in the current working tree (this
>> inability to remove is why we use the new layout these days).
>> We can learn where it is by looking at .gitmodules in the
>> working tree---map the name "kernel" we learned earlier, and
>> map it to the current path ("linux-3.0/" if you have been
>> following this example so far).
>>
>> And in that fallback context, I would say that reading from a dirty
>> (or "messed with by the user") .gitmodules is the right thing to
>> do. Perhaps the user may be in the process of moving the submodule
>> in his working tree with
>>
>> $ mv linux-3.0 linux-3.2
>> $ git config -f .gitmodules submodule.kernel.path linux-3.2
>>
>> but hasn't committed the change yet.
>>
>>> For those reasons I deliberately decided not to reproduce the
>>> above logic all by myself.
>>
>> As I already hinted, I agree that the "how to find the location of
>> submodule repository, given a particular tree in the top-level
>> project the submodule belongs to and the path to the submodule in
>> question" deserves a separate thread to discuss with area experts.
>
> As per my email to Heiko on this thread, I’m happy to start such
> a discussion - I’ll use your notes as a starting point. I’m much more comfortable
> using a wiki for this - is this common or should I start a new mail thread
> with RFC in the title or similar?
>
> I did complete my work on my version of git-archive (for internal use) and added some regression tests
> for current behaviour. Also the add_submodule_odb patch should IMHO be incorporated
> anyway. I’ll resubmit those two for consideration in a new thread.
>
> Kind Regards
> Nick Townsend
>
next prev parent reply other threads:[~2013-12-03 0:04 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-26 0:04 [PATCH] submodule recursion in git-archive Nick Townsend
2013-11-26 15:17 ` René Scharfe
2013-11-26 18:57 ` Jens Lehmann
2013-11-26 22:18 ` Junio C Hamano
2013-11-27 0:28 ` René Scharfe
2013-11-27 3:28 ` Nick Townsend
2013-11-27 19:05 ` Junio C Hamano
2013-11-27 3:55 ` Nick Townsend
2013-11-27 19:43 ` Junio C Hamano
2013-11-29 22:38 ` Heiko Voigt
[not found] ` <3C71BC83-4DD0-43F8-9E36-88594CA63FC5@mac.com>
2013-12-03 0:05 ` Nick Townsend
2013-12-03 18:33 ` Heiko Voigt
2013-12-09 20:55 ` [RFC/WIP PATCH] implement reading of submodule .gitmodules configuration into cache Heiko Voigt
2013-12-09 23:37 ` Junio C Hamano
2013-12-12 13:03 ` Heiko Voigt
2013-12-03 0:00 ` [PATCH] submodule recursion in git-archive Nick Townsend
2013-12-03 0:03 ` Nick Townsend [this message]
2013-11-26 22:38 ` Heiko Voigt
2013-11-27 3:33 ` Nick Townsend
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=D8D13DC5-0E93-4900-A738-A4A6700BC92F@mac.com \
--to=nick.townsend@mac.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).