git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Pierre Habouzit <madcoder@debian.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: Jakub Narebski <jnareb@gmail.com>,
	Avery Pennarun <apenwarr@gmail.com>,
	Nigel Magnay <nigel.magnay@gmail.com>,
	Git ML <git@vger.kernel.org>
Subject: Re: git submodules
Date: Sun, 17 Aug 2008 22:13:36 +0200	[thread overview]
Message-ID: <20080817201336.GA17148@artemis> (raw)
In-Reply-To: <7vfxptpr76.fsf@gitster.siamese.dyndns.org>

[-- Attachment #1: Type: text/plain, Size: 6785 bytes --]

On Mon, Jul 28, 2008 at 10:41:17PM +0000, Junio C Hamano wrote:
> I suspect the use of it may help the use case Pierre proposes, but its
> main attractiveness as I understood it back when we discussed the facility
> was that you could switch branches between 'maint' that did not have a
> submodule at "path" back then, and 'master' that does have one now,
> without losing the submodule repository.  When checking out 'master' (and
> that would probably mean you would update 'git-submodule init' and
> 'git-submodule update' implementation), you would instanciate subdirectory
> "path", create "path/.git" that is such a regular file that that points at
> somewhere inside the $GIT_DIR of superproject (say ".git/submodules/foo").
> By storing refs and object store are all safely away in the superproject
> $GIT_DIR, you can now safely switch back to 'maint', which would involve
> making sure there is no local change that will be lost and then removing
> the "path" and everything underneath it.

gitfiles looks nifty for sure, though I've thought about it a bit, and
I'm not sure if we don't want something a bit more powerful, though
still in the same vein.

If we look at submodules, I quite believe that we would benefit a lot
from sharing the object directory accross the supermodule and all its
submodules, because of the following reasons:

  * It could make things like git-blame better: at work, it's common for
    us to move files across submodules: we have a stable library shared
    accross projects, and move there C modules that have staged for
    quite some time in the applications and are stable enough, and it's
    pity to loose history then, whereas git could really guess about the
    move if it sees through GITLINKS in the same object repository.
    GITLINKS are not very different from trees actually if you can look
    through them, it's just a matter of dereferencing twice instead of
    once.

  * For people that have made a subdirectory become a submodule (and
    it's also something that can happen) it's likely that lots of blobs
    are shared. It would end up taking less disk space.

  * It helps people fixing situations where they pushed a supermodule
    with a substate that never existed without seeing it. Since the
    object store is shared, this commit that actually never existed will
    never ever be pruned, and at _least_ one person on earth will never
    lose it. With detached heads everywhere it's very easy to not name a
    detached head, and have it pruned at some point.

  * I _believe_ (just a hunch) that it helps knowing if it's possible to
    perform a "recursive" (wrt submodules) checkout/reset/$whatever,
    without having to spawn subcommands and quite unpleasant similar
    stuff.


Though we would not like to have submodules suffer from reachability
issues after a prune in the supermodule. That means that all references
and reflogs of the submodules shall be accessible from the supermodule
so that everything that could mess with the object store by removing
objects cannot remove interesting objects (that should limit the code
paths to really seldom places actually).


So what I've thinked about was to extend gitfiles so that it can also
define where to find not only the git_dir but also the object store.
Moving the current "faked symlink" approach to a less terse file looking
like a standard git-config one. E.g.:

    [gitfile]
	git_dir = some/path/.git/submodules/foo/
        objects = some/path/.git/objects
        # why not other settings in the future ?

This part is quite easy and straightforward (and it can be done while
keeping backward compatibility with the current way gitfiles work).
What I can't decide is how we deal with the reflogs and references. I
see two choices. Assuming the submodules git_dir's are under the
supermodule $GIT_DIR/submodules/$name_of_the_super_module/:

  (1) we do nothing more.

  (2) we melt the submodules reflogs and references into the supermodule
      ones with appropriate namespacing. For example, for a submodule
      named "foo/bar" we would have its reflogs live in the supermodule
      .git/logs/submodules/foo/bar/logs/* and its references under
      .git/refs/submodules/foo/bar/refs/*. For that we add 'logs =' and
      'refs =' to the gitfile.

The first approach need us to be able to somehow recurse under
.git/submodules to understand what inside that looks like a git_dir, and
teach reachability commands to look at the refs inside them. It can be
quite a lot of work, especially since we can have submodules inside
submodules at some point.

The second approach has the net benefit that no pruning command has to
be modified to work. Many commands that we want to act on the global
repository will just work. Though, we have to fix a couple of issues
too:
  (1) be able to have a references directory that is not .git/refs. I
      looked at the source, I believe only 3 or 4 places in the C code
      have to be fixed for that to work, maybe a bit more in the shell
      commands, but that should be fairly easy.

  (2) it will break reference packing, because the submodules won't see
      the supermodule packed-refs file, and we will probably have to
      draft a new packed-refs thingy because of this issue. A simple
      possibility I see is to move packed-refs as refs/.packed-refs (as
      a starting dot cannot be a reference name). Then teach
      git-pack-refs to generate a .packed-refs each time it crosses a
      'refs/' directory name, and finally learn how to load those (and
      no it won't require to recurse into the whole refs/, we can mark
      in the toplevel refs/.packed-refs that it has submodules and that
      there is a .packed-refs under
      refs/submodules/foo/bar/refs/.packed-refs and avoid the costly
      recursion).

  (3) we will have to teach for_each_ref to skip "/submodules",
      which is I believe fairly easy.


I personnaly like the second approach better because it will scale
better (I believe) when people will do submodules into submodules into
submodules. But I'm unsure if it's too disruptive or not.

So .. comments thoughts remarks are welcomed :)



Note: with enhanced gitfiles, and making workdirs use gitfiles, with any
      of those approaches, it's easy to make workdirs that won't have
      the "if we repack we may lose things referenced from other
      workdir's reflogs" problem anymore. Which is kind of a nifty side
      effect ;)
-- 
·O·  Pierre Habouzit
··O                                                madcoder@debian.org
OOO                                                http://www.madism.org

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

  reply	other threads:[~2008-08-17 20:15 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-28 16:20 git submodules Pierre Habouzit
2008-07-28 16:23 ` Pierre Habouzit
2008-07-28 20:23 ` Nigel Magnay
2008-07-28 20:55   ` Pierre Habouzit
2008-07-28 20:59     ` Pierre Habouzit
2008-07-28 21:40       ` Avery Pennarun
2008-07-28 22:03         ` Pierre Habouzit
2008-07-28 22:26           ` Jakub Narebski
2008-07-28 22:41             ` Junio C Hamano
2008-08-17 20:13               ` Pierre Habouzit [this message]
2008-08-17 22:54                 ` Avery Pennarun
2008-08-17 23:08                 ` Junio C Hamano
2008-08-18  0:46                   ` Pierre Habouzit
2008-07-28 22:32           ` Avery Pennarun
2008-07-28 23:12             ` Pierre Habouzit
2008-07-29  5:51         ` Benjamin Collins
2008-07-29  6:04           ` Shawn O. Pearce
2008-07-29  8:18             ` Nigel Magnay
2008-07-29  8:45               ` Pierre Habouzit
2008-07-29  8:21           ` Pierre Habouzit
2008-07-29  8:37             ` Pierre Habouzit
2008-07-29  8:51               ` Petr Baudis
2008-07-29 12:15                 ` Johannes Schindelin
2008-07-29 13:07                   ` Pierre Habouzit
2008-07-29 13:15                     ` Johannes Schindelin
2008-07-29 13:19                       ` Pierre Habouzit
2008-07-29 13:31                       ` Nigel Magnay
2008-07-29 14:49                         ` Pierre Habouzit
2008-07-29 14:53                         ` Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2009-10-17 17:15 Steven Noonan
2009-10-17 17:27 ` Jakub Narebski
2009-10-17 22:30   ` Nanako Shiraishi
2009-10-21 19:38 ` Avery Pennarun
2008-04-28 19:50 Victor Bogado da Silva Lins
2008-04-28 21:01 ` Miklos Vajna
     [not found] <s5hwspjzbt0.wl%tiwai@suse.de>
     [not found] ` <Pine.LNX.4.61.0802061437190.8113@tm8103.perex-int.cz>
     [not found]   ` <Pine.LNX.4.61.0802061505470.8113@tm8103.perex-int.cz>
     [not found]     ` <47AA1361.7070201@keyaccess.nl>
     [not found]       ` <s5h7ihhknez.wl%tiwai@suse.de>
2008-02-07 21:24         ` GIT submodules Rene Herman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080817201336.GA17148@artemis \
    --to=madcoder@debian.org \
    --cc=apenwarr@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=nigel.magnay@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).