git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Jonathan Nieder <jrnieder@gmail.com>, "git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [PATCH] recursive submodules: detach HEAD from new state
Date: Tue, 25 Jul 2017 15:27:07 -0700
Message-ID: <CAGZ79kZdoktBRBuNxVk-zehZR3Z-egEPG81KQ9WqHTEtrm+5uw@mail.gmail.com> (raw)
In-Reply-To: <xmqqr2x5bhk7.fsf@gitster.mtv.corp.google.com>

On Mon, Jul 24, 2017 at 3:23 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>> Also, while I do agree with you that the problem exists, it is
>> unclear why this patch is a solution and not a hack that sweeps a
>> problem under the rug.
>>
>> It is unclear why this "silently detach HEAD without telling the
>> user" is a better solution than erroring out, for example [*1*].
>
> Just to avoid possible confusion; I am not claiming that it would be
> more (or less for that matter) sensible to error out than silently
> detaching HEAD, because I am not giving the reason to substantiate
> the claim and I do not have a strong opinion to favour which one (or
> another potential solution, if any).
>
> I am just saying that the patch that proposes a solution should be
> backed with an explanation why it is a good idea, especially when
> there are obvious alternatives that are not so clearly inferior.
>
> Thanks.

So I took a step back and wrote about different proposals where
we want to go long term. See below. This will help us
figuring out how to approach this bug correctly.
------



RFC: A new type of symbolic refs

A symbolic ref can currently only point at a ref or another symbolic ref.
This proposal show cases different scenarios on how this could change in
the future.



A: HEAD pointing at the superprojects index
===========================================

Introduce a new symbolic ref that points at the superprojects index of
the gitlink. The format is

  "repo:" <superprojects gitdir> '\0' <gitlink-path> '\0'

Ref read operations
-------------------
  e.g. git log HEAD

Just like existing symrefs, the content of the ref will be read and followed.
On reading "repo:", the sha1 will be obtained equivalent to:

    git -C <superproject> ls-files -s <gitlink-path> | awk '{ print $2}'

In case of error
(superproject not found, gitlink path does not exist), the ref is broken and

Ref write operations driven by the submodule, affecting symrefs
---------------------------------------------------------------
  e.g. git checkout <other branch> (in the submodule)

In this scenario only the HEAD is optionally attached to the superproject,
so we can rewrite the HEAD to be anything else, such as a branch just fine.
Once the HEAD is not pointing at the superproject any more, we'll leave the
submodule alone in operations driven by the superproject.

Ref write operations driven by the submodule, affecting target ref
------------------------------------------------------------------
  e.g. git commit, reset --hard, update-ref (in the submodule)

The HEAD stays the same, pointing at the superproject.
The gitlink is changed to the target sha1, using

  git -C <superproject> update-index --add \
      --cacheinfo 160000,$SHA1,<gitlink-path>

This will affect the superprojects index, such that then a commit in
the superproject is needed.

Ref write operations driven by the superproject, changing the gitlink
---------------------------------------------------------------------
  e.g. git checkout <tree-ish>, git reset --hard (in the superproject)

This will change the gitlink in the superprojects index, such that the HEAD
in the submodule changes, which would trigger an update of the
submodules working tree.

Consistency considerations (gc)
-------------------------------
  e.g. git gc --aggressive --prune=now

The repacking logic is already aware of a detached HEAD, such that
using this new symref mechanism would not generate problems as long as
we keep the HEAD attached to the superproject. However when commits/objects
are created while the HEAD is attached to the superproject and then HEAD
switches to a local branch, there are problems with the created objects
as they seem unreachable now.

This problem is not new as a superproject may record submodule objects
that are not reachable from any of the submodule branches. Such objects
fall prey to overzealous packing in the submodule.

This proposal however exposes this problem a lot more, as the submodule
has fewer needs for branches.




B: HEAD pointing at a superprojects branch
==========================================

Instead of pointing at the index of the superproject, we also
encode a branch name:

    repo:" <superprojects gitdir> '\0' <gitlink-path> '\0' branch '\0'

Ref read operations
-------------------
  e.g. git log HEAD

This is similar to the case of pointing at the index, except that the reading
operation reads from the tip of the branch:

    git -C <superproject> ls-tree <superproject branch> -- \
        <gitlink-path> | awk '{ print $3}'

Ref write operations driven by the submodule, affecting symrefs
---------------------------------------------------------------
  e.g. git checkout <other branch> (in the submodule)

HEAD will be pointed at the local target branch, dropping the affliation to
the superproject.

Ref write operations driven by the submodule, affecting target ref
------------------------------------------------------------------
  e.g. git commit, reset --hard, update-ref (in the submodule)

As we're pointing at the superprojects branch, this would have to create
a dummy(?) commit in the superproject, that just changes the submodule
pointer in the superprojects branch, such that the operation of storing
a new sha1 for the submodule is equivalent to

  git -C <superproject> update-index --add \
      --cacheinfo 160000,$SHA1,<gitlink-path>
  git -C <superproject> commit -m "Update submodule"

This behavior in the superproject is similar to Gerrits subscription model
where superprojects are updated from the submodule.

Each operation in the submodule triggers a local superproject commit.

Ref write operations driven by the superproject, changing the gitlink
---------------------------------------------------------------------
  e.g. git merge, git pull (in the superproject)

This will change the gitlink in the superprojects index, such that the HEAD
in the submodule changes, which would trigger an update of the
submodules working tree.

This would require a good merge strategy for submodules, i.e. on merge
the submodule would create a merge commit that is recorded in the
superprojects merge commit.

Consistency considerations (gc)
-------------------------------
  e.g. git gc --aggressive --prune=now

The repacking problem comes with a solution unlike the previous proposal.
This is because any relevant commit in the submodule is recorded in the
superproject via a commit in a branch. Then even non-fast-forward histories
in the submodule can all be kept by walking the superproject and looking at
all gitlink entries of the submodule.



C: All branches are symbolic references to the superproject
===========================================================

Instead of having just HEAD pointed at a superproject, all(!) branches
in the submodule point at the superprojects branch of the same name.
Symbolic refs that resolve to a local sha1 are not allowed, any symbolic ref
ends up pointing at the superproject eventually.
e.g. HEAD points at a submodule branch, which in turn points at
the superproject branch of the same name.

Ref read operations
-------------------
  e.g. git log

HEAD is read, which may be either (a) locally detached or (b) pointing at a
superproject branch. Resolve as in B.

Ref write operations driven by the submodule, affecting symrefs
---------------------------------------------------------------
  e.g. git checkout <other branch> (in the submodule)

As there is no other local branch, HEAD would point at the other submodule
branch, which then points at another branch in the superproject.

Ref write operations driven by the submodule, affecting target ref
------------------------------------------------------------------
  e.g. git commit, reset --hard, update-ref (in the submodule)

  same as B.

Ref write operations driven by the superproject, changing the gitlink
---------------------------------------------------------------------
  e.g. git merge, git pull (in the superproject)

  same as B.

Consistency considerations (gc)
-------------------------------
  e.g. git gc --aggressive --prune=now

As the superproject contains all knowledge, the gc starts with a
walk of all superproject branches, destilling the recorded gitlink entries
and then starts walking in the submodule from all the recorded gitlinks
to create a pack.

gc and repacking would either be forbidden in the submodule or deflected
to the superproject.

  reply index

Thread overview: 8+ messages in thread (expand / mbox.gz / Atom feed / [top])
2017-07-24 17:36 Stefan Beller
2017-07-24 18:03 ` Jonathan Nieder
2017-07-24 19:07   ` Stefan Beller
2017-07-24 20:57     ` Junio C Hamano
2017-07-24 21:33   ` Junio C Hamano
2017-07-24 22:23     ` Junio C Hamano
2017-07-25 22:27       ` Stefan Beller [this message]
2017-07-26 19:36         ` Junio C Hamano

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply to all the recipients using the --to, --cc,
  and --in-reply-to switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79kZdoktBRBuNxVk-zehZR3Z-egEPG81KQ9WqHTEtrm+5uw@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox