git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Stefan Beller <sbeller@google.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Jeff King <peff@peff.net>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	Heiko Voigt <hvoigt@hvoigt.net>,
	Jens Lehmann <Jens.Lehmann@web.de>,
	Fredrik Gustafsson <iveqy@iveqy.com>,
	Leandro Lucarella <leandro.lucarella@sociomantic.com>
Subject: Re: [PATCHv2] push: change submodule default to check
Date: Wed, 24 Aug 2016 15:37:29 -0700	[thread overview]
Message-ID: <CAGZ79kYOBqQ0FF4J-+KbefSD8HRrUeMqpO27m_jprhm93aB+LA@mail.gmail.com> (raw)
In-Reply-To: <xmqqh9aaot49.fsf@gitster.mtv.corp.google.com>

On Wed, Aug 24, 2016 at 12:37 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jeff King <peff@peff.net> writes:
>
> This seems to be dropped from the list, probably due to no "To:"
> header in the original, which led to "no", "To-header" "on" and
> "input <" on YOUR recipient list, so I am quoting it in full without
> trimming.
>
>> On Wed, Aug 24, 2016 at 10:30:17AM -0700, Stefan Beller wrote:
>>
>>> When working with submodules, it is easy to forget to push the submodules.
>>> The setting 'check', which checks if any existing submodule is present on
>>> at least one remote of the submodule remotes, is designed to prevent this
>>> mistake.
>>>
>>> Flipping the default to check for submodules is safer than the current
>>> default of ignoring submodules while pushing.
>>
>> It is safer, and that's good. But it's also slower, because it requires
>> an extra traversal of all of the pushed commits. And now people will
>> have to pay the price even if they are not using submodules at all.
>>
>> For instance, try this from a checkout of linux.git:
>>
>>   for i in no check; do
>>       rm -rf dst.git
>>       git init --bare dst.git
>>       echo "==> Pushing with submodules=$i"
>>       time git push --recurse-submodules=$i dst.git HEAD
>>   done
>>
>> The second case takes 30-40 seconds longer. This is a full push of
>> history, so it's an extreme case[1], but it's still rather unfortunate.
>>
>> Can we tie this default to some sign that submodules are actually in
>> use? I don't think the presence of .gitmodules is perfect (because you
>> might be in a bare repo, for example, and have just fetched some other
>> history you are relaying), but it may be a good compromise.  I'm
>> envisioning something like "--recurse-submodules=auto-check" which
>> auto-detects common situations (e.g., presence of .gitmodules or
>> .git/modules checkouts) and enables "check", and then setting the
>> default to that in the long run.
>>
>> -Peff
>>
>> [1] Actually, there is another much worse case lurking there. Try:
>>
>>       git push --recurse-submodules=check --mirror dst.git
>>
>>     from the kernel. I didn't let it finish, but I'd estimate it would
>>     take on the order of 5 hours. The problem is that push feeds each
>>     updated ref tip to find_unpushed_submodules(), so we end up walking
>>     over the same history over and over.
>>
>>     I think it should feed all of the "before" and "after" ref tips it
>>     proposes to update to a _single_ revision traversal.
>
> That sounds massively ... broken.  So before even thinking about
> flipping it to default, this needs to be fixed first.
>

I agree. That sounds bad.

However having the --auto-check feels like papering over the
actual problem which to me sounds like a design problem.
However this may be a viable short term solution.

About the design issue:
We need to answer the question: Which submodule commits
are referenced by a given set of superproject commits.

This question is advancing a very similar question that we'd
have to ask in git-gc. In gc we would end up without having to
worry about a specific set, but rather the all reachable commits
of the superproject are in the given set.

So we could solve two issues at the same time if we had a quick
way to answer this question quickly.

And for that I considered introducing a ref in the submodule
(e.g.) refs/superproject/gc_protector, which records both the
superprojects commit as well as the submodule commit
in case the superproject changed the submodule pointer.

I have some rough patches for this, but I did not consider sending
that to the list as the patches are still rough and not quite
fully thought out on the design level.

--
Actually screw this. After an office discussion with Jonathan (cc'd):

1)  We need to have an "alternate refs namespace", which contains
    secondary data (as this can be derived from the primary data with
    lots of compute). So maybe we need a file similar to the packed refs
    file, "superproject-refs" that behaves as if it is storing refs,
but that file
    is safe to delete as we know all its contents can be recreated.

2) In this new alternate refs space, we can have
    refs/superproject/<sha1> in the submodule with the sha1 of the
    superproject that points to a commit which is a dirty merge of all
    submodule commits, that have no other parents.

e.g.
In the superproject we might have a history of:

  A <- B <- C

with A being origin/master and C our local master.

In the submodule we have:

  D <- E <- F
    \
      G

F is the master branch in the submodule, G is a dangling commit.

Now we could have the following git links:

    A -> D
    B -> G
    C -> F

Then the refs/superproject/<C>
would be a merge of F and G.

When pushing in the superproject (A..C) we would need to make sure
the submodule is updated as well, i.e. we'd look at
refs/superproject/<C> to know we need to advance the remote
submodule history to contain at least G and F.

Then we can do a standard rev walk starting at G and F downwards until we
hit a commit that is contained in remote/refs/*.

> Why wasn't it discovered that "push.recursesubmodules = check" is
> UNUSABLE since it was introduced is simply beyond me.

Maybe it is not compatible with your workflow as you push only once a day?
In a centralized server model you rarely exceed a few commits on push time,
but
    $ git rev-list a829490...c08db5a |wc -l
    23
    $ git rev-list f5df0f2...f3c0fa5 |wc -l
    61
    $ git rev-list 8f73021...de36215 |wc -l
    95

and some new branches, so you pushed around 200 commits.
I think this misbehaves strongly for typical maintainer work flows.

> I am mad (which is unusual for me, isn't it? -- I've seen somebody
> else saying "I am mad", but I do not think I ever said it here).

Please direct your anger at the design of submodules.
The assumption of the submodule being sort of 'independant' doesn't allow
for efficient data structures I would think. So maybe we need to teach the
submodules about the existence and state of the superproject?

/me ducks

  parent reply	other threads:[~2016-08-24 22:37 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-24 17:30 [PATCHv2] push: change submodule default to check Stefan Beller
2016-08-24 18:38 ` Junio C Hamano
     [not found] ` <20160824183112.ceekegpzavnbybxp@sigill.intra.peff.net>
2016-08-24 19:37   ` Junio C Hamano
2016-08-24 21:26     ` Junio C Hamano
2016-08-24 22:37     ` Stefan Beller [this message]
2016-08-24 23:01       ` Jeff King
2016-09-14 17:31         ` [PATCH 1/2] serialize collection of changed submodules Heiko Voigt
2016-09-14 22:30           ` Junio C Hamano
2016-09-15 12:10             ` [PATCH 3/2] batch check whether submodule needs pushing into one call Heiko Voigt
2016-09-15 21:08               ` Junio C Hamano
2016-09-16  9:40                 ` Heiko Voigt
2016-09-16 12:31                   ` Heiko Voigt
2016-09-16 18:13                     ` Junio C Hamano
2016-09-19 20:08                       ` Heiko Voigt
2016-09-16 17:59               ` Junio C Hamano
2016-09-19 19:58                 ` Heiko Voigt
2016-09-15 12:18             ` [PATCH 4/2] use actual start hashes for submodule push check instead of local refs Heiko Voigt
2016-09-16 17:27           ` [PATCH 1/2] serialize collection of changed submodules Junio C Hamano
2016-09-19 19:44             ` Heiko Voigt
2016-09-14 17:51         ` [PATCH 2/2] serialize collection of refs that contain submodule changes Heiko Voigt
2016-09-14 19:46           ` Heiko Voigt
2016-09-14 20:04             ` Stefan Beller
2016-09-16 17:47           ` Junio C Hamano
2016-09-19 19:51             ` Heiko Voigt
2016-09-19 20:09               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGZ79kYOBqQ0FF4J-+KbefSD8HRrUeMqpO27m_jprhm93aB+LA@mail.gmail.com \
    --to=sbeller@google.com \
    --cc=Jens.Lehmann@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hvoigt@hvoigt.net \
    --cc=iveqy@iveqy.com \
    --cc=leandro.lucarella@sociomantic.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).