git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Eddy Petrișor" <eddy.petrisor@gmail.com>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: git@vger.kernel.org, Stefan Beller <sbeller@google.com>
Subject: Re: [RFC PATCH] git-submodule.sh:cmd_update: if submodule branch exists, fetch that instead of default
Date: Thu, 8 Mar 2018 21:29:25 +0200	[thread overview]
Message-ID: <CAK0XTWdY6rfKu_s8Am6dh9njcFHqSAz_54PhW6V09DuGwE-h0g@mail.gmail.com> (raw)
In-Reply-To: <20180306202149.GA160269@aiede.svl.corp.google.com>

2018-03-06 22:21 GMT+02:00 Jonathan Nieder <jrnieder@gmail.com>:
>
> (cc list snipped)
> Hi,
>
> Eddy Petrișor wrote:
>
> > Cc: [a lot of people]
>
> Can you say a little about how this cc list was created?  E.g. should
> "git send-email" get a feature to warn about long cc lists?


I did it as advised by the  documentation, git blame:

https://github.com/git/git/blob/master/Documentation/SubmittingPatches#L264

I was suprised that there is no specialized script that does this, as
it is for the kernel or u-boot, so I ran first

    git log --pretty=format:'%an <%ae>' git-submodule.sh | sort -u >
mail-list-submodule

then configure git to use that custom output and ignore the patch
since it was trying to convert every line of it into a email address:

    git config sendemail.ccCmd 'cat mail-list-submodule # '

Then "git send-email 0001..." did the rest.


>
> > Signed-off-by: Eddy Petrișor <eddy.petrisor@gmail.com>
> > ---
> >
> > There are projects such as llvm/clang which use several repositories, and they
> > might be forked for providing support for various features such as adding Redox
> > awareness to the toolchain. This typically means the superproject will use
> > another branch than master, occasionally even use an old commit from that
> > non-master branch.
> >
> > Combined with the fact that when incorporating such a hierachy of repositories
> > usually the user is interested in just the exact commit specified in the
> > submodule info, it follows that a desireable usecase is to be also able to
> > provide '--depth 1' to avoid waiting for ages for the clone operation to
> > finish.
>
> Some previous discussion is at
> https://public-inbox.org/git/CAGZ79ka6UXKyVLmdLg_M5-sB1x96g8FRzRZy=ENy5aJBQf9_QA@mail.gmail.com/.
>
> In theory this should be straightforward: Git protocol allows fetching
> an arbitrary commit, so "git submodule update" and similar commands
> could fetch the submodule commit by SHA-1 instead of by refname.  Poof!
> Problem gone.
>
> In practice, some complications:
>
>  - some servers do not permit fetch-by-sha1.  For example, github does
>    not permit it.  This is governed by the
>    uploadpack.allowReachableSHA1InWant / uploadpack.allowAnySHA1InWant
>    configuration items.


That is the problem I have faced since I tried to clone this repo
which has at lest 2 levels of submodules:
https://github.com/redox-os/redox

The problematic modules are:
rust @ https://github.com/redox-os/rust/tree/81c2bf4e51647295d3d92952dbb0464b460df0c3
- first level

and

rust/src/llvm @
https://github.com/rust-lang/llvm/tree/ba2edd794c7def715007931fcd1b4ce62aa711c8


>
>    That should be surmountable by making the behavior conditional, but
>    it's a complication.


Which forced me to try to fetch a branch on which that commit exists,
but also consider providing --depth for the submodule checkout,
effectively minimizing the amount of brought in history.

>
>
>  - When the user passes --depth=<num>, do they mean that to apply to
>    the superproject, to the submodules, or both?  Documentation should
>    make the behavior clear.


The supermodule clone already exists and that works OK; I was after
providing something like 'git submodule update --depth 20 --recursive'
or evn providing that 'depth' info via the .gitmodules parameters
since 'shallow' is already used somehow, yet that is a bool value, not
an integer, like depth expects:


eddy@feodora:~/usr/src/redox/rust-eddy$ git config -f .gitmodules
--list | egrep '(depth|shallow)'
submodule.src/llvm.shallow=true
submodule.src/rt/hoedown.shallow=true
submodule.src/jemalloc.shallow=true
submodule.src/liblibc.shallow=true
submodule.src/doc/nomicon.shallow=true
submodule.src/tools/cargo.shallow=true
submodule.src/doc/reference.shallow=true
submodule.src/doc/book.shallow=true
submodule.src/tools/rls.shallow=true
submodule.src/libcompiler_builtins.shallow=true
submodule.src/tools/clippy.shallow=true
submodule.src/tools/rustfmt.shallow=true
submodule.src/tools/miri.shallow=true
submodule.src/dlmalloc.shallow=true
submodule.src/binaryen.shallow=true
submodule.src/doc/rust-by-example.shallow=true
submodule.src/llvm-emscripten.shallow=true
submodule.src/tools/rust-installer.shallow=true


> > Git submodule seems to be very stubborn and cloning master, although the
> > wrapper script and the gitmodules-helper could work together to clone directly
> > the branch specified in the .gitmodules file, if specified.
>
> This could make sense.  For the same reason as --depth in the
> superproject gives ambiguous signals about what should happen in
> submodules, --single-branch in the superproject gives ambiguous
> signals about what branch to fetch in submodules.


I never thought of providing the branch in the command line, since
that's not versionable info, but to provide git-submodule a hint in
the .gitsubmodule config about on which branch the hash in question
should be found, like this:

eddy@feodora:~/usr/src/redox/rust-eddy$ git config -f .gitmodules
--list | egrep branch
submodule.src/llvm.branch=rust-llvm-release-4-0-1
submodule.src/rt/hoedown.branch=rust-2015-09-21-do-not-delete

>
> > Another wrinkle is that when the commit is not the tip of the branch, the depth
> > parameter should somehow be stored in the .gitmodules info, but any change in
> > the submodule will break the supermodule submodule depth info sooner or later,
> > which is definitly frigile.
>
> Hm, this seems to go in another direction.  I don't think we should
> store the depth parameter in the .gitmodules file, since different
> users are likely to have different preferences about what to make
> shallow.  If we make --depth easy enough to use at the superproject
> level then the user can specify what they want there.


Yes, but
a) if the commit is not on the tip of that branch,
b) the server does not support sha1 fetching and
c) it is not expected to typically work in the submodules from the
superproject checkout (e.g submodule is under another team's control)

the typical usecase could be that the code is question is read-only
and it should be OK to typically fetch a shallow clone of the
submodule (by default). The only compromise at that point is to either
choose a afe margin for the defalt depth in case the module branch is
being modified.

I solved a similar problem at work where having 2 shallow branches
which had to be merged; since when trying to merge two branches
without apprent common history git will do a merge commit, when a
simple ff woudl wor, I wanted to find the merge-base.
SInce the brances were shallow and they had to have a merge-base, the
solution was to was to simply increase the --depth parameter in
predifined steps (of 10 commits) and fetch both branches, check if the
'git merge-base'  command manged to find a commit; if not the depthc
parameter woudl be increased and the whole process would be started
until that merge-base was found.

I wonder if the same couldn't be done for the submodules that have
submodule.<dir>.shallow setting, making the depth parameter hardcoding
unnecessary, and guranteeing the submodule could be cloned at any
given time, as long as the branch specified by submodule.<dir>.branch
is not removed from the submodule repo.

>
> > I tried digging into this section of the code and debugging with bashdb to see
> > where --depth might fit, but I got stuck on the shell-to-helper interaction and
> > the details of the submodule implementation, so I want to lay out this first
> > patch as starting point for the discussion in the hope somebody else picks it
> > up or can provide some inputs. I have the feeling there are multiple code paths
> > that are being ran, depending on the moment (initial clone, submodule
> > recursive, post-clone update etc.) and I have a gut feeling there shouldn't be
> > any code duplication just because the operation is different.
> >
> > This first patch is only trying to use a non-master branch, I have some changes
> > for the --depth part, but I stopped working on it due to the "default depth"
> > issue above.
> >
> > Does any of this sound reasonable?
> > Is this patch idea usable or did I managed to touch the part of the code that
> > should not be touched?
>
> I agree with the goal.  As mentioned above, I'm not confident about
> the particular mechanism --- e.g. something using fetch-by-sha1 seems
> likely to be more intuitive.
>
> Today, the 'branch' setting in .gitmodules is only for "git submodule
> update --remote".  This patch would be a significant expansion in
> scope for it.  Hopefully others on the list can talk more about how
> that fits into various workflows and whether it would work out well.


Thanks for the input, it was helpful anyway.

Now that I think of it, I think the "fetch more and more from the
branch until we find the sha1" could work; in the worst case you would
end up going to the beginning of the repo in case the submodule
history was rewritten or if the branch specified in the supermodule
submodule.<path>.branch was removed.

I am aware this is not ellegant at all (what is the default "step"?),
yet the submodule part of the code already struggles in that area
because it clearly looks like an afterthought, anyway.

>
> Thanks and hope that helps,
> Jonathan


-- 
Eddy Petrișor

  reply	other threads:[~2018-03-08 19:29 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1520366804-28233-1-git-send-email-eddy.petrisor@gmail.com>
2018-03-06 20:21 ` [RFC PATCH] git-submodule.sh:cmd_update: if submodule branch exists, fetch that instead of default Jonathan Nieder
2018-03-08 19:29   ` Eddy Petrișor [this message]
2018-03-08 19:41     ` Eddy Petrișor
2018-03-16 21:33     ` Thomas Gummerer
2018-03-16 21:44       ` Eric Sunshine
     [not found]         ` <CAK0XTWcNySGgwgFgCPDnZ+m=2hfEgswHbJKYeu+LQfuQ9_=shQ@mail.gmail.com>
2018-03-17 19:11           ` Thomas Gummerer
2018-03-18  1:43             ` Eric Sunshine
2018-03-26 23:06 ` Stefan Beller
     [not found]   ` <CAK0XTWd7QGtVDwm8FDXejZfbgVH6-1NprGY0xxAnC33QH8aCCQ@mail.gmail.com>
2018-03-29 20:54     ` Fwd: " Eddy Petrișor
     [not found] ` <20180329225502.20112-1-eddy.petrisor@gmail.com>
2018-03-29 22:59   ` [RFC PATCH v2] " Eddy Petrișor
2018-04-03 22:20     ` [RFC PATCH v3 1/2] " Eddy Petrișor
2018-04-03 22:20     ` [RFC PATCH v3 2/2] t7406: add test for non-default branch in submodule Eddy Petrișor
2018-04-03 22:26       ` Eddy Petrișor
2018-04-04 18:36         ` Stefan Beller
2018-04-04 20:37           ` Eddy Petrișor
2018-04-04 21:41             ` Stefan Beller
2018-04-18 22:35               ` [RFC PATCH v4 1/9] git-submodule.sh:cmd_update: if submodule branch exists, fetch that instead of default Eddy Petrișor
2018-04-18 23:53                 ` Stefan Beller
2018-04-19  5:43                   ` Eddy Petrișor
2018-04-18 22:35               ` [RFC PATCH v4 2/9] t7406: add test for non-default branch in submodule Eddy Petrișor
2018-04-18 22:35               ` [RFC PATCH v4 3/9] fixup:t7406: use test_commit instead of echo/add/commit as suggested by Stefan Beller Eddy Petrișor
2018-04-18 22:35               ` [RFC PATCH v4 4/9] fixup:t7404:use 'git -C' instead of cd .. && git Eddy Petrișor
2018-04-18 22:35               ` [RFC PATCH v4 5/9] fixup:t7406:cleanup chained submodules after test is done Eddy Petrișor
2018-04-18 22:35               ` [RFC PATCH v4 6/9] fixup:t7406:don't call init after add, is redundant Eddy Petrișor
2018-04-18 22:35               ` [RFC PATCH v4 7/9] fixup:t7406:supr5 commit is done before submodules chaining Eddy Petrișor
2018-04-18 22:35               ` [RFC PATCH v4 8/9] fixup:t7406:use super_w instead of the existing super Eddy Petrișor
2018-04-18 22:35               ` [RFC PATCH v4 9/9] fixup:t7406:change branches in submodules after the link is done Eddy Petrișor
2018-04-19  6:07               ` [RFC PATCH v3 2/2] t7406: add test for non-default branch in submodule Eddy Petrișor
2018-04-19 17:52                 ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAK0XTWdY6rfKu_s8Am6dh9njcFHqSAz_54PhW6V09DuGwE-h0g@mail.gmail.com \
    --to=eddy.petrisor@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).