git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Eddy Petrișor" <eddy.petrisor@gmail.com>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: git@vger.kernel.org, Stefan Beller <sbeller@google.com>
Subject: Re: [RFC PATCH] git-submodule.sh:cmd_update: if submodule branch exists, fetch that instead of default
Date: Thu, 8 Mar 2018 21:41:24 +0200	[thread overview]
Message-ID: <CAK0XTWepPHe__qO6foFrh8-5mVJgWFzUsSw_4GZ89MFOmTrXbA@mail.gmail.com> (raw)
In-Reply-To: <CAK0XTWdY6rfKu_s8Am6dh9njcFHqSAz_54PhW6V09DuGwE-h0g@mail.gmail.com>

2018-03-08 21:29 GMT+02:00 Eddy Petrișor <eddy.petrisor@gmail.com>:
> 2018-03-06 22:21 GMT+02:00 Jonathan Nieder <jrnieder@gmail.com>:
>>
>> (cc list snipped)
>> Hi,
>>
>> Eddy Petrișor wrote:
>>
>> > Cc: [a lot of people]
>>
>> Can you say a little about how this cc list was created?  E.g. should
>> "git send-email" get a feature to warn about long cc lists?
>
>
> I did it as advised by the  documentation, git blame:
>
> https://github.com/git/git/blob/master/Documentation/SubmittingPatches#L264
>
> I was suprised that there is no specialized script that does this, as
> it is for the kernel or u-boot, so I ran first
>
>     git log --pretty=format:'%an <%ae>' git-submodule.sh | sort -u >
> mail-list-submodule
>
> then configure git to use that custom output and ignore the patch
> since it was trying to convert every line of it into a email address:
>
>     git config sendemail.ccCmd 'cat mail-list-submodule # '
>
> Then "git send-email 0001..." did the rest.
>
>
>>
>> > Signed-off-by: Eddy Petrișor <eddy.petrisor@gmail.com>
>> > ---
>> >
>> > There are projects such as llvm/clang which use several repositories, and they
>> > might be forked for providing support for various features such as adding Redox
>> > awareness to the toolchain. This typically means the superproject will use
>> > another branch than master, occasionally even use an old commit from that
>> > non-master branch.
>> >
>> > Combined with the fact that when incorporating such a hierachy of repositories
>> > usually the user is interested in just the exact commit specified in the
>> > submodule info, it follows that a desireable usecase is to be also able to
>> > provide '--depth 1' to avoid waiting for ages for the clone operation to
>> > finish.
>>
>> Some previous discussion is at
>> https://public-inbox.org/git/CAGZ79ka6UXKyVLmdLg_M5-sB1x96g8FRzRZy=ENy5aJBQf9_QA@mail.gmail.com/.
>>
>> In theory this should be straightforward: Git protocol allows fetching
>> an arbitrary commit, so "git submodule update" and similar commands
>> could fetch the submodule commit by SHA-1 instead of by refname.  Poof!
>> Problem gone.
>>
>> In practice, some complications:
>>
>>  - some servers do not permit fetch-by-sha1.  For example, github does
>>    not permit it.  This is governed by the
>>    uploadpack.allowReachableSHA1InWant / uploadpack.allowAnySHA1InWant
>>    configuration items.
>
>
> That is the problem I have faced since I tried to clone this repo
> which has at lest 2 levels of submodules:
> https://github.com/redox-os/redox
>
> The problematic modules are:
> rust @ https://github.com/redox-os/rust/tree/81c2bf4e51647295d3d92952dbb0464b460df0c3
> - first level
>
> and
>
> rust/src/llvm @
> https://github.com/rust-lang/llvm/tree/ba2edd794c7def715007931fcd1b4ce62aa711c8
>
>
>>
>>    That should be surmountable by making the behavior conditional, but
>>    it's a complication.
>
>
> Which forced me to try to fetch a branch on which that commit exists,
> but also consider providing --depth for the submodule checkout,
> effectively minimizing the amount of brought in history.
>
>>
>>
>>  - When the user passes --depth=<num>, do they mean that to apply to
>>    the superproject, to the submodules, or both?  Documentation should
>>    make the behavior clear.
>
>
> The supermodule clone already exists and that works OK; I was after
> providing something like 'git submodule update --depth 20 --recursive'
> or evn providing that 'depth' info via the .gitmodules parameters
> since 'shallow' is already used somehow, yet that is a bool value, not
> an integer, like depth expects:
>
>
> eddy@feodora:~/usr/src/redox/rust-eddy$ git config -f .gitmodules
> --list | egrep '(depth|shallow)'
> submodule.src/llvm.shallow=true
> submodule.src/rt/hoedown.shallow=true
> submodule.src/jemalloc.shallow=true
> submodule.src/liblibc.shallow=true
> submodule.src/doc/nomicon.shallow=true
> submodule.src/tools/cargo.shallow=true
> submodule.src/doc/reference.shallow=true
> submodule.src/doc/book.shallow=true
> submodule.src/tools/rls.shallow=true
> submodule.src/libcompiler_builtins.shallow=true
> submodule.src/tools/clippy.shallow=true
> submodule.src/tools/rustfmt.shallow=true
> submodule.src/tools/miri.shallow=true
> submodule.src/dlmalloc.shallow=true
> submodule.src/binaryen.shallow=true
> submodule.src/doc/rust-by-example.shallow=true
> submodule.src/llvm-emscripten.shallow=true
> submodule.src/tools/rust-installer.shallow=true
>
>
>> > Git submodule seems to be very stubborn and cloning master, although the
>> > wrapper script and the gitmodules-helper could work together to clone directly
>> > the branch specified in the .gitmodules file, if specified.
>>
>> This could make sense.  For the same reason as --depth in the
>> superproject gives ambiguous signals about what should happen in
>> submodules, --single-branch in the superproject gives ambiguous
>> signals about what branch to fetch in submodules.
>
>
> I never thought of providing the branch in the command line, since
> that's not versionable info, but to provide git-submodule a hint in
> the .gitsubmodule config about on which branch the hash in question
> should be found, like this:
>
> eddy@feodora:~/usr/src/redox/rust-eddy$ git config -f .gitmodules
> --list | egrep branch
> submodule.src/llvm.branch=rust-llvm-release-4-0-1
> submodule.src/rt/hoedown.branch=rust-2015-09-21-do-not-delete
>
>>
>> > Another wrinkle is that when the commit is not the tip of the branch, the depth
>> > parameter should somehow be stored in the .gitmodules info, but any change in
>> > the submodule will break the supermodule submodule depth info sooner or later,
>> > which is definitly frigile.
>>
>> Hm, this seems to go in another direction.  I don't think we should
>> store the depth parameter in the .gitmodules file, since different
>> users are likely to have different preferences about what to make
>> shallow.  If we make --depth easy enough to use at the superproject
>> level then the user can specify what they want there.
>
>
> Yes, but
> a) if the commit is not on the tip of that branch,
> b) the server does not support sha1 fetching and
> c) it is not expected to typically work in the submodules from the
> superproject checkout (e.g submodule is under another team's control)
>
> the typical usecase could be that the code is question is read-only
> and it should be OK to typically fetch a shallow clone of the
> submodule (by default). The only compromise at that point is to either
> choose a afe margin for the defalt depth in case the module branch is
> being modified.
>
> I solved a similar problem at work where having 2 shallow branches
> which had to be merged; since when trying to merge two branches
> without apprent common history git will do a merge commit, when a
> simple ff woudl wor, I wanted to find the merge-base.
> SInce the brances were shallow and they had to have a merge-base, the
> solution was to was to simply increase the --depth parameter in
> predifined steps (of 10 commits) and fetch both branches, check if the
> 'git merge-base'  command manged to find a commit; if not the depthc
> parameter woudl be increased and the whole process would be started
> until that merge-base was found.

Another optimization could be for submodule.<path>.shallow to imply
--single-branch, too.

> I wonder if the same couldn't be done for the submodules that have
> submodule.<dir>.shallow setting, making the depth parameter hardcoding
> unnecessary, and guranteeing the submodule could be cloned at any
> given time, as long as the branch specified by submodule.<dir>.branch
> is not removed from the submodule repo.
>
>>
>> > I tried digging into this section of the code and debugging with bashdb to see
>> > where --depth might fit, but I got stuck on the shell-to-helper interaction and
>> > the details of the submodule implementation, so I want to lay out this first
>> > patch as starting point for the discussion in the hope somebody else picks it
>> > up or can provide some inputs. I have the feeling there are multiple code paths
>> > that are being ran, depending on the moment (initial clone, submodule
>> > recursive, post-clone update etc.) and I have a gut feeling there shouldn't be
>> > any code duplication just because the operation is different.
>> >
>> > This first patch is only trying to use a non-master branch, I have some changes
>> > for the --depth part, but I stopped working on it due to the "default depth"
>> > issue above.
>> >
>> > Does any of this sound reasonable?
>> > Is this patch idea usable or did I managed to touch the part of the code that
>> > should not be touched?
>>
>> I agree with the goal.  As mentioned above, I'm not confident about
>> the particular mechanism --- e.g. something using fetch-by-sha1 seems
>> likely to be more intuitive.
>>
>> Today, the 'branch' setting in .gitmodules is only for "git submodule
>> update --remote".  This patch would be a significant expansion in
>> scope for it.  Hopefully others on the list can talk more about how
>> that fits into various workflows and whether it would work out well.
>
>
> Thanks for the input, it was helpful anyway.
>
> Now that I think of it, I think the "fetch more and more from the
> branch until we find the sha1" could work; in the worst case you would
> end up going to the beginning of the repo in case the submodule
> history was rewritten or if the branch specified in the supermodule
> submodule.<path>.branch was removed.

Actually, the branch removal in the submodule would lead to a failure
to fetch from the get go, so the worst case is if history is rewritten
because the sha1 would be missing fromt he branch.
Antoher non-fatal perfomance issue would be if the desired commit is
very close to the initial commit of the repo.

> I am aware this is not ellegant at all (what is the default "step"?),
> yet the submodule part of the code already struggles in that area
> because it clearly looks like an afterthought, anyway.


-- 
Eddy Petrișor

  reply	other threads:[~2018-03-08 19:47 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1520366804-28233-1-git-send-email-eddy.petrisor@gmail.com>
2018-03-06 20:21 ` [RFC PATCH] git-submodule.sh:cmd_update: if submodule branch exists, fetch that instead of default Jonathan Nieder
2018-03-08 19:29   ` Eddy Petrișor
2018-03-08 19:41     ` Eddy Petrișor [this message]
2018-03-16 21:33     ` Thomas Gummerer
2018-03-16 21:44       ` Eric Sunshine
     [not found]         ` <CAK0XTWcNySGgwgFgCPDnZ+m=2hfEgswHbJKYeu+LQfuQ9_=shQ@mail.gmail.com>
2018-03-17 19:11           ` Thomas Gummerer
2018-03-18  1:43             ` Eric Sunshine
2018-03-26 23:06 ` Stefan Beller
     [not found]   ` <CAK0XTWd7QGtVDwm8FDXejZfbgVH6-1NprGY0xxAnC33QH8aCCQ@mail.gmail.com>
2018-03-29 20:54     ` Fwd: " Eddy Petrișor
     [not found] ` <20180329225502.20112-1-eddy.petrisor@gmail.com>
2018-03-29 22:59   ` [RFC PATCH v2] " Eddy Petrișor
2018-04-03 22:20     ` [RFC PATCH v3 1/2] " Eddy Petrișor
2018-04-03 22:20     ` [RFC PATCH v3 2/2] t7406: add test for non-default branch in submodule Eddy Petrișor
2018-04-03 22:26       ` Eddy Petrișor
2018-04-04 18:36         ` Stefan Beller
2018-04-04 20:37           ` Eddy Petrișor
2018-04-04 21:41             ` Stefan Beller
2018-04-18 22:35               ` [RFC PATCH v4 1/9] git-submodule.sh:cmd_update: if submodule branch exists, fetch that instead of default Eddy Petrișor
2018-04-18 23:53                 ` Stefan Beller
2018-04-19  5:43                   ` Eddy Petrișor
2018-04-18 22:35               ` [RFC PATCH v4 2/9] t7406: add test for non-default branch in submodule Eddy Petrișor
2018-04-18 22:35               ` [RFC PATCH v4 3/9] fixup:t7406: use test_commit instead of echo/add/commit as suggested by Stefan Beller Eddy Petrișor
2018-04-18 22:35               ` [RFC PATCH v4 4/9] fixup:t7404:use 'git -C' instead of cd .. && git Eddy Petrișor
2018-04-18 22:35               ` [RFC PATCH v4 5/9] fixup:t7406:cleanup chained submodules after test is done Eddy Petrișor
2018-04-18 22:35               ` [RFC PATCH v4 6/9] fixup:t7406:don't call init after add, is redundant Eddy Petrișor
2018-04-18 22:35               ` [RFC PATCH v4 7/9] fixup:t7406:supr5 commit is done before submodules chaining Eddy Petrișor
2018-04-18 22:35               ` [RFC PATCH v4 8/9] fixup:t7406:use super_w instead of the existing super Eddy Petrișor
2018-04-18 22:35               ` [RFC PATCH v4 9/9] fixup:t7406:change branches in submodules after the link is done Eddy Petrișor
2018-04-19  6:07               ` [RFC PATCH v3 2/2] t7406: add test for non-default branch in submodule Eddy Petrișor
2018-04-19 17:52                 ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAK0XTWepPHe__qO6foFrh8-5mVJgWFzUsSw_4GZ89MFOmTrXbA@mail.gmail.com \
    --to=eddy.petrisor@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).