git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* in case you want a use-case with lots of submodules
@ 2017-06-19 15:59 Yaroslav Halchenko
  2017-06-19 19:30 ` Stefan Beller
  0 siblings, 1 reply; 4+ messages in thread
From: Yaroslav Halchenko @ 2017-06-19 15:59 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller

Hi All,

On a recent trip I've listened to the git minutes podcast episode and
got excited to hear  Stefan Beller (CCed just in case) describing
ongoing work on submodules mechanism.  I got excited, since e.g.
performance improvements would be of great benefit to us too.

In our project, http://datalad.org, git submodules is the basic
mechanism to bring multiple "datasets" (mix of git and git-annex'ed
repositories)  under the same roof so we could non-ambiguously
version them all at any level.

http://datasets.datalad.org ATM provides quite a sizeable (ATM 370
repositories, up to 4 levels deep) hierarchy of git/git-annex
repositories all tied together via git submodules mechanism.  And as the
collection grows, interactions with it become slower, so additional
options (such as --ignore-submodules=dirty  to status) become our
friends.

So I thought to share this as a use-case happen you need more
motivation or just a real-case test-bed for your work.  And thank
you again for making Git even Greater.

P.S. Please CCme in your replies (if any), I am not on the list

With best regards,
-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: in case you want a use-case with lots of submodules
  2017-06-19 15:59 in case you want a use-case with lots of submodules Yaroslav Halchenko
@ 2017-06-19 19:30 ` Stefan Beller
  2017-06-19 20:20   ` Yaroslav Halchenko
  0 siblings, 1 reply; 4+ messages in thread
From: Stefan Beller @ 2017-06-19 19:30 UTC (permalink / raw)
  To: Yaroslav Halchenko, Prathamesh Chavan; +Cc: git@vger.kernel.org

On Mon, Jun 19, 2017 at 8:59 AM, Yaroslav Halchenko <yoh@onerussian.com> wrote:
> Hi All,
>
> On a recent trip I've listened to the git minutes podcast episode and
> got excited to hear  Stefan Beller (CCed just in case) describing
> ongoing work on submodules mechanism.  I got excited, since e.g.
> performance improvements would be of great benefit to us too.

If you're mostly interested in performance improvements of the status
quo (i.e. "make git-submodule fast"), then the work of Prathamesh
Chavan (cc'd) might be more interesting to you than what I do.
He is porting git-submodule (which is mostly a shell script nowadays)
to C, such that we can save a lot of process invocations and can do
processing within one process.

> In our project, http://datalad.org, git submodules is the basic
> mechanism to bring multiple "datasets" (mix of git and git-annex'ed
> repositories)  under the same roof so we could non-ambiguously
> version them all at any level.

Cool, glad to here submodules being useful. :)

> http://datasets.datalad.org ATM provides quite a sizeable (ATM 370
> repositories, up to 4 levels deep) hierarchy of git/git-annex
> repositories all tied together via git submodules mechanism.  And as the
> collection grows, interactions with it become slower, so additional
> options (such as --ignore-submodules=dirty  to status) become our
> friends.

I am not as much concerned about the 370 number than about the
4 layers of nesting. In my experience the nested submodule case
is a little bit error prone and the bug reports are not as frequent as
there are not as many users of nesting, yet(?)

In a neighboring thread on the mailing list we have a discussion
on the usefulness of being on branches than in detached HEAD
in the submodules.
https://public-inbox.org/git/0092CDD27C5F9D418B0F3E9B5D05BE08010287DF@SBS2011.opfingen.plc2.de/

This would not break non-ambiguously, rather it would add
ease of use.

> So I thought to share this as a use-case happen you need more
> motivation or just a real-case test-bed for your work.  And thank
> you again for making Git even Greater.

Thanks for the motivation. :)

> P.S. Please CCme in your replies (if any), I am not on the list
>
> With best regards,

Cheers,
Stefan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: in case you want a use-case with lots of submodules
  2017-06-19 19:30 ` Stefan Beller
@ 2017-06-19 20:20   ` Yaroslav Halchenko
  2017-06-20  5:43     ` Stefan Beller
  0 siblings, 1 reply; 4+ messages in thread
From: Yaroslav Halchenko @ 2017-06-19 20:20 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Prathamesh Chavan, git@vger.kernel.org


On Mon, 19 Jun 2017, Stefan Beller wrote:

> On Mon, Jun 19, 2017 at 8:59 AM, Yaroslav Halchenko <yoh@onerussian.com> wrote:
> > Hi All,

> > On a recent trip I've listened to the git minutes podcast episode and
> > got excited to hear  Stefan Beller (CCed just in case) describing
> > ongoing work on submodules mechanism.  I got excited, since e.g.
> > performance improvements would be of great benefit to us too.

> If you're mostly interested in performance improvements of the status
> quo (i.e. "make git-submodule fast"), then the work of Prathamesh
> Chavan (cc'd) might be more interesting to you than what I do.
> He is porting git-submodule (which is mostly a shell script nowadays)
> to C, such that we can save a lot of process invocations and can do
> processing within one process.

ah -- cool.  I would be eager to test it out, thanks!  would be
interesting to see if it positively affects our overall performance.
Pointers to that development would be welcome!

> > http://datasets.datalad.org ATM provides quite a sizeable (ATM 370
> > repositories, up to 4 levels deep) hierarchy of git/git-annex
> > repositories all tied together via git submodules mechanism.  And as the
> > collection grows, interactions with it become slower, so additional
> > options (such as --ignore-submodules=dirty  to status) become our
> > friends.

> I am not as much concerned about the 370 number than about the
> 4 layers of nesting. In my experience the nested submodule case
> is a little bit error prone and the bug reports are not as frequent as
> there are not as many users of nesting, yet(?)

well -- part of the story here is that we are forced to use/have full
blown .git/ directories (for git-annex symlinks to content files to
work) within submodules instead of .git file with a reference under
parent's .git/modules.   So we can 'slice' at any level and I
guess that is why may be avoiding some possibly issues due to nesting
and the "parent has all .git/modules" approach.

> In a neighboring thread on the mailing list we have a discussion
> on the usefulness of being on branches than in detached HEAD
> in the submodules.
> https://public-inbox.org/git/0092CDD27C5F9D418B0F3E9B5D05BE08010287DF@SBS2011.opfingen.plc2.de/

> This would not break non-ambiguously, rather it would add
> ease of use.

that is indeed a common caveat... I am not sure if any heuristic
approach would provide a 'bullet proof' solution.  I might even prefer a
hardcoded 'branch-name' to be listed/associated with each submodule
within .gitmodules.  In the datalad case, detached HEAD is common
whenever someone installs "outdated" (branch of which progressed
forward) submodule.  In this case we just check if the branch after "git
clone"  (but before git submodule update) includes the pointed by
Subproject commit, and if so -- we announce that it must be the branch
(so far it is always "master" branch anyways ;) )

> > So I thought to share this as a use-case happen you need more
> > motivation or just a real-case test-bed for your work.  And thank
> > you again for making Git even Greater.

> Thanks for the motivation. :)

the least I could do ;)

-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: in case you want a use-case with lots of submodules
  2017-06-19 20:20   ` Yaroslav Halchenko
@ 2017-06-20  5:43     ` Stefan Beller
  0 siblings, 0 replies; 4+ messages in thread
From: Stefan Beller @ 2017-06-20  5:43 UTC (permalink / raw)
  To: Yaroslav Halchenko; +Cc: Prathamesh Chavan, git@vger.kernel.org

On Mon, Jun 19, 2017 at 1:20 PM, Yaroslav Halchenko <yoh@onerussian.com> wrote:
>
> On Mon, 19 Jun 2017, Stefan Beller wrote:
>
>> On Mon, Jun 19, 2017 at 8:59 AM, Yaroslav Halchenko <yoh@onerussian.com> wrote:
>> > Hi All,
>
>> > On a recent trip I've listened to the git minutes podcast episode and
>> > got excited to hear  Stefan Beller (CCed just in case) describing
>> > ongoing work on submodules mechanism.  I got excited, since e.g.
>> > performance improvements would be of great benefit to us too.
>
>> If you're mostly interested in performance improvements of the status
>> quo (i.e. "make git-submodule fast"), then the work of Prathamesh
>> Chavan (cc'd) might be more interesting to you than what I do.
>> He is porting git-submodule (which is mostly a shell script nowadays)
>> to C, such that we can save a lot of process invocations and can do
>> processing within one process.
>
> ah -- cool.  I would be eager to test it out, thanks!  would be
> interesting to see if it positively affects our overall performance.
> Pointers to that development would be welcome!

The latest from today:
https://public-inbox.org/git/CAME+mvUQJFneV7b1G7zmAidP-5L=nimvY43V0ug-Gtesr83tzg@mail.gmail.com/


>
>> > http://datasets.datalad.org ATM provides quite a sizeable (ATM 370
>> > repositories, up to 4 levels deep) hierarchy of git/git-annex
>> > repositories all tied together via git submodules mechanism.  And as the
>> > collection grows, interactions with it become slower, so additional
>> > options (such as --ignore-submodules=dirty  to status) become our
>> > friends.
>
>> I am not as much concerned about the 370 number than about the
>> 4 layers of nesting. In my experience the nested submodule case
>> is a little bit error prone and the bug reports are not as frequent as
>> there are not as many users of nesting, yet(?)
>
> well -- part of the story here is that we are forced to use/have full
> blown .git/ directories (for git-annex symlinks to content files to
> work) within submodules instead of .git file with a reference under
> parent's .git/modules.   So we can 'slice' at any level and I
> guess that is why may be avoiding some possibly issues due to nesting
> and the "parent has all .git/modules" approach.

That sounds like you either want to configure to have the submodules
git dirs in-place or you want to convince git-annex to learn about the
gitdir pointer files.

>
>> In a neighboring thread on the mailing list we have a discussion
>> on the usefulness of being on branches than in detached HEAD
>> in the submodules.
>> https://public-inbox.org/git/0092CDD27C5F9D418B0F3E9B5D05BE08010287DF@SBS2011.opfingen.plc2.de/
>
>> This would not break non-ambiguously, rather it would add
>> ease of use.
>
> that is indeed a common caveat... I am not sure if any heuristic
> approach would provide a 'bullet proof' solution.  I might even prefer a
> hardcoded 'branch-name' to be listed/associated with each submodule
> within .gitmodules.

hardcoded as submodule.NAME.branch, maybe?
https://git-scm.com/docs/gitmodules

>  In the datalad case, detached HEAD is common

So you are accustomed to detached HEADs and would not
gain much from being back on a branch?  That's cool, too.


> whenever someone installs "outdated" (branch of which progressed
> forward) submodule.  In this case we just check if the branch after "git
> clone"  (but before git submodule update) includes the pointed by
> Subproject commit, and if so -- we announce that it must be the branch
> (so far it is always "master" branch anyways ;) )

heh, having just one branch. That is retro-style. :)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-06-20  5:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-19 15:59 in case you want a use-case with lots of submodules Yaroslav Halchenko
2017-06-19 19:30 ` Stefan Beller
2017-06-19 20:20   ` Yaroslav Halchenko
2017-06-20  5:43     ` Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).