git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* GSoC Project | Submodules related work
@ 2017-03-15  9:33 Prathamesh Chavan
  2017-03-15 18:12 ` Stefan Beller
  0 siblings, 1 reply; 4+ messages in thread
From: Prathamesh Chavan @ 2017-03-15  9:33 UTC (permalink / raw)
  To: git; +Cc: Stefan Beller, Christian Couder, bmwill, me

Hey everyone,

I am Prathamesh. I am studying Computer Science and Engineering at IIT
Kharagpur. I am interested to participate in Google Summer of Code 2017
under Git organization. I attempted "Avoid pipes in git related commands
for test suite" as my microproject[1].

As a part of GSoC, I would like to work on git submodules. The projects I
have looked up are:
        1. "git -C sub add ." might behave just like "git add sub"
        2. Teach "git -C <submodule-path> status" in an unpopulated
           submodule to report the submodule being unpopulated, do not
           fall back to the superproject.
        3. Teach "git log -- <path/into/submodule/and/further>" to behave
           like "git -C <path/into/submodule> log -- <and/further>"

I went through the series of mail (related to projects 1 and 2)[2] for
getting a better picture of the projects. I think as the projects are
quite interrelated together, these may make a complete GSoC project.

Also the conclusions which I was able to make from the mails[2] are:

1. We are catching commands typed by the user in an unpopulated or
   even an uninitialized submodule.

2. We first check if we are present in the superproject's root dir.
   If .git dir is present we check for the presence of .gitmodules file,
   from which we can check the path give is inside some submodule.
   *When .git file containing just a .gitlink is present then, I am not
   sure but even in this case the root folder contains .gitmodules
   file in the case of submodules(Please correct me here, if I'm going
   wrong), then we may still carry the same procedure as above.

3. Once we can detect whether the $cwd is in a submodule, we can output
   "The submodule 'sub' is not initialized. To init ..." for all the
   commands which doesn't initialize and populate the submodule.

4. As similar detection could be used in the third project listed above,
   hence I even wished to include it.

What are your suggestions about these projects? Also, will it be
rational to consider it as one complete project for GSoC?
I think this might interfere with Valery's proposal[3] of shell to C
conversion of submodule related codes. What do you all think?
If it does interfere, then can we both work out on something common?

Thanks,
Prathamesh

[1]: https://public-inbox.org/git/20170313065148.10707-1-pc44800@gmail.com/T/#u
[2]: https://public-inbox.org/git/CAGZ79kYW1zS3-9AYPaiUfBGTFygyg1ZVd3YyOctp3gihfEpHeg@mail.gmail.com/T/#u
[3]: https://public-inbox.org/git/20170310211348.18887-1-me@vtolstov.org/T/#u

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: GSoC Project | Submodules related work
  2017-03-15  9:33 GSoC Project | Submodules related work Prathamesh Chavan
@ 2017-03-15 18:12 ` Stefan Beller
  2017-03-15 21:28   ` Prathamesh
  0 siblings, 1 reply; 4+ messages in thread
From: Stefan Beller @ 2017-03-15 18:12 UTC (permalink / raw)
  To: Prathamesh Chavan; +Cc: git, Christian Couder, Brandon Williams, Valery Tolstov

On Wed, Mar 15, 2017 at 2:33 AM, Prathamesh Chavan <pc44800@gmail.com> wrote:
> Hey everyone,
>
> I am Prathamesh. I am studying Computer Science and Engineering at IIT
> Kharagpur. I am interested to participate in Google Summer of Code 2017
> under Git organization. I attempted "Avoid pipes in git related commands
> for test suite" as my microproject[1].
>
> As a part of GSoC, I would like to work on git submodules. The projects I
> have looked up are:
>         1. "git -C sub add ." might behave just like "git add sub"
>         2. Teach "git -C <submodule-path> status" in an unpopulated
>            submodule to report the submodule being unpopulated, do not
>            fall back to the superproject.
>         3. Teach "git log -- <path/into/submodule/and/further>" to behave
>            like "git -C <path/into/submodule> log -- <and/further>"
>
> I went through the series of mail (related to projects 1 and 2)[2] for
> getting a better picture of the projects. I think as the projects are
> quite interrelated together, these may make a complete GSoC project.

Sounds reasonable.

> Also the conclusions which I was able to make from the mails[2] are:
>
> 1. We are catching commands typed by the user in an unpopulated or
>    even an uninitialized submodule.

What do you mean by catch here?

What happens is that Git is summoned in e.g. path/super/sub/ and when
Git wakes up, it has to find out what is going on, e.g. where the repo
is that it should work on.

(0) it checks if it is inside the .git dir by looking for files like
HEAD, config
    objects/.)
1) It looks if there is a ".git" file or directory in the current directory.
2) if that is not the case go up one directory and check again.
    repeat this step until either the repo is found or a filesystem boundary
    is reached.

And as uninitialized submodules do not have a .git, we'll find the
superproject repository. Once the superproject is found, the
subcommand itself is invoked. (e.g. cmd_add for "git add", in
builtin/add.c; the function signature is just like main() except that
it has an additional prefix parameter, which indicates the path
from where we ended up at to the original invokation point,
i.e. when invoked in  path/super/sub/, and the superproject
was found at path/super/, the prefix is sub/.

> 2. We first check if we are present in the superproject's root dir.

After the repo discovery as described above we're in a root of
*a* repository, and we have a prefix, which may or may not be
an uninitialized submodule.

>     we check for the presence of .gitmodules file,

We have an API for that. :)
See submodule-config.h#submodule_from_path that
either returns a struct submodule or NULL if there is no
submodule.

However to detect if there is a submodule, we can to check two
things: if there is a .gitmodules entry and if there is a gitlink entry
recorded in the tree. Depending on the command we'd want to
do one before the other. e.g. git-add most likely doesn't need to
load .gitmodules, but may have the objects already loaded.
So checking if a given path is a submodule is cheap compared
to loading the .gitmodules file, so we'd probably want to do the
cheap thing first.

>    from which we can check the path give is inside some submodule.
>    *When .git file containing just a .gitlink is present then, I am not
>    sure but even in this case the root folder contains .gitmodules
>    file in the case of submodules(Please correct me here, if I'm going
>    wrong), then we may still carry the same procedure as above.

I think even when the .gitmodules file is missing, we want to have
some sort of warning here, as it is a confusing state to run git
from an uninitialized gitlink'd repository. The user may assume they
run the command in the gitlink'd repo, so it may be better to bail out.

>
> 3. Once we can detect whether the $cwd is in a submodule, we can output
>    "The submodule 'sub' is not initialized. To init ..." for all the
>    commands which doesn't initialize and populate the submodule.

It depends on the command what we want to do; for most commands
this seems to be the right choice. For git-log we need to do a different
thing, as you mention in 4)

> 4. As similar detection could be used in the third project listed above,
>    hence I even wished to include it.
>
> What are your suggestions about these projects? Also, will it be
> rational to consider it as one complete project for GSoC?

I think it is sensible to consider enhancing multiple commands, as
one command is a very small bite for a GSoC project.
And once you have the first command done, you'd generally know
the vibe and the next commands ought to be easier. (though we'd
still need to figure out different outcomes, e.g. step 3 or 4 as above).

> I think this might interfere with Valery's proposal[3] of shell to C
> conversion of submodule related codes. What do you all think?

I do not think there is interference with Valery's proposal, as this
proposal would mostly work in builtin/{add,log,commit}.c
(cmd_status is in builtin/commit.c for whatever reason)
whereas Valery's proposal would mostly revolve around working
in git-submodule.sh (deleting lines there) and
builtin/submodule--helper.c (adding the deleted lines back here;
in another language)

> If it does interfere, then can we both work out on something common?

I really do not see a lot of interference of these 2 proposals.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: GSoC Project | Submodules related work
  2017-03-15 18:12 ` Stefan Beller
@ 2017-03-15 21:28   ` Prathamesh
  2017-03-15 22:13     ` Stefan Beller
  0 siblings, 1 reply; 4+ messages in thread
From: Prathamesh @ 2017-03-15 21:28 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, Christian Couder, Brandon Williams, Valery Tolstov

>> Also the conclusions which I was able to make from the mails[2] are:
>>
>> 1. We are catching commands typed by the user in an unpopulated or
>>    even an uninitialized submodule.
>
> What do you mean by catch here?

By catching commands, I meant that we identify that the user has entered
the command in an unpopulated or uninitialized submodule and respond
to the user accordingly.

>
> What happens is that Git is summoned in e.g. path/super/sub/ and when
> Git wakes up, it has to find out what is going on, e.g. where the repo
> is that it should work on.
>
> (0) it checks if it is inside the .git dir by looking for files like
> HEAD, config
>     objects/.)
> 1) It looks if there is a ".git" file or directory in the current directory.
> 2) if that is not the case go up one directory and check again.
>     repeat this step until either the repo is found or a filesystem boundary
>     is reached.
>
> And as uninitialized submodules do not have a .git, we'll find the
> superproject repository. Once the superproject is found, the
> subcommand itself is invoked. (e.g. cmd_add for "git add", in
> builtin/add.c; the function signature is just like main() except that
> it has an additional prefix parameter, which indicates the path
> from where we ended up at to the original invokation point,
> i.e. when invoked in  path/super/sub/, and the superproject
> was found at path/super/, the prefix is sub/.
>
>> 2. We first check if we are present in the superproject's root dir.
>
> After the repo discovery as described above we're in a root of
> *a* repository, and we have a prefix, which may or may not be
> an uninitialized submodule.
>
>>     we check for the presence of .gitmodules file,
>
> We have an API for that. :)
> See submodule-config.h#submodule_from_path that
> either returns a struct submodule or NULL if there is no
> submodule.
>
> However to detect if there is a submodule, we can to check two
> things: if there is a .gitmodules entry and if there is a gitlink entry
> recorded in the tree. Depending on the command we'd want to
> do one before the other. e.g. git-add most likely doesn't need to
> load .gitmodules, but may have the objects already loaded.
> So checking if a given path is a submodule is cheap compared
> to loading the .gitmodules file, so we'd probably want to do the
> cheap thing first.

Adding to this, we may use here this functions is_submodule_populated()
and is_submodule_initialized() from submodule.c

>
>>    from which we can check the path give is inside some submodule.
>>    *When .git file containing just a .gitlink is present then, I am not
>>    sure but even in this case the root folder contains .gitmodules
>>    file in the case of submodules(Please correct me here, if I'm going
>>    wrong), then we may still carry the same procedure as above.
>
> I think even when the .gitmodules file is missing, we want to have
> some sort of warning here, as it is a confusing state to run git
> from an uninitialized gitlink'd repository. The user may assume they
> run the command in the gitlink'd repo, so it may be better to bail out.

Can you please give an example of such situation ? I would like to
reproduce it and think further.
(As even in the case where the superproject is initialized using gitlink,
.gitmodules is in the same folder as that of the .git file containing
GIT_DIR path)

>
>>
>> 3. Once we can detect whether the $cwd is in a submodule, we can output
>>    "The submodule 'sub' is not initialized. To init ..." for all the
>>    commands which doesn't initialize and populate the submodule.
>
> It depends on the command what we want to do; for most commands
> this seems to be the right choice. For git-log we need to do a different
> thing, as you mention in 4)
>
>> 4. As similar detection could be used in the third project listed above,
>>    hence I even wished to include it.
>>
>> What are your suggestions about these projects? Also, will it be
>> rational to consider it as one complete project for GSoC?
>
> I think it is sensible to consider enhancing multiple commands, as
> one command is a very small bite for a GSoC project.
> And once you have the first command done, you'd generally know
> the vibe and the next commands ought to be easier. (though we'd
> still need to figure out different outcomes, e.g. step 3 or 4 as above).
>

Thank you for your suggestions. I will also look into more such cases
where I may enhance multiple commands so as to expand my project.

>> I think this might interfere with Valery's proposal[3] of shell to C
>> conversion of submodule related codes. What do you all think?
>
> I do not think there is interference with Valery's proposal, as this
> proposal would mostly work in builtin/{add,log,commit}.c
> (cmd_status is in builtin/commit.c for whatever reason)
> whereas Valery's proposal would mostly revolve around working
> in git-submodule.sh (deleting lines there) and
> builtin/submodule--helper.c (adding the deleted lines back here;
> in another language)
>
>> If it does interfere, then can we both work out on something common?
>
> I really do not see a lot of interference of these 2 proposals.

Thank you for confirming that. Now I may carry-on working on my
proposal for the project :) Also if it possible, I would like to
work on a smaller task related to my project first, as it will help me
understand about the project more, and which also will help me write
my proposal for the project.

Thanks,
Prathamesh

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: GSoC Project | Submodules related work
  2017-03-15 21:28   ` Prathamesh
@ 2017-03-15 22:13     ` Stefan Beller
  0 siblings, 0 replies; 4+ messages in thread
From: Stefan Beller @ 2017-03-15 22:13 UTC (permalink / raw)
  To: Prathamesh; +Cc: git, Christian Couder, Brandon Williams, Valery Tolstov

On Wed, Mar 15, 2017 at 2:28 PM, Prathamesh <pc44800@gmail.com> wrote:
>> What do you mean by catch here?
>
> By catching commands, I meant that we identify that the user has entered
> the command in an unpopulated or uninitialized submodule and respond
> to the user accordingly.

Well in that sense, we do not do that, yet. I see what you mean.

>> However to detect if there is a submodule, we can to check two
>> things: if there is a .gitmodules entry and if there is a gitlink entry
>> recorded in the tree. Depending on the command we'd want to
>> do one before the other. e.g. git-add most likely doesn't need to
>> load .gitmodules, but may have the objects already loaded.
>> So checking if a given path is a submodule is cheap compared
>> to loading the .gitmodules file, so we'd probably want to do the
>> cheap thing first.
>
> Adding to this, we may use here this functions is_submodule_populated()
> and is_submodule_initialized() from submodule.c

Not quite, IMO.

is_submodule_initialized checks for the existence of
submodule.<name>.URL in .git/config; but it sounds as if we want to
check for the existence of submodule.<any name>.path in .gitmodules
instead. So we'd end up using only

    module = submodule_from_path(null_sha1, path);

only from that function.

is_submodule_populated checks if there is a .git file/directory at the given
path, which at this point we would know is not the case, already?

We'd roughly need to
module_list_compute(... prefix = "", pathspec = prefix, ...),
i.e.

    struct cache_entry *ce = lookup_cache_entry_for(prefix);
    if (ce && S_ISGITLINK(ce->ce_mode))
        /* this is an uninitialized submodule */
    else
        /* this is just a normal prefix */

>> I think even when the .gitmodules file is missing, we want to have
>> some sort of warning here, as it is a confusing state to run git
>> from an uninitialized gitlink'd repository. The user may assume they
>> run the command in the gitlink'd repo, so it may be better to bail out.
>
> Can you please give an example of such situation ? I would like to
> reproduce it and think further.

I think you can create such a situation via

    git init tmp
    cd tmp
    git init gitlink
    git -C gitlink commit --allow-empty -m "initial commit"
->  git add gitlink
    git commit -m "add 'gitlink' as a gitlink"
    rm -rf gitlink
    mkdir gitlink
    git -C gitlink status

Note that we used "git add" instead of "git submodule add". git-add
doesn't care about submodule, i.e. doesn't create a .gitmodules entry
for you (unlike "git submodule add").

Also note that the "rm -rf && mkdir" is just a placeholder to produce this
state. An alternative ending after the commit could have been

    cd ..
    git clone tmp tmp2
    cd tmp2
    git -C gitlink status



> (As even in the case where the superproject is initialized using gitlink,
> .gitmodules is in the same folder as that of the .git file containing
> GIT_DIR path)

I do not understand this?

> Also if it possible, I would like to
> work on a smaller task related to my project first, as it will help me
> understand about the project more, and which also will help me write
> my proposal for the project.

Heh, that is the beauty of open source, you don't have to ask permission. ;)
But I guess this is meant as a question, on what this smaller project
could be? Well as this proposal is heavy on path computation, I'd
look for pathspec related leftovers at
https://git-blame.blogspot.com/p/leftover-bits.html

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-03-15 22:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-15  9:33 GSoC Project | Submodules related work Prathamesh Chavan
2017-03-15 18:12 ` Stefan Beller
2017-03-15 21:28   ` Prathamesh
2017-03-15 22:13     ` Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).