git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [GSoC] Draft Proposal (Convert submodule to builtin)
@ 2021-04-01 11:40 Chinmoy Chakraborty
  2021-04-02  5:32 ` Bagas Sanjaya
  0 siblings, 1 reply; 10+ messages in thread
From: Chinmoy Chakraborty @ 2021-04-01 11:40 UTC (permalink / raw)
  To: git

Greetings mentors,

This is my draft proposal for the GSoC project, Convert submodule to 
builtin.

It would be really helpful if you could review the draft and present your

views on it :).


Regards,

Chinmoy Chakraborty.


Link: https://github.com/chinmoy12c/GSoC/blob/master/GSoC_Git_proposal.pdf


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GSoC] Draft Proposal (Convert submodule to builtin)
  2021-04-01 11:40 [GSoC] Draft Proposal (Convert submodule to builtin) Chinmoy Chakraborty
@ 2021-04-02  5:32 ` Bagas Sanjaya
  2021-04-02  6:31   ` Christian Couder
  2021-04-02  7:04   ` Chinmoy Chakraborty
  0 siblings, 2 replies; 10+ messages in thread
From: Bagas Sanjaya @ 2021-04-02  5:32 UTC (permalink / raw)
  To: Chinmoy Chakraborty; +Cc: git

On 01/04/21 18.40, Chinmoy Chakraborty wrote:
> Greetings mentors,
> 
> This is my draft proposal for the GSoC project, Convert submodule to builtin.
> 
> It would be really helpful if you could review the draft and present your
> 
> views on it :).
> 
> 
> Regards,
> 
> Chinmoy Chakraborty.
> 
> 
> Link: https://github.com/chinmoy12c/GSoC/blob/master/GSoC_Git_proposal.pdf
> 
Looking from your proposal, you seems like you're using Kali Linux as
daily driver. Why do you do that? AFAIK Kali is for penetration testing
purposes. I assumed that you know what you're doing if you use this
distro as your general-purpose distro. And why not using Ubuntu/Mint/Debian/
Arch? Please see [1].

[1]: https://www.kali.org/docs/introduction/should-i-use-kali-linux/

But anyway, because we're at Git ML, let me review your proposal:

> The following helper functions would be required to be implemented● A function to guess the directory name from the repository string.
> ● A function for normalizing path, that is, removing multiple //;
> leading ./; /./; /../; trailing / .
> ● A function to check for tracked directories properly as pointed out
> by Kaartic Sivaraam on the thread:
> https://lore.kernel.org/git/ce151a1408291bb0991ce89459e36ee1
> 3ccdfa52.camel@gmail.com/.
> ● A function to check if the path exists and is already a git repo,
> else clone it.
> ● A function to set the submodule config properly.
> After implementation of all these helper methods, the main
> `module_add()` function would be implemented using the helper
> functions listed above as well as those helper functions which are
> predefined.

I know these helper functions will be required for porting `git-submodule.sh`
and used by `module_add()`, but do you have any justification for these?

Anyway, I copy-pasted the quotation above from your proposal in PDF format.
but when I pasted it into text-only mail, it looks rather ugly.
So next time when sending GSoC proposal for Git, I would like to see
proposals in Markdown format (as well as PDF) for the reason above.

Thanks.

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GSoC] Draft Proposal (Convert submodule to builtin)
  2021-04-02  5:32 ` Bagas Sanjaya
@ 2021-04-02  6:31   ` Christian Couder
  2021-04-02  7:05     ` Chinmoy Chakraborty
  2021-04-03 14:14     ` Chinmoy Chakraborty
  2021-04-02  7:04   ` Chinmoy Chakraborty
  1 sibling, 2 replies; 10+ messages in thread
From: Christian Couder @ 2021-04-02  6:31 UTC (permalink / raw)
  To: Chinmoy Chakraborty; +Cc: git, Bagas Sanjaya

On Fri, Apr 2, 2021 at 7:44 AM Bagas Sanjaya <bagasdotme@gmail.com> wrote:
>
> On 01/04/21 18.40, Chinmoy Chakraborty wrote:
> > Greetings mentors,
> >
> > This is my draft proposal for the GSoC project, Convert submodule to builtin.
> >
> > It would be really helpful if you could review the draft and present your
> >
> > views on it :).

Thanks for letting us know about your proposal!

> > Link: https://github.com/chinmoy12c/GSoC/blob/master/GSoC_Git_proposal.pdf

[...]

> Anyway, I copy-pasted the quotation above from your proposal in PDF format.
> but when I pasted it into text-only mail, it looks rather ugly.
> So next time when sending GSoC proposal for Git, I would like to see
> proposals in Markdown format (as well as PDF) for the reason above.

I agree that proposals should be sent to the mailing list in a textual
format inline. It's ok if you also send a link to a better formatted
version of your proposal, so that people who prefer to read the better
formatted version can do so. It's ok too, of course, if the final
version uploaded into the GSoC web site is the better formatted
version.

Sending the textual version to the mailing list inline could make it
easier for people who want to comment and discuss it, like patches
are, which will help you make a better proposal. Later it might also
help people searching the mailing list.

Best,
Christian.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GSoC] Draft Proposal (Convert submodule to builtin)
  2021-04-02  5:32 ` Bagas Sanjaya
  2021-04-02  6:31   ` Christian Couder
@ 2021-04-02  7:04   ` Chinmoy Chakraborty
  1 sibling, 0 replies; 10+ messages in thread
From: Chinmoy Chakraborty @ 2021-04-02  7:04 UTC (permalink / raw)
  To: Bagas Sanjaya, git


On 4/2/21 11:02 AM, Bagas Sanjaya wrote:
> Looking from your proposal, you seems like you're using Kali Linux as
> daily driver. Why do you do that?

Well I've tried various Linux distros (including ubuntu, mint). I have been

using Kali linux for 4 years now, and at this point of time it's rather just

a personal preference. I am interested in pen testing too as a general

hobby, so I need the tool-set provided by Kali every once in a while :).


> I know these helper functions will be required for porting 
> `git-submodule.sh`
> and used by `module_add()`, but do you have any justification for these?

All the functions listed above was based on the previous work and 
discussions.

I read through all the versions of the patches and studied the issues faced

and then structured a proper skeleton of the design. You may follow the

previous work through :

https://lore.kernel.org/git/20201214231939.644175-1-periperidip@gmail.com/


> Anyway, I copy-pasted the quotation above from your proposal in PDF 
> format.
> but when I pasted it into text-only mail, it looks rather ugly.
> So next time when sending GSoC proposal for Git, I would like to see
> proposals in Markdown format (as well as PDF) for the reason above.
>
> Thanks.

Sure I'll do that :). Thanks for the review.


Regards,

Chinmoy Chakraborty.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GSoC] Draft Proposal (Convert submodule to builtin)
  2021-04-02  6:31   ` Christian Couder
@ 2021-04-02  7:05     ` Chinmoy Chakraborty
  2021-04-03 14:14     ` Chinmoy Chakraborty
  1 sibling, 0 replies; 10+ messages in thread
From: Chinmoy Chakraborty @ 2021-04-02  7:05 UTC (permalink / raw)
  To: Christian Couder, git


On 4/2/21 12:01 PM, Christian Couder wrote:
> I agree that proposals should be sent to the mailing list in a textual
> format inline. It's ok if you also send a link to a better formatted
> version of your proposal, so that people who prefer to read the better
> formatted version can do so. It's ok too, of course, if the final
> version uploaded into the GSoC web site is the better formatted
> version.
>
> Sending the textual version to the mailing list inline could make it
> easier for people who want to comment and discuss it, like patches
> are, which will help you make a better proposal. Later it might also
> help people searching the mailing list.
>
> Best,
> Christian.


Sure, I'll send a textual format of the proposal as well.

Also, I've submitted the draft proposal through the GSoC website

as well. Please have a look through it and suggest any required

changes :).


Thanks.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GSoC] Draft Proposal (Convert submodule to builtin)
  2021-04-02  6:31   ` Christian Couder
  2021-04-02  7:05     ` Chinmoy Chakraborty
@ 2021-04-03 14:14     ` Chinmoy Chakraborty
  2021-04-05 14:44       ` Christian Couder
  1 sibling, 1 reply; 10+ messages in thread
From: Chinmoy Chakraborty @ 2021-04-03 14:14 UTC (permalink / raw)
  To: Christian Couder, git

Here is a textual format of the draft proposal.


Convert submodule to builtin
March 2021

==================================================
##Personal Information##


Name - Chinmoy Chakraborty
E-mail - chinmoy12c@gmail.com
Github - https://github.com/chinmoy12c
Linkedin - https://www.linkedin.com/in/chinmoy12c/
Major - Information Technology
Time Zone - IST (UTC+05:30)
=================================================

##Work Environment##

I am fluent in C, Java, Python, and Shell script. I use Git as my VCS, 
Visual Studio Code
as my primary code editor, and Kali Linux as my primary OS.
=================================================

##Git Contributions##

[Microproject] Replace instances of `the_repository` with ‘r’. (Learning 
the ropes)
Pull request: https://github.com/gitgitgadget/git/pull/915
Mailing List: 
https://lore.kernel.org/git/pull.915.git.1616701733901.gitgitgadget@gmail.com/


[column, range-diff] downcase option description
Pull request: https://github.com/gitgitgadget/git/pull/920
Mailing List: 
https://lore.kernel.org/git/pull.920.git.1616913298624.gitgitgadget@gmail.com/


[Documentation] updated documentation for git commit --date
Pull request: https://github.com/gitgitgadget/git/pull/918
Mailing List: 
https://lore.kernel.org/git/pull.918.git.1616926790227.gitgitgadget@gmail.com/
=================================================

##Project Outline##

A few components of git, like `git-submodule.sh`
are in the form of shell scripts. This causes
problems in production code in multiple platforms
like windows. The goal of this project is to
convert the shell script version of `git-submodule.sh`
to portable c code. The end goal would be
to completely remove `git-submodule.sh` and rename
`builtin/submodule--helper.c` to `builtin/submodule.c`.
=================================================

##Why is the project required?##

"Issues with the portability of code"

The submodule shell script uses shell commands like
`echo`, `grep`, `test`, `printf` etc. When switching
to non-POSIX compliant systems, one will have
to re-implement these commands specifically for the
system. There are also POSIX-to-Windows path conversion
issues. To fix these issues, it was decided to convert
these scripts into portable C code.

"Large overhead in calling the command"

The commands implemented in shell scripts are not builtins, so
they call `fork()` and `exec()` multiple times, hence creating
additional shells. This adds to the overhead in using the
commands in terms of time and memory.

"No access to low-level API"

The shell commands don’t have access to low level commands
like `git hash-object`, `git cat-file` etc. As these commands
are internally required for submodule commands to work, the shell
script needs to spawn a separate shell to execute these commands.
=================================================

##How have I prepared?##

I have gone through all the previous works and read through their
code to make myself accustomed to the intricacies of the code.
I have also structured my workflow based on the observation of
the previous discussions on those patches, and taken into
consideration the issues faced previously.

=================================================

##Previous Work##

A large part of the `git submodule--helper.c` has already been
converted by Stefan Beller​, Prathamesh Chavan​ in his GSoC project
in 2017, and Shourya Shukla in his GSoC project in 2020. This is
the list of already ported commands.

set-branch
set-url
summary
status
init
deinit
update
foreach
sync
absorbgitdirs
=================================================

##Work to be done##

The only command that is left to be ported is `git submodule add`.
The previous work on this by Shourya Shukla in GSoC 2020, did
not reach a successful merge due to some issues in design and
was kicked out because it has been stale for so long. The first
and foremost aim of the project will be to finish porting this
command. Thereafter, the end goal would be to completely replace
the shell script (git-submodule.sh) with an efficient c code.

Before porting the `git submodule add` command the initial work
would be dedicated to the implementation of small helper functions
in the form of small patches, which would be directly used by the
`add` command. This workflow is based on the suggestion by
Junio C Hamano on the thread:
https://lore.kernel.org/git/xmqqd01sugrg.fsf@gitster.c.googlers.com/.

This workflow would help in the following ways:

- It would help in sending patches in a small digestible format.
- It would help the reviewers easily review those small units
   of patches in a single sitting.
- It would help keep small logical units of code in different clean commits.

An additional test tweak would also be required in 
`t7400-submodule-basic.sh`,
to prepend the keyword ‘fatal’ since the command dies out in case
of absence of commits.


The following helper functions would be required to be implemented -

- A function to guess the directory name from the repository string.
- A function for normalizing path, that is, removing multiple
   //; leading ./; /./; /../; trailing / .

- A function to check for tracked directories properly as pointed
   out by Kaartic Sivaraam on the thread:
https://lore.kernel.org/git/ce151a1408291bb0991ce89459e36ee13ccdfa52.camel@gmail.com/.

- A function to check if the path exists and is already a git
   repo, else clone it.

- A function to set the submodule config properly.

- After implementation of all these helper methods, the main
   `module_add()` function would be implemented using the helper
   functions listed above as well as those helper functions which
   are predefined.
=================================================

##Project Timeline##

"Present Day - May 17"
I’ll utilize this time in exploring the codebase more properly and
solving more issues, which would help me properly familiarize
myself with the codebase. I’ll also try to structure a more
solidified, detailed workflow and come up with a draft patch
based on the previous work and discussions.

"May 17 - June 7 (Community bonding period)"
- Get familiar with the community.
- Discuss proper workflow with mentors.
- Make changes in the timeline, if necessary.
- Discuss the structure of the series of patches.

"June 7 - June 25 (Initial coding phase)"
- Finish implementation of the helper functions.
- Work on a proper structure of the implementation of the
   `submodule add` command and implement additional helper
   functions if required.
- Update proper documentation of the helper functions.

"June 25 - June 5 (Binding the code)"
This time would be used to code the main `submodule add`
command using all the helper functions implemented in the
initial phase of coding. This includes binding all the code
together and then completing the command through incremental
reviews. Also, the necessary documentation would be updated
parallelly.

"July 5 - July 12 (Initiate porting of command)"
This time would be utilized in initiating the step-by-step
conversion of the `git submodule add` command.

"July 12 - July 16 (Phase 1 evaluation)"

"July 16 - July 26 (Semester exams)"
I will be taking my semester examinations during this
time. As such, I’ll try to be in touch with the mentors
and take out as much time as possible (around 20 hours a week).

"July 26 - August 10 (Porting the complete script)"
This period would be utilized in the complete conversion of
`git-submodule.sh` into c code and combine it with
`submodule--helper.c` to make a single `builtin/submodule.c`.
As I’ll be completely free from academics during this period,
I’ll try to compensate as much time as possible for the above
period of July 16 - July 26.

"August 10 - August 16 (Final review and evaluation)"
- Final review by the mentors.
- Apply necessary changes and touch-ups.
- Updating documentation, if any left.

"August 16 - August 23 (Submission of final report)"

Additionally: There are places in the original shell script
and c code tagged as `NEEDSWORK`. My aim would be to resolve
these issues within the GSoC period if time permits.
=================================================

##Post GSoC##

After the GSoC period, I plan to continue my contributions
for git and look for other issues to work on. I’d look into
the conversion of other commands which are pending conversion,
as well as work on the `NEEDSWORK` part of the code (If I’m
unable to finish it within the GSoC period itself). I plan on
mentoring new contributors to git and help the contributors
by doing code reviews and solving their doubts and helping
them out.

Regards,
Chinmoy Chakraborty.

=================================================



Thanks,

Chinmoy


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GSoC] Draft Proposal (Convert submodule to builtin)
  2021-04-03 14:14     ` Chinmoy Chakraborty
@ 2021-04-05 14:44       ` Christian Couder
  2021-04-06 19:53         ` Chinmoy Chakraborty
  2021-04-08  9:11         ` Chinmoy Chakraborty
  0 siblings, 2 replies; 10+ messages in thread
From: Christian Couder @ 2021-04-05 14:44 UTC (permalink / raw)
  To: Chinmoy Chakraborty; +Cc: git

On Sat, Apr 3, 2021 at 4:13 PM Chinmoy Chakraborty <chinmoy12c@gmail.com> wrote:
>
> Here is a textual format of the draft proposal.
>
>
> Convert submodule to builtin
> March 2021
>
> ==================================================
> ##Personal Information##
>
>
> Name - Chinmoy Chakraborty
> E-mail - chinmoy12c@gmail.com
> Github - https://github.com/chinmoy12c
> Linkedin - https://www.linkedin.com/in/chinmoy12c/
> Major - Information Technology
> Time Zone - IST (UTC+05:30)
> =================================================
>
> ##Work Environment##
>
> I am fluent in C, Java, Python, and Shell script. I use Git as my VCS,
> Visual Studio Code
> as my primary code editor, and Kali Linux as my primary OS.
> =================================================
>
> ##Git Contributions##
>
> [Microproject] Replace instances of `the_repository` with ‘r’. (Learning
> the ropes)
> Pull request: https://github.com/gitgitgadget/git/pull/915
> Mailing List:
> https://lore.kernel.org/git/pull.915.git.1616701733901.gitgitgadget@gmail.com/
>
>
> [column, range-diff] downcase option description
> Pull request: https://github.com/gitgitgadget/git/pull/920
> Mailing List:
> https://lore.kernel.org/git/pull.920.git.1616913298624.gitgitgadget@gmail.com/
>
>
> [Documentation] updated documentation for git commit --date
> Pull request: https://github.com/gitgitgadget/git/pull/918
> Mailing List:
> https://lore.kernel.org/git/pull.918.git.1616926790227.gitgitgadget@gmail.com/
> =================================================

Ok.

> ##Project Outline##
>
> A few components of git, like `git-submodule.sh`
> are in the form of shell scripts. This causes
> problems in production code in multiple platforms
> like windows. The goal of this project is to
> convert the shell script version of `git-submodule.sh`
> to portable c code. The end goal would be

s/c/C/

> to completely remove `git-submodule.sh` and rename
> `builtin/submodule--helper.c` to `builtin/submodule.c`.

You could add something like "so that the `git submodule` is fully
implemented using C".

> =================================================
>
> ##Why is the project required?##
>
> "Issues with the portability of code"
>
> The submodule shell script uses shell commands like
> `echo`, `grep`, `test`, `printf` etc. When switching
> to non-POSIX compliant systems, one will have
> to re-implement these commands specifically for the
> system. There are also POSIX-to-Windows path conversion
> issues. To fix these issues, it was decided to convert
> these scripts into portable C code.
>
> "Large overhead in calling the command"
>
> The commands implemented in shell scripts are not builtins, so
> they call `fork()` and `exec()` multiple times, hence creating
> additional shells. This adds to the overhead in using the
> commands in terms of time and memory.
>
> "No access to low-level API"
>
> The shell commands don’t have access to low level commands
> like `git hash-object`, `git cat-file` etc. As these commands
> are internally required for submodule commands to work, the shell
> script needs to spawn a separate shell to execute these commands.
> =================================================

Ok.

> ##How have I prepared?##
>
> I have gone through all the previous works and read through their
> code to make myself accustomed to the intricacies of the code.
> I have also structured my workflow based on the observation of
> the previous discussions on those patches, and taken into
> consideration the issues faced previously.
>
> =================================================
>
> ##Previous Work##
>
> A large part of the `git submodule--helper.c` has already been
> converted by Stefan Beller, Prathamesh Chavan in his GSoC project
> in 2017, and Shourya Shukla in his GSoC project in 2020. This is
> the list of already ported commands.
>
> set-branch
> set-url
> summary
> status
> init
> deinit
> update
> foreach
> sync
> absorbgitdirs
> =================================================
>
> ##Work to be done##
>
> The only command that is left to be ported is `git submodule add`.
> The previous work on this by Shourya Shukla in GSoC 2020, did
> not reach a successful merge due to some issues in design and
> was kicked out because it has been stale for so long.

Maybe you could explain a bit what "kicked out" means, or replace it
with something more explicit, for people who don't know well how Git
development works.

> The first
> and foremost aim of the project will be to finish porting this
> command.

You mean the "add" sub-command, or the full "submodule" command?

> Thereafter, the end goal would be to completely replace
> the shell script (git-submodule.sh) with an efficient c code.

s/c/C/

> Before porting the `git submodule add` command the initial work
> would be dedicated to the implementation of small helper functions
> in the form of small patches, which would be directly used by the
> `add` command. This workflow is based on the suggestion by
> Junio C Hamano on the thread:
> https://lore.kernel.org/git/xmqqd01sugrg.fsf@gitster.c.googlers.com/.
>
> This workflow would help in the following ways:
>
> - It would help in sending patches in a small digestible format.
> - It would help the reviewers easily review those small units
>    of patches in a single sitting.
> - It would help keep small logical units of code in different clean commits.

Yeah, nice!

How does this compare with Shourya's work? Will this avoid the design
issues in Shourya's work?

> An additional test tweak would also be required in
> `t7400-submodule-basic.sh`,
> to prepend the keyword ‘fatal’ since the command dies out in case
> of absence of commits.
>
>
> The following helper functions would be required to be implemented -
>
> - A function to guess the directory name from the repository string.
> - A function for normalizing path, that is, removing multiple
>    //; leading ./; /./; /../; trailing / .
>
> - A function to check for tracked directories properly as pointed
>    out by Kaartic Sivaraam on the thread:
> https://lore.kernel.org/git/ce151a1408291bb0991ce89459e36ee13ccdfa52.camel@gmail.com/.
>
> - A function to check if the path exists and is already a git
>    repo, else clone it.
>
> - A function to set the submodule config properly.

Nice! Maybe you could give an example and tell:
- how you would name one of the above function,
- what would be its arguments,
- perhaps how you would test it
...

> - After implementation of all these helper methods, the main
>    `module_add()` function would be implemented using the helper
>    functions listed above as well as those helper functions which
>    are predefined.
> =================================================

Ok, I will review the other parts later.

Thanks!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GSoC] Draft Proposal (Convert submodule to builtin)
  2021-04-05 14:44       ` Christian Couder
@ 2021-04-06 19:53         ` Chinmoy Chakraborty
  2021-04-08  9:11         ` Chinmoy Chakraborty
  1 sibling, 0 replies; 10+ messages in thread
From: Chinmoy Chakraborty @ 2021-04-06 19:53 UTC (permalink / raw)
  To: Christian Couder, git


On 4/5/21 8:14 PM, Christian Couder wrote:
> You could add something like "so that the `git submodule` is fully
> implemented using C".
>
Sure I'll add this.


>> ##Work to be done##
>>
>> The only command that is left to be ported is `git submodule add`.
>> The previous work on this by Shourya Shukla in GSoC 2020, did
>> not reach a successful merge due to some issues in design and
>> was kicked out because it has been stale for so long.
> Maybe you could explain a bit what "kicked out" means, or replace it
> with something more explicit, for people who don't know well how Git
> development works.

Okay, I'll explain it in simple terms. Actually in the docs version

I added a link to the `kicked out` phrase directing to :

https://github.com/git/git/blob/1861aa482a38ae84f597cede48167ab43e7e50a3/whats-cooking.txt#L1158-L1169

I'll add this in the textual format too.


>> The first
>> and foremost aim of the project will be to finish porting this
>> command.
> You mean the "add" sub-command, or the full "submodule" command?

The "add" command.


> Before porting the `git submodule add` command the initial work
>> would be dedicated to the implementation of small helper functions
>> in the form of small patches, which would be directly used by the
>> `add` command. This workflow is based on the suggestion by
>> Junio C Hamano on the thread:
>> https://lore.kernel.org/git/xmqqd01sugrg.fsf@gitster.c.googlers.com/.
>>
>> This workflow would help in the following ways:
>>
>> - It would help in sending patches in a small digestible format.
>> - It would help the reviewers easily review those small units
>>     of patches in a single sitting.
>> - It would help keep small logical units of code in different clean commits.
> Yeah, nice!
>
> How does this compare with Shourya's work? Will this avoid the design
> issues in Shourya's work?

It would help in faster and more thorough reviews by the reviewers.

Since last time, the patches sent were quite large in size, it became quite

tedious and difficult to review the patch at once. Due to this, the review

and changes cycle slacked off and hence the patch never made it to master.

Sending the patches in small chunks and separate logic units would speed up

the review cycle.


> Nice! Maybe you could give an example and tell:
> - how you would name one of the above function,
> - what would be its arguments,
> - perhaps how you would test it
> ...

Sure I'll add example prototype of the functions.


>> - After implementation of all these helper methods, the main
>>     `module_add()` function would be implemented using the helper
>>     functions listed above as well as those helper functions which
>>     are predefined.
>> =================================================
> Ok, I will review the other parts later.
>
> Thanks!
Thanks for the review, really helpful :).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GSoC] Draft Proposal (Convert submodule to builtin)
  2021-04-05 14:44       ` Christian Couder
  2021-04-06 19:53         ` Chinmoy Chakraborty
@ 2021-04-08  9:11         ` Chinmoy Chakraborty
  2021-04-10 12:03           ` Christian Couder
  1 sibling, 1 reply; 10+ messages in thread
From: Chinmoy Chakraborty @ 2021-04-08  9:11 UTC (permalink / raw)
  To: Christian Couder, git

This is the V2 of the draft proposal after applying the

changes suggested by Christian Couder and Kaartik Sivaraam.

Thank you for you reviews :).

==================================================

Convert submodule to builtin
March 2021

==================================================
##Personal Information##


Name - Chinmoy Chakraborty
E-mail - chinmoy12c@gmail.com
Github - https://github.com/chinmoy12c
Linkedin - https://www.linkedin.com/in/chinmoy12c/
Major - Information Technology
Time Zone - IST (UTC+05:30)
=================================================

##Work Environment##

I am fluent in C, Java, Python, and Shell script. I use Git
as my VCS, Visual Studio Code as my primary code editor, and
Kali Linux as my primary OS.
=================================================

##Git Contributions##

[Microproject] Replace instances of `the_repository` with ‘r’. (Learning 
the ropes)
Pull request: https://github.com/gitgitgadget/git/pull/915
Mailing List: 
https://lore.kernel.org/git/pull.915.git.1616701733901.gitgitgadget@gmail.com/


[column, range-diff] downcase option description
Pull request: https://github.com/gitgitgadget/git/pull/920
Mailing List: 
https://lore.kernel.org/git/pull.920.git.1616913298624.gitgitgadget@gmail.com/


[Documentation] updated documentation for git commit --date
Pull request: https://github.com/gitgitgadget/git/pull/918
Mailing List: 
https://lore.kernel.org/git/pull.918.git.1616926790227.gitgitgadget@gmail.com/
=================================================

##Project Outline##

A few components of git, like `git-submodule.sh`
are in the form of shell scripts. This causes
problems in production code in multiple platforms
like windows. The goal of this project is to
convert the shell script version of `git-submodule.sh`
to portable c code. The end goal would be
to completely remove `git-submodule.sh` and rename
`builtin/submodule--helper.c` to `builtin/submodule.c`
so that the `git submodule` is fully implemented using C.
=================================================

##Why is the project required?##

"Issues with the portability of code"

The submodule shell script uses shell commands like
`echo`, `grep`, `test`, `printf` etc. When switching
to non-POSIX compliant systems, one will have
to re-implement these commands specifically for the
system. There are also POSIX-to-Windows path conversion
issues. To fix these issues, it was decided to convert
these scripts into portable C code.

"Large overhead in calling the command"

The commands implemented in shell scripts are not builtins, so
they call `fork()` and `exec()` multiple times, hence creating
additional shells. This adds to the overhead in using the
commands in terms of time and memory.

"No access to low-level API"

The shell commands don’t have access to low level commands
like `git hash-object`, `git cat-file` etc. As these commands
are internally required for submodule commands to work, the shell
script needs to spawn a separate shell to execute these commands.
=================================================

##How have I prepared?##

I have gone through all the previous works and read through their
code to make myself accustomed to the intricacies of the code.
I have also structured my workflow based on the observation of
the previous discussions on those patches, and taken into
consideration the issues faced previously.

=================================================

##Previous Work##

A large part of the `git submodule--helper.c` has already been
converted by Stefan Beller​, Prathamesh Chavan​ in his GSoC project
in 2017, and Shourya Shukla in his GSoC project in 2020. This is
the list of already ported commands.

set-branch
set-url
summary
status
init
deinit
update
foreach
sync
absorbgitdirs
=================================================

##Work to be done##

The only command that is left to be ported is `git submodule add`.
The previous work on this by Shourya Shukla in GSoC 2020, did
not reach a successful merge due to some issues in design and
the patch was dropped as it had been stale for long.
See: 
https://github.com/git/git/blob/1861aa482a38ae84f597cede48167ab43e7e50a3/whats-cooking.txt#L1158-L1169
The first and foremost aim of the project will be to finish
porting the `add` command. Thereafter, the end goal would be to
completely replace the shell script (git-submodule.sh) with
an efficient c code.

Before porting the `git submodule add` command the initial work
would be dedicated to the implementation of small helper functions
in the form of small patches, which would be directly used by the
`add` command. This workflow is based on the suggestion by
Junio C Hamano on the thread:
https://lore.kernel.org/git/xmqqd01sugrg.fsf@gitster.c.googlers.com/.

This workflow would help in the following ways:

- It would help in sending patches in a small digestible format.
- It would help the reviewers easily review those small units
   of patches in a single sitting.
- It would help keep small logical units of code in different clean commits.

An additional test tweak would also be required in 
`t7400-submodule-basic.sh`,
to prepend the keyword ‘fatal’ since the command dies out in case
of absence of commits as pointed out by Kaartic Sivaraam on the thread:
https://lore.kernel.org/git/ce151a1408291bb0991ce89459e36ee13ccdfa52.camel@gmail.com/.


The following helper functions would be required to be implemented -

- A function to guess the path name from the repository string.
   Example prototype: static char *guess_dir_name(const char *repo)
   Returns the path name.

- A function to assure repo is absolute or relative to parent.
   Example prototype: static char *get_real_repo(const char *repo)
   Returns the correct repo name.

- A function for normalizing path, that is, removing multiple
   //; leading ./; /./; /../; trailing / .
   Example prototype: static char *normalize_path(const char *path)

- A function to check if the path exists and is already a git
   repo, else clone it.
   Example prototype: static int add_submodule(struct add_data *info)
   `add_data` is a struct which stores the config of the submodule.

- A function to set the submodule config properly.
   Example prototype: static void config_added_submodule(struct add_data 
*info)

- After implementation of all these helper methods, the main
   `module_add()` function would be implemented using the helper
   functions listed above as well as those helper functions which
   are predefined.
=================================================

##Project Timeline##

"Present Day - May 17"
I’ll utilize this time in exploring the codebase more properly and
solving more issues, which would help me properly familiarize
myself with the codebase. I’ll also try to structure a more
solidified, detailed workflow and come up with a draft patch
based on the previous work and discussions.

"May 17 - June 7 (Community bonding period)"
- Get familiar with the community.
- Discuss proper workflow with mentors.
- Make changes in the timeline, if necessary.
- Discuss the structure of the series of patches.

"June 7 - June 25 (Initial coding phase)"
- Finish implementation of the helper functions.
- Work on a proper structure of the implementation of the
   `submodule add` command and implement additional helper
   functions if required.
- Update proper documentation of the helper functions.

"June 25 - July 5 (Binding the code)"
This time would be used to code the main `submodule add`
command using all the helper functions implemented in the
initial phase of coding. This includes binding all the code
together and then completing the command through incremental
reviews. Also, the necessary documentation would be updated
parallelly.

"July 5 - July 12 (Initiate porting of command)"
- Discuss how to go about porting the entire submodule script.
- Initiate porting of the `git-submodule.sh` script.

"July 12 - July 16 (Phase 1 evaluation)"

"July 16 - July 26 (Semester exams)"
I will be taking my semester examinations during this
time. As such, I’ll try to be in touch with the mentors
and take out as much time as possible (around 20 hours a week).

"July 26 - August 10 (Porting the complete script)"
This period would be utilized in the complete conversion of
`git-submodule.sh` into c code and combine it with
`submodule--helper.c` to make a single `builtin/submodule.c`.
As I’ll be completely free from academics during this period,
I’ll try to compensate as much time as possible for the above
period of July 16 - July 26.

"August 10 - August 16 (Final review and evaluation)"
- Final review by the mentors.
- Apply necessary changes and touch-ups.
- Updating documentation, if any left.

"August 16 - August 23 (Submission of final report)"

Additionally: There are places in the original shell script
and c code tagged as `NEEDSWORK`. My aim would be to resolve
these issues within the GSoC period if time permits.
=================================================

##Post GSoC##

After the GSoC period, I plan to continue my contributions
for git and look for other issues to work on. I’d look into
the conversion of other commands which are pending conversion,
as well as work on the `NEEDSWORK` part of the code (If I’m
unable to finish it within the GSoC period itself). I plan on
mentoring new contributors to git and help the contributors
by doing code reviews and solving their doubts and helping
them out.

Regards,
Chinmoy Chakraborty.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [GSoC] Draft Proposal (Convert submodule to builtin)
  2021-04-08  9:11         ` Chinmoy Chakraborty
@ 2021-04-10 12:03           ` Christian Couder
  0 siblings, 0 replies; 10+ messages in thread
From: Christian Couder @ 2021-04-10 12:03 UTC (permalink / raw)
  To: Chinmoy Chakraborty; +Cc: git, Kaartic Sivaraam

On Thu, Apr 8, 2021 at 11:10 AM Chinmoy Chakraborty
<chinmoy12c@gmail.com> wrote:
>
> This is the V2 of the draft proposal after applying the
>
> changes suggested by Christian Couder and Kaartik Sivaraam.

Please add all the reviewers in Cc. I added Kaartic.

> ##Git Contributions##
>
> [Microproject] Replace instances of `the_repository` with ‘r’. (Learning
> the ropes)
> Pull request: https://github.com/gitgitgadget/git/pull/915
> Mailing List:
> https://lore.kernel.org/git/pull.915.git.1616701733901.gitgitgadget@gmail.com/
>
>
> [column, range-diff] downcase option description
> Pull request: https://github.com/gitgitgadget/git/pull/920
> Mailing List:
> https://lore.kernel.org/git/pull.920.git.1616913298624.gitgitgadget@gmail.com/
>
>
> [Documentation] updated documentation for git commit --date
> Pull request: https://github.com/gitgitgadget/git/pull/918
> Mailing List:
> https://lore.kernel.org/git/pull.918.git.1616926790227.gitgitgadget@gmail.com/

Adding a status to let us know if each of your contributions have been
merged to seen, next or master, or even already released (in which
version?) would be nice. You could also add a GitHub or GitLab link to
the merge commit that merged your contribution.

> ##Project Outline##
>
> A few components of git, like `git-submodule.sh`
> are in the form of shell scripts. This causes
> problems in production code in multiple platforms
> like windows. The goal of this project is to
> convert the shell script version of `git-submodule.sh`
> to portable c code. The end goal would be

s/c/C/

> to completely remove `git-submodule.sh` and rename
> `builtin/submodule--helper.c` to `builtin/submodule.c`
> so that the `git submodule` is fully implemented using C.
> =================================================
>
> ##Why is the project required?##
>
> "Issues with the portability of code"
>
> The submodule shell script uses shell commands like
> `echo`, `grep`, `test`, `printf` etc. When switching
> to non-POSIX compliant systems, one will have
> to re-implement these commands specifically for the
> system. There are also POSIX-to-Windows path conversion
> issues. To fix these issues, it was decided to convert
> these scripts into portable C code.
>
> "Large overhead in calling the command"
>
> The commands implemented in shell scripts are not builtins, so
> they call `fork()` and `exec()` multiple times, hence creating
> additional shells.

s/shells/processes/

> This adds to the overhead in using the
> commands in terms of time and memory.
>
> "No access to low-level API"
>
> The shell commands don’t have access to low level commands
> like `git hash-object`, `git cat-file` etc.

I am not sure it's correct to say that they don't have access to such
low level commands. I think it would be better to say that they don't
have direct access to the internal functions behind such low level
commands.

> As these commands
> are internally required for submodule commands to work, the shell
> script needs to spawn a separate shell to execute these commands.

s/separate shell/separate process/

> =================================================
>
> ##How have I prepared?##
>
> I have gone through all the previous works and read through their
> code to make myself accustomed to the intricacies of the code.
> I have also structured my workflow based on the observation of
> the previous discussions on those patches, and taken into
> consideration the issues faced previously.
>
> =================================================
>
> ##Previous Work##
>
> A large part of the `git submodule--helper.c` has already been

I would say that it's 'git-submodule.sh' that has been converted. 'git
submodule--helper.c' is the result of that conversion.

> converted by Stefan Beller, Prathamesh Chavan in his GSoC project
> in 2017, and Shourya Shukla in his GSoC project in 2020. This is
> the list of already ported commands.
>
> set-branch
> set-url
> summary
> status
> init
> deinit
> update
> foreach
> sync
> absorbgitdirs
> =================================================
>
> ##Work to be done##
>
> The only command that is left to be ported is `git submodule add`.
> The previous work on this by Shourya Shukla in GSoC 2020, did
> not reach a successful merge due to some issues in design and
> the patch was dropped as it had been stale for long.
> See:
> https://github.com/git/git/blob/1861aa482a38ae84f597cede48167ab43e7e50a3/whats-cooking.txt#L1158-L1169
> The first and foremost aim of the project will be to finish
> porting the `add` command. Thereafter, the end goal would be to
> completely replace the shell script (git-submodule.sh) with
> an efficient c code.

s/c/C/

> Before porting the `git submodule add` command the initial work
> would be dedicated to the implementation of small helper functions
> in the form of small patches, which would be directly used by the
> `add` command. This workflow is based on the suggestion by
> Junio C Hamano on the thread:
> https://lore.kernel.org/git/xmqqd01sugrg.fsf@gitster.c.googlers.com/.
>
> This workflow would help in the following ways:
>
> - It would help in sending patches in a small digestible format.
> - It would help the reviewers easily review those small units
>    of patches in a single sitting.
> - It would help keep small logical units of code in different clean commits.
>
> An additional test tweak would also be required in
> `t7400-submodule-basic.sh`,
> to prepend the keyword ‘fatal’ since the command dies out in case
> of absence of commits as pointed out by Kaartic Sivaraam on the thread:
> https://lore.kernel.org/git/ce151a1408291bb0991ce89459e36ee13ccdfa52.camel@gmail.com/.

Ok.

> The following helper functions would be required to be implemented -
>
> - A function to guess the path name from the repository string.
>    Example prototype: static char *guess_dir_name(const char *repo)
>    Returns the path name.
>
> - A function to assure repo is absolute or relative to parent.
>    Example prototype: static char *get_real_repo(const char *repo)
>    Returns the correct repo name.
>
> - A function for normalizing path, that is, removing multiple
>    //; leading ./; /./; /../; trailing / .
>    Example prototype: static char *normalize_path(const char *path)
>
> - A function to check if the path exists and is already a git
>    repo, else clone it.
>    Example prototype: static int add_submodule(struct add_data *info)
>    `add_data` is a struct which stores the config of the submodule.
>
> - A function to set the submodule config properly.
>    Example prototype: static void config_added_submodule(struct add_data
> *info)

How would you test these helper functions? Would they be used by any
code in 'builtin/submodule--helper.c'?

> - After implementation of all these helper methods, the main
>    `module_add()` function would be implemented using the helper
>    functions listed above as well as those helper functions which
>    are predefined.
> =================================================
>
> ##Project Timeline##
>
> "Present Day - May 17"
> I’ll utilize this time in exploring the codebase more properly and
> solving more issues, which would help me properly familiarize
> myself with the codebase. I’ll also try to structure a more
> solidified, detailed workflow and come up with a draft patch

A draft patch of what? I thought that you wanted to create many
patches, each one implementing a small step, maybe a single function
that you described above.

> based on the previous work and discussions.
>
> "May 17 - June 7 (Community bonding period)"
> - Get familiar with the community.
> - Discuss proper workflow with mentors.
> - Make changes in the timeline, if necessary.
> - Discuss the structure of the series of patches.
>
> "June 7 - June 25 (Initial coding phase)"
> - Finish implementation of the helper functions.
> - Work on a proper structure of the implementation of the
>    `submodule add` command

I am not sure what this means.

> and implement additional helper
>    functions if required.
> - Update proper documentation of the helper functions.
>
> "June 25 - July 5 (Binding the code)"
> This time would be used to code the main `submodule add`
> command using all the helper functions implemented in the
> initial phase of coding.

So how would the helper functions be useful and tested before this step?

> This includes binding all the code
> together and then completing the command through incremental
> reviews. Also, the necessary documentation would be updated
> parallelly.
>
> "July 5 - July 12 (Initiate porting of command)"
> - Discuss how to go about porting the entire submodule script.
> - Initiate porting of the `git-submodule.sh` script.
>
> "July 12 - July 16 (Phase 1 evaluation)"
>
> "July 16 - July 26 (Semester exams)"
> I will be taking my semester examinations during this
> time. As such, I’ll try to be in touch with the mentors
> and take out as much time as possible (around 20 hours a week).
>
> "July 26 - August 10 (Porting the complete script)"
> This period would be utilized in the complete conversion of
> `git-submodule.sh` into c code and combine it with

s/c code/C code/

> `submodule--helper.c` to make a single `builtin/submodule.c`.
> As I’ll be completely free from academics during this period,
> I’ll try to compensate as much time as possible for the above
> period of July 16 - July 26.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-04-10 12:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-01 11:40 [GSoC] Draft Proposal (Convert submodule to builtin) Chinmoy Chakraborty
2021-04-02  5:32 ` Bagas Sanjaya
2021-04-02  6:31   ` Christian Couder
2021-04-02  7:05     ` Chinmoy Chakraborty
2021-04-03 14:14     ` Chinmoy Chakraborty
2021-04-05 14:44       ` Christian Couder
2021-04-06 19:53         ` Chinmoy Chakraborty
2021-04-08  9:11         ` Chinmoy Chakraborty
2021-04-10 12:03           ` Christian Couder
2021-04-02  7:04   ` Chinmoy Chakraborty

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).