git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [GSoC] Proposal: turn git-add--interactive.perl into a builtin
@ 2017-03-26  3:15 Daniel Ferreira (theiostream)
  2017-03-28 18:05 ` Stefan Beller
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Ferreira (theiostream) @ 2017-03-26  3:15 UTC (permalink / raw)
  To: Git Mailing List; +Cc: johannes.schindelin, Stefan Beller, christian.couder

Hi there. First of all, I'd like to thank all of the support up to now
with my microproject :). Here's a first draft of my proposal for
Google Summer of Code '17, based on the "Convert scripts to builtins"
idea. Please let me know what you think.

---

SYNOPSIS
There are many advantages to converting parts of git that are still
scripts to C builtins, among which execution speed, improved
compatibility and code deduplication. This proposal aims to apply this
to git-add--interactive, one of the most useful features of Git.

FEASIBILITY
Many git scripts have attracted attention for being turned into
builtins. There is ongoing work on git-stash
(https://public-inbox.org/git/20170321053135.thk77soxc4irxbqj@sigill.intra.peff.net/),
and porting interactive rebase is one of the ideas for this edition of
GSoC. Not as much attention, however, has been directed to
git-add--interactive.

There was only one discussion regarding the feasibility of its porting
(https://public-inbox.org/git/CAP8UFD2PcBsU6=FK4OHVrB7E98ycohS_0pYcbCBar=of1HLx+Q@mail.gmail.com/).
It resulted in a consensus that doing it would be a task too large –
although interesting – for GSoC 2015 based on the amount of its lines
of code. It is, however, only a few lines larger than
git-rebase--interactive, which has been considered an appropriate
idea. As such, it looks like a possible project for three months of
full-time work.

Aside from the benefits cited above, turning git-add--interactive into
a builtin can reduce Git's dependency on Perl to the point where no
"common" command would continue to rely on it.

PROJECTED TIMELINE
- Prior to May 4
-- Refine my basic knowledge of Perl
-- Craft one or two small patches to some of Git's Perl components
(preferentially to git-add--interactive itself) to improve my
understanding of the language and of how Git's Perl scripts actually
work

- May 4 - May 30
-- Clarify implementation details with my mentor, and work on a more
detailed roadmap for the project
-- Investigate roughly how to replace command invocations from the
script with actual builtin functions; which Git APIs in Perl already
have functional equivalents in C; which parts will require a full
rewrite.

- May 30 - June 30 (start of coding period)
-- Define the architecture of the builtin within git (which
functions/interfaces will it have? where will its code reside?).
-- Implement a small subset of the builtin (to be defined with my
mentor) and glue it into the existing Perl script. Present this as a
first patch to get feedback early regarding the implementation and
avoid piling up mistakes early.
-- Do necessary changes based on this initial review.
-- Have roughly 1/3 of the script's functionality ported to C.

- June 30 - July 28
-- Port the remainder of the script to a builtin.
-- Have a weekly roadmap, sending a part of the patch every 15 days to
the mailing list for review and to avoid massive commits by the end of
GSoC.
-- Apply suggestions from community reviews when possible; if not,
save them for doing toward the end of GSoC (see below).
(Note: due to a previous commitment, during a five-day period of July
I will only be able to work part-time on GSoC. The actual week will be
known over the next weeks.)

- July 28 - August 29
-- By the start of this period, send a patch with the builtin fully
implemented to the mailing list.
-- Fix bugs, test extensively, possibly extend test coverage for
git-add--interactive.
-- Respond to the (predictably big) community feedback regarding the change.

I currently work full-time in a payments company (see below), but in
case of being accepted I am willing to quit my job some months early
to dedicate myself fully to GSoC starting June.

BIOGRAPHICAL INFORMATION
My name is Daniel Ferreira and I'm a student from São Paulo, Brazil. I
was accepted by Stanford University last year and I will start college
this fall. I started coding C about six years ago writing up system
modifications ("tweaks") for jailbroken iPhones. Since then, I have
written/contributed to a couple of open-source projects like an IRC
bot and other assorted things – all of them tracked on Git
(https://github.com/theiostream). I have also developed a
(closed-source) library in C for interacting with payment terminals in
the company I have worked for over the last two years (Pagar.me).
There, we use Git extensively for managing projects with around 20
people working concurrently.

MICROPROJECT
I have sent a series of patches to complete the microproject of
converting recursive calls to readdir() into calls to dir_iterator.
The most recent version can be found in
https://public-inbox.org/git/1490465551-71056-2-git-send-email-bnmvco@gmail.com/T/#u.

Thanks,
-- Daniel.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [GSoC] Proposal: turn git-add--interactive.perl into a builtin
  2017-03-26  3:15 [GSoC] Proposal: turn git-add--interactive.perl into a builtin Daniel Ferreira (theiostream)
@ 2017-03-28 18:05 ` Stefan Beller
  2017-03-30  0:01   ` Johannes Schindelin
  0 siblings, 1 reply; 7+ messages in thread
From: Stefan Beller @ 2017-03-28 18:05 UTC (permalink / raw)
  To: Daniel Ferreira (theiostream)
  Cc: Git Mailing List, Johannes Schindelin, Christian Couder

On Sat, Mar 25, 2017 at 8:15 PM, Daniel Ferreira (theiostream)
<bnmvco@gmail.com> wrote:

> SYNOPSIS
> There are many advantages to converting parts of git that are still
> scripts to C builtins, among which execution speed, improved
> compatibility and code deduplication.

agreed.

> git-add--interactive, one of the most useful features of Git.

knee jerk reaction: I never used it, so it cannot be that important ;)
(I use git-gui, which is essentially the same workflow. There are tons
of ways to accomplish a given goal using Git, so I guess we don't
want to get in an argument here).

>
> FEASIBILITY
>
> There was only one discussion regarding the feasibility of its porting
> (https://public-inbox.org/git/CAP8UFD2PcBsU6=FK4OHVrB7E98ycohS_0pYcbCBar=of1HLx+Q@mail.gmail.com/).
> It resulted in a consensus that doing it would be a task too large –
> although interesting – for GSoC 2015 based on the amount of its lines
> of code. It is, however, only a few lines larger than
> git-rebase--interactive, which has been considered an appropriate
> idea. As such, it looks like a possible project for three months of
> full-time work.

ok, it sounds a challenging project. (currently counting 1750 lines of
code). Scrolling over the source code, there are quite a couple of
functions, where the direct equivalent in C springs to mind.

run_cmd_pipe -> see run-command.h
unquote_path -> unquote_c_style ?
refresh -> update_index_if_able()
list_modified -> iterate over "const struct cache_entry *ce = active_cache[i];"


> PROJECTED TIMELINE
> - Prior to May 4
> -- Refine my basic knowledge of Perl
> -- Craft one or two small patches to some of Git's Perl components
> (preferentially to git-add--interactive itself) to improve my
> understanding of the language and of how Git's Perl scripts actually
> work



>
> - May 4 - May 30
> -- Clarify implementation details with my mentor, and work on a more
> detailed roadmap for the project
> -- Investigate roughly how to replace command invocations from the
> script with actual builtin functions; which Git APIs in Perl already
> have functional equivalents in C; which parts will require a full
> rewrite.

There are different approaches for replacing functionality in another
language. Examples:
* Implement the functionality in C and then have a "flag-day" commit
  783d7e865e (builtin-am: remove redirection to git-am.sh, 2015-08-04)
  This only works when the whole functionality was replaced in prior commits
* Implement partial functionality in C and call it via a helper function.
  3604242f08 (submodule: port init from shell to C, 2016-04-15)
  This works well for only partial conversions (the larger the thing to
  convert the more appealing this is, as it gets code shipped early.)
  When choosing this strategy, this part of the Project would be to
  identify parts that could be ported on its own without much
  additional glue-code.

> - May 30 - June 30 (start of coding period)
> -- Define the architecture of the builtin within git (which
> functions/interfaces will it have? where will its code reside?).
> -- Implement a small subset of the builtin (to be defined with my
> mentor) and glue it into the existing Perl script. Present this as a
> first patch to get feedback early regarding the implementation and
> avoid piling up mistakes early.
> -- Do necessary changes based on this initial review.
> -- Have roughly 1/3 of the script's functionality ported to C.
>
> - June 30 - July 28
> -- Port the remainder of the script to a builtin.
> -- Have a weekly roadmap, sending a part of the patch every 15 days to
> the mailing list for review and to avoid massive commits by the end of
> GSoC.

yeah; send early, send often. ;)

> -- Apply suggestions from community reviews when possible; if not,
> save them for doing toward the end of GSoC (see below).

Please do not underestimate the discussion by community, finding
consensus on list consumes a bit of time in some cases.

> (Note: due to a previous commitment, during a five-day period of July
> I will only be able to work part-time on GSoC. The actual week will be
> known over the next weeks.)

Maybe you want to shift the schedule up to here by one week then?
(e.g. the first period would be April 27  - May 23)

>
> - July 28 - August 29
> -- By the start of this period, send a patch with the builtin fully
> implemented to the mailing list.

/a patch/a patch series consisting of many patches/
Experience shows that smaller patches are easier to review as
it is more focused. Consider e.g. e86ab2c1cd (wt-status: convert
to struct object_id, 2017-02-21) and the parents leading up to this
commit. They work on the same big topic, but focus on very
regional areas to ease review.

> -- Fix bugs, test extensively, possibly extend test coverage for
> git-add--interactive.

AFAICT ('$ git grep "git add -i"') there is only t3701 testing the
interactive add. Maybe we need to add tests first to document
current behavior, before attempting a conversion?

This could go well into the period "May 4 - May 30", as writing
tests would count as "Clarify implementation details".

Hope this helps,
Stefan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [GSoC] Proposal: turn git-add--interactive.perl into a builtin
  2017-03-28 18:05 ` Stefan Beller
@ 2017-03-30  0:01   ` Johannes Schindelin
  2017-03-31  5:07     ` Daniel Ferreira (theiostream)
  0 siblings, 1 reply; 7+ messages in thread
From: Johannes Schindelin @ 2017-03-30  0:01 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Daniel Ferreira (theiostream), Git Mailing List, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 8148 bytes --]

Hi Stefan & Daniel,

On Tue, 28 Mar 2017, Stefan Beller wrote:

> On Sat, Mar 25, 2017 at 8:15 PM, Daniel Ferreira (theiostream)
> <bnmvco@gmail.com> wrote:
> 
> > SYNOPSIS
> > There are many advantages to converting parts of git that are still
> > scripts to C builtins, among which execution speed, improved
> > compatibility and code deduplication.
> 
> agreed.

I would even add portability. But yeah, speed is a big thing. I am an
extensive user of `git add -p` (which is backed by
git-add--interactive.perl) and it is slow as molasses on Windows, just
because it is a Perl script (and the Perl interpreter needs to emulate
POSIX functionality that is frequently not even needed, such as: copying
all memory and reopening all file descriptors in a fork() call only to
exec() git.exe right away, tossing all of the diligently work into the
dustbin).

> > git-add--interactive, one of the most useful features of Git.
> 
> knee jerk reaction: I never used it, so it cannot be that important ;)
> (I use git-gui, which is essentially the same workflow. There are tons
> of ways to accomplish a given goal using Git, so I guess we don't
> want to get in an argument here).

Well, I make up for your lack of `git add -i` usage.

Of course, since you use git-gui, you are simply using another dependency
that bloats Git for Windows: Tcl/Tk.

> > FEASIBILITY
> >
> > There was only one discussion regarding the feasibility of its porting
> > (https://public-inbox.org/git/CAP8UFD2PcBsU6=FK4OHVrB7E98ycohS_0pYcbCBar=of1HLx+Q@mail.gmail.com/).
> > It resulted in a consensus that doing it would be a task too large –
> > although interesting – for GSoC 2015 based on the amount of its lines
> > of code. It is, however, only a few lines larger than
> > git-rebase--interactive, which has been considered an appropriate
> > idea. As such, it looks like a possible project for three months of
> > full-time work.
> 
> ok, it sounds a challenging project. (currently counting 1750 lines of
> code). Scrolling over the source code, there are quite a couple of
> functions, where the direct equivalent in C springs to mind.
> 
> run_cmd_pipe -> see run-command.h
> unquote_path -> unquote_c_style ?
> refresh -> update_index_if_able()
> list_modified -> iterate over "const struct cache_entry *ce = active_cache[i];"

Yes, I think it would be more important to acquaint oneself with the
idiosynchracies of Git's internal "API" than to get familiar with Perl:
interpreting what obscure Perl code does is something I would gladly do as
a mentor.

> > PROJECTED TIMELINE
> > - Prior to May 4
> > -- Refine my basic knowledge of Perl
> > -- Craft one or two small patches to some of Git's Perl components
> > (preferentially to git-add--interactive itself) to improve my
> > understanding of the language and of how Git's Perl scripts actually
> > work

As I mentioned above, the Perl code should be fairly intuitive for the
most part, with maybe a couple of pieces of code using more advanced Perl
techniques. Given the scope of the project, I would recommend working
closely together with the mentor(s) to clarify what those code parts do.

> > - May 4 - May 30
> > -- Clarify implementation details with my mentor, and work on a more
> > detailed roadmap for the project
> > -- Investigate roughly how to replace command invocations from the
> > script with actual builtin functions; which Git APIs in Perl already
> > have functional equivalents in C; which parts will require a full
> > rewrite.
> 
> There are different approaches for replacing functionality in another
> language. Examples:
> * Implement the functionality in C and then have a "flag-day" commit
>   783d7e865e (builtin-am: remove redirection to git-am.sh, 2015-08-04)
>   This only works when the whole functionality was replaced in prior commits
> * Implement partial functionality in C and call it via a helper function.
>   3604242f08 (submodule: port init from shell to C, 2016-04-15)
>   This works well for only partial conversions (the larger the thing to
>   convert the more appealing this is, as it gets code shipped early.)
>   When choosing this strategy, this part of the Project would be to
>   identify parts that could be ported on its own without much
>   additional glue-code.

To offer my perspective: I strongly prefer the latter approach. Not only
does it yield earlier results, it also makes it substantially easier to
handle the project even if it should turn out to be a little larger than
just 3 months.

> > - May 30 - June 30 (start of coding period)
> > -- Define the architecture of the builtin within git (which
> > functions/interfaces will it have? where will its code reside?).
> > -- Implement a small subset of the builtin (to be defined with my
> > mentor) and glue it into the existing Perl script. Present this as a
> > first patch to get feedback early regarding the implementation and
> > avoid piling up mistakes early.
> > -- Do necessary changes based on this initial review.
> > -- Have roughly 1/3 of the script's functionality ported to C.
> >
> > - June 30 - July 28
> > -- Port the remainder of the script to a builtin.
> > -- Have a weekly roadmap, sending a part of the patch every 15 days to
> > the mailing list for review and to avoid massive commits by the end of
> > GSoC.
> 
> yeah; send early, send often. ;)

Even better: push multiple times a day to a public repository, say, on
GitHub. That allows for "Work In Progress" commits that not only serve as
a backup but also as transparent report what was done.

> > -- Apply suggestions from community reviews when possible; if not,
> > save them for doing toward the end of GSoC (see below).
> 
> Please do not underestimate the discussion by community, finding
> consensus on list consumes a bit of time in some cases.

I agree with this statement. Ideally, the first patch series would be
ready to submit very soon into the project, something like 2 weeks (which
is another point in favor of the helper approach outlined by Stefan).

The smaller the patch series are, the more likely they will get in
quickly. And the easier is it to address reviewers' comments, because you
won't have to send out gazillion unchanged patches of a multi-dozen patch
series just to address one typo in one patch.

> > (Note: due to a previous commitment, during a five-day period of July
> > I will only be able to work part-time on GSoC. The actual week will be
> > known over the next weeks.)
> 
> Maybe you want to shift the schedule up to here by one week then?
> (e.g. the first period would be April 27  - May 23)
> 
> >
> > - July 28 - August 29
> > -- By the start of this period, send a patch with the builtin fully
> > implemented to the mailing list.
> 
> /a patch/a patch series consisting of many patches/
> Experience shows that smaller patches are easier to review as
> it is more focused.

Not only that. Reviewers' time is scarce. I frequently have to triage
which mails I can read in the time I have for the mailing list. If I see a
patch series that is too large, I simply cannot review it. I gather that
is the same issue for pretty much all available reviewers.

> Consider e.g. e86ab2c1cd (wt-status: convert to struct object_id,
> 2017-02-21) and the parents leading up to this commit. They work on the
> same big topic, but focus on very regional areas to ease review.
> 
> > -- Fix bugs, test extensively, possibly extend test coverage for
> > git-add--interactive.
> 
> AFAICT ('$ git grep "git add -i"') there is only t3701 testing the
> interactive add. Maybe we need to add tests first to document
> current behavior, before attempting a conversion?

That should indeed be the first step, before doing much else.

However, it does look as if t3701's 40 test cases provide fairly thorough
coverage.

> This could go well into the period "May 4 - May 30", as writing
> tests would count as "Clarify implementation details".

I agree with that, too.

Thanks for working on this!
Johannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [GSoC] Proposal: turn git-add--interactive.perl into a builtin
  2017-03-30  0:01   ` Johannes Schindelin
@ 2017-03-31  5:07     ` Daniel Ferreira (theiostream)
  2017-03-31 19:06       ` Daniel Ferreira (theiostream)
  2017-04-02 19:43       ` Johannes Schindelin
  0 siblings, 2 replies; 7+ messages in thread
From: Daniel Ferreira (theiostream) @ 2017-03-31  5:07 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Stefan Beller, Git Mailing List, Christian Couder

Hi Stefan & Johannes,

Thank you for the precious feedback on the proposal. I don't see much
sense in sending a full "v2" of it and have you read it all over
again, so I'll just answer to your comments directly.

Also, although the GSoC website allows me to send a "proposal draft"
to you through the website, since I've already sent it here that
shouldn't be necessary, correct? I intend to use it just to send the
final thing.

On Wed, Mar 29, 2017 at 9:01 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> On Tue, 28 Mar 2017, Stefan Beller wrote:
>
>> On Sat, Mar 25, 2017 at 8:15 PM, Daniel Ferreira (theiostream)
>> <bnmvco@gmail.com> wrote:
>>
>> > SYNOPSIS
>> > There are many advantages to converting parts of git that are still
>> > scripts to C builtins, among which execution speed, improved
>> > compatibility and code deduplication.
>>
>> agreed.
>
> I would even add portability. But yeah, speed is a big thing. I am an
> extensive user of `git add -p` (which is backed by
> git-add--interactive.perl) and it is slow as molasses on Windows, just
> because it is a Perl script (and the Perl interpreter needs to emulate
> POSIX functionality that is frequently not even needed, such as: copying
> all memory and reopening all file descriptors in a fork() call only to
> exec() git.exe right away, tossing all of the diligently work into the
> dustbin).

Thanks for this example – it hadn't come to my mind since I don't use
Git on Windows. I'll be sure to complement the synopsis with it. :)

>
>> > FEASIBILITY
>> >
>> > There was only one discussion regarding the feasibility of its porting
>> > (https://public-inbox.org/git/CAP8UFD2PcBsU6=FK4OHVrB7E98ycohS_0pYcbCBar=of1HLx+Q@mail.gmail.com/).
>> > It resulted in a consensus that doing it would be a task too large –
>> > although interesting – for GSoC 2015 based on the amount of its lines
>> > of code. It is, however, only a few lines larger than
>> > git-rebase--interactive, which has been considered an appropriate
>> > idea. As such, it looks like a possible project for three months of
>> > full-time work.
>>
>> ok, it sounds a challenging project. (currently counting 1750 lines of
>> code). Scrolling over the source code, there are quite a couple of
>> functions, where the direct equivalent in C springs to mind.
>>
>> run_cmd_pipe -> see run-command.h
>> unquote_path -> unquote_c_style ?
>> refresh -> update_index_if_able()
>> list_modified -> iterate over "const struct cache_entry *ce = active_cache[i];"

Thank you for these functions. I don't think I will be able to specify
them in detail as part of the projected timeline (e.g. "June 1:
convert calls to refresh() to use update_index_if_able()") already
because there is not enough time prior to the proposal deadline to
study their behavior in detail, and I like to avoid talking about
things I don't fully understand. Although I think I can cite them as
examples for a thesis I had put elsewhere in the proposal that "Git
APIs in Perl already have functional equivalents in C".

Also, they will be great for my early investigation stage into
git-add--interactive. :) Once more, thanks for having listed them.

> Yes, I think it would be more important to acquaint oneself with the
> idiosynchracies of Git's internal "API" than to get familiar with Perl:
> interpreting what obscure Perl code does is something I would gladly do as
> a mentor.

That's really nice! I usually don't get stuck when trying to
understand code in languages I'm not too well acquainted with, but I
figured getting more familiar with Perl would speed development up.
But it does make sense that this "prior to May 4" might be better
invested learning about git's internals than Perl.

Question: do you suggest any pending bugfix to git-add--interactive or
to something related that might give some useful knowledge in advance?
(for the pre-code period). My microproject involves playing with the
dir_iterator interface, which is a great exercise in code refactoring
but really does not teach me too much about Git's architecture.

Even if you do not have an answer to this, I'm pretty sure I'll keep
this commitment to submitting some patch series somehow related to
git-add before GSoC begins, especially after this comment from
Johannes.

>
>> > PROJECTED TIMELINE
>> > - Prior to May 4
>> > -- Refine my basic knowledge of Perl
>> > -- Craft one or two small patches to some of Git's Perl components
>> > (preferentially to git-add--interactive itself) to improve my
>> > understanding of the language and of how Git's Perl scripts actually
>> > work

So yeah, I think this could be rewritten as:

- Prior to May 4
-- Craft two or three small patch series to git-add--interactive or
related components to improve my understanding of Git's internal
architecture, especially that related to git-add.

>
>> > - May 4 - May 30
>> > -- Clarify implementation details with my mentor, and work on a more
>> > detailed roadmap for the project
>> > -- Investigate roughly how to replace command invocations from the
>> > script with actual builtin functions; which Git APIs in Perl already
>> > have functional equivalents in C; which parts will require a full
>> > rewrite.
>> There are different approaches for replacing functionality in another
>> language. Examples:
>> * Implement the functionality in C and then have a "flag-day" commit
>>   783d7e865e (builtin-am: remove redirection to git-am.sh, 2015-08-04)
>>   This only works when the whole functionality was replaced in prior commits
>> * Implement partial functionality in C and call it via a helper function.
>>   3604242f08 (submodule: port init from shell to C, 2016-04-15)
>>   This works well for only partial conversions (the larger the thing to
>>   convert the more appealing this is, as it gets code shipped early.)
>>   When choosing this strategy, this part of the Project would be to
>>   identify parts that could be ported on its own without much
>>   additional glue-code.
>
> To offer my perspective: I strongly prefer the latter approach. Not only
> does it yield earlier results, it also makes it substantially easier to
> handle the project even if it should turn out to be a little larger than
> just 3 months.

I agree. I even think that the latter parts of this projected timeline
imply a choice for the second approach.

For now, Stefan's implementation in 3604242f08 (create a helper
builtin command, call it from a script, do it to other functions until
the whole script is replaced) seems the most natural way to do it. But
sadly, I still do not have enough knowledge about the project to be
able to specify in this proposal how (e.g. in what order) exactly I
intend to "divide" the code to do this. But I'm sure the answer to
this question will be clearer through contact with my mentor and with
growing experience in the project.

>
>> > - May 30 - June 30 (start of coding period)
>> > -- Define the architecture of the builtin within git (which
>> > functions/interfaces will it have? where will its code reside?).
>> > -- Implement a small subset of the builtin (to be defined with my
>> > mentor) and glue it into the existing Perl script. Present this as a
>> > first patch to get feedback early regarding the implementation and
>> > avoid piling up mistakes early.
>> > -- Do necessary changes based on this initial review.
>> > -- Have roughly 1/3 of the script's functionality ported to C.
>> >
>> > - June 30 - July 28
>> > -- Port the remainder of the script to a builtin.
>> > -- Have a weekly roadmap, sending a part of the patch every 15 days to
>> > the mailing list for review and to avoid massive commits by the end of
>> > GSoC.
>>
>> yeah; send early, send often. ;)
>
> Even better: push multiple times a day to a public repository, say, on
> GitHub. That allows for "Work In Progress" commits that not only serve as
> a backup but also as transparent report what was done.

Great idea. It can also be a good way to report my progress to my
mentor. (I'll do my best to avoid "[WIP]" commits with barely any
explanation).

As for the "timeline", I think it can be improved by some better
specification regarding the gradual process of the implementation
opposed to switching from the script to a builtin all at once.

>> > -- Apply suggestions from community reviews when possible; if not,
>> > save them for doing toward the end of GSoC (see below).
>>
>> Please do not underestimate the discussion by community, finding
>> consensus on list consumes a bit of time in some cases.
> I agree with this statement. Ideally, the first patch series would be
> ready to submit very soon into the project, something like 2 weeks (which
> is another point in favor of the helper approach outlined by Stefan).
> The smaller the patch series are, the more likely they will get in
> quickly. And the easier is it to address reviewers' comments, because you
> won't have to send out gazillion unchanged patches of a multi-dozen patch
> series just to address one typo in one patch.

Seems reasonable. I'll amend the proposal to commit to a first patch
series in 2 weeks and to sending a new one every week or less (to
avoid series getting massive). I'll also refrain from postponing
responding to feedback and, when necessary, work in parallel between
amending patches and working on new code. After all, I know how my
schedule will be during the whole GSoC period, but I don't know how
available those in the list will be to review it, so it's probably
best to respond early.

>> > (Note: due to a previous commitment, during a five-day period of July
>> > I will only be able to work part-time on GSoC. The actual week will be
>> > known over the next weeks.)
>>
>> Maybe you want to shift the schedule up to here by one week then?
>> (e.g. the first period would be April 27  - May 23)
>>

I cannot shift the schedule one week earlier during the pre-code
period because Google only releases mentor information on May 4, and
the things I'm planning for the second period depend on the mentor.
However, my classes only start in September 22, so I would have
absolutely no problem continuing to work full-time on Git after the
GSoC period is officially over to compensate for these five days.

>> >
>> > - July 28 - August 29
>> > -- By the start of this period, send a patch with the builtin fully
>> > implemented to the mailing list.
>>
>> /a patch/a patch series consisting of many patches/
>> Experience shows that smaller patches are easier to review as
>> it is more focused.

I will reword it.

>> Consider e.g. e86ab2c1cd (wt-status: convert to struct object_id,
>> 2017-02-21) and the parents leading up to this commit. They work on the
>> same big topic, but focus on very regional areas to ease review.
>>
>> > -- Fix bugs, test extensively, possibly extend test coverage for
>> > git-add--interactive.
>>
>> AFAICT ('$ git grep "git add -i"') there is only t3701 testing the
>> interactive add. Maybe we need to add tests first to document
>> current behavior, before attempting a conversion?
>
> That should indeed be the first step, before doing much else.
>
> However, it does look as if t3701's 40 test cases provide fairly thorough
> coverage.
>
>> This could go well into the period "May 4 - May 30", as writing
>> tests would count as "Clarify implementation details".
>
> I agree with that, too.

Seems reasonable. As I said above, working on patches (if possible)
related to git-add--interactive will probably lead me to discovering
coverage issues, which could then be dealt with in this pre-code
period already with some guidance from my mentor.

> Thanks for working on this!

Thank *you*, and Stefan, for the careful review of it. It means a lot.

-- Daniel.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [GSoC] Proposal: turn git-add--interactive.perl into a builtin
  2017-03-31  5:07     ` Daniel Ferreira (theiostream)
@ 2017-03-31 19:06       ` Daniel Ferreira (theiostream)
  2017-04-02 19:43       ` Johannes Schindelin
  1 sibling, 0 replies; 7+ messages in thread
From: Daniel Ferreira (theiostream) @ 2017-03-31 19:06 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Stefan Beller, Git Mailing List, Christian Couder

Well, Google requires me to have a draft on a Google Doc anyway for
the proposal, and I am unsure who exactly it will reach. Since it *is*
part of the discussion regarding my proposal, I suppose it is worth
posting here for anyone to comment:
https://docs.google.com/document/d/1dvF2PNRQvvZ351jCdKzOLs7tzaDqhR7ci7TDgzYQg9I/edit?usp=sharing.

-- Daniel.

On Fri, Mar 31, 2017 at 2:07 AM, Daniel Ferreira (theiostream)
<bnmvco@gmail.com> wrote:
> Hi Stefan & Johannes,
>
> Thank you for the precious feedback on the proposal. I don't see much
> sense in sending a full "v2" of it and have you read it all over
> again, so I'll just answer to your comments directly.
>
> Also, although the GSoC website allows me to send a "proposal draft"
> to you through the website, since I've already sent it here that
> shouldn't be necessary, correct? I intend to use it just to send the
> final thing.
>
> On Wed, Mar 29, 2017 at 9:01 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>> On Tue, 28 Mar 2017, Stefan Beller wrote:
>>
>>> On Sat, Mar 25, 2017 at 8:15 PM, Daniel Ferreira (theiostream)
>>> <bnmvco@gmail.com> wrote:
>>>
>>> > SYNOPSIS
>>> > There are many advantages to converting parts of git that are still
>>> > scripts to C builtins, among which execution speed, improved
>>> > compatibility and code deduplication.
>>>
>>> agreed.
>>
>> I would even add portability. But yeah, speed is a big thing. I am an
>> extensive user of `git add -p` (which is backed by
>> git-add--interactive.perl) and it is slow as molasses on Windows, just
>> because it is a Perl script (and the Perl interpreter needs to emulate
>> POSIX functionality that is frequently not even needed, such as: copying
>> all memory and reopening all file descriptors in a fork() call only to
>> exec() git.exe right away, tossing all of the diligently work into the
>> dustbin).
>
> Thanks for this example – it hadn't come to my mind since I don't use
> Git on Windows. I'll be sure to complement the synopsis with it. :)
>
>>
>>> > FEASIBILITY
>>> >
>>> > There was only one discussion regarding the feasibility of its porting
>>> > (https://public-inbox.org/git/CAP8UFD2PcBsU6=FK4OHVrB7E98ycohS_0pYcbCBar=of1HLx+Q@mail.gmail.com/).
>>> > It resulted in a consensus that doing it would be a task too large –
>>> > although interesting – for GSoC 2015 based on the amount of its lines
>>> > of code. It is, however, only a few lines larger than
>>> > git-rebase--interactive, which has been considered an appropriate
>>> > idea. As such, it looks like a possible project for three months of
>>> > full-time work.
>>>
>>> ok, it sounds a challenging project. (currently counting 1750 lines of
>>> code). Scrolling over the source code, there are quite a couple of
>>> functions, where the direct equivalent in C springs to mind.
>>>
>>> run_cmd_pipe -> see run-command.h
>>> unquote_path -> unquote_c_style ?
>>> refresh -> update_index_if_able()
>>> list_modified -> iterate over "const struct cache_entry *ce = active_cache[i];"
>
> Thank you for these functions. I don't think I will be able to specify
> them in detail as part of the projected timeline (e.g. "June 1:
> convert calls to refresh() to use update_index_if_able()") already
> because there is not enough time prior to the proposal deadline to
> study their behavior in detail, and I like to avoid talking about
> things I don't fully understand. Although I think I can cite them as
> examples for a thesis I had put elsewhere in the proposal that "Git
> APIs in Perl already have functional equivalents in C".
>
> Also, they will be great for my early investigation stage into
> git-add--interactive. :) Once more, thanks for having listed them.
>
>> Yes, I think it would be more important to acquaint oneself with the
>> idiosynchracies of Git's internal "API" than to get familiar with Perl:
>> interpreting what obscure Perl code does is something I would gladly do as
>> a mentor.
>
> That's really nice! I usually don't get stuck when trying to
> understand code in languages I'm not too well acquainted with, but I
> figured getting more familiar with Perl would speed development up.
> But it does make sense that this "prior to May 4" might be better
> invested learning about git's internals than Perl.
>
> Question: do you suggest any pending bugfix to git-add--interactive or
> to something related that might give some useful knowledge in advance?
> (for the pre-code period). My microproject involves playing with the
> dir_iterator interface, which is a great exercise in code refactoring
> but really does not teach me too much about Git's architecture.
>
> Even if you do not have an answer to this, I'm pretty sure I'll keep
> this commitment to submitting some patch series somehow related to
> git-add before GSoC begins, especially after this comment from
> Johannes.
>
>>
>>> > PROJECTED TIMELINE
>>> > - Prior to May 4
>>> > -- Refine my basic knowledge of Perl
>>> > -- Craft one or two small patches to some of Git's Perl components
>>> > (preferentially to git-add--interactive itself) to improve my
>>> > understanding of the language and of how Git's Perl scripts actually
>>> > work
>
> So yeah, I think this could be rewritten as:
>
> - Prior to May 4
> -- Craft two or three small patch series to git-add--interactive or
> related components to improve my understanding of Git's internal
> architecture, especially that related to git-add.
>
>>
>>> > - May 4 - May 30
>>> > -- Clarify implementation details with my mentor, and work on a more
>>> > detailed roadmap for the project
>>> > -- Investigate roughly how to replace command invocations from the
>>> > script with actual builtin functions; which Git APIs in Perl already
>>> > have functional equivalents in C; which parts will require a full
>>> > rewrite.
>>> There are different approaches for replacing functionality in another
>>> language. Examples:
>>> * Implement the functionality in C and then have a "flag-day" commit
>>>   783d7e865e (builtin-am: remove redirection to git-am.sh, 2015-08-04)
>>>   This only works when the whole functionality was replaced in prior commits
>>> * Implement partial functionality in C and call it via a helper function.
>>>   3604242f08 (submodule: port init from shell to C, 2016-04-15)
>>>   This works well for only partial conversions (the larger the thing to
>>>   convert the more appealing this is, as it gets code shipped early.)
>>>   When choosing this strategy, this part of the Project would be to
>>>   identify parts that could be ported on its own without much
>>>   additional glue-code.
>>
>> To offer my perspective: I strongly prefer the latter approach. Not only
>> does it yield earlier results, it also makes it substantially easier to
>> handle the project even if it should turn out to be a little larger than
>> just 3 months.
>
> I agree. I even think that the latter parts of this projected timeline
> imply a choice for the second approach.
>
> For now, Stefan's implementation in 3604242f08 (create a helper
> builtin command, call it from a script, do it to other functions until
> the whole script is replaced) seems the most natural way to do it. But
> sadly, I still do not have enough knowledge about the project to be
> able to specify in this proposal how (e.g. in what order) exactly I
> intend to "divide" the code to do this. But I'm sure the answer to
> this question will be clearer through contact with my mentor and with
> growing experience in the project.
>
>>
>>> > - May 30 - June 30 (start of coding period)
>>> > -- Define the architecture of the builtin within git (which
>>> > functions/interfaces will it have? where will its code reside?).
>>> > -- Implement a small subset of the builtin (to be defined with my
>>> > mentor) and glue it into the existing Perl script. Present this as a
>>> > first patch to get feedback early regarding the implementation and
>>> > avoid piling up mistakes early.
>>> > -- Do necessary changes based on this initial review.
>>> > -- Have roughly 1/3 of the script's functionality ported to C.
>>> >
>>> > - June 30 - July 28
>>> > -- Port the remainder of the script to a builtin.
>>> > -- Have a weekly roadmap, sending a part of the patch every 15 days to
>>> > the mailing list for review and to avoid massive commits by the end of
>>> > GSoC.
>>>
>>> yeah; send early, send often. ;)
>>
>> Even better: push multiple times a day to a public repository, say, on
>> GitHub. That allows for "Work In Progress" commits that not only serve as
>> a backup but also as transparent report what was done.
>
> Great idea. It can also be a good way to report my progress to my
> mentor. (I'll do my best to avoid "[WIP]" commits with barely any
> explanation).
>
> As for the "timeline", I think it can be improved by some better
> specification regarding the gradual process of the implementation
> opposed to switching from the script to a builtin all at once.
>
>>> > -- Apply suggestions from community reviews when possible; if not,
>>> > save them for doing toward the end of GSoC (see below).
>>>
>>> Please do not underestimate the discussion by community, finding
>>> consensus on list consumes a bit of time in some cases.
>> I agree with this statement. Ideally, the first patch series would be
>> ready to submit very soon into the project, something like 2 weeks (which
>> is another point in favor of the helper approach outlined by Stefan).
>> The smaller the patch series are, the more likely they will get in
>> quickly. And the easier is it to address reviewers' comments, because you
>> won't have to send out gazillion unchanged patches of a multi-dozen patch
>> series just to address one typo in one patch.
>
> Seems reasonable. I'll amend the proposal to commit to a first patch
> series in 2 weeks and to sending a new one every week or less (to
> avoid series getting massive). I'll also refrain from postponing
> responding to feedback and, when necessary, work in parallel between
> amending patches and working on new code. After all, I know how my
> schedule will be during the whole GSoC period, but I don't know how
> available those in the list will be to review it, so it's probably
> best to respond early.
>
>>> > (Note: due to a previous commitment, during a five-day period of July
>>> > I will only be able to work part-time on GSoC. The actual week will be
>>> > known over the next weeks.)
>>>
>>> Maybe you want to shift the schedule up to here by one week then?
>>> (e.g. the first period would be April 27  - May 23)
>>>
>
> I cannot shift the schedule one week earlier during the pre-code
> period because Google only releases mentor information on May 4, and
> the things I'm planning for the second period depend on the mentor.
> However, my classes only start in September 22, so I would have
> absolutely no problem continuing to work full-time on Git after the
> GSoC period is officially over to compensate for these five days.
>
>>> >
>>> > - July 28 - August 29
>>> > -- By the start of this period, send a patch with the builtin fully
>>> > implemented to the mailing list.
>>>
>>> /a patch/a patch series consisting of many patches/
>>> Experience shows that smaller patches are easier to review as
>>> it is more focused.
>
> I will reword it.
>
>>> Consider e.g. e86ab2c1cd (wt-status: convert to struct object_id,
>>> 2017-02-21) and the parents leading up to this commit. They work on the
>>> same big topic, but focus on very regional areas to ease review.
>>>
>>> > -- Fix bugs, test extensively, possibly extend test coverage for
>>> > git-add--interactive.
>>>
>>> AFAICT ('$ git grep "git add -i"') there is only t3701 testing the
>>> interactive add. Maybe we need to add tests first to document
>>> current behavior, before attempting a conversion?
>>
>> That should indeed be the first step, before doing much else.
>>
>> However, it does look as if t3701's 40 test cases provide fairly thorough
>> coverage.
>>
>>> This could go well into the period "May 4 - May 30", as writing
>>> tests would count as "Clarify implementation details".
>>
>> I agree with that, too.
>
> Seems reasonable. As I said above, working on patches (if possible)
> related to git-add--interactive will probably lead me to discovering
> coverage issues, which could then be dealt with in this pre-code
> period already with some guidance from my mentor.
>
>> Thanks for working on this!
>
> Thank *you*, and Stefan, for the careful review of it. It means a lot.
>
> -- Daniel.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [GSoC] Proposal: turn git-add--interactive.perl into a builtin
  2017-03-31  5:07     ` Daniel Ferreira (theiostream)
  2017-03-31 19:06       ` Daniel Ferreira (theiostream)
@ 2017-04-02 19:43       ` Johannes Schindelin
  2017-04-02 22:50         ` Daniel Ferreira (theiostream)
  1 sibling, 1 reply; 7+ messages in thread
From: Johannes Schindelin @ 2017-04-02 19:43 UTC (permalink / raw)
  To: Daniel Ferreira (theiostream)
  Cc: Stefan Beller, Git Mailing List, Christian Couder

Hi Daniel,

On Fri, 31 Mar 2017, Daniel Ferreira (theiostream) wrote:

> Question: do you suggest any pending bugfix to git-add--interactive or
> to something related that might give some useful knowledge in advance?
> (for the pre-code period). My microproject involves playing with the
> dir_iterator interface, which is a great exercise in code refactoring
> but really does not teach me too much about Git's architecture.

We ask to accomplish a microproject before evaluating the proposals for
one reason: to have a good understanding how well the students would
interact with the project if they were accepted. As such, the
microprojects really are about the flow of the contribution, not to tackle
the project already.

Meaning: I would recommend staying with your microproject, in particular
if it is already in full swing.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [GSoC] Proposal: turn git-add--interactive.perl into a builtin
  2017-04-02 19:43       ` Johannes Schindelin
@ 2017-04-02 22:50         ` Daniel Ferreira (theiostream)
  0 siblings, 0 replies; 7+ messages in thread
From: Daniel Ferreira (theiostream) @ 2017-04-02 22:50 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Stefan Beller, Git Mailing List, Christian Couder

On Sun, Apr 2, 2017 at 4:43 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> We ask to accomplish a microproject before evaluating the proposals for
> one reason: to have a good understanding how well the students would
> interact with the project if they were accepted. As such, the
> microprojects really are about the flow of the contribution, not to tackle
> the project already.
> Meaning: I would recommend staying with your microproject, in particular
> if it is already in full swing.

Oh, when I mentioned these bugfixes I meant I'd be willing to do those
*plus* my microproject before the coding period begins as a "warm-up"
to the project. I'm certainly staying with the microproject until the
end!

-- Daniel.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-04-02 22:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-26  3:15 [GSoC] Proposal: turn git-add--interactive.perl into a builtin Daniel Ferreira (theiostream)
2017-03-28 18:05 ` Stefan Beller
2017-03-30  0:01   ` Johannes Schindelin
2017-03-31  5:07     ` Daniel Ferreira (theiostream)
2017-03-31 19:06       ` Daniel Ferreira (theiostream)
2017-04-02 19:43       ` Johannes Schindelin
2017-04-02 22:50         ` Daniel Ferreira (theiostream)

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).