git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC][GSoC] Project proposal: convert interactive rebase to C
@ 2018-03-17 19:14 Alban Gruin
  2018-03-17 20:29 ` Christian Couder
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Alban Gruin @ 2018-03-17 19:14 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, Johannes Schindelin

Hi,

here is my first draft of my proposal for the GSoC, about the "convert 
interactive rebase to C" project. Any feedback is welcome :)

---
ABSTRACT
git is a modular source control management software, and all of its 
subcommands are programs on their own. A lot of them are written in C, but a 
couple of them are shell or Perl scripts. This is the case of git-rebase--
interactive (or interactive rebase), which is a shell script. Rewriting it in 
C would improve its performance, its portability, and maybe its robustness.


ABOUT `git-rebase{,--interactive}`

git-rebase allows to re-apply changes on top of another branch. For instance, 
when a local branch and a remote branch have diverged, git-rebase can re-unify 
them, applying each change made on the local branch on top of the remote 
branch.

git-rebase--interactive is used to reorganize commits by reordering, 
rewording, or squashing them. To achieve this purpose, git opens the list of 
commits to be modified in a text editor (hence the interactivity), as well as 
the actions to be performed for each of them.


PROJECT GOALS

The goal of this project is to rewrite git-rebase--interactive in C as it has 
been discussed on the git mailing list[1], for multiple reasons :

Performance improvements
Shell scripts are inherently slow. That’s because each command is a program by 
itself. So, for each command, the shell interpreter has to spawn a new process 
and to load a new program.

Those commands can be other git commands. Sometimes, they are wrappers to call 
internal C functions (eg. git-rebase--helper), something shell scripts can’t 
do natively. These wrappers basically parse the parameters, then start the 
appropriate function, which is
obviously slower than just calling a function from C.

Other commands can be POSIX utilities (eg. sed, cut, etc.). They have their 
own problems (speed aside), namely portability.

Portability improvements
Shell scripts often relies on many of those POSIX utilities, which are not 
necessarily natively available on all platforms (most notably, Windows), or 
may have more or less features depending on the implementation.


APPROXIMATIVE TIMELINE

Community bonding — April 23, 2018 – May 14, 2018
During the community bonding, I would like to dive into git’s codebase, and to 
understand what git-rebase--interactive does under the hood. At the same time, 
I’d communicate with the community and my mentor, seeking for clarifications, 
and asking questions about how things should or should not be done.

Weeks 1 & 2 — May 14, 2018 – May 28, 2018
First, I would refactor --preserve-merges in its own shell script, as 
described in Dscho’s email.

Weeks 3 & 4 — May 18, 2018 – June 11, 2018
Then, I would start to rewrite git-rebase--interactive, and get rid of git-
rebase--helper.

Weeks 5 to 9 — June 11, 2018 – July 15, 2018
During this period, I would continue to rewrite git-rebase--interactive.

Weeks 10 & 11 — July 16, 2018 – July 29, 2018
In the second half of July, I would look for bugs in the new code, test it, 
and improve its coverage.

Weeks 12 — July 30, 2018 – August 5, 2018
In the last week, I would polish the code where needed, in order to improve 
for performance or to make the code more readable.


ABOUT ME

My name is Alban Gruin, I am an undergraduate at the Paul Sabatier University 
in Toulouse, France, where I have been studying Computer Sciences for the past 
year and a half. My timezone currently is UTC+01:00, but will be UTC+02:00 
starting from March 25th, because of the daylight saving time in Europe.

I have been programming in C for the last 5 years. I learned using freely 
available resources online, and by attending class ever since last year.

I am also quite familiar with shell scripts, and I have been using git for the 
last 3 years. 

My e-mail address is alban <dot> gruin <at> gmail <dot> com. My IRC nick is 
abngrn.

My micro-project was "userdiff: add built-in pattern for golang"[2][3].

---

You can find the Google Doc version here[4].

Regards,
Alban Gruin

[1] https://public-inbox.org/git/alpine.DEB.2.20.1609021432070.129229@virtualbox/
[2] https://public-inbox.org/git/20180228172906.30582-1-alban.gruin@gmail.com/
[3] https://git.kernel.org/pub/scm/git/git.git/commit/?id=1dbf0c0a
[4] https://docs.google.com/document/d/1Jx0w867tVAht7QI1_prieiXg_iQ_nTloOyaIIOnm85g/edit?usp=sharing



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][GSoC] Project proposal: convert interactive rebase to C
  2018-03-17 19:14 [RFC][GSoC] Project proposal: convert interactive rebase to C Alban Gruin
@ 2018-03-17 20:29 ` Christian Couder
  2018-03-20 16:29 ` Johannes Schindelin
  2018-03-22 22:03 ` Alban Gruin
  2 siblings, 0 replies; 8+ messages in thread
From: Christian Couder @ 2018-03-17 20:29 UTC (permalink / raw)
  To: Alban Gruin; +Cc: git, Johannes Schindelin

Hi,

On Sat, Mar 17, 2018 at 8:14 PM, Alban Gruin <alban.gruin@gmail.com> wrote:
>
> Weeks 3 & 4 — May 18, 2018 – June 11, 2018
> Then, I would start to rewrite git-rebase--interactive, and get rid of git-
> rebase--helper.

Usually to rewrite a shell script in C, we first rewrite shell
functions into option arguments in a C builtin helper and make the
schell script call the builtin helper (instead of the original shell
functions). Eventually when the shell script is mostly only calling
the builtin helper, we add what is needed into the builtin helper and
we rename it to make it fully replace the shell script.

See for example 0cce4a2756 (rebase -i -x: add exec commands via the
rebase--helper, 2017-12-05) or b903674b35 (bisect--helper:
`is_expected_rev` & `check_expected_revs` shell function in C,
2017-09-29). These examples show that we can do step by step rewrites.

I would suggest planning to use the same approach, and describing in
your proposal which shell functions you would like to rewrite into the
C builtin helper in which order, before planning to fully replace the
current git-rebase--interactive.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][GSoC] Project proposal: convert interactive rebase to C
  2018-03-17 19:14 [RFC][GSoC] Project proposal: convert interactive rebase to C Alban Gruin
  2018-03-17 20:29 ` Christian Couder
@ 2018-03-20 16:29 ` Johannes Schindelin
  2018-03-21 11:04   ` Alban Gruin
  2018-03-22 22:03 ` Alban Gruin
  2 siblings, 1 reply; 8+ messages in thread
From: Johannes Schindelin @ 2018-03-20 16:29 UTC (permalink / raw)
  To: Alban Gruin; +Cc: git, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 2639 bytes --]

Hi Alban,

thank you for your proposal!

I will only comment on the parts that I feel could use improvement, the
rest is fine by me.

On Sat, 17 Mar 2018, Alban Gruin wrote:

> APPROXIMATIVE TIMELINE
> 
> Community bonding — April 23, 2018 – May 14, 2018
> During the community bonding, I would like to dive into git’s codebase,
> and to understand what git-rebase--interactive does under the hood. At
> the same time, I’d communicate with the community and my mentor, seeking
> for clarifications, and asking questions about how things should or
> should not be done.
> 
> Weeks 1 & 2 — May 14, 2018 – May 28, 2018
> First, I would refactor --preserve-merges in its own shell script, as
> described in Dscho’s email.

Could you go into detail a bit here? Like, describe what parts of the
git-rebase--interactive.sh script would need to be duplicated, which ones
would be truly moved, etc

> Weeks 3 & 4 — May 18, 2018 – June 11, 2018
> Then, I would start to rewrite git-rebase--interactive, and get rid of git-
> rebase--helper.

I think this is a bit premature, as the rebase--helper would not only
serve the --interactive part, but in later conversions also --am and
--merge, and only in the very end, when git-rebase.sh gets converted,
would we be able to simply rename the rebase--helper to rebase.

Also, I have a hunch that there is actually almost nothing left to rewrite
after my sequencer improvements that made it into Git v2.13.0, together
with the upcoming changes (which are on top of the --recreate-merges patch
series, hence I did not send them to the mailing list yet) in
https://github.com/dscho/git/commit/c261f17a4a3e

So I would like to see more details here... ;-)

> Weeks 5 to 9 — June 11, 2018 – July 15, 2018
> During this period, I would continue to rewrite git-rebase--interactive.

It would be good if the proposal said what parts of the conversion are
tricky, to merit spending a month on them.

> Weeks 10 & 11 — July 16, 2018 – July 29, 2018
> In the second half of July, I would look for bugs in the new code, test it, 
> and improve its coverage.

As I mentioned in a related mail, the test suite coverage would be most
sensibly extended *before* starting to rewrite code in C, as it helps
catching bugs early and avoids having to merge buggy code that needs to be
fixed immediately.

> Weeks 12 — July 30, 2018 – August 5, 2018
> In the last week, I would polish the code where needed, in order to
> improve for performance or to make the code more readable.

Thank you for sharing this draft with us!
Johannes

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][GSoC] Project proposal: convert interactive rebase to C
  2018-03-20 16:29 ` Johannes Schindelin
@ 2018-03-21 11:04   ` Alban Gruin
  2018-03-21 23:51     ` Johannes Schindelin
  0 siblings, 1 reply; 8+ messages in thread
From: Alban Gruin @ 2018-03-21 11:04 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Christian Couder

Hi Johannes,

Le mardi 20 mars 2018 17:29:28 CET, vous avez écrit :
> > Weeks 1 & 2 — May 14, 2018 – May 28, 2018
> > First, I would refactor --preserve-merges in its own shell script, as
> > described in Dscho’s email.
> 
> Could you go into detail a bit here? Like, describe what parts of the
> git-rebase--interactive.sh script would need to be duplicated, which ones
> would be truly moved, etc
> 

It would lead to duplicate a good chunk of git_rebase__interactive(), 
apparently. The moved parts would be everything in `if test t = 
"$preserve_merges"; then …; fi` statements. That is, about 50 lines of shell 
code.

Judging by that, beginning by that is probably not the right thing to do. 
Also, somebody is already working on that[1]. 

> > Weeks 3 & 4 — May 18, 2018 – June 11, 2018
> > Then, I would start to rewrite git-rebase--interactive, and get rid of
> > git-
> > rebase--helper.
> 
> I think this is a bit premature, as the rebase--helper would not only
> serve the --interactive part, but in later conversions also --am and
> --merge, and only in the very end, when git-rebase.sh gets converted,
> would we be able to simply rename the rebase--helper to rebase.
> 

Yes, Christian Couder told me that it would not be a good methodology too.

> Also, I have a hunch that there is actually almost nothing left to rewrite
> after my sequencer improvements that made it into Git v2.13.0, together
> with the upcoming changes (which are on top of the --recreate-merges patch
> series, hence I did not send them to the mailing list yet) in
> https://github.com/dscho/git/commit/c261f17a4a3e

One year ago, you said[2] that converting this script "will fill up 3 month, 
very easily". Is this not accurate anymore?

> 
> So I would like to see more details here... ;-)

Yep, I’m working on that. 

> > Weeks 5 to 9 — June 11, 2018 – July 15, 2018
> > During this period, I would continue to rewrite git-rebase--interactive.
> 
> It would be good if the proposal said what parts of the conversion are
> tricky, to merit spending a month on them.
> 
> > Weeks 10 & 11 — July 16, 2018 – July 29, 2018
> > In the second half of July, I would look for bugs in the new code, test
> > it,
> > and improve its coverage.
> 
> As I mentioned in a related mail, the test suite coverage would be most
> sensibly extended *before* starting to rewrite code in C, as it helps
> catching bugs early and avoids having to merge buggy code that needs to be
> fixed immediately.

Makes sense.

> 
> > Weeks 12 — July 30, 2018 – August 5, 2018
> > In the last week, I would polish the code where needed, in order to
> > improve for performance or to make the code more readable.
> 
> Thank you for sharing this draft with us!
> Johannes

I’ll send a new draft as soon as possible (hopefully this afternoon).

Thank you for your enthousiasm :)
Alban

[1] https://public-inbox.org/git/20180320204507.12623-1-wink@saville.com/
[2] https://public-inbox.org/git/alpine.DEB.
2.20.1703231827060.3767@virtualbox/



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][GSoC] Project proposal: convert interactive rebase to C
  2018-03-21 11:04   ` Alban Gruin
@ 2018-03-21 23:51     ` Johannes Schindelin
  0 siblings, 0 replies; 8+ messages in thread
From: Johannes Schindelin @ 2018-03-21 23:51 UTC (permalink / raw)
  To: Alban Gruin; +Cc: git, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 2158 bytes --]

Hi Alban,

On Wed, 21 Mar 2018, Alban Gruin wrote:

> Le mardi 20 mars 2018 17:29:28 CET, vous avez écrit :
> 
> > Also, I have a hunch that there is actually almost nothing left to
> > rewrite after my sequencer improvements that made it into Git v2.13.0,
> > together with the upcoming changes (which are on top of the
> > --recreate-merges patch series, hence I did not send them to the
> > mailing list yet) in
> > https://github.com/dscho/git/commit/c261f17a4a3e
> 
> One year ago, you said[2] that converting this script "will fill up 3
> month, very easily". Is this not accurate anymore?

Let me read that mail ;-)

*goes and reads*

Well, I was talking about two different aspects to Ivan and to you. I
should have been clearer. So let me try again:

To convert `git-rebase--interactive.sh`, I think the most important part
is to factor out the preserve-merges code into its own script. After that,
there is little I can think of (apart from support for --root, which a
not-yet-contributed patch in my sequencer-shears branch on
https://github.com/dscho/git addresses) that still needs to be converted.
For somebody familiar with Git's source code, I would estimate one week
(and therefore 3 weeks would be a realistic estimate :-)).

Come to think of it, a better approach might be to leave the
preserve-merges stuff in, and teach `git-rebase.sh` to call the sequencer
directly for --interactive without --preserve-merges, then rename the
script to git-rebase--preserve.sh

The other aspect, the one I thought would take up to 3 months, easily, was
to convert the entirety of rebase -i into C. That would entail also the
option parsing, for which you would have to convert also git-rebase.sh
(and if you do not convert git-rebase--am.sh and git-rebase--merge.sh
first, you would then have to teach builtin/rebase.c to populate the
environment variables expected by those shell scripts while spawning
them).

I still think that the latter is too big a task for a single GSoC.

> I’ll send a new draft as soon as possible (hopefully this afternoon).

I look forward to reading it!

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][GSoC] Project proposal: convert interactive rebase to C
  2018-03-17 19:14 [RFC][GSoC] Project proposal: convert interactive rebase to C Alban Gruin
  2018-03-17 20:29 ` Christian Couder
  2018-03-20 16:29 ` Johannes Schindelin
@ 2018-03-22 22:03 ` Alban Gruin
  2018-03-24  7:43   ` Christian Couder
  2018-03-25  0:43   ` [RFC v3][GSoC] " Alban Gruin
  2 siblings, 2 replies; 8+ messages in thread
From: Alban Gruin @ 2018-03-22 22:03 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, Johannes Schindelin

Hi,

here is my second draft of my proposal. As last time, any feedback is
welcome :)

I did not write my phone number and address here for obvious reasons,
but they will be in the “about me” section of the final proposal.

Apart from that, do you think there is something to add?

---
ABSTRACT

git is a modular source control management software, and all of its
subcommands are programs on their own. A lot of them are written in C,
but a couple of them are shell or Perl scripts. This is the case of
git-rebase--interactive (or interactive rebase), which is a shell
script. Rewriting it in C would improve its performance, its
portability, and maybe its robustness.


ABOUT `git-rebase` and `git-rebase--interactive`

git-rebase allows to re-apply changes on top of another branch. For
instance, when a local branch and a remote branch have diverged,
git-rebase can re-unify them, applying each change made on the
local branch on top of the remote branch.

git-rebase--interactive is used to reorganize commits by reordering,
rewording, or squashing them. To achieve this purpose, git opens the
list of commits to be modified in a text editor (hence the
interactivity), as well as the actions to be performed for each of
them.


PROJECT GOALS

The goal of this project is to rewrite git-rebase--interactive in C
as it has been discussed on the git mailing list[1], for multiple
reasons :


Performance improvements
Shell scripts are inherently slow. That’s because each command is a
program by itself. So, for each command, the shell interpreter has to
spawn a new process and to load a new program (with fork() and
exec() syscalls), which is an expensive process.

Those commands can be other git commands. Sometimes, they are
wrappers to call internal C functions (eg. git-rebase--helper),
something shell scripts can’t do natively. These wrappers basically
parse the parameters, then start the appropriate function, which is
obviously slower than just calling a function from C.

Other commands can be POSIX utilities (eg. sed, cut, etc.). They
have their own problems (speed aside), namely portability.

Portability improvements
Shell scripts often relies on many of those POSIX utilities, which are
not necessarily natively available on all platforms (most notably,
Windows), or may have more or less features depending on the
implementation.

Although C is not perfect portability-wise, it’s still better than
shell scripts. For instance, the resulting binaries will not
necessarily depend on third-party programs or libraries.


RISKS

Of course, rewriting a piece of software takes time, and can lead to
regressions (ie. new bugs). To mitigate that risk, I should understand
well the functions I want to rewrite, run tests on a regular basis and
write new if needed, and of course discuss about my code with the
community during reviews.


APPROXIMATIVE TIMELINE

Community bonding -- April 23, 2018 - May 14, 2018
During the community bonding, I would like to dive into git’s
codebase, and to understand what git-rebase--interactive does under
the hood. At the same time, I’d communicate with the community and my
mentor, seeking for clarifications, and asking questions about how
things should or should not be done.

Weeks 1 & 2 -- May 14, 2018 - May 27, 2018
/From May 14 to 18, I have exams at the university, so I won’t be able
to work full time./

I would search for edge cases not covered by current tests and write
some if needed.

Week 3 -- May 28, 2018 - June 3, 2018
At the same time, I would refactor --preserve-merges in its own
shell script (as described in Dscho’s email[1]), if it has
not been deprecated or moved in the meantime. This operation is not
really tricky by itself, as --preserve-merges is about only 50 lines
of code into git_rebase__interactive().

Weeks 4 to 7 -- May 4, 2018 - July 1, 2018
Then, I would start to incrementally rewrite
git-rebase--interactive.sh functions in C, and move them
git-rebase--helper.c (as in commits 0cce4a2756[2] (rebase -i
-x: add exec commands via the rebase--helper) and b903674b35[3]
(bisect--helper: `is_expected_rev` & `check_expected_revs` shell
function in C)).

There is a lot of functions into git-rebase--interactive.sh to
rewrite. Most of them are small, and some of them are even wrappers
for a single command (eg. is_merge_commit()), so they shouldn’t be
really problematic.

A couple of them are quite long (eg. pick_one()), and will probably
be even longer once rewritten in C due to the low-level nature of the
language. They also tend to depend a lot on other smaller functions.

The plan here would be to start rewriting the smaller functions when
applicable (ie. they’re not a simple command wrapper) before
working on the biggest of them.

Week 8 -- July 2, 2018 - July 8, 2018
When all majors functions from git-rebase--interactive.sh have been
rewritten in C, I would retire the script in favor of a builtin.

Weeks 9 & 10 -- July 9, 2018 - July 22, 2018
I plan to spend theses two weeks to improve the code coverage where
needed.

Weeks 11 & 12 -- July 23, 2018 - August 5, 2018
In the last two weeks, I would polish the code where needed, in order
to improve its performance or to make it more readable.

I also plan to use this time as a backup if I am late on my planning.


IF TIME PERMITS

 * I see there is a known breakage in test t3404. Maybe I could
   investigate and try to fix it.
 * Add an option to include the patch in the message of commits to be
   reworded, as proposed by Ævar Arnfjörð Bjarmason[4].


ABOUT ME

My name is Alban Gruin. I am an undergraduate at the Paul Sabatier
University in Toulouse, France, where I have been studying Computer
Sciences for a year and a half. My timezone currently is UTC+01:00,
but will be UTC+02:00 starting from March 25th, because of the
daylight saving time in Europe.

I have been programming in C for the last 5 years. I learned using
freely available resources online, and by attending class ever since
last year.

I am also quite familiar with shell scripts, and I have been using
git for the last 3 years.

My e-mail address is alban <dot> gruin <at> gmail <dot> com, my IRC
nick is abngrn.

My micro-project was "userdiff: add built-in pattern for
golang"[5][6].

---

You can find the Google Doc version here[7].


Regards,
Alban Gruin

[1]
https://public-inbox.org/git/alpine.DEB.2.20.1609021432070.129229@virtualbox/
[2] https://git.kernel.org/pub/scm/git/git.git/commit/?id=0cce4a2756
[3] https://git.kernel.org/pub/scm/git/git.git/commit/?id=b903674b35
[4] https://public-inbox.org/git/87in9ucsbb.fsf@evledraar.gmail.com/
[5]
https://public-inbox.org/git/20180228172906.30582-1-alban.gruin@gmail.com/
[6] https://git.kernel.org/pub/scm/git/git.git/commit/?id=1dbf0c0a
[7]
https://docs.google.com/document/d/1Jx0w867tVAht7QI1_prieiXg_iQ_nTloOyaIIOnm85g/edit?usp=sharing

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][GSoC] Project proposal: convert interactive rebase to C
  2018-03-22 22:03 ` Alban Gruin
@ 2018-03-24  7:43   ` Christian Couder
  2018-03-25  0:43   ` [RFC v3][GSoC] " Alban Gruin
  1 sibling, 0 replies; 8+ messages in thread
From: Christian Couder @ 2018-03-24  7:43 UTC (permalink / raw)
  To: Alban Gruin; +Cc: git, Johannes Schindelin

Hi,

On Thu, Mar 22, 2018 at 11:03 PM, Alban Gruin <alban.gruin@gmail.com> wrote:
> Hi,
>
> here is my second draft of my proposal. As last time, any feedback is
> welcome :)
>
> I did not write my phone number and address here for obvious reasons,
> but they will be in the “about me” section of the final proposal.
>
> Apart from that, do you think there is something to add?

Please take a look at the comments that have been made on other's
proposal. Many proposals could be improved in the same way and it is a
bit annoying for us to repeat the same things many times.

[...]

> The goal of this project is to rewrite git-rebase--interactive in C
> as it has been discussed on the git mailing list[1], for multiple
> reasons :

In general when the project or some issues related to the project have
already been worked on or discussed on the mailing list, it is a good
thing to summarize those discussions and link to them in your
proposal. It shows that you want to take the time to gather existing
information, to understand that information and to take it into
account in your proposal, and it can also make your proposal easier to
read and understand.

More specifically your proposal has some links which is nice, but I
think it would be better if it summarized a bit more what the links
contain.

[...]

> Weeks 1 & 2 -- May 14, 2018 - May 27, 2018
> /From May 14 to 18, I have exams at the university, so I won’t be able
> to work full time./
>
> I would search for edge cases not covered by current tests and write
> some if needed.
>
> Week 3 -- May 28, 2018 - June 3, 2018
> At the same time, I would refactor --preserve-merges in its own
> shell script (as described in Dscho’s email[1]), if it has
> not been deprecated or moved in the meantime.

Here for example it is better if we could get a better idea about how
you plan to do it without having to read Dscho's email.

> This operation is not
> really tricky by itself, as --preserve-merges is about only 50 lines
> of code into git_rebase__interactive().
>
> Weeks 4 to 7 -- May 4, 2018 - July 1, 2018
> Then, I would start to incrementally rewrite
> git-rebase--interactive.sh functions in C, and move them
> git-rebase--helper.c (as in commits 0cce4a2756[2] (rebase -i
> -x: add exec commands via the rebase--helper) and b903674b35[3]
> (bisect--helper: `is_expected_rev` & `check_expected_revs` shell
> function in C)).

I know what you mean but I would still appreciate if you could summarize it.

> There is a lot of functions into git-rebase--interactive.sh to
> rewrite. Most of them are small, and some of them are even wrappers
> for a single command (eg. is_merge_commit()), so they shouldn’t be
> really problematic.
>
> A couple of them are quite long (eg. pick_one()), and will probably
> be even longer once rewritten in C due to the low-level nature of the
> language. They also tend to depend a lot on other smaller functions.
>
> The plan here would be to start rewriting the smaller functions when
> applicable (ie. they’re not a simple command wrapper) before
> working on the biggest of them.
>
> Week 8 -- July 2, 2018 - July 8, 2018
> When all majors functions from git-rebase--interactive.sh have been
> rewritten in C, I would retire the script in favor of a builtin.
>
> Weeks 9 & 10 -- July 9, 2018 - July 22, 2018
> I plan to spend theses two weeks to improve the code coverage where
> needed.
>
> Weeks 11 & 12 -- July 23, 2018 - August 5, 2018
> In the last two weeks, I would polish the code where needed, in order
> to improve its performance or to make it more readable.

We like to have big improvements be split into batches of patches,
also called patch series, and polishing of the first batches happening
as soon as possible so that they can be ready to be merged soon. The
patch series that are sent to the mailing list often need a number of
round of reviews and improvements which can take a long time, so it is
better if this process starts as soon as possible.

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC v3][GSoC] Project proposal: convert interactive rebase to C
  2018-03-22 22:03 ` Alban Gruin
  2018-03-24  7:43   ` Christian Couder
@ 2018-03-25  0:43   ` Alban Gruin
  1 sibling, 0 replies; 8+ messages in thread
From: Alban Gruin @ 2018-03-25  0:43 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, Johannes Schindelin

Hi,
this is the first draft of my proposal.

---
ABSTRACT

git is a modular source control management software, and all of its
subcommands are programs on their own. A lot of them are written in C,
but a couple of them are shell or Perl scripts. This is the case of
=git-rebase--interactive= (or interactive rebase), which is a shell
script. Rewriting it in C would improve its performance, its
portability, and maybe its robustness.


ABOUT `git-rebase` AND `git-rebase--interactive`

git-rebase allows to re-apply changes on top of another branch. For
instance, when a local branch and a remote branch have diverged,
git-rebase can re-unify them, applying each change made on the
local branch on top of the remote branch.

git-rebase--interactive is used to reorganize commits by reordering,
rewording, or squashing them. To achieve this purpose, =git= opens the
list of commits to be modified in a text editor (hence the
interactivity), as well as the actions to be performed for each of
them.


PROJECT GOALS

Back in 2016, Johannes Schindelin discussed[1] about retiring
git-rebase.sh (described here as a “hacky shell script”) in favor of
a builtin. He explained that, as it’s only a command-line parser for
git-rebase--am.sh, git-rebase--interactive.sh, and
git-rebase--merge.sh, these 3 scripts should be rewritten first.

The goal of this project is to rewrite git-rebase--interactive.sh in
C, for multiple reasons :

Performance improvements
Shell scripts are inherently slow. When Johannes Schindelin began to
rewrite some parts of git-rebase--interactive in C, its performance
increased from 3 to 5 times, depending on the platform[2].

That’s because each command is a program by itself. So, for each
command, the shell interpreter has to spawn a new process and to load
a new program (with fork() and exec() syscalls), which is an
expensive process.

Those commands can be other git commands. Sometimes, they are
wrappers to call internal C functions (eg. git-rebase--helper),
something shell scripts can’t do natively. These wrappers basically
parse the parameters, then start the appropriate function, which is
obviously slower than just calling a function from C.

Other commands can be POSIX utilities (eg. sed, cut, etc.). They
have their own problems (speed aside), namely portability.

Portability improvements
Shell scripts often relies on many of those POSIX utilities, which are
not necessarily natively available on all platforms (most notably,
Windows), or may have more or less features depending on the
implementation.

Although C is not perfect portability-wise, it’s still better than
shell scripts. For instance, the resulting binaries will not
necessarily depend on third-party programs or libraries.


RISKS

Of course, rewriting a piece of software takes time, and can lead to
regressions (ie. new bugs). To mitigate that risk, I should understand
well the functions I want to rewrite, run tests on a regular basis and
write new if needed, and of course discuss about my code with the
community during reviews.


APPROXIMATIVE TIMELINE

Normally, I would be able to work 35 to 40 hours a week. When I have
courses or exams at university, I could work between 20 and 25 hours a
week.

Community bonding --- April 23, 2018 -- May 14, 2018
/I’ll still have courses at the university during this period./

During the community bonding, I would like to dive into git’s
codebase, and to understand what git-rebase--interactive does under
the hood. At the same time, I’d communicate with the community and my
mentor, seeking for clarifications, and asking questions about how
things should or should not be done.

Weeks 1 & 2 --- May 14, 2018 -- May 27, 2018
/From May 14 to 18, I have exams at the university, so I won’t be able
to work full time./

I would search for edge cases not covered by current tests and write
some if needed.

Week 3 --- May 28, 2018 -- June 3, 2018
At the same time, I would refactor --preserve-merges in its own
shell script, if it has not been deprecated or moved in the
meantime. Dscho explained that this would be the first step of the
conversion[1]. This operation is not really tricky by itself,
as --preserve-merges is about only 50 lines of code into
git_rebase__interactive(), plus some specific functions
(eg. pick_one()).

Weeks 4 to 7 --- June 4, 2018 -- July 1, 2018
Then, I would start to incrementally rewrite
git-rebase--interactive.sh functions in C, and move them
git-rebase--helper.c. Newly-rewritten C functions are then
associated to command-line parameters to be able to use them from
shell scripts.

Examples of such conversion can be found in commits
0cce4a2756[3] (rebase -i -x: add exec commands via the
rebase--helper) and b903674b35[4] (bisect--helper:
`is_expected_rev` & `check_expected_revs` shell function in C).

There is a lot of functions into git-rebase--interactive.sh to
rewrite. Most of them are small, and some of them are even wrappers
for a single command (eg. do_next()), so they shouldn’t be really
problematic.

A couple of them are quite long (eg. pick_one()), and will probably
be even longer once rewritten in C due to the low-level nature of the
language. They also tend to depend a lot on other smaller functions.

The plan here would be to start rewriting the smaller functions when
applicable (ie. they’re not a simple command wrapper) before
working on the biggest of them.

Of course, rewriting a function should not cause any breakage in the
test suite.

Week 8 --- July 2, 2018 -- July 8, 2018
Then, I plan to polish the new code, in order to improve its
performance or to make it more readable.

Benchmarking will be done using the existing script in t/perf.

Weeks 9 & 10 --- July 9, 2018 -- July 22, 2018
When all majors functions from git-rebase--interactive.sh have been
rewritten in C, I would retire the script in favor of a builtin.

Weeks 11 & 12 --- July 23, 2018 -- August 5, 2018
In the last two weeks, I would improve the code coverage where needed.

I also plan to use this time as a backup if I am late on my planning.


IF TIMES PERMITS

 * Add an option to include the patch in the message of commits to be
   reworded, as proposed by Ævar Arnfjörð Bjarmason[5].


COMMUNICATING ABOUT MY WORK

I will communicate every week about the state of my work on the
mailing list.


ABOUT ME

My name is Alban Gruin. I am an undergraduate at the Paul Sabatier
University in Toulouse, France, where I have been studying Computer
Sciences for a year and a half. My timezone is UTC+02:00.

I have been programming in C for the last 5 years. I learned using
freely available resources online, and by attending class ever since
last year.

I am also quite familiar with shell scripts, and I have been using
git for the last 3 years.

My e-mail address is alban <dot> gruin <at> gmail <dot> com, my IRC
nick is abngrn.

My micro-project was “userdiff: add built-in pattern for
golang”[6][7].
---

You can find the Google Doc version here[8].

Regards,
Alban Gruin

[1]
https://public-inbox.org/git/alpine.DEB.2.20.1609021432070.129229@virtualbox/
[2]
https://public-inbox.org/git/cover.1472457609.git.johannes.schindelin@gmx.de/
[3] https://git.kernel.org/pub/scm/git/git.git/commit/?id=0cce4a2756
[4] https://git.kernel.org/pub/scm/git/git.git/commit/?id=b903674b35
[5] https://public-inbox.org/git/87in9ucsbb.fsf@evledraar.gmail.com/
[6]
https://public-inbox.org/git/20180228172906.30582-1-alban.gruin@gmail.com/
[7] https://git.kernel.org/pub/scm/git/git.git/commit/?id=1dbf0c0a
[8]
https://docs.google.com/document/d/1Jx0w867tVAht7QI1_prieiXg_iQ_nTloOyaIIOnm85g/edit?usp=sharing

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-03-25  0:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-17 19:14 [RFC][GSoC] Project proposal: convert interactive rebase to C Alban Gruin
2018-03-17 20:29 ` Christian Couder
2018-03-20 16:29 ` Johannes Schindelin
2018-03-21 11:04   ` Alban Gruin
2018-03-21 23:51     ` Johannes Schindelin
2018-03-22 22:03 ` Alban Gruin
2018-03-24  7:43   ` Christian Couder
2018-03-25  0:43   ` [RFC v3][GSoC] " Alban Gruin

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).