git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / Atom feed
* Git in Outreachy December 2019?
@ 2019-08-27  5:17 Jeff King
  2019-08-31  7:58 ` Christian Couder
                   ` (2 more replies)
  0 siblings, 3 replies; 62+ messages in thread
From: Jeff King @ 2019-08-27  5:17 UTC (permalink / raw)
  To: git; +Cc: Olga Telezhnaya, Christian Couder, Elijah Newren, Thomas Gummerer

Do we have interested mentors for the next round of Outreachy?

The deadline for Git to apply to the program is September 5th. The
deadline for mentors to have submitted project descriptions is September
24th. Intern applications would start on October 1st.

If there are mentors who want to participate, I can handle the project
application and can start asking around for funding.

-Peff

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-08-27  5:17 Git in Outreachy December 2019? Jeff King
@ 2019-08-31  7:58 ` Christian Couder
  2019-08-31 19:44   ` Olga Telezhnaya
  2019-09-04 19:41 ` Jeff King
  2019-09-13 20:03 ` Jonathan Tan
  2 siblings, 1 reply; 62+ messages in thread
From: Christian Couder @ 2019-08-31  7:58 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Olga Telezhnaya, Elijah Newren, Thomas Gummerer

On Tue, Aug 27, 2019 at 7:17 AM Jeff King <peff@peff.net> wrote:
>
> Do we have interested mentors for the next round of Outreachy?

I am interested to co-mentor.

> The deadline for Git to apply to the program is September 5th. The
> deadline for mentors to have submitted project descriptions is September
> 24th. Intern applications would start on October 1st.
>
> If there are mentors who want to participate, I can handle the project
> application and can start asking around for funding.

That would be really nice, thank you!

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-08-31  7:58 ` Christian Couder
@ 2019-08-31 19:44   ` Olga Telezhnaya
  0 siblings, 0 replies; 62+ messages in thread
From: Olga Telezhnaya @ 2019-08-31 19:44 UTC (permalink / raw)
  To: Christian Couder; +Cc: Jeff King, git, Elijah Newren, Thomas Gummerer

сб, 31 авг. 2019 г. в 10:58, Christian Couder <christian.couder@gmail.com>:
>
> On Tue, Aug 27, 2019 at 7:17 AM Jeff King <peff@peff.net> wrote:
> >
> > Do we have interested mentors for the next round of Outreachy?
>
> I am interested to co-mentor.

I am not ready to give the answer right now, but I will definitely
think about the project proposals, I want to help with that part. By
the way, we can take some projects (with description) from this summer
round of GSoC. 3 of 4 projects were not selected.

>
> > The deadline for Git to apply to the program is September 5th. The
> > deadline for mentors to have submitted project descriptions is September
> > 24th. Intern applications would start on October 1st.
> >
> > If there are mentors who want to participate, I can handle the project
> > application and can start asking around for funding.
>
> That would be really nice, thank you!

Thank you!

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-08-27  5:17 Git in Outreachy December 2019? Jeff King
  2019-08-31  7:58 ` Christian Couder
@ 2019-09-04 19:41 ` Jeff King
  2019-09-05  7:24   ` Christian Couder
                     ` (3 more replies)
  2019-09-13 20:03 ` Jonathan Tan
  2 siblings, 4 replies; 62+ messages in thread
From: Jeff King @ 2019-09-04 19:41 UTC (permalink / raw)
  To: git
  Cc: Olga Telezhnaya, Christian Couder, Elijah Newren,
	Thomas Gummerer, Matheus Tavares Bernardino

On Tue, Aug 27, 2019 at 01:17:57AM -0400, Jeff King wrote:

> Do we have interested mentors for the next round of Outreachy?
> 
> The deadline for Git to apply to the program is September 5th. The
> deadline for mentors to have submitted project descriptions is September
> 24th. Intern applications would start on October 1st.
> 
> If there are mentors who want to participate, I can handle the project
> application and can start asking around for funding.

Funding is still up in the air, but in the meantime I've tentatively
signed us up (we have until the 24th to have the funding committed).
Next we need mentors to submit projects, as well as first-time
contribution micro-projects.

Project proposals can be made here:

  https://www.outreachy.org/communities/cfp/git/

If you want to know more about the program, there's a mentor FAQ here:

  https://www.outreachy.org/mentor/mentor-faq/

or just ask in this thread.

The project page has a section to point people in the right direction
for first-time contributions. I've left it blank for now, but I think it
makes sense to point one (or both) of:

  - https://git-scm.com/docs/MyFirstContribution

  - https://matheustavares.gitlab.io/posts/first-steps-contributing-to-git

as well as a list of micro-projects (or at least instructions on how to
find #leftoverbits, though we'd definitely have to step up our labeling,
as I do not recall having seen one for a while).

-Peff

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-04 19:41 ` Jeff King
@ 2019-09-05  7:24   ` Christian Couder
  2019-09-05 19:39   ` Emily Shaffer
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 62+ messages in thread
From: Christian Couder @ 2019-09-05  7:24 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Olga Telezhnaya, Elijah Newren, Thomas Gummerer,
	Matheus Tavares Bernardino

On Wed, Sep 4, 2019 at 9:41 PM Jeff King <peff@peff.net> wrote:
>
> Funding is still up in the air, but in the meantime I've tentatively
> signed us up (we have until the 24th to have the funding committed).
> Next we need mentors to submit projects, as well as first-time
> contribution micro-projects.

Great! Thanks for signing up and for all the information!

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-04 19:41 ` Jeff King
  2019-09-05  7:24   ` Christian Couder
@ 2019-09-05 19:39   ` Emily Shaffer
  2019-09-06 11:55     ` Carlo Arenas
  2019-09-07  6:36     ` Jeff King
  2019-09-08 14:56   ` Pratyush Yadav
  2019-09-23 18:07   ` SZEDER Gábor
  3 siblings, 2 replies; 62+ messages in thread
From: Emily Shaffer @ 2019-09-05 19:39 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Olga Telezhnaya, Christian Couder, Elijah Newren,
	Thomas Gummerer, Matheus Tavares Bernardino

On Wed, Sep 04, 2019 at 03:41:15PM -0400, Jeff King wrote:
> On Tue, Aug 27, 2019 at 01:17:57AM -0400, Jeff King wrote:
> 
> > Do we have interested mentors for the next round of Outreachy?
> > 
> > The deadline for Git to apply to the program is September 5th. The
> > deadline for mentors to have submitted project descriptions is September
> > 24th. Intern applications would start on October 1st.
> > 
> > If there are mentors who want to participate, I can handle the project
> > application and can start asking around for funding.
> 
> Funding is still up in the air, but in the meantime I've tentatively
> signed us up (we have until the 24th to have the funding committed).
> Next we need mentors to submit projects, as well as first-time
> contribution micro-projects.

I'm interested to mentor too, but I haven't done anything like this -
official mentoring, intern hosting, anything - so I will need to learn
:)

 - Emily

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-05 19:39   ` Emily Shaffer
@ 2019-09-06 11:55     ` Carlo Arenas
  2019-09-07  6:39       ` Jeff King
  2019-09-07  6:36     ` Jeff King
  1 sibling, 1 reply; 62+ messages in thread
From: Carlo Arenas @ 2019-09-06 11:55 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: Jeff King, git, Olga Telezhnaya, Christian Couder, Elijah Newren,
	Thomas Gummerer, Matheus Tavares Bernardino

I'm interested to mentor/help too, but I am definitely not a (some
people would even argue against "reliable") contributor but I might be
better than nothing and could pass my "lessons learned" along, so
hopefully next contributors are less of a pain to deal with than I am

Carlo

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-05 19:39   ` Emily Shaffer
  2019-09-06 11:55     ` Carlo Arenas
@ 2019-09-07  6:36     ` Jeff King
  1 sibling, 0 replies; 62+ messages in thread
From: Jeff King @ 2019-09-07  6:36 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: git, Olga Telezhnaya, Christian Couder, Elijah Newren,
	Thomas Gummerer, Matheus Tavares Bernardino

On Thu, Sep 05, 2019 at 12:39:59PM -0700, Emily Shaffer wrote:

> > Funding is still up in the air, but in the meantime I've tentatively
> > signed us up (we have until the 24th to have the funding committed).
> > Next we need mentors to submit projects, as well as first-time
> > contribution micro-projects.
> 
> I'm interested to mentor too, but I haven't done anything like this -
> official mentoring, intern hosting, anything - so I will need to learn
> :)

Great! I think a big challenge is coming up with an appropriately scoped
project for the intern. You can take a look at previous Outreachy and
GSoC proposals[1] to get the general idea, and then think about some
area you feel comfortable working in. You're welcome to send ideas or
drafts to the list to get feedback.

If you're feeling overwhelmed (or even if you're not), it might make
sense to try to partner with somebody else as a co-mentor, especially if
they've participated in the past. (It might also be that you can offer
to co-mentor with somebody else's project proposal).

-Peff

[1] Older materials can be found at https://git.github.io/, though
    these days the proposed projects go directly into the Outreachy
    system.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-06 11:55     ` Carlo Arenas
@ 2019-09-07  6:39       ` Jeff King
  2019-09-07 10:13         ` Carlo Arenas
  0 siblings, 1 reply; 62+ messages in thread
From: Jeff King @ 2019-09-07  6:39 UTC (permalink / raw)
  To: Carlo Arenas
  Cc: Emily Shaffer, git, Olga Telezhnaya, Christian Couder,
	Elijah Newren, Thomas Gummerer, Matheus Tavares Bernardino

On Fri, Sep 06, 2019 at 04:55:46AM -0700, Carlo Arenas wrote:

> I'm interested to mentor/help too, but I am definitely not a (some
> people would even argue against "reliable") contributor but I might be
> better than nothing and could pass my "lessons learned" along, so
> hopefully next contributors are less of a pain to deal with than I am

I just wrote a response to Emily, but I think a lot of it applies to
you, as well.

In particular, I think both of you are a bit newer to the project than
most of the other people who have mentored in the past. In some ways
that may be a good thing, as it likely makes it easier to see things
from the intern's perspective. :) But it may also introduce some
complications if you're working in an area of the code you're not
familiar with. Co-mentoring may help with that.

-Peff

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-07  6:39       ` Jeff King
@ 2019-09-07 10:13         ` Carlo Arenas
  0 siblings, 0 replies; 62+ messages in thread
From: Carlo Arenas @ 2019-09-07 10:13 UTC (permalink / raw)
  To: Jeff King
  Cc: Emily Shaffer, git, Olga Telezhnaya, Christian Couder,
	Elijah Newren, Thomas Gummerer, Matheus Tavares Bernardino

On Fri, Sep 6, 2019 at 11:40 PM Jeff King <peff@peff.net> wrote:
>
> I just wrote a response to Emily, but I think a lot of it applies to
> you, as well.

With the exception of course that Emily can definitely write better
code than my attempted hacks

> In particular, I think both of you are a bit newer to the project than
> most of the other people who have mentored in the past. In some ways
> that may be a good thing, as it likely makes it easier to see things
> from the intern's perspective. :) But it may also introduce some
> complications if you're working in an area of the code you're not
> familiar with. Co-mentoring may help with that.

agree, and glad to help any way I can, specially if I can help offset
some of the boring work required that would free resources somewhere
else; then again by replying to Emily, I didn't meant we were both in
the same level both technically and resource wise (after all she wrote
the documentation on how to contribute and I only found a silly bug on
her code that will prevent really obsolete systems to compile and
didn't even wrote the patch)

Carlo

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-04 19:41 ` Jeff King
  2019-09-05  7:24   ` Christian Couder
  2019-09-05 19:39   ` Emily Shaffer
@ 2019-09-08 14:56   ` Pratyush Yadav
  2019-09-09 17:00     ` Jeff King
  2019-09-23 18:07   ` SZEDER Gábor
  3 siblings, 1 reply; 62+ messages in thread
From: Pratyush Yadav @ 2019-09-08 14:56 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Hi Jeff,

On 04/09/19 03:41PM, Jeff King wrote:
[snip]
> The project page has a section to point people in the right direction
> for first-time contributions. I've left it blank for now, but I think it
> makes sense to point one (or both) of:
> 
>   - https://git-scm.com/docs/MyFirstContribution
> 
>   - https://matheustavares.gitlab.io/posts/first-steps-contributing-to-git
> 
> as well as a list of micro-projects (or at least instructions on how to
> find #leftoverbits, though we'd definitely have to step up our labeling,
> as I do not recall having seen one for a while).

I'd like to put out a proposal regarding first contributions and micro 
projects.

I have a small list of small isolated features and bug fixes that
_I think_ git-gui would benefit with. And other people using it can 
probably add their pet peeves and issues as well. My question is, are 
these something new contributors should try to work on as an 
introduction to the community? Since most of these features and fixes 
are small and isolated, they should be pretty easy to work on. And I 
think people generally find UI apps a little easier to work on.

But I'll play the devil's advocate on my proposal and point out some 
problems/flaws:
- Git-gui is written in Tcl, and git in C (and other languages too, but 
  not Tcl). That means while people do get a feel of the community and 
  general workflow, they don't necessarily get a feel of the actual git 
  internal codebase.
- Since I don't see a git-gui related project worth being into the 
  Outreachy program, it essentially means they will likely not work on 
  anything related to their project.
- Git-gui is essentially a wrapper on top of git, so people won't get 
  exposure to the git internals.

I'd like to hear your and the rest of the community's thoughts about 
this proposal, and whether it will be a good idea or not.

If people do like this idea, I can do a write up on "things to fix in 
git-gui" that people can add to (and they get a chance to call me stupid 
for even thinking feature X is a good idea ;)).

-- 
Regards,
Pratyush Yadav

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-08 14:56   ` Pratyush Yadav
@ 2019-09-09 17:00     ` Jeff King
  0 siblings, 0 replies; 62+ messages in thread
From: Jeff King @ 2019-09-09 17:00 UTC (permalink / raw)
  To: Pratyush Yadav; +Cc: git

On Sun, Sep 08, 2019 at 08:26:10PM +0530, Pratyush Yadav wrote:

> I'd like to put out a proposal regarding first contributions and micro 
> projects.
> 
> I have a small list of small isolated features and bug fixes that
> _I think_ git-gui would benefit with. And other people using it can 
> probably add their pet peeves and issues as well. My question is, are 
> these something new contributors should try to work on as an 
> introduction to the community? Since most of these features and fixes 
> are small and isolated, they should be pretty easy to work on. And I 
> think people generally find UI apps a little easier to work on.
> 
> But I'll play the devil's advocate on my proposal and point out some 
> problems/flaws:
> - Git-gui is written in Tcl, and git in C (and other languages too, but 
>   not Tcl). That means while people do get a feel of the community and 
>   general workflow, they don't necessarily get a feel of the actual git 
>   internal codebase.
> - Since I don't see a git-gui related project worth being into the 
>   Outreachy program, it essentially means they will likely not work on 
>   anything related to their project.
> - Git-gui is essentially a wrapper on top of git, so people won't get 
>   exposure to the git internals.
> 
> I'd like to hear your and the rest of the community's thoughts about 
> this proposal, and whether it will be a good idea or not.

Right, I came up with similar devil's advocate arguments. :) I'm not
totally opposed, because part of the point of these microprojects just
getting people familiar with interacting with the community and
submitting a patch. They're not always in the same area the intern
intends to work, just because there's not always a trivial problem to be
solved there.

So we do look at it mostly as a "can you do this basic test" test, and
not necessarily as a prelude to the project.

But it would be nice if it were at least in the same _language_ that the
ultimate project will be done in. Because we're evaluating the
applicant's ability to write code in that language, too.

So I dunno. I am on the fence.

-Peff

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-08-27  5:17 Git in Outreachy December 2019? Jeff King
  2019-08-31  7:58 ` Christian Couder
  2019-09-04 19:41 ` Jeff King
@ 2019-09-13 20:03 ` Jonathan Tan
  2019-09-13 20:51   ` Jeff King
  2 siblings, 1 reply; 62+ messages in thread
From: Jonathan Tan @ 2019-09-13 20:03 UTC (permalink / raw)
  To: peff; +Cc: git, Jonathan Tan

> Do we have interested mentors for the next round of Outreachy?
> 
> The deadline for Git to apply to the program is September 5th. The
> deadline for mentors to have submitted project descriptions is September
> 24th. Intern applications would start on October 1st.
> 
> If there are mentors who want to participate, I can handle the project
> application and can start asking around for funding.

I probably should have replied earlier, but if Git has applied to the
program, feel free to include me as a mentor.

There was a discussion about mentors/co-mentors possibly working in a
part of a codebase that they are not familiar with [1] - firstly, I
think that's possible and even likely for most of us. :-) If any
question arises, maybe it would be sufficient for the mentors to just
help formulate the question (or pose the question themselves) to the
mailing list. If "[Outreachy]" appears in the subject, I'll make it a
higher priority for myself to answer those.

[1] https://public-inbox.org/git/20190907063958.GB28860@sigill.intra.peff.net/

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-13 20:03 ` Jonathan Tan
@ 2019-09-13 20:51   ` Jeff King
  2019-09-16 18:42     ` Emily Shaffer
  2019-09-20 17:04     ` Jonathan Tan
  0 siblings, 2 replies; 62+ messages in thread
From: Jeff King @ 2019-09-13 20:51 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Fri, Sep 13, 2019 at 01:03:17PM -0700, Jonathan Tan wrote:

> > Do we have interested mentors for the next round of Outreachy?
> > 
> > The deadline for Git to apply to the program is September 5th. The
> > deadline for mentors to have submitted project descriptions is September
> > 24th. Intern applications would start on October 1st.
> > 
> > If there are mentors who want to participate, I can handle the project
> > application and can start asking around for funding.
> 
> I probably should have replied earlier, but if Git has applied to the
> program, feel free to include me as a mentor.

Great!  See my followup here:

  https://public-inbox.org/git/20190904194114.GA31398@sigill.intra.peff.net/

Prospective mentors need to sign up on that site, and should propose a
project they'd be willing to mentor.

> There was a discussion about mentors/co-mentors possibly working in a
> part of a codebase that they are not familiar with [1] - firstly, I
> think that's possible and even likely for most of us. :-) If any
> question arises, maybe it would be sufficient for the mentors to just
> help formulate the question (or pose the question themselves) to the
> mailing list. If "[Outreachy]" appears in the subject, I'll make it a
> higher priority for myself to answer those.

I do think it's OK for mentors to not be intimately familiar with the
part of the code that is being touched, as long as the project is simple
enough that they can pick up the technical details easily as-needed. A
lot of what mentors will help mentees with is the overall process (both
Git-specific parts, but also more general development issues). But I
think the proposed projects do need to be feasible.

I'm happy to discuss possible projects if anybody has an idea but isn't
sure how to develop it into a proposal.

-Peff

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-13 20:51   ` Jeff King
@ 2019-09-16 18:42     ` Emily Shaffer
  2019-09-16 21:33       ` Eric Wong
                         ` (2 more replies)
  2019-09-20 17:04     ` Jonathan Tan
  1 sibling, 3 replies; 62+ messages in thread
From: Emily Shaffer @ 2019-09-16 18:42 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Tan, git

On Fri, Sep 13, 2019 at 04:51:49PM -0400, Jeff King wrote:
> On Fri, Sep 13, 2019 at 01:03:17PM -0700, Jonathan Tan wrote:
> 
> > > Do we have interested mentors for the next round of Outreachy?
> > > 
> > > The deadline for Git to apply to the program is September 5th. The
> > > deadline for mentors to have submitted project descriptions is September
> > > 24th. Intern applications would start on October 1st.
> > > 
> > > If there are mentors who want to participate, I can handle the project
> > > application and can start asking around for funding.
> > 
> > I probably should have replied earlier, but if Git has applied to the
> > program, feel free to include me as a mentor.
> 
> Great!  See my followup here:
> 
>   https://public-inbox.org/git/20190904194114.GA31398@sigill.intra.peff.net/
> 
> Prospective mentors need to sign up on that site, and should propose a
> project they'd be willing to mentor.
> 
> > There was a discussion about mentors/co-mentors possibly working in a
> > part of a codebase that they are not familiar with [1] - firstly, I
> > think that's possible and even likely for most of us. :-) If any
> > question arises, maybe it would be sufficient for the mentors to just
> > help formulate the question (or pose the question themselves) to the
> > mailing list. If "[Outreachy]" appears in the subject, I'll make it a
> > higher priority for myself to answer those.
> 
> I do think it's OK for mentors to not be intimately familiar with the
> part of the code that is being touched, as long as the project is simple
> enough that they can pick up the technical details easily as-needed. A
> lot of what mentors will help mentees with is the overall process (both
> Git-specific parts, but also more general development issues). But I
> think the proposed projects do need to be feasible.
> 
> I'm happy to discuss possible projects if anybody has an idea but isn't
> sure how to develop it into a proposal.

Hi Peff,

Jonathan Tan, Jonathan Nieder, Josh Steadmon and I met on Friday to talk
about projects and we came up with a trimmed list; not sure what more
needs to be done to make them into fully-fledged proposals.

For starter microprojects, we came up with:

 - cleanup a test script (although we need to identify particularly
   which ones and what counts as "clean")
 - moving doc from documentation/technical/api-* to comments in the
   appropriate header instead
 - teach a command which currently handles its own argv how to use
   parse-options instead
 - add a user.timezone option which Git can use if present rather than
   checking system local time

For the longer projects, we came up with a few more:

 - find places where we can pass in the_repository as arg instead of
   using global the_repository
 - convert sh/pl commands to C, including:
   - git-submodules.sh
   - git-bisect.sh
   - rebase --preserve-merges
   - add -i
   (We were afraid this might be too boring, though.)
 - reduce/eliminate use of fetch_if_missing global
 - create a better difftool/mergetool for format of choice (this one
   ends up existing outside of the Git codebase, but still may be pretty
   adjacent and big impact)
 - training wheels/intro/tutorial mode? (We thought it may be useful to
   make available a very basic "I just want to make a single PR and not
   learn graph theory" mode, toggled by config switch)
 - "did you mean?" for common use cases, e.g. commit with a dirty
   working tree and no staged files - either offer a hint or offer a
   prompt to continue ("Stage changed files and commit? [Y/n]")
 - new `git partial-clone` command to interactively set a filter,
   configure other partial clone settings
 - add progress bars in various situations
 - add a TUI to deal more easily with the mailing list. Jonathan Tan has
   a strong idea of what this TUI would do... This one would also end up
   external but adjacent to the Git codebase.
 - try and make progress towards running many tests from a single test
   file in parallel - maybe this is too big, I'm not sure if we know how
   many of our tests are order-dependent within a file for now...

It might make sense to only focus on scoping the ones we feel most
interested in. We came up with a pretty big list because we had some
other programs in mind, so I suppose it's not necessary to develop all
of them for this program.

 - Emily

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-16 18:42     ` Emily Shaffer
@ 2019-09-16 21:33       ` Eric Wong
  2019-09-16 21:44       ` SZEDER Gábor
  2019-09-17 11:23       ` Johannes Schindelin
  2 siblings, 0 replies; 62+ messages in thread
From: Eric Wong @ 2019-09-16 21:33 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Jeff King, Jonathan Tan, git, Konstantin Ryabitsev

Emily Shaffer <emilyshaffer@google.com> wrote:
> Jonathan Tan, Jonathan Nieder, Josh Steadmon and I met on Friday to talk
> about projects and we came up with a trimmed list; not sure what more
> needs to be done to make them into fully-fledged proposals.

<snip>

> For the longer projects, we came up with a few more:

<snip>

>  - add a TUI to deal more easily with the mailing list. Jonathan Tan has
>    a strong idea of what this TUI would do... This one would also end up
>    external but adjacent to the Git codebase.

AFAIK, Konstantin is/was interested in exploring some of these
ideas with Linux Foundation, too (but he's on vacation atm)

<snip>

> It might make sense to only focus on scoping the ones we feel most
> interested in. We came up with a pretty big list because we had some
> other programs in mind, so I suppose it's not necessary to develop all
> of them for this program.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-16 18:42     ` Emily Shaffer
  2019-09-16 21:33       ` Eric Wong
@ 2019-09-16 21:44       ` SZEDER Gábor
  2019-09-16 23:13         ` Jonathan Nieder
  2019-09-17 11:23       ` Johannes Schindelin
  2 siblings, 1 reply; 62+ messages in thread
From: SZEDER Gábor @ 2019-09-16 21:44 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Jeff King, Jonathan Tan, git

On Mon, Sep 16, 2019 at 11:42:08AM -0700, Emily Shaffer wrote:
>  - try and make progress towards running many tests from a single test
>    file in parallel - maybe this is too big, I'm not sure if we know how
>    many of our tests are order-dependent within a file for now...

Forget it, too many (most?) of them are order-dependent.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-16 21:44       ` SZEDER Gábor
@ 2019-09-16 23:13         ` Jonathan Nieder
  2019-09-17  0:59           ` Jeff King
  0 siblings, 1 reply; 62+ messages in thread
From: Jonathan Nieder @ 2019-09-16 23:13 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Emily Shaffer, Jeff King, Jonathan Tan, git

SZEDER Gábor wrote:
> On Mon, Sep 16, 2019 at 11:42:08AM -0700, Emily Shaffer wrote:

>>  - try and make progress towards running many tests from a single test
>>    file in parallel - maybe this is too big, I'm not sure if we know how
>>    many of our tests are order-dependent within a file for now...
>
> Forget it, too many (most?) of them are order-dependent.

Hm, I remember a conversation about this with Thomas Rast a while ago.
It seemed possible at the time.

Most tests use "setup" or "set up" in the names of test assertions
that are required by later tests.  It's very helpful for debugging and
maintenance to be able to skip or reorder some tests, so I've been
able to rely on this a bit.  Of course there's no automated checking
in place for that, so there are plenty of test scripts that are
exceptions to it.

If we introduce a test_setup helper, then we would not have to rely on
convention any more.  A test_setup test assertion would represent a
"barrier" that all later tests in the file can rely on.  We could
introduce some automated checking that these semantics are respected,
and then we get a maintainability improvement in every test script
that uses test_setup.  (In scripts without any test_setup, treat all
test assertions as barriers since they haven't been vetted.)

With such automated tests in place, we can then try updating all tests
that say "setup" or "set up" to use test_setup and see what fails.

Some other tests cannot run in parallel for other reasons (e.g. HTTP
tests).  These can be declared as such, and then we have the ability
to run arbitrary individual tests in parallel.

Most of the time in a test run involves multiple test scripts running
in parallel already, so this isn't a huge win for the time to complete
a normal test run.  It helps more with expensive runs like --valgrind.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-16 23:13         ` Jonathan Nieder
@ 2019-09-17  0:59           ` Jeff King
  0 siblings, 0 replies; 62+ messages in thread
From: Jeff King @ 2019-09-17  0:59 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: SZEDER Gábor, Emily Shaffer, Jonathan Tan, git

On Mon, Sep 16, 2019 at 04:13:49PM -0700, Jonathan Nieder wrote:

> Most tests use "setup" or "set up" in the names of test assertions
> that are required by later tests.  It's very helpful for debugging and
> maintenance to be able to skip or reorder some tests, so I've been
> able to rely on this a bit.  Of course there's no automated checking
> in place for that, so there are plenty of test scripts that are
> exceptions to it.
> 
> If we introduce a test_setup helper, then we would not have to rely on
> convention any more.  A test_setup test assertion would represent a
> "barrier" that all later tests in the file can rely on.  We could
> introduce some automated checking that these semantics are respected,
> and then we get a maintainability improvement in every test script
> that uses test_setup.  (In scripts without any test_setup, treat all
> test assertions as barriers since they haven't been vetted.)
> 
> With such automated tests in place, we can then try updating all tests
> that say "setup" or "set up" to use test_setup and see what fails.
> 
> Some other tests cannot run in parallel for other reasons (e.g. HTTP
> tests).  These can be declared as such, and then we have the ability
> to run arbitrary individual tests in parallel.

This isn't quite the same, but couldn't we get most of the gain just by
splitting the tests into more scripts? As you note, we already run those
in parallel, so it increases the granularity of our parallelism. And you
don't have to worry about skipping tests 1 through 18 if they're in
another file; you just don't consider them at all. It also Just Works
with things like HTTP, which choose ports under the assumption that the
other tests are running simultaneously.

It doesn't help with the case that test 1 does setup, and then tests 2,
3, and 4 are logically independent (and some could be skipped or not).

If anybody is interested in splitting up scripts, the obvious ones to
look at are the ones that take the longest (t9001 takes 55s on my
system, though the whole suite runs in only 95s). Of course you can get
most of the parallelism benefit by using "prove --state=slow,save",
which ends up with lots of short scripts at the end (rather than one
slow one chewing one CPU while the rest sit idle).

> Most of the time in a test run involves multiple test scripts running
> in parallel already, so this isn't a huge win for the time to complete
> a normal test run.  It helps more with expensive runs like --valgrind.

Two easier suggestions than trying to make --valgrind faster:

  - use SANITIZE=address, which is way cheaper than valgrind (and
    catches more things!)

  - use --valgrind-only=17 to run everything else in "fast" mode, but
    check the test you care about

-Peff

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-16 18:42     ` Emily Shaffer
  2019-09-16 21:33       ` Eric Wong
  2019-09-16 21:44       ` SZEDER Gábor
@ 2019-09-17 11:23       ` Johannes Schindelin
  2019-09-17 12:02         ` SZEDER Gábor
                           ` (3 more replies)
  2 siblings, 4 replies; 62+ messages in thread
From: Johannes Schindelin @ 2019-09-17 11:23 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Jeff King, Jonathan Tan, git

Hi Emily,

On Mon, 16 Sep 2019, Emily Shaffer wrote:

> Jonathan Tan, Jonathan Nieder, Josh Steadmon and I met on Friday to
> talk about projects and we came up with a trimmed list; not sure what
> more needs to be done to make them into fully-fledged proposals.

Thank you for doing this!

> For starter microprojects, we came up with:
>
>  - cleanup a test script (although we need to identify particularly
>    which ones and what counts as "clean")
>  - moving doc from documentation/technical/api-* to comments in the
>    appropriate header instead
>  - teach a command which currently handles its own argv how to use
>    parse-options instead
>  - add a user.timezone option which Git can use if present rather than
>    checking system local time

Nice projects, all. There are a couple more ideas on
https://github.com/gitgitgadget/git/issues, they could probably use some
tagging.

> For the longer projects, we came up with a few more:
>
>  - find places where we can pass in the_repository as arg instead of
>    using global the_repository

Good project, if a bit boring ;-) Also, `the_index` is used a lot in
`builtin/*.c`, still.

>  - convert sh/pl commands to C, including:
>    - git-submodules.sh

I am of two minds there. But mostly, I am of the "friends don't let
friends use submodules" camp, so I would not even want to mentor for
this project: it just makes me shudder too much every time I have to
work with/on submodules.

Of course, if others are interested, I'd hardly object to turn this into
a built-in.

>    - git-bisect.sh

That would be my top recommendation, especially given how much effort
Tanushree put in last winter to make this conversion to C so much more
achievable than before.

>    - rebase --preserve-merges

No. `rebase -p` is already deprecated in favor of `rebase -r` (which
_is_ already built-in).

I already have patches lined up to drop that rebase backend. Let's not
waste effort on converting this script to C.

>    - add -i

Please see PRs #170-#175 on https://github.com/gitgitgadget/git/pulls,
and please do help by adding your review of #170 (which was already
submitted as v4:
https://public-inbox.org/git/pull.170.v4.git.gitgitgadget@gmail.com/

In other words: this project is well under way. In fact, Git for Windows
users enjoy this as an opt-in already.

>    (We were afraid this might be too boring, though.)

Converting shell/Perl scripts into built-in C never looks as much fun as
open-ended projects with lots of playing around, but the advantage of
the former is that they can be easily structured, offer a lot of
opportunity for learning, and they are ultimately more rewarding because
the goals are much better defined than many other projects'.

Another script that would _really_ benefit from being converted to C:
`mergetool`. Especially on Windows, where the over-use of spawned
processes really hurts, it is awfully slow a command.

To complete the list of sh/pl commands:

- git-merge-octopus.sh
- git-merge-one-file.sh
- git-merge-resolve.sh

These seem to be good candidates for conversion to built-ins. Their
functionality is well-exercised in the test suite, their complexity is
quite manageable, and there is no good reason that these should be
scripted.

The only slightly challenging aspect might be that `merge-one-file` is
actually not a merge strategy, but it is used as helper to be passed to
`git merge-index` via the `-o <helper>` option, which makes it slightly
awkward to be implemented as a built-in. A better approach would
therefore be to special-case this value in `merge-index` and execute the
C code directly, without the detour of spawning a built-in.

- git-difftool--helper.sh
- git-mergetool--lib.sh

These would be converted as part of making `mergetool` a built-in, I
believe.

- git-filter-branch.sh

This one is in the process of being deprecated in favor of `git
filter-repo` (which is an external tool), so I don't think there would
be much use in wasting energy on trying to convert it to C. Especially
given that it wants to call shell script snippets all over the place,
and those shell script snippets are supposed to run in the same context,
which might actually make it completely impossible to convert this to C
at all.

- git-legacy-stash.sh

This will go away once the built-in stash is considered good enough.

- git-instaweb.sh
- git-request-pull.sh
- git-send-email.perl
- git-web--browse.sh

I don't think that any of these should be converted. They are just too
unimportant from a performance point of view, and obscure enough that
even their portability issues don't matter too much.

As to `send-email` in particular: I would not want anybody to drag in
all the dependencies required to convert `send-email` to a built-in to
begin with.

- git-archimport.perl
- git-cvsexportcommit.perl
- git-cvsimport.perl
- git-cvsserver.perl
- git-quiltimport.sh
- git-svn.perl

These are all connectors of some sort to other version control software.
It also feels like they become less and less important, as Git really
takes over the world.

At some stage, I think, it would make sense to push those scripts out
into their own repositories, looking for new maintainers (and if none
can be found, then there really is not enough need for them to begin
with, and they can be archived).

>  - reduce/eliminate use of fetch_if_missing global
>  - create a better difftool/mergetool for format of choice (this one
>    ends up existing outside of the Git codebase, but still may be pretty
>    adjacent and big impact)
>  - training wheels/intro/tutorial mode? (We thought it may be useful to
>    make available a very basic "I just want to make a single PR and not
>    learn graph theory" mode, toggled by config switch)
>  - "did you mean?" for common use cases, e.g. commit with a dirty
>    working tree and no staged files - either offer a hint or offer a
>    prompt to continue ("Stage changed files and commit? [Y/n]")
>  - new `git partial-clone` command to interactively set a filter,
>    configure other partial clone settings
>  - add progress bars in various situations
>  - add a TUI to deal more easily with the mailing list. Jonathan Tan has
>    a strong idea of what this TUI would do... This one would also end up
>    external but adjacent to the Git codebase.

I don't think that this would be a good project for anybody except
people who are already really, really familiar with our mailing
list-centric workflow.

>  - try and make progress towards running many tests from a single test
>    file in parallel - maybe this is too big, I'm not sure if we know how
>    many of our tests are order-dependent within a file for now...

Another, potentially more rewarding, project would be to modernize our
test suite framework, so that it is not based on Unix shell scripting,
but on C instead.

The fact that it is based on Unix shell scripting not only costs a lot
of speed, especially on Windows, it also limits us quite a bit, and I am
talking about a lot more than just the awkwardness of having to think
about options of BSD vs GNU variants of common command-line tools.

For example, many, many, if not all, test cases, spend the majority of
their code on setting up specific scenarios. I don't know about you,
but personally I have to dive into many of them when things fail (and I
_dread_ the numbers 0021, 0025 and 3070, let me tell you) and I really
have to say that most of that code is hard to follow and does not make
it easy to form a mental model of what the code tries to accomplish.

To address this, a while ago Thomas Rast started to use `fast-export`ed
commit histories in test scripts (see e.g. `t/t3206/history.export`). I
still find that this fails to make it easier for occasional readers to
understand the ideas underlying the test cases.

Another approach is to document heavily the ideas first, then use code
to implement them. For example, t3430 starts with this:

	[...]

	Initial setup:

	    -- B --                   (first)
	   /       \
	 A - C - D - E - H            (master)
	   \    \       /
	    \    F - G                (second)
	     \
	      Conflicting-G

	[...]

	test_commit A &&
	git checkout -b first &&
	test_commit B &&
	git checkout master &&
	test_commit C &&
	test_commit D &&
	git merge --no-commit B &&
	test_tick &&
	git commit -m E &&
	git tag -m E E &&
	git checkout -b second C &&
	test_commit F &&
	test_commit G &&
	git checkout master &&
	git merge --no-commit G &&
	test_tick &&
	git commit -m H &&
	git tag -m H H &&
	git checkout A &&
	test_commit conflicting-G G.t

	[...]

While this is _somewhat_ better than having only the code, I am still
unhappy about it: this wall of `test_commit` lines interspersed with
other commands is very hard to follow.

If we were to (slowly) convert our test suite framework to C, we could
change that.

One idea would be to allow recreating commit history from something that
looks like the output of `git log`, or even `git log --graph --oneline`,
much like `git mktree` (which really should have been a test helper
instead of a Git command, but I digress) takes something that looks like
the output of `git ls-tree` and creates a tree object from it.

Another thing that would be much easier if we moved more and more parts
of the test suite framework to C: we could implement more powerful
assertions, a lot more easily. For example, the trace output of a failed
`test_i18ngrep` (or `mingw_test_cmp`!!!) could be made a lot more
focused on what is going wrong than on cluttering the terminal window
with almost useless lines which are tedious to sift through.

Likewise, having a framework in C would make it a lot easier to improve
debugging, e.g. by making test scripts "resumable" (guarded by an
option, it could store a complete state, including a copy of the trash
directory, before executing commands, which would allow "going back in
time" and calling a failing command with a debugger, or with valgrind, or
just seeing whether the command would still fail, i.e. whether the test
case is flaky).

Also, things like the code tracing via `-x` (which relies on Bash
functionality in order to work properly, and which _still_ does not work
as intended if your test case evaluates a lazy prereq that has not been
evaluated before) could be "done right".

In many ways, our current test suite seems to test Git's functionality
as much as (core) contributors' abilities to implement test cases in
Unix shell script, _correctly_, and maybe also contributors' patience.
You could say that it tests for the wrong thing at least half of the
time, by design.

It might look like a somewhat less important project, but given that we
exercise almost 150,000 test cases with every CI build, I think it does
make sense to grind our axe for a while, so to say.

Therefore, it might be a really good project to modernize our test
suite. To take ideas from modern test frameworks such as Jest and try to
bring them to C. Which means that new contributors would probably be
better suited to work on this project than Git old-timers!

And the really neat thing about this project is that it could be done
incrementally.

> It might make sense to only focus on scoping the ones we feel most
> interested in. We came up with a pretty big list because we had some
> other programs in mind, so I suppose it's not necessary to develop all
> of them for this program.

I don't find that list particularly big, to be honest ;-)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-17 11:23       ` Johannes Schindelin
@ 2019-09-17 12:02         ` SZEDER Gábor
  2019-09-23 12:47           ` Johannes Schindelin
  2019-09-17 15:10         ` Christian Couder
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 62+ messages in thread
From: SZEDER Gábor @ 2019-09-17 12:02 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Emily Shaffer, Jeff King, Jonathan Tan, git

On Tue, Sep 17, 2019 at 01:23:18PM +0200, Johannes Schindelin wrote:
> Also, things like the code tracing via `-x` (which relies on Bash
> functionality in order to work properly,

Not really.

> and which _still_ does not work
> as intended if your test case evaluates a lazy prereq that has not been
> evaluated before

I don't see any striking differences between the trace output of a test
involving a lazy prereq from Bash or dash:

  $ cat t9999-test.sh 
  #!/bin/sh
  
  test_description='test'
  
  . ./test-lib.sh
  
  test_lazy_prereq DUMMY_PREREQ '
          : lazily evaluating a dummy prereq
  '
  
  test_expect_success DUMMY_PREREQ 'test' '
          true
  '
  
  test_done
  $ ./t9999-test.sh -x
  Initialized empty Git repository in /home/szeder/src/git/t/trash directory.t9999-test/.git/
  checking prerequisite: DUMMY_PREREQ
  
  mkdir -p "$TRASH_DIRECTORY/prereq-test-dir" &&
  (
          cd "$TRASH_DIRECTORY/prereq-test-dir" &&
          : lazily evaluating a dummy prereq
  
  )
  + mkdir -p /home/szeder/src/git/t/trash directory.t9999-test/prereq-test-dir
  + cd /home/szeder/src/git/t/trash directory.t9999-test/prereq-test-dir
  + : lazily evaluating a dummy prereq
  prerequisite DUMMY_PREREQ ok
  expecting success of 9999.1 'test': 
          true
  
  + true
  ok 1 - test
  
  # passed all 1 test(s)
  1..1
  $ bash ./t9999-test.sh -x
  Initialized empty Git repository in /home/szeder/src/git/t/trash directory.t9999-test/.git/
  checking prerequisite: DUMMY_PREREQ
  
  mkdir -p "$TRASH_DIRECTORY/prereq-test-dir" &&
  (
          cd "$TRASH_DIRECTORY/prereq-test-dir" &&
          : lazily evaluating a dummy prereq
  
  )
  ++ mkdir -p '/home/szeder/src/git/t/trash directory.t9999-test/prereq-test-dir'
  ++ cd '/home/szeder/src/git/t/trash directory.t9999-test/prereq-test-dir'
  ++ : lazily evaluating a dummy prereq
  prerequisite DUMMY_PREREQ ok
  expecting success of 9999.1 'test': 
          true
  
  ++ true
  ok 1 - test
  
  # passed all 1 test(s)
  1..1


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-17 11:23       ` Johannes Schindelin
  2019-09-17 12:02         ` SZEDER Gábor
@ 2019-09-17 15:10         ` Christian Couder
  2019-09-23 12:50           ` Johannes Schindelin
  2019-09-23 19:30           ` Jeff King
  2019-09-23 18:07         ` Jeff King
  2019-09-24  0:55         ` Eric Wong
  3 siblings, 2 replies; 62+ messages in thread
From: Christian Couder @ 2019-09-17 15:10 UTC (permalink / raw)
  To: Emily Shaffer, Johannes Schindelin; +Cc: Jeff King, Jonathan Tan, git

Hi Emily and Dscho,

On Tue, Sep 17, 2019 at 1:28 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> On Mon, 16 Sep 2019, Emily Shaffer wrote:
>
> > Jonathan Tan, Jonathan Nieder, Josh Steadmon and I met on Friday to
> > talk about projects and we came up with a trimmed list; not sure what
> > more needs to be done to make them into fully-fledged proposals.
>
> Thank you for doing this!

Yeah, great!

> > For starter microprojects, we came up with:
> >
> >  - cleanup a test script (although we need to identify particularly
> >    which ones and what counts as "clean")
> >  - moving doc from documentation/technical/api-* to comments in the
> >    appropriate header instead
> >  - teach a command which currently handles its own argv how to use
> >    parse-options instead
> >  - add a user.timezone option which Git can use if present rather than
> >    checking system local time
>
> Nice projects, all. There are a couple more ideas on
> https://github.com/gitgitgadget/git/issues, they could probably use some
> tagging.

Thanks! Maybe we should have a page with Outreachy microprojects on
https://git.github.io/

I will see if I find the time to create one soon with the above information.

> > For the longer projects, we came up with a few more:

[...]

> >    - git-bisect.sh
>
> That would be my top recommendation, especially given how much effort
> Tanushree put in last winter to make this conversion to C so much more
> achievable than before.

I just added a project in the Outreachy system about it. I would have
added the link but Outreachy asks to not share the link publicly. I am
willing to co-mentor (or maybe mentor alone if no one else wants to
co-mentor) it. Anyone willing to co-mentor can register on the
outreachy website.

Thanks for the other suggestions by the way.

> Converting shell/Perl scripts into built-in C never looks as much fun as
> open-ended projects with lots of playing around, but the advantage of
> the former is that they can be easily structured, offer a lot of
> opportunity for learning, and they are ultimately more rewarding because
> the goals are much better defined than many other projects'.

I agree. Outreachy also suggest avoiding projects that have to be
discussed a lot or are not clear enough.

> >  - reduce/eliminate use of fetch_if_missing global

I like this one!

> > It might make sense to only focus on scoping the ones we feel most
> > interested in. We came up with a pretty big list because we had some
> > other programs in mind, so I suppose it's not necessary to develop all
> > of them for this program.

I agree as I don't think we will have enough mentors or co-mentors for
a big number of projects.

Best,
Christian.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-13 20:51   ` Jeff King
  2019-09-16 18:42     ` Emily Shaffer
@ 2019-09-20 17:04     ` Jonathan Tan
  2019-09-21  1:47       ` Emily Shaffer
                         ` (2 more replies)
  1 sibling, 3 replies; 62+ messages in thread
From: Jonathan Tan @ 2019-09-20 17:04 UTC (permalink / raw)
  To: peff; +Cc: jonathantanmy, git

> Prospective mentors need to sign up on that site, and should propose a
> project they'd be willing to mentor.

[snip]

> I'm happy to discuss possible projects if anybody has an idea but isn't
> sure how to develop it into a proposal.

I'm new to Outreachy and programs like this, so does anyone have an
opinion on my draft proposal below? It does not have any immediate
user-facing benefit, but it does have a definite end point.

Also let me know if an Outreachy proposal should have more detail, etc.

    Refactor "git index-pack" logic into library code

    Currently, whenever any Git code needs a pack to be indexed, it
    needs to spawn a new "git index-pack" process, passing command-line
    arguments and communicating with it using file descriptors (standard
    input and output), much like an end-user would if invoking "git
    index-pack" directly. Refactor the pack indexing logic into library
    code callable from other Git code, make "git index-pack" a thin
    wrapper around that library code, and (to demonstrate that the
    refactoring works) change fetch-pack.c to use the library code
    instead of spawning the "git index-pack" process.

    This allows the pack indexing code to communicate with its callers
    with the full power of C (structs, callbacks, etc.) instead of being
    restricted to command-line arguments and file descriptors. It also
    simplifies debugging in that there will no longer be 2
    inter-communicating processes to deal with, only 1.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-20 17:04     ` Jonathan Tan
@ 2019-09-21  1:47       ` Emily Shaffer
  2019-09-23 14:23         ` Christian Couder
  2019-09-23 19:40         ` Jeff King
  2019-09-23 11:49       ` Christian Couder
  2019-09-23 19:15       ` Jeff King
  2 siblings, 2 replies; 62+ messages in thread
From: Emily Shaffer @ 2019-09-21  1:47 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: peff, git

On Fri, Sep 20, 2019 at 10:04:48AM -0700, Jonathan Tan wrote:
> > Prospective mentors need to sign up on that site, and should propose a
> > project they'd be willing to mentor.
> 
> [snip]
> 
> > I'm happy to discuss possible projects if anybody has an idea but isn't
> > sure how to develop it into a proposal.
> 
> I'm new to Outreachy and programs like this, so does anyone have an
> opinion on my draft proposal below? It does not have any immediate
> user-facing benefit, but it does have a definite end point.

I'd appreciate similar opinion if anybody has it - and I'd also really
feel more comfortable with a co-mentor.

"""
"Did You Mean..?"

There are some situations where it's fairly clear what a user meant to
do, even though they did not do that thing correctly. For example, if a
user runs `git commit` with tracked, modified, unstaged files in their
worktree, but no staged files at all, it's fairly likely that they
simply forgot to add the files they wanted. In this case, the error
message is slightly obtuse:

$ git commit
On branch master
Changes not staged for commit:
	modified:   foo.txt

no changes added to commit


Since we have an idea of what the user _meant_ to do, we can offer
something more like:

$ git commit
On branch master
Changes not staged for commit:
	modified:   foo.txt

Stage listed changes and continue? [Y/n]

While the above case is a good starting place, other similar cases can
be added afterwards if time permits. These helper prompts should be
enabled/disabled via a config option so that people who are used to
their current workflow won't be impacted.
"""

Thanks in advance for feedback.

 - Emily

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-20 17:04     ` Jonathan Tan
  2019-09-21  1:47       ` Emily Shaffer
@ 2019-09-23 11:49       ` Christian Couder
  2019-09-23 17:58         ` Jonathan Tan
  2019-09-23 19:15       ` Jeff King
  2 siblings, 1 reply; 62+ messages in thread
From: Christian Couder @ 2019-09-23 11:49 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Jeff King, git

On Mon, Sep 23, 2019 at 10:53 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> > Prospective mentors need to sign up on that site, and should propose a
> > project they'd be willing to mentor.

Yeah, you are very welcome to sign up soon if you haven't already done
so as I think the deadline is really soon.

> > I'm happy to discuss possible projects if anybody has an idea but isn't
> > sure how to develop it into a proposal.
>
> I'm new to Outreachy and programs like this, so does anyone have an
> opinion on my draft proposal below? It does not have any immediate
> user-facing benefit, but it does have a definite end point.

No need for user-facing benefits. Refactoring or improving the code in
other useful ways are very good subjects (as I already said in my
reply to Emily and Dscho).

> Also let me know if an Outreachy proposal should have more detail, etc.
>
>     Refactor "git index-pack" logic into library code
>
>     Currently, whenever any Git code needs a pack to be indexed, it
>     needs to spawn a new "git index-pack" process, passing command-line
>     arguments and communicating with it using file descriptors (standard
>     input and output), much like an end-user would if invoking "git
>     index-pack" directly. Refactor the pack indexing logic into library
>     code callable from other Git code, make "git index-pack" a thin
>     wrapper around that library code, and (to demonstrate that the
>     refactoring works) change fetch-pack.c to use the library code
>     instead of spawning the "git index-pack" process.
>
>     This allows the pack indexing code to communicate with its callers
>     with the full power of C (structs, callbacks, etc.) instead of being
>     restricted to command-line arguments and file descriptors. It also
>     simplifies debugging in that there will no longer be 2
>     inter-communicating processes to deal with, only 1.

I think this is really great, both the idea and the description! No
need for more details.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-17 12:02         ` SZEDER Gábor
@ 2019-09-23 12:47           ` Johannes Schindelin
  2019-09-23 16:58             ` SZEDER Gábor
  2019-09-23 18:19             ` Jeff King
  0 siblings, 2 replies; 62+ messages in thread
From: Johannes Schindelin @ 2019-09-23 12:47 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Emily Shaffer, Jeff King, Jonathan Tan, git

[-- Attachment #1: Type: text/plain, Size: 883 bytes --]

Hi,

On Tue, 17 Sep 2019, SZEDER Gábor wrote:

> On Tue, Sep 17, 2019 at 01:23:18PM +0200, Johannes Schindelin wrote:
> > Also, things like the code tracing via `-x` (which relies on Bash
> > functionality in order to work properly,
>
> Not really.

To work properly. What I meant was the trick we need to play with
`BASH_XTRACEFD`.

> > and which _still_ does not work as intended if your test case
> > evaluates a lazy prereq that has not been evaluated before
>
> I don't see any striking differences between the trace output of a test
> involving a lazy prereq from Bash or dash:
>
> [...]

The evaluation of the lazy prereq is indeed not different between Bash
or dash. It is nevertheless quite disruptive in the trace of a test
script, especially when it is evaluated for a test case that is skipped
explicitly via the `--run` option.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-17 15:10         ` Christian Couder
@ 2019-09-23 12:50           ` Johannes Schindelin
  2019-09-23 19:30           ` Jeff King
  1 sibling, 0 replies; 62+ messages in thread
From: Johannes Schindelin @ 2019-09-23 12:50 UTC (permalink / raw)
  To: Christian Couder; +Cc: Emily Shaffer, Jeff King, Jonathan Tan, git

Hi,

On Tue, 17 Sep 2019, Christian Couder wrote:

> On Tue, Sep 17, 2019 at 1:28 PM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> > On Mon, 16 Sep 2019, Emily Shaffer wrote:
> >
> > >  - reduce/eliminate use of fetch_if_missing global
>
> I like this one!

It looks as if a (non-Outreachy) contributor also does like this one
already: https://github.com/git/git/pull/650

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-21  1:47       ` Emily Shaffer
@ 2019-09-23 14:23         ` Christian Couder
  2019-09-23 19:40         ` Jeff King
  1 sibling, 0 replies; 62+ messages in thread
From: Christian Couder @ 2019-09-23 14:23 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Jonathan Tan, Jeff King, git

On Mon, Sep 23, 2019 at 3:35 PM Emily Shaffer <emilyshaffer@google.com> wrote:
>
> On Fri, Sep 20, 2019 at 10:04:48AM -0700, Jonathan Tan wrote:
>
> > I'm new to Outreachy and programs like this, so does anyone have an
> > opinion on my draft proposal below? It does not have any immediate
> > user-facing benefit, but it does have a definite end point.
>
> I'd appreciate similar opinion if anybody has it - and I'd also really
> feel more comfortable with a co-mentor.

First as the deadline is tomorrow, I think it is important to submit
projects to Outreachy even if they are not perfect and even if we
would like a co-mentor (which is also my case by the way).

We can hopefully improve the projects after the deadline or perhaps
drop them if we cannot find enough co-mentors or if we don't agree
with the goal or find it too difficult.

> """
> "Did You Mean..?"
>
> There are some situations where it's fairly clear what a user meant to
> do, even though they did not do that thing correctly. For example, if a
> user runs `git commit` with tracked, modified, unstaged files in their
> worktree, but no staged files at all, it's fairly likely that they
> simply forgot to add the files they wanted. In this case, the error
> message is slightly obtuse:
>
> $ git commit
> On branch master
> Changes not staged for commit:
>         modified:   foo.txt
>
> no changes added to commit
>
>
> Since we have an idea of what the user _meant_ to do, we can offer
> something more like:
>
> $ git commit
> On branch master
> Changes not staged for commit:
>         modified:   foo.txt
>
> Stage listed changes and continue? [Y/n]
>
> While the above case is a good starting place, other similar cases can
> be added afterwards if time permits. These helper prompts should be
> enabled/disabled via a config option so that people who are used to
> their current workflow won't be impacted.
> """

I agree that it might help. There could be significant discussion
about what the UI should be though. For example maybe we could just
promote `git commit -p` in the tutorials instead of doing the above.
Or have a commit.patch config option if we haven't one already.

Thanks,
Christian.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-23 12:47           ` Johannes Schindelin
@ 2019-09-23 16:58             ` SZEDER Gábor
  2019-09-26 11:04               ` Johannes Schindelin
  2019-09-23 18:19             ` Jeff King
  1 sibling, 1 reply; 62+ messages in thread
From: SZEDER Gábor @ 2019-09-23 16:58 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Emily Shaffer, Jeff King, Jonathan Tan, git

On Mon, Sep 23, 2019 at 02:47:23PM +0200, Johannes Schindelin wrote:
> Hi,
> 
> On Tue, 17 Sep 2019, SZEDER Gábor wrote:
> 
> > On Tue, Sep 17, 2019 at 01:23:18PM +0200, Johannes Schindelin wrote:
> > > Also, things like the code tracing via `-x` (which relies on Bash
> > > functionality in order to work properly,
> >
> > Not really.
> 
> To work properly. What I meant was the trick we need to play with
> `BASH_XTRACEFD`.

I'm still unsure what BASH_XTRACEFD trick you mean.  AFAICT we don't
play any tricks with it to make '-x' work properly, and indeed '-x'
tracing works properly even without BASH_XTRACEFD (and to achive that
we did have to play some tricks, but not any with BASH_XTRACEFD;
perhaps these tricks are what you meant?).

> > > and which _still_ does not work as intended if your test case
> > > evaluates a lazy prereq that has not been evaluated before
> >
> > I don't see any striking differences between the trace output of a test
> > involving a lazy prereq from Bash or dash:
> >
> > [...]
> 
> The evaluation of the lazy prereq is indeed not different between Bash
> or dash. It is nevertheless quite disruptive in the trace of a test
> script, especially when it is evaluated for a test case that is skipped
> explicitly via the `--run` option.

But then the actual issue is the unnecessary evaluation of the prereq
even when the test framework could know in advance that the test case
should be skipped anyway, and the trace from it is a mere side effect,
no?


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-23 11:49       ` Christian Couder
@ 2019-09-23 17:58         ` Jonathan Tan
  2019-09-23 19:27           ` Jeff King
  0 siblings, 1 reply; 62+ messages in thread
From: Jonathan Tan @ 2019-09-23 17:58 UTC (permalink / raw)
  To: christian.couder; +Cc: jonathantanmy, peff, git

> No need for user-facing benefits. Refactoring or improving the code in
> other useful ways are very good subjects (as I already said in my
> reply to Emily and Dscho).

Thanks!

> I think this is really great, both the idea and the description! No
> need for more details.

Thanks! I've just submitted the project proposal - hopefully it will be
approved soon. In any case, it seems that project information can be
edited after submission.

There was a "How can applicants make a contribution to your project?"
question and a few questions about communication channels. I answered
them as best I could but if anyone has already answered them, it would
be great to just use the same answer everywhere. (I can't see all
project information of other projects since I haven't filled out a
"short initial application", but I don't think that applies to me.)

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-04 19:41 ` Jeff King
                     ` (2 preceding siblings ...)
  2019-09-08 14:56   ` Pratyush Yadav
@ 2019-09-23 18:07   ` SZEDER Gábor
  2019-09-26  9:47     ` SZEDER Gábor
  2019-09-26 11:42     ` Johannes Schindelin
  3 siblings, 2 replies; 62+ messages in thread
From: SZEDER Gábor @ 2019-09-23 18:07 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Olga Telezhnaya, Christian Couder, Elijah Newren,
	Thomas Gummerer, Matheus Tavares Bernardino

On Wed, Sep 04, 2019 at 03:41:15PM -0400, Jeff King wrote:
> The project page has a section to point people in the right direction
> for first-time contributions. I've left it blank for now, but I think it
> makes sense to point one (or both) of:
> 
>   - https://git-scm.com/docs/MyFirstContribution
> 
>   - https://matheustavares.gitlab.io/posts/first-steps-contributing-to-git
> 
> as well as a list of micro-projects (or at least instructions on how to
> find #leftoverbits, though we'd definitely have to step up our labeling,
> as I do not recall having seen one for a while).

And we should make sure that all microprojects are indeed micro in
size.  Matheus sent v8 of a 10 patch series in July that started out
as a microproject back in February...

Here is one more idea for microprojects:

  Find a group of related preprocessor constants and turn them into an
  enum.  Also find where those constants are stored in variables and
  in structs and passed around as function parameters, and change the
  type of those variables, fields and parameters to the new enum.




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-17 11:23       ` Johannes Schindelin
  2019-09-17 12:02         ` SZEDER Gábor
  2019-09-17 15:10         ` Christian Couder
@ 2019-09-23 18:07         ` Jeff King
  2019-09-24 14:25           ` Johannes Schindelin
  2019-09-24  0:55         ` Eric Wong
  3 siblings, 1 reply; 62+ messages in thread
From: Jeff King @ 2019-09-23 18:07 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Emily Shaffer, Jonathan Tan, git

On Tue, Sep 17, 2019 at 01:23:18PM +0200, Johannes Schindelin wrote:

> The only slightly challenging aspect might be that `merge-one-file` is
> actually not a merge strategy, but it is used as helper to be passed to
> `git merge-index` via the `-o <helper>` option, which makes it slightly
> awkward to be implemented as a built-in. A better approach would
> therefore be to special-case this value in `merge-index` and execute the
> C code directly, without the detour of spawning a built-in.

I think it could make sense for merge-index to be able to directly run
the merge-one-file code[1]. But I think we'd want to keep its ability to
run an arbitrary script, and for people to call merge-one-file
separately, since right now you can do:

  git merge-index my-script

and have "my-script" do some processing of its own, then hand off more
work to merge-one-file.

So the weird calling convention is actually a user-visible and
potentially useful interface. So it would need a deprecation period
before being removed.

-Peff

[1] Certainly doing it in-process would be faster for the common case of
    "git merge-index git-merge-one-file", but I wonder if anybody really
    does that. These days most people would just merge-recursive anyway.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-23 12:47           ` Johannes Schindelin
  2019-09-23 16:58             ` SZEDER Gábor
@ 2019-09-23 18:19             ` Jeff King
  2019-09-24 14:30               ` Johannes Schindelin
  1 sibling, 1 reply; 62+ messages in thread
From: Jeff King @ 2019-09-23 18:19 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: SZEDER Gábor, Emily Shaffer, Jonathan Tan, git

On Mon, Sep 23, 2019 at 02:47:23PM +0200, Johannes Schindelin wrote:

> The evaluation of the lazy prereq is indeed not different between Bash
> or dash. It is nevertheless quite disruptive in the trace of a test
> script, especially when it is evaluated for a test case that is skipped
> explicitly via the `--run` option.

That sounds like a bug: if we know we are not going to run the test
anyway due to --run or GIT_TEST_SKIP, we should probably avoid checking
the prereq at all.

-Peff

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-20 17:04     ` Jonathan Tan
  2019-09-21  1:47       ` Emily Shaffer
  2019-09-23 11:49       ` Christian Couder
@ 2019-09-23 19:15       ` Jeff King
  2019-09-23 20:38         ` Jonathan Tan
  2 siblings, 1 reply; 62+ messages in thread
From: Jeff King @ 2019-09-23 19:15 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Fri, Sep 20, 2019 at 10:04:48AM -0700, Jonathan Tan wrote:

> > I'm happy to discuss possible projects if anybody has an idea but isn't
> > sure how to develop it into a proposal.
> 
> I'm new to Outreachy and programs like this, so does anyone have an
> opinion on my draft proposal below? It does not have any immediate
> user-facing benefit, but it does have a definite end point.
> 
> Also let me know if an Outreachy proposal should have more detail, etc.
> 
>     Refactor "git index-pack" logic into library code
> 
>     Currently, whenever any Git code needs a pack to be indexed, it
>     needs to spawn a new "git index-pack" process, passing command-line
>     arguments and communicating with it using file descriptors (standard
>     input and output), much like an end-user would if invoking "git
>     index-pack" directly. Refactor the pack indexing logic into library
>     code callable from other Git code, make "git index-pack" a thin
>     wrapper around that library code, and (to demonstrate that the
>     refactoring works) change fetch-pack.c to use the library code
>     instead of spawning the "git index-pack" process.
> 
>     This allows the pack indexing code to communicate with its callers
>     with the full power of C (structs, callbacks, etc.) instead of being
>     restricted to command-line arguments and file descriptors. It also
>     simplifies debugging in that there will no longer be 2
>     inter-communicating processes to deal with, only 1.

I think this is an OK level of detail. I'm not sure quite sure about the
goal of the project, though. In particular:

  - I'm not clear what we'd hope to gain. I.e., what richer information
    would we want to pass back and forth between index-pack and the
    other processes? It might also be more efficient, but I'm not sure
    it's measurably so (we save a single process, and we save some pipe
    traffic, but the sideband demuxer would probably end up passing it
    over a self-pipe anyway).

  - index-pack is prone to dying on bad input, and we wouldn't want it
    to take down the outer fetch-pack or receive-pack, which are what
    produce useful messages to the user. That's something that could be
    fixed as part of the libification, but I suspect the control flow
    might be a little tricky.

  - we don't always call index-pack, but sometimes call unpack-objects.
    I suppose we could continue to call an external unpack-objects in
    that path, but that eliminates the utility of having richer
    communication if we sometimes have to take the "dumb" path. A while
    ago I took a stab at teaching index-pack to unpack. It works, but
    there are a few ugly bits, as discussed in:

      https://github.com/peff/git/commit/7df82454a855281e9c147f3023225f8a6f72e303

    Maybe that would be worth making part of the project?

-Peff

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-23 17:58         ` Jonathan Tan
@ 2019-09-23 19:27           ` Jeff King
  2019-09-23 20:48             ` Jonathan Tan
  0 siblings, 1 reply; 62+ messages in thread
From: Jeff King @ 2019-09-23 19:27 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: christian.couder, git

On Mon, Sep 23, 2019 at 10:58:05AM -0700, Jonathan Tan wrote:

> > I think this is really great, both the idea and the description! No
> > need for more details.
> 
> Thanks! I've just submitted the project proposal - hopefully it will be
> approved soon. In any case, it seems that project information can be
> edited after submission.

I approved this. I did leave some comments elsewhere in the thread, but
I think we can continue to iterate on the idea.

> There was a "How can applicants make a contribution to your project?"
> question and a few questions about communication channels. I answered
> them as best I could but if anyone has already answered them, it would
> be great to just use the same answer everywhere. (I can't see all
> project information of other projects since I haven't filled out a
> "short initial application", but I don't think that applies to me.)

What you wrote there looks pretty good to me. Copying it here for
purposes of discussion:

> Please introduce yourself on the public project chat:
>
> IRC - Follow this link to join this project's public chat. If you are
> asked for username, pick any username! You will not need a password to
> join this chat.
>
> Git mailing list - Follow this link to join this project's public
> chat.

where the IRC link goes to Freenode's webchat, and the mailing list link
goes to https://git-scm.com/community.

The other proposal is from Christian, who wrote:

> Please introduce yourself on the public project chat:
>
>     a mailing list - Once you join the project's communication
>     channel, the mentors have some additional instructions for you to
>     follow:
> 
>     Start the subject of your emails to the mailing list with "[Outreachy]" so that mentors and people interested in Outreachy can easily notice your emails.
>     Follow this link to join this project's public chat.
> 
>     Send an email to majordomo@vger.kernel.org with "subscribe git" in
>     the body of the email.

I think the "[Outreachy]" advice is good, and worth adding to yours. I
think linking to the "community" page is a good idea for Christian's, as
it has more tips on using the mailing list.

I'd leave it up to individual mentors whether they want to mention IRC
(based on whether or not they actually use IRC themselves).

-Peff

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-17 15:10         ` Christian Couder
  2019-09-23 12:50           ` Johannes Schindelin
@ 2019-09-23 19:30           ` Jeff King
  1 sibling, 0 replies; 62+ messages in thread
From: Jeff King @ 2019-09-23 19:30 UTC (permalink / raw)
  To: Christian Couder; +Cc: Emily Shaffer, Johannes Schindelin, Jonathan Tan, git

On Tue, Sep 17, 2019 at 05:10:43PM +0200, Christian Couder wrote:

> > Nice projects, all. There are a couple more ideas on
> > https://github.com/gitgitgadget/git/issues, they could probably use some
> > tagging.
> 
> Thanks! Maybe we should have a page with Outreachy microprojects on
> https://git.github.io/
> 
> I will see if I find the time to create one soon with the above information.

Please let me know if you do; there's a spot in the project page for:

  Description of your first time contribution tutorial:

  If your applicants need to complete a tutorial before working on
  contributions with mentors, please provide a description and the URL
  for the tutorial. For example, the Linux kernel asks applicants to
  complete a tutorial for compiling and installing a custom kernel, and
  sending in a simple whitespace change patch. Once applicants complete
  this tutorial, they can start to work with mentors on more complex
  contributions.

which could link to that list (and the list should probably discuss how
we consider micro-projects, etc, and link to MyFirstContribution or
similar).

I think this suggestion is a good one to add, as well:

  https://public-inbox.org/git/20190923180649.GA2886@szeder.dev/

-Peff

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-21  1:47       ` Emily Shaffer
  2019-09-23 14:23         ` Christian Couder
@ 2019-09-23 19:40         ` Jeff King
  2019-09-23 22:29           ` Philip Oakley
  1 sibling, 1 reply; 62+ messages in thread
From: Jeff King @ 2019-09-23 19:40 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Jonathan Tan, git

On Fri, Sep 20, 2019 at 06:47:01PM -0700, Emily Shaffer wrote:

> """
> "Did You Mean..?"
> 
> There are some situations where it's fairly clear what a user meant to
> do, even though they did not do that thing correctly. For example, if a
> user runs `git commit` with tracked, modified, unstaged files in their
> worktree, but no staged files at all, it's fairly likely that they
> simply forgot to add the files they wanted. In this case, the error
> message is slightly obtuse:
> 
> $ git commit
> On branch master
> Changes not staged for commit:
> 	modified:   foo.txt
> 
> no changes added to commit
> 
> 
> Since we have an idea of what the user _meant_ to do, we can offer
> something more like:
> 
> $ git commit
> On branch master
> Changes not staged for commit:
> 	modified:   foo.txt
> 
> Stage listed changes and continue? [Y/n]
> 
> While the above case is a good starting place, other similar cases can
> be added afterwards if time permits. These helper prompts should be
> enabled/disabled via a config option so that people who are used to
> their current workflow won't be impacted.
> """

This is an interesting idea. At first I thought it might be too small
for a project, but I think it could be expanded or contracted as much as
the time allows by just looking for more "did you mean" spots.

I have mixed feelings on making things interactive. For one, it gets
awkward when Git commands are called as part of a script or other
program (and a lot of programs like git-commit ride the line of plumbing
and porcelain). I know this would kick in only when a config option is
set, but I think that might things even _more_ confusing, as something
that works for one user (without the config) would start behaving
weirdly for another.

I also think it might be an opportunity to educate. Instead of giving a
yes/no prompt, we can actually recommend one (or more!) sets of commands
to get the desired effect. I _thought_ we already did for this case by
default (triggered by advice.statusHints, which is true by default). But
it looks like those don't get printed for git-commit?

-Peff

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-23 19:15       ` Jeff King
@ 2019-09-23 20:38         ` Jonathan Tan
  2019-09-23 21:28           ` Jeff King
  0 siblings, 1 reply; 62+ messages in thread
From: Jonathan Tan @ 2019-09-23 20:38 UTC (permalink / raw)
  To: peff; +Cc: jonathantanmy, git

> I think this is an OK level of detail. I'm not sure quite sure about the
> goal of the project, though. In particular:
> 
>   - I'm not clear what we'd hope to gain. I.e., what richer information
>     would we want to pass back and forth between index-pack and the
>     other processes? It might also be more efficient, but I'm not sure
>     it's measurably so (we save a single process, and we save some pipe
>     traffic, but the sideband demuxer would probably end up passing it
>     over a self-pipe anyway).

I didn't have any concrete ideas so I didn't include those, but some
unrefined ideas:

 - index-pack has the CLI option to specify a message to be written into
   the .promisor file, but in my patch to write fetched refs to
   .promisor [1], I ended up making fetch-pack.c write the information
   because I didn't know how many refs were going to be written (and I
   didn't want to bump into CLI argument length limits). If we had this
   feature, I might have been able to pass a callback to index-pack that
   writes the list of refs once we have the fd into .promisor,
   eliminating some code duplication (but I haven't verified this).

 - In your reply [2] to the above [1], you mentioned the possibility of
   keeping a list of cutoff points. One way of doing this, as I state in
   [3], is my original suggestion back in 2017 of one such
   repository-wide list. If we do this, it would be better for
   fetch-pack to handle this instead of index-pack, and it seems more
   efficient to me to have index-pack be able to pass objects to
   fetch-pack as they are inflated instead of fetch-pack rereading the
   compressed forms on disk (but again, I haven't verified this).

[1] https://public-inbox.org/git/20190826214737.164132-1-jonathantanmy@google.com/
[2] https://public-inbox.org/git/20190905070153.GE21450@sigill.intra.peff.net/
[3] https://public-inbox.org/git/20190905183926.137490-1-jonathantanmy@google.com/

There are also the debuggability improvements of not having to deal with
2 processes.

>   - index-pack is prone to dying on bad input, and we wouldn't want it
>     to take down the outer fetch-pack or receive-pack, which are what
>     produce useful messages to the user. That's something that could be
>     fixed as part of the libification, but I suspect the control flow
>     might be a little tricky.

Good point.

>   - we don't always call index-pack, but sometimes call unpack-objects.
>     I suppose we could continue to call an external unpack-objects in
>     that path, but that eliminates the utility of having richer
>     communication if we sometimes have to take the "dumb" path. A while
>     ago I took a stab at teaching index-pack to unpack. It works, but
>     there are a few ugly bits, as discussed in:
> 
>       https://github.com/peff/git/commit/7df82454a855281e9c147f3023225f8a6f72e303
> 
>     Maybe that would be worth making part of the project?

I'm reluctant to do so because I don't want to increase the scope too
much - although if my project has relatively narrow scope for an
Outreachy project, we can do so. As for eliminating the utility of
having richer communication, I don't think so, because in the situations
where we require richer communication (right now, situations to do with
partial clone), we specifically run index-pack anyway.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-23 19:27           ` Jeff King
@ 2019-09-23 20:48             ` Jonathan Tan
  0 siblings, 0 replies; 62+ messages in thread
From: Jonathan Tan @ 2019-09-23 20:48 UTC (permalink / raw)
  To: peff; +Cc: jonathantanmy, christian.couder, git

> I approved this. I did leave some comments elsewhere in the thread, but
> I think we can continue to iterate on the idea.

Thanks.

> > There was a "How can applicants make a contribution to your project?"
> > question and a few questions about communication channels. I answered
> > them as best I could but if anyone has already answered them, it would
> > be great to just use the same answer everywhere. (I can't see all
> > project information of other projects since I haven't filled out a
> > "short initial application", but I don't think that applies to me.)
> 
> What you wrote there looks pretty good to me. Copying it here for
> purposes of discussion:
> 
> > Please introduce yourself on the public project chat:
> >
> > IRC - Follow this link to join this project's public chat. If you are
> > asked for username, pick any username! You will not need a password to
> > join this chat.
> >
> > Git mailing list - Follow this link to join this project's public
> > chat.
> 
> where the IRC link goes to Freenode's webchat, and the mailing list link
> goes to https://git-scm.com/community.

Thanks. For the record, most of the text is from Outreachy - I just
supplied the links.

> The other proposal is from Christian, who wrote:
> 
> > Please introduce yourself on the public project chat:
> >
> >     a mailing list - Once you join the project's communication
> >     channel, the mentors have some additional instructions for you to
> >     follow:
> > 
> >     Start the subject of your emails to the mailing list with "[Outreachy]" so that mentors and people interested in Outreachy can easily notice your emails.
> >     Follow this link to join this project's public chat.
> > 
> >     Send an email to majordomo@vger.kernel.org with "subscribe git" in
> >     the body of the email.
> 
> I think the "[Outreachy]" advice is good, and worth adding to yours. I
> think linking to the "community" page is a good idea for Christian's, as
> it has more tips on using the mailing list.

Good idea on the "[Outreachy]" - I've added it to my project as well.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-23 20:38         ` Jonathan Tan
@ 2019-09-23 21:28           ` Jeff King
  2019-09-24 17:07             ` Jonathan Tan
  0 siblings, 1 reply; 62+ messages in thread
From: Jeff King @ 2019-09-23 21:28 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Mon, Sep 23, 2019 at 01:38:54PM -0700, Jonathan Tan wrote:

> I didn't have any concrete ideas so I didn't include those, but some
> unrefined ideas:

One risk to a mentoring project like this is that the intern does a good
job of steps 1-5, and then in step 6 we realize that the whole thing is
not useful, and upstream doesn't want it. Which isn't to say the intern
didn't learn something, and the project didn't benefit. Negative results
can be useful; but it can also be demoralizing.

I'm not arguing that's going to be the case here. But I do think it's
worth talking through these things a bit as part of thinking about
proposals.

>  - index-pack has the CLI option to specify a message to be written into
>    the .promisor file, but in my patch to write fetched refs to
>    .promisor [1], I ended up making fetch-pack.c write the information
>    because I didn't know how many refs were going to be written (and I
>    didn't want to bump into CLI argument length limits). If we had this
>    feature, I might have been able to pass a callback to index-pack that
>    writes the list of refs once we have the fd into .promisor,
>    eliminating some code duplication (but I haven't verified this).

That makes some sense. We could pass the data over a pipe, but obviously
stdin is already in use to receive the pack here. Ideally we'd be able
to pass multiple streams between the programs, but I think due to
Windows support, we can't assume that arbitrary pipe descriptors will
make it across the run-command boundary. So I think we'd be left with
communicating via temporary files (which really isn't the worst thing in
the world, but has its own complications).

>  - In your reply [2] to the above [1], you mentioned the possibility of
>    keeping a list of cutoff points. One way of doing this, as I state in
>    [3], is my original suggestion back in 2017 of one such
>    repository-wide list. If we do this, it would be better for
>    fetch-pack to handle this instead of index-pack, and it seems more
>    efficient to me to have index-pack be able to pass objects to
>    fetch-pack as they are inflated instead of fetch-pack rereading the
>    compressed forms on disk (but again, I haven't verified this).

And this is the flip-side problem: we need to get data back, but we have
only stdout, which is already in use (so we need some kind of protocol).
That leads to things like the horrible NUL-byte added by 83558686ce
(receive-pack: send keepalives during quiet periods, 2016-07-15).

> There are also the debuggability improvements of not having to deal with
> 2 processes.

I think it can sometimes be easier to debug with two separate processes,
because the input to index-pack is well-defined and can be repeated
without hitting the network (though you do have to figure out how to
record the network response, which can be non-trivial). I've also done
similar things for running performance simulations.

We'll still have the stand-alone index-pack command, so it can be used
for those cases. But as we add more features that utilize the in-process
interface, that may eventually stop being feasible.

> > [dropping unpack-objects]
> >     Maybe that would be worth making part of the project?
> 
> I'm reluctant to do so because I don't want to increase the scope too
> much - although if my project has relatively narrow scope for an
> Outreachy project, we can do so. As for eliminating the utility of
> having richer communication, I don't think so, because in the situations
> where we require richer communication (right now, situations to do with
> partial clone), we specifically run index-pack anyway.

Yeah, we're in kind of a weird situation there, where unpack-objects is
used less and less. I wonder how many surprises are lurking where
somebody reasoned about index-pack behavior, but unpack-objects may do
something slightly differently (I know this came up when we looked at
fsck-ing incoming objects for submodule vulnerabilities).

I kind of wonder if it would be reasonable to just always use index-pack
for the sake of simplicity, even if it never learns to actually unpack
objects. We've been doing that for years on the server side at GitHub
without ill effects (I think the unpack route is slightly more efficient
for a thin pack, but since it only kicks in when there are few objects
anyway, I wonder how big an advantage it is in general).

-Peff

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-23 19:40         ` Jeff King
@ 2019-09-23 22:29           ` Philip Oakley
  0 siblings, 0 replies; 62+ messages in thread
From: Philip Oakley @ 2019-09-23 22:29 UTC (permalink / raw)
  To: Jeff King, Emily Shaffer; +Cc: Jonathan Tan, git

On 23/09/2019 20:40, Jeff King wrote:
> On Fri, Sep 20, 2019 at 06:47:01PM -0700, Emily Shaffer wrote:
>
>> """
>> "Did You Mean..?"
>>
>> There are some situations where it's fairly clear what a user meant to
>> do, even though they did not do that thing correctly. For example, if a
>> user runs `git commit` with tracked, modified, unstaged files in their
>> worktree, but no staged files at all, it's fairly likely that they
>> simply forgot to add the files they wanted. In this case, the error
>> message is slightly obtuse:
>>
>> $ git commit
>> On branch master
>> Changes not staged for commit:
>> 	modified:   foo.txt
>>
>> no changes added to commit
>>
>>
>> Since we have an idea of what the user _meant_ to do, we can offer
>> something more like:
>>
>> $ git commit
>> On branch master
>> Changes not staged for commit:
>> 	modified:   foo.txt
>>
>> Stage listed changes and continue? [Y/n]
>>
>> While the above case is a good starting place, other similar cases can
>> be added afterwards if time permits. These helper prompts should be
>> enabled/disabled via a config option so that people who are used to
>> their current workflow won't be impacted.
>> """
> This is an interesting idea. At first I thought it might be too small
> for a project, but I think it could be expanded or contracted as much as
> the time allows by just looking for more "did you mean" spots.
>
> I have mixed feelings on making things interactive. For one, it gets
> awkward when Git commands are called as part of a script or other
> program (and a lot of programs like git-commit ride the line of plumbing
> and porcelain). I know this would kick in only when a config option is
> set, but I think that might things even _more_ confusing, as something
> that works for one user (without the config) would start behaving
> weirdly for another.
>
> I also think it might be an opportunity to educate. Instead of giving a
> yes/no prompt, we can actually recommend one (or more!) sets of commands
> to get the desired effect. I _thought_ we already did for this case by
> default (triggered by advice.statusHints, which is true by default). But
> it looks like those don't get printed for git-commit?
>
> -Peff
Also there is a lot of common problems and issues that can be mined from 
StackOverflow for similar "Did You Mean..?"user problems.

Philip

  


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-17 11:23       ` Johannes Schindelin
                           ` (2 preceding siblings ...)
  2019-09-23 18:07         ` Jeff King
@ 2019-09-24  0:55         ` Eric Wong
  2019-09-26 12:45           ` Johannes Schindelin
  2019-09-28  4:01           ` Junio C Hamano
  3 siblings, 2 replies; 62+ messages in thread
From: Eric Wong @ 2019-09-24  0:55 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Emily Shaffer, Jeff King, Jonathan Tan, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> On Mon, 16 Sep 2019, Emily Shaffer wrote:
> >  - try and make progress towards running many tests from a single test
> >    file in parallel - maybe this is too big, I'm not sure if we know how
> >    many of our tests are order-dependent within a file for now...
> 
> Another, potentially more rewarding, project would be to modernize our
> test suite framework, so that it is not based on Unix shell scripting,
> but on C instead.

I worry more C would reduce the amount of contributors (some of
the C rewrites already scared me off hacking years ago).  I
figure more users are familiar with sh than C.

It would also increase the disparity between tests and use of
actual users from the command-line.

> The fact that it is based on Unix shell scripting not only costs a lot
> of speed, especially on Windows, it also limits us quite a bit, and I am
> talking about a lot more than just the awkwardness of having to think
> about options of BSD vs GNU variants of common command-line tools.

I agree that it costs a lot of time, and I'm even on Linux using
dash as /bin/sh + eatmydata (but ancient laptop)

> For example, many, many, if not all, test cases, spend the majority of
> their code on setting up specific scenarios. I don't know about you,
> but personally I have to dive into many of them when things fail (and I
> _dread_ the numbers 0021, 0025 and 3070, let me tell you) and I really
> have to say that most of that code is hard to follow and does not make
> it easy to form a mental model of what the code tries to accomplish.
> 
> To address this, a while ago Thomas Rast started to use `fast-export`ed
> commit histories in test scripts (see e.g. `t/t3206/history.export`). I
> still find that this fails to make it easier for occasional readers to
> understand the ideas underlying the test cases.
> 
> Another approach is to document heavily the ideas first, then use code
> to implement them. For example, t3430 starts with this:
> 
> 	[...]
> 
> 	Initial setup:
> 
> 	    -- B --                   (first)
> 	   /       \
> 	 A - C - D - E - H            (master)
> 	   \    \       /
> 	    \    F - G                (second)
> 	     \
> 	      Conflicting-G
> 
> 	[...]
> 
> 	test_commit A &&
> 	git checkout -b first &&
> 	test_commit B &&
> 	git checkout master &&
> 	test_commit C &&
> 	test_commit D &&
> 	git merge --no-commit B &&
> 	test_tick &&
> 	git commit -m E &&
> 	git tag -m E E &&
> 	git checkout -b second C &&
> 	test_commit F &&
> 	test_commit G &&
> 	git checkout master &&
> 	git merge --no-commit G &&
> 	test_tick &&
> 	git commit -m H &&
> 	git tag -m H H &&
> 	git checkout A &&
> 	test_commit conflicting-G G.t
> 
> 	[...]
> 
> While this is _somewhat_ better than having only the code, I am still
> unhappy about it: this wall of `test_commit` lines interspersed with
> other commands is very hard to follow.

Agreed.  More on the readability part below...

As far as speeding that up, I think moving some parts
of test setup to Makefiles + fast-import/fast-export would give
us a nice balance of speed + maintainability:

1. initial setup is done using normal commands (or graph drawing tool)
2. the result of setup is "built" with fast-export
3. test uses fast-import

Makefile rules would prevent subsequent test runs from repeating
1. and 2.

> If we were to (slowly) convert our test suite framework to C, we could
> change that.
> 
> One idea would be to allow recreating commit history from something that
> looks like the output of `git log`, or even `git log --graph --oneline`,
> much like `git mktree` (which really should have been a test helper
> instead of a Git command, but I digress) takes something that looks like
> the output of `git ls-tree` and creates a tree object from it.

I've been playing with Graph::Easy (Perl5 module) in other
projects, and I also think the setup could be more easily
expressed with a declarative language (e.g. GNU make)

> Another thing that would be much easier if we moved more and more parts
> of the test suite framework to C: we could implement more powerful
> assertions, a lot more easily. For example, the trace output of a failed
> `test_i18ngrep` (or `mingw_test_cmp`!!!) could be made a lot more
> focused on what is going wrong than on cluttering the terminal window
> with almost useless lines which are tedious to sift through.

I fail to see how language choice here matters.  But then again,
I have plenty of experience writing bad code in ALL languages I
know :>

> Likewise, having a framework in C would make it a lot easier to improve
> debugging, e.g. by making test scripts "resumable" (guarded by an
> option, it could store a complete state, including a copy of the trash
> directory, before executing commands, which would allow "going back in
> time" and calling a failing command with a debugger, or with valgrind, or
> just seeing whether the command would still fail, i.e. whether the test
> case is flaky).

Resumability sounds like a perfect job for GNU make.
(that said, I don't know if you use make or something else to build gfw)

> In many ways, our current test suite seems to test Git's functionality
> as much as (core) contributors' abilities to implement test cases in
> Unix shell script, _correctly_, and maybe also contributors' patience.
> You could say that it tests for the wrong thing at least half of the
> time, by design.

Basic (not advanced) sh is already a prerequisite for using git.

Writing correct code and tests in ANY language is still a
challenge for me; but I'm least convinced a low-level language
such as C is the right language for writing integration tests in.

C is fine for unit tests, and maybe we can use more unit tests
and less integration tests.

> It might look like a somewhat less important project, but given that we
> exercise almost 150,000 test cases with every CI build, I think it does
> make sense to grind our axe for a while, so to say.

Something that would benefit both users and regular contributors
is the use and adoption of more batch and eval-friendly interfaces.
e.g. fast-import/export, cat-file --batch, for-each-ref --perl...

I haven't used hg since 2005, but I know "hg server" exists
nowadays to get rid of a lot of startup overhead in Mercurial,
and maybe git could steal that idea, too...

> Therefore, it might be a really good project to modernize our test
> suite. To take ideas from modern test frameworks such as Jest and try to
> bring them to C. Which means that new contributors would probably be
> better suited to work on this project than Git old-timers!
> 
> And the really neat thing about this project is that it could be done
> incrementally.

I hope to find time to hack some more batch/eval-friendly stuff
that can make scripting git more performant; but no idea on my
availability :<

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-23 18:07         ` Jeff King
@ 2019-09-24 14:25           ` Johannes Schindelin
  2019-09-24 15:33             ` Jeff King
  0 siblings, 1 reply; 62+ messages in thread
From: Johannes Schindelin @ 2019-09-24 14:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Emily Shaffer, Jonathan Tan, git

Hi Peff,

On Mon, 23 Sep 2019, Jeff King wrote:

> On Tue, Sep 17, 2019 at 01:23:18PM +0200, Johannes Schindelin wrote:
>
> > The only slightly challenging aspect might be that `merge-one-file` is
> > actually not a merge strategy, but it is used as helper to be passed to
> > `git merge-index` via the `-o <helper>` option, which makes it slightly
> > awkward to be implemented as a built-in. A better approach would
> > therefore be to special-case this value in `merge-index` and execute the
> > C code directly, without the detour of spawning a built-in.
>
> I think it could make sense for merge-index to be able to directly run
> the merge-one-file code[1]. But I think we'd want to keep its ability to
> run an arbitrary script, and for people to call merge-one-file
> separately, since right now you can do:
>
>   git merge-index my-script
>
> and have "my-script" do some processing of its own, then hand off more
> work to merge-one-file.

Oh, sorry, I did not mean to say that we should do away with this at
all! Rather, I meant to say that `merge-index` could detect when it was
asked to run `git-merge-one-file` and re-route to internal code instead
of spawning a process. If any other script/program was specified, it
should be spawned off, just like it is done today.

Thanks,
Dscho

> So the weird calling convention is actually a user-visible and
> potentially useful interface. So it would need a deprecation period
> before being removed.
>
> -Peff
>
> [1] Certainly doing it in-process would be faster for the common case of
>     "git merge-index git-merge-one-file", but I wonder if anybody really
>     does that. These days most people would just merge-recursive anyway.
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-23 18:19             ` Jeff King
@ 2019-09-24 14:30               ` Johannes Schindelin
  0 siblings, 0 replies; 62+ messages in thread
From: Johannes Schindelin @ 2019-09-24 14:30 UTC (permalink / raw)
  To: Jeff King; +Cc: SZEDER Gábor, Emily Shaffer, Jonathan Tan, git

Hi Peff,

On Mon, 23 Sep 2019, Jeff King wrote:

> On Mon, Sep 23, 2019 at 02:47:23PM +0200, Johannes Schindelin wrote:
>
> > The evaluation of the lazy prereq is indeed not different between Bash
> > or dash. It is nevertheless quite disruptive in the trace of a test
> > script, especially when it is evaluated for a test case that is skipped
> > explicitly via the `--run` option.
>
> That sounds like a bug: if we know we are not going to run the test
> anyway due to --run or GIT_TEST_SKIP, we should probably avoid checking
> the prereq at all.

Yep. Likewise, the `&&` chain checker seems to run even if the test case
is skipped.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-24 14:25           ` Johannes Schindelin
@ 2019-09-24 15:33             ` Jeff King
  2019-09-28  3:56               ` Junio C Hamano
  0 siblings, 1 reply; 62+ messages in thread
From: Jeff King @ 2019-09-24 15:33 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Emily Shaffer, Jonathan Tan, git

On Tue, Sep 24, 2019 at 04:25:45PM +0200, Johannes Schindelin wrote:

> > I think it could make sense for merge-index to be able to directly run
> > the merge-one-file code[1]. But I think we'd want to keep its ability to
> > run an arbitrary script, and for people to call merge-one-file
> > separately, since right now you can do:
> >
> >   git merge-index my-script
> >
> > and have "my-script" do some processing of its own, then hand off more
> > work to merge-one-file.
> 
> Oh, sorry, I did not mean to say that we should do away with this at
> all! Rather, I meant to say that `merge-index` could detect when it was
> asked to run `git-merge-one-file` and re-route to internal code instead
> of spawning a process. If any other script/program was specified, it
> should be spawned off, just like it is done today.

OK, great, then we are completely on the same page. :)

-Peff

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-23 21:28           ` Jeff King
@ 2019-09-24 17:07             ` Jonathan Tan
  2019-09-26  7:09               ` Jeff King
  0 siblings, 1 reply; 62+ messages in thread
From: Jonathan Tan @ 2019-09-24 17:07 UTC (permalink / raw)
  To: peff; +Cc: jonathantanmy, git

> On Mon, Sep 23, 2019 at 01:38:54PM -0700, Jonathan Tan wrote:
> 
> > I didn't have any concrete ideas so I didn't include those, but some
> > unrefined ideas:
> 
> One risk to a mentoring project like this is that the intern does a good
> job of steps 1-5, and then in step 6 we realize that the whole thing is
> not useful, and upstream doesn't want it. Which isn't to say the intern
> didn't learn something, and the project didn't benefit. Negative results
> can be useful; but it can also be demoralizing.

That's true. I think that libification is in itself a useful and
non-controversial goal.

> I'm not arguing that's going to be the case here. But I do think it's
> worth talking through these things a bit as part of thinking about
> proposals.

[snip]

> >  - index-pack has the CLI option to specify a message to be written into
> >    the .promisor file, but in my patch to write fetched refs to
> >    .promisor [1], I ended up making fetch-pack.c write the information
> >    because I didn't know how many refs were going to be written (and I
> >    didn't want to bump into CLI argument length limits). If we had this
> >    feature, I might have been able to pass a callback to index-pack that
> >    writes the list of refs once we have the fd into .promisor,
> >    eliminating some code duplication (but I haven't verified this).
> 
> That makes some sense. We could pass the data over a pipe, but obviously
> stdin is already in use to receive the pack here. Ideally we'd be able
> to pass multiple streams between the programs, but I think due to
> Windows support, we can't assume that arbitrary pipe descriptors will
> make it across the run-command boundary. So I think we'd be left with
> communicating via temporary files (which really isn't the worst thing in
> the world, but has its own complications).
> 
> >  - In your reply [2] to the above [1], you mentioned the possibility of
> >    keeping a list of cutoff points. One way of doing this, as I state in
> >    [3], is my original suggestion back in 2017 of one such
> >    repository-wide list. If we do this, it would be better for
> >    fetch-pack to handle this instead of index-pack, and it seems more
> >    efficient to me to have index-pack be able to pass objects to
> >    fetch-pack as they are inflated instead of fetch-pack rereading the
> >    compressed forms on disk (but again, I haven't verified this).
> 
> And this is the flip-side problem: we need to get data back, but we have
> only stdout, which is already in use (so we need some kind of protocol).
> That leads to things like the horrible NUL-byte added by 83558686ce
> (receive-pack: send keepalives during quiet periods, 2016-07-15).

Sounds good. With this, do you think that there is enough likelihood of
acceptance that we can move ahead with my proposed project?

Besides discussing the likelihood of patches being accepted/rejected,
should we record the result of discussion somewhere (or, if only the
mentor should give their ideas, for me to write in more detail)? I don't
recall a place in the Outreachy form to write this, so I just mentioned
the benefits in outline, but maybe I can just include it somewhere
anyway.

> > There are also the debuggability improvements of not having to deal with
> > 2 processes.
> 
> I think it can sometimes be easier to debug with two separate processes,
> because the input to index-pack is well-defined and can be repeated
> without hitting the network (though you do have to figure out how to
> record the network response, which can be non-trivial). I've also done
> similar things for running performance simulations.

Hmm...that's true, but I think this is a matter of degree. The input to
a lib function for index-pack can be similarly simple and well-defined
(a C interface that we can exercise using a throwaway patch to
test-tool, for example), but I agree that it usually won't be as simple
as input to CLI (but this is because of limitations that the CLI
imposes).

> > > [dropping unpack-objects]
> > >     Maybe that would be worth making part of the project?
> > 
> > I'm reluctant to do so because I don't want to increase the scope too
> > much - although if my project has relatively narrow scope for an
> > Outreachy project, we can do so. As for eliminating the utility of
> > having richer communication, I don't think so, because in the situations
> > where we require richer communication (right now, situations to do with
> > partial clone), we specifically run index-pack anyway.
> 
> Yeah, we're in kind of a weird situation there, where unpack-objects is
> used less and less. I wonder how many surprises are lurking where
> somebody reasoned about index-pack behavior, but unpack-objects may do
> something slightly differently (I know this came up when we looked at
> fsck-ing incoming objects for submodule vulnerabilities).
> 
> I kind of wonder if it would be reasonable to just always use index-pack
> for the sake of simplicity, even if it never learns to actually unpack
> objects. We've been doing that for years on the server side at GitHub
> without ill effects (I think the unpack route is slightly more efficient
> for a thin pack, but since it only kicks in when there are few objects
> anyway, I wonder how big an advantage it is in general).

This sounds reasonable to me.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-24 17:07             ` Jonathan Tan
@ 2019-09-26  7:09               ` Jeff King
  0 siblings, 0 replies; 62+ messages in thread
From: Jeff King @ 2019-09-26  7:09 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git

On Tue, Sep 24, 2019 at 10:07:46AM -0700, Jonathan Tan wrote:

> > >  - In your reply [2] to the above [1], you mentioned the possibility of
> > >    keeping a list of cutoff points. One way of doing this, as I state in
> > >    [3], is my original suggestion back in 2017 of one such
> > >    repository-wide list. If we do this, it would be better for
> > >    fetch-pack to handle this instead of index-pack, and it seems more
> > >    efficient to me to have index-pack be able to pass objects to
> > >    fetch-pack as they are inflated instead of fetch-pack rereading the
> > >    compressed forms on disk (but again, I haven't verified this).
> > 
> > And this is the flip-side problem: we need to get data back, but we have
> > only stdout, which is already in use (so we need some kind of protocol).
> > That leads to things like the horrible NUL-byte added by 83558686ce
> > (receive-pack: send keepalives during quiet periods, 2016-07-15).
> 
> Sounds good. With this, do you think that there is enough likelihood of
> acceptance that we can move ahead with my proposed project?
> 
> Besides discussing the likelihood of patches being accepted/rejected,
> should we record the result of discussion somewhere (or, if only the
> mentor should give their ideas, for me to write in more detail)? I don't
> recall a place in the Outreachy form to write this, so I just mentioned
> the benefits in outline, but maybe I can just include it somewhere
> anyway.

Yeah, I think it's OK to go ahead. I think an intern who is interested
in the project would get in touch with you either directly, or via the
list. So that gives some opportunity to discuss the ideas with them, and
to go into more detail on the proposal in an interactive way (it would
also be fine to point at this thread, too, of course).

-Peff

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-23 18:07   ` SZEDER Gábor
@ 2019-09-26  9:47     ` SZEDER Gábor
  2019-09-26 19:32       ` Johannes Schindelin
  2019-09-26 11:42     ` Johannes Schindelin
  1 sibling, 1 reply; 62+ messages in thread
From: SZEDER Gábor @ 2019-09-26  9:47 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Olga Telezhnaya, Christian Couder, Elijah Newren,
	Thomas Gummerer, Matheus Tavares Bernardino

On Mon, Sep 23, 2019 at 08:07:09PM +0200, SZEDER Gábor wrote:
> Here is one more idea for microprojects:
> 
>   Find a group of related preprocessor constants and turn them into an
>   enum.  Also find where those constants are stored in variables and
>   in structs and passed around as function parameters, and change the
>   type of those variables, fields and parameters to the new enum.

Peff thought elsewhere in the thread that this is a good idea, so I
wanted to try out how this microproject would work in practice, and to 
add a commit that we can show as a good example, and therefore set out 
to convert 'cache_entry->ce_flags' to an enum...  and will soon send
out a RFH patch, because I hit a snag, and am not sure what to do
about it :)  Anyway:

  - Finding a group of related preprocessor constants is trivial: the
    common prefixes and vertically aligned values of related constants
    stand out in output of 'git grep #define'.  Converting them to an
    enum is fairly trivial as well.

  - Converting various integer types of variables, struct fields, and
    function parameters to the new enum is... well, I wouldn't say
    that it's hard, but it's tedious (but 'ce_flags' with about 20
    related constants is perhaps the biggest we have).  OTOH, it's all 
    fairly mechanical, and doesn't require any understanding of Git
    internals.  Overall I think that this is indeed a micro-sized
    microproject, but...

  - The bad news is that I expect that reviewing the variable, etc.
    type conversions will be just as tedious, and it's quite easy to
    miss a conversion or three, so I'm afraid that several rerolls
    will be necessary.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-23 16:58             ` SZEDER Gábor
@ 2019-09-26 11:04               ` Johannes Schindelin
  2019-09-26 13:28                 ` SZEDER Gábor
  0 siblings, 1 reply; 62+ messages in thread
From: Johannes Schindelin @ 2019-09-26 11:04 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Emily Shaffer, Jeff King, Jonathan Tan, git

[-- Attachment #1: Type: text/plain, Size: 2681 bytes --]

Hi,

On Mon, 23 Sep 2019, SZEDER Gábor wrote:

> On Mon, Sep 23, 2019 at 02:47:23PM +0200, Johannes Schindelin wrote:
> >
> > On Tue, 17 Sep 2019, SZEDER Gábor wrote:
> >
> > > On Tue, Sep 17, 2019 at 01:23:18PM +0200, Johannes Schindelin wrote:
> > > > Also, things like the code tracing via `-x` (which relies on Bash
> > > > functionality in order to work properly,
> > >
> > > Not really.
> >
> > To work properly. What I meant was the trick we need to play with
> > `BASH_XTRACEFD`.
>
> I'm still unsure what BASH_XTRACEFD trick you mean.  AFAICT we don't
> play any tricks with it to make '-x' work properly, and indeed '-x'
> tracing works properly even without BASH_XTRACEFD (and to achive that
> we did have to play some tricks, but not any with BASH_XTRACEFD;
> perhaps these tricks are what you meant?).

It works okay some of the time. But IIRC `-x -V` requires the
`BASH_XTRACEFD` trick.

However, I start to feel like I am distracted deliberately from my main
argument: that shell scripting is simply an awful language to implement
a highly reliable test framework. That we need to rely on Bash, at least
some of the time, is just _one_ of the many shortcomings.

> > > > and which _still_ does not work as intended if your test case
> > > > evaluates a lazy prereq that has not been evaluated before
> > >
> > > I don't see any striking differences between the trace output of a test
> > > involving a lazy prereq from Bash or dash:
> > >
> > > [...]
> >
> > The evaluation of the lazy prereq is indeed not different between Bash
> > or dash. It is nevertheless quite disruptive in the trace of a test
> > script, especially when it is evaluated for a test case that is skipped
> > explicitly via the `--run` option.
>
> But then the actual issue is the unnecessary evaluation of the prereq
> even when the test framework could know in advance that the test case
> should be skipped anyway, and the trace from it is a mere side effect,
> no?

I forgot a crucial tidbit: if you run with `-x` and a lazy prereq is
evaluated, not only is the output disruptive, the trace is also turned
off after the lazy prereq, _before_ the actual test case is run. So you
don't see any trace of the actual test case.

In any case, I really do not want to see this thread derailed into
specifics of Bashisms and bugs in our test framework.

My main point should not be diluted: a test framework should be
implemented in a language that offers speedy execution of even
complicated logic, proper error checking, and higher data types (i.e.
other than "everything is a string"). Unix shell script is not it.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-23 18:07   ` SZEDER Gábor
  2019-09-26  9:47     ` SZEDER Gábor
@ 2019-09-26 11:42     ` Johannes Schindelin
  1 sibling, 0 replies; 62+ messages in thread
From: Johannes Schindelin @ 2019-09-26 11:42 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Jeff King, git, Olga Telezhnaya, Christian Couder, Elijah Newren,
	Thomas Gummerer, Matheus Tavares Bernardino

[-- Attachment #1: Type: text/plain, Size: 1346 bytes --]

Hi,

On Mon, 23 Sep 2019, SZEDER Gábor wrote:

> On Wed, Sep 04, 2019 at 03:41:15PM -0400, Jeff King wrote:
> > The project page has a section to point people in the right direction
> > for first-time contributions. I've left it blank for now, but I think it
> > makes sense to point one (or both) of:
> >
> >   - https://git-scm.com/docs/MyFirstContribution
> >
> >   - https://matheustavares.gitlab.io/posts/first-steps-contributing-to-git
> >
> > as well as a list of micro-projects (or at least instructions on how to
> > find #leftoverbits, though we'd definitely have to step up our labeling,
> > as I do not recall having seen one for a while).
>
> And we should make sure that all microprojects are indeed micro in
> size.  Matheus sent v8 of a 10 patch series in July that started out
> as a microproject back in February...

Indeed.

> Here is one more idea for microprojects:
>
>   Find a group of related preprocessor constants and turn them into an
>   enum.  Also find where those constants are stored in variables and
>   in structs and passed around as function parameters, and change the
>   type of those variables, fields and parameters to the new enum.

I agree that this is a good suggestion, and turned this #leftoverbits
into https://github.com/gitgitgadget/git/issues/357.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-24  0:55         ` Eric Wong
@ 2019-09-26 12:45           ` Johannes Schindelin
  2019-09-30  8:55             ` Eric Wong
  2019-09-28  4:01           ` Junio C Hamano
  1 sibling, 1 reply; 62+ messages in thread
From: Johannes Schindelin @ 2019-09-26 12:45 UTC (permalink / raw)
  To: Eric Wong; +Cc: Emily Shaffer, Jeff King, Jonathan Tan, git

Hi Eric,

On Tue, 24 Sep 2019, Eric Wong wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > On Mon, 16 Sep 2019, Emily Shaffer wrote:
> > >  - try and make progress towards running many tests from a single test
> > >    file in parallel - maybe this is too big, I'm not sure if we know how
> > >    many of our tests are order-dependent within a file for now...
> >
> > Another, potentially more rewarding, project would be to modernize our
> > test suite framework, so that it is not based on Unix shell scripting,
> > but on C instead.
>
> I worry more C would reduce the amount of contributors (some of
> the C rewrites already scared me off hacking years ago).  I
> figure more users are familiar with sh than C.

Seeing as most of the patches/patch series require contributors not only
to write test cases in Unix shell script, but also to patch or implement
code in C, I fail to be concerned about that.

> It would also increase the disparity between tests and use of
> actual users from the command-line.

I find it really endearing whenever I hear anybody talking about Git
users as if every single one of them crafted extensive shell scripts
around Git. In my experience, most of the people who do that gather on
this here mailing list, or on a nearby one, and that is but a tiny
fraction of Git users.

Besides, it was my understanding that Git's test suite tries to prevent
regressions in Git's code, whether it is called in scripts or not. As
such, it would not matter _how_ that functionality is tested. Does that
not match your understanding?

> > The fact that it is based on Unix shell scripting not only costs a lot
> > of speed, especially on Windows, it also limits us quite a bit, and I am
> > talking about a lot more than just the awkwardness of having to think
> > about options of BSD vs GNU variants of common command-line tools.
>
> I agree that it costs a lot of time, and I'm even on Linux using
> dash as /bin/sh + eatmydata (but ancient laptop)

One thing that I meant to play with (but which is hampered by too many
parts of Git being implemented in Unix shell/Perl, still, therefore
making code coverage analysis hard) is Test Impact Analysis
(https://docs.microsoft.com/en-us/azure/devops/pipelines/test/test-impact-analysis).
In short, a way to avoid running tests the code touched by them was
already tested before. Example: if I change the `README.md`, no
regression test needs to be run at all. If I change `git-p4.py`, the
majority of test scripts can be skipped, only t98*.sh need to be run
(and maybe not even all of them).

There is a lot of work to be done on the built-in'ification, still,
before that becomes feasible, of course.

> > For example, many, many, if not all, test cases, spend the majority of
> > their code on setting up specific scenarios. I don't know about you,
> > but personally I have to dive into many of them when things fail (and I
> > _dread_ the numbers 0021, 0025 and 3070, let me tell you) and I really
> > have to say that most of that code is hard to follow and does not make
> > it easy to form a mental model of what the code tries to accomplish.
> >
> > To address this, a while ago Thomas Rast started to use `fast-export`ed
> > commit histories in test scripts (see e.g. `t/t3206/history.export`). I
> > still find that this fails to make it easier for occasional readers to
> > understand the ideas underlying the test cases.
> >
> > Another approach is to document heavily the ideas first, then use code
> > to implement them. For example, t3430 starts with this:
> >
> > 	[...]
> >
> > 	Initial setup:
> >
> > 	    -- B --                   (first)
> > 	   /       \
> > 	 A - C - D - E - H            (master)
> > 	   \    \       /
> > 	    \    F - G                (second)
> > 	     \
> > 	      Conflicting-G
> >
> > 	[...]
> >
> > 	test_commit A &&
> > 	git checkout -b first &&
> > 	test_commit B &&
> > 	git checkout master &&
> > 	test_commit C &&
> > 	test_commit D &&
> > 	git merge --no-commit B &&
> > 	test_tick &&
> > 	git commit -m E &&
> > 	git tag -m E E &&
> > 	git checkout -b second C &&
> > 	test_commit F &&
> > 	test_commit G &&
> > 	git checkout master &&
> > 	git merge --no-commit G &&
> > 	test_tick &&
> > 	git commit -m H &&
> > 	git tag -m H H &&
> > 	git checkout A &&
> > 	test_commit conflicting-G G.t
> >
> > 	[...]
> >
> > While this is _somewhat_ better than having only the code, I am still
> > unhappy about it: this wall of `test_commit` lines interspersed with
> > other commands is very hard to follow.
>
> Agreed.  More on the readability part below...
>
> As far as speeding that up, I think moving some parts
> of test setup to Makefiles + fast-import/fast-export would give
> us a nice balance of speed + maintainability:
>
> 1. initial setup is done using normal commands (or graph drawing tool)
> 2. the result of setup is "built" with fast-export
> 3. test uses fast-import

I actually talked about this in my mail. If you find it easy to deduce
the intent behind a commit history that was exported via fast-export,
more power to you. (Was the committer name crucial? The file name? Or
the temporal order of the commits?)

In contrast, I find it very challenging, myself. And please keep in mind
that the first thing any contributor needs to do who sees a failing
regression test (where the failure is most likely caused by the patch
they plan on contributing): understand what the heck the regression test
case is trying to ensure. The harder the code makes that, the worse it
does its primary job: to (help) prevent regressions.

So no, I am not at all on board with moving to fast-imported commit
histories in Git's test suite. They provide some convenience to the
authors of those regression tests, which is not the audience you need to
cater for in this case: instead, regression tests should make it not
only easy to catch, but _especially_ easy to fix, regressions. And that
audience would pay dearly for that erstwhile convenience.

> Makefile rules would prevent subsequent test runs from repeating
> 1. and 2.

That is a cute idea, until you realize that the number of developers
fluent in `make` is even smaller than the number of developers fluent in
`C`. In other words, you would again _increase_ the the number of
prerequisites instead of reducing it.

> > If we were to (slowly) convert our test suite framework to C, we could
> > change that.
> >
> > One idea would be to allow recreating commit history from something that
> > looks like the output of `git log`, or even `git log --graph --oneline`,
> > much like `git mktree` (which really should have been a test helper
> > instead of a Git command, but I digress) takes something that looks like
> > the output of `git ls-tree` and creates a tree object from it.
>
> I've been playing with Graph::Easy (Perl5 module) in other
> projects, and I also think the setup could be more easily
> expressed with a declarative language (e.g. GNU make)

I am dubious. But hey, if you show me something that looks _dead_ easy
to understand, and even easier to write for new contributors, who am I
to object?

But mind, I am not fluent in Perl. I can probably hack my way through
`git-svn`, but that's a far cry from knowing what I am doing there.

And wouldn't that _again_ increase the number of prerequisites on
contributors? I mean, you sounded genuinely concerned about that at the
beginning of the mail. And I share that concern.

> > Another thing that would be much easier if we moved more and more parts
> > of the test suite framework to C: we could implement more powerful
> > assertions, a lot more easily. For example, the trace output of a failed
> > `test_i18ngrep` (or `mingw_test_cmp`!!!) could be made a lot more
> > focused on what is going wrong than on cluttering the terminal window
> > with almost useless lines which are tedious to sift through.
>
> I fail to see how language choice here matters.

In Unix shell script, we either trace (everything) or we don't. In C,
you can be a lot more fine-grained with log messages. A *lot*.

We even already have such a fine-grained log machinery in place, see
`trace.h`.

Also, because of the ability to perform more sophisticated locking, lazy
prerequisites can easily be cached in C, whereas that is not so easy in
Unix shell (and hence it is not done).

> > Likewise, having a framework in C would make it a lot easier to improve
> > debugging, e.g. by making test scripts "resumable" (guarded by an
> > option, it could store a complete state, including a copy of the trash
> > directory, before executing commands, which would allow "going back in
> > time" and calling a failing command with a debugger, or with valgrind, or
> > just seeing whether the command would still fail, i.e. whether the test
> > case is flaky).
>
> Resumability sounds like a perfect job for GNU make.

Umm.

So let me give you an example of something I had to debug recently. A
`git stash apply` marked files outside the sparse checkout as deleted,
when they actually had staged changes during the `git stash` call.

If this had been a regression test case, it would have looked like this:

test_expect_success 'stash handles skip-worktree entries nicely' '
        test_commit A &&
	echo changed >A.t &&
	git add A.t &&
	git update-index --skip-worktree A.t &&
	rm A.t &&
	git stash &&

	: this should not mark A.t as deleted &&
	git stash apply &&
	test -n "$(git ls-files A.t)"
'

Now, the problem would have occurred in the very last command: `A.t`
would have been missing from the index.

In order to debug this, you would have had to put in "breakpoints"
(inserting `false &&` at strategic places), or prefix commands with
`debug` to start them in GDB, then re-run the test case.

Lather, rinse and repeat, until you figured out that `git stash` was
failing to record `A.t` properly.

Then dig into that, recreating the same worktree situation every time
you run `git stash`, until you find out that there is a call to `git
update-index -z --add --remove --stdin` that removes that file.

Further investigation would show you that this command pretty much does
what it is told to, because it is fed `A.t` in its `stdin`.

The next step would probably be to go back even one more step and see
that `diff_index()` reported this file as modified between the index and
`HEAD`.

Slowly, you would form an understanding of what is going wrong, and you
would have to go back and forth between blaming `diff_index()` and
`update-index`, or the options `git stash` passes to them.

You would have to recreate the worktree many, many times, in order to
dig in deep, and of course you would need to understand the intention
not only of the regression test, but also of the code it calls.

In this instance, it is merely tedious, but possible. I know, because I
did it. For flaky tests, not so much.

*That* is the scenario I tried to get at.

Writing the test cases in `make` would not help that. Not one bit. It
would actually make things a lot worse.

> (that said, I don't know if you use make or something else to build
> gfw)

We are talking about the test suite, yes?

Not about building Git? Because it does not matter whether we use `make`
or not (we do by default, although we also have an option to build in
Visual Studio and/or via MSBuild).

On Windows, just like on Linux, we use a Unix shell interpreter (Bash).
Sure, to run the entire test suite, we use `make` (sometimes in
conjunction with `prove`), but the tests themselves are written in Unix
shell script, so I have a hard time imagining a different method to run
them -- whether on Windows or not -- than to use a Unix shell
interpreter such as Bash.

> > In many ways, our current test suite seems to test Git's
> > functionality as much as (core) contributors' abilities to implement
> > test cases in Unix shell script, _correctly_, and maybe also
> > contributors' patience.  You could say that it tests for the wrong
> > thing at least half of the time, by design.
>
> Basic (not advanced) sh is already a prerequisite for using git.

Well, if you are happy with that prerequisite to be set in stone, I am
not. Why should any Git user *need* to know sh?

> Writing correct code and tests in ANY language is still a
> challenge for me; but I'm least convinced a low-level language
> such as C is the right language for writing integration tests in.

I would be delighted to go for a proper, well-maintained test framework
such as Jest. But of course, that would not be accepted in this project.
So I won't even think about it.

> C is fine for unit tests, and maybe we can use more unit tests and
> less integration tests.

We don't have integration tests.

Unless your concept of what constitutes an "integration test" is very
different from mine. For me, an integration test would set up an
environment that involves multiple points of failure, e.g. setting up an
SSH server on Linux and accessing that via Git from Ubuntu. Or setting
up a web server with a self-signed certificate, import the public key
into the Windows Certificate Store and then accessing the server both
using OpenSSL and Secure Channel (the native Windows way to communicate
via TLS).

There is nothing even close to that in Git's test suite.

> > It might look like a somewhat less important project, but given that we
> > exercise almost 150,000 test cases with every CI build, I think it does
> > make sense to grind our axe for a while, so to say.
>
> Something that would benefit both users and regular contributors
> is the use and adoption of more batch and eval-friendly interfaces.
> e.g. fast-import/export, cat-file --batch, for-each-ref --perl...

Given how hard it is to deduce the intention behind such invocations, I
am rather doubtful that this would improve our test suite.

> I haven't used hg since 2005, but I know "hg server" exists
> nowadays to get rid of a lot of startup overhead in Mercurial,
> and maybe git could steal that idea, too...

I have no idea what `hg server` does. Care to enlighten me?

> > Therefore, it might be a really good project to modernize our test
> > suite. To take ideas from modern test frameworks such as Jest and try to
> > bring them to C. Which means that new contributors would probably be
> > better suited to work on this project than Git old-timers!
> >
> > And the really neat thing about this project is that it could be done
> > incrementally.
>
> I hope to find time to hack some more batch/eval-friendly stuff
> that can make scripting git more performant; but no idea on my
> availability :<

Knock yourself out, if you enjoy that type of project. And who knows,
maybe you will convince me yet that it benefits the tests...

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-26 11:04               ` Johannes Schindelin
@ 2019-09-26 13:28                 ` SZEDER Gábor
  2019-09-26 19:39                   ` Johannes Schindelin
  0 siblings, 1 reply; 62+ messages in thread
From: SZEDER Gábor @ 2019-09-26 13:28 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Emily Shaffer, Jeff King, Jonathan Tan, git

On Thu, Sep 26, 2019 at 01:04:48PM +0200, Johannes Schindelin wrote:
> > > > > Also, things like the code tracing via `-x` (which relies on Bash
> > > > > functionality in order to work properly,
> > > >
> > > > Not really.
> > >
> > > To work properly. What I meant was the trick we need to play with
> > > `BASH_XTRACEFD`.
> >
> > I'm still unsure what BASH_XTRACEFD trick you mean.  AFAICT we don't
> > play any tricks with it to make '-x' work properly, and indeed '-x'
> > tracing works properly even without BASH_XTRACEFD (and to achive that
> > we did have to play some tricks, but not any with BASH_XTRACEFD;
> > perhaps these tricks are what you meant?).
> 
> It works okay some of the time.

As far as I can tell it works all the time.

(Well, Ok, with the exception of t1510, but only because back then I
couldn't be bothered to figure out how that test script works.  But
even that script handles '-x' without BASH_XTRACEFD gracefully, and
it's safe to run the whole test suite with '-x'.)

>  But IIRC `-x -V` requires the `BASH_XTRACEFD` trick.

No, it doesn't; '-V' should have no effect on the '-x' trace
whatsoever.

As soon as I fixed running the test suite with '-x' and /bin/sh I
added GIT_TEST_OPTS="--verbose-log -x" to my 'config.mak' and to our
CI scripts.  The default shell running the test suite in our Linux CI
jobs is dash, and in our macOS jobs it's an ancient Bash version that
doesn't yet have BASH_XTRACEFD.  As far as I know they all work as
they should.

> However, I start to feel like I am distracted deliberately from my main
> argument

That was definitely not my intention.  However, if there are any open
issues with '-x', then I do want to know about it and fix it sooner
rather than later.  Alas, I still don't have the slightest clue about
what your issue actually is.

> I forgot a crucial tidbit: if you run with `-x` and a lazy prereq is
> evaluated, not only is the output disruptive, the trace is also turned
> off after the lazy prereq, _before_ the actual test case is run. So you
> don't see any trace of the actual test case.

Tracing is always turned on before running the test case,
independently from whether a lazy prereq was evaluated or not, so we
do always see the trace of the actual test case.  Notice the '+ true'
and '++ true' lines in my earlier reply including the test traces:
those lines are the trace of the actual test case.

  https://public-inbox.org/git/20190917120230.GA27531@szeder.dev/


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-26  9:47     ` SZEDER Gábor
@ 2019-09-26 19:32       ` Johannes Schindelin
  2019-09-26 21:54         ` SZEDER Gábor
  0 siblings, 1 reply; 62+ messages in thread
From: Johannes Schindelin @ 2019-09-26 19:32 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Jeff King, git, Olga Telezhnaya, Christian Couder, Elijah Newren,
	Thomas Gummerer, Matheus Tavares Bernardino

[-- Attachment #1: Type: text/plain, Size: 1876 bytes --]

Hi,

On Thu, 26 Sep 2019, SZEDER Gábor wrote:

> On Mon, Sep 23, 2019 at 08:07:09PM +0200, SZEDER Gábor wrote:
> > Here is one more idea for microprojects:
> >
> >   Find a group of related preprocessor constants and turn them into an
> >   enum.  Also find where those constants are stored in variables and
> >   in structs and passed around as function parameters, and change the
> >   type of those variables, fields and parameters to the new enum.
>
> Peff thought elsewhere in the thread that this is a good idea, so I
> wanted to try out how this microproject would work in practice, and to
> add a commit that we can show as a good example, and therefore set out
> to convert 'cache_entry->ce_flags' to an enum...  and will soon send
> out a RFH patch, because I hit a snag, and am not sure what to do
> about it :)  Anyway:
>
>   - Finding a group of related preprocessor constants is trivial: the
>     common prefixes and vertically aligned values of related constants
>     stand out in output of 'git grep #define'.  Converting them to an
>     enum is fairly trivial as well.
>
>   - Converting various integer types of variables, struct fields, and
>     function parameters to the new enum is... well, I wouldn't say
>     that it's hard, but it's tedious (but 'ce_flags' with about 20
>     related constants is perhaps the biggest we have).  OTOH, it's all
>     fairly mechanical, and doesn't require any understanding of Git
>     internals.  Overall I think that this is indeed a micro-sized
>     microproject, but...
>
>   - The bad news is that I expect that reviewing the variable, etc.
>     type conversions will be just as tedious, and it's quite easy to
>     miss a conversion or three, so I'm afraid that several rerolls
>     will be necessary.

I thought Coccinelle could help with that?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-26 13:28                 ` SZEDER Gábor
@ 2019-09-26 19:39                   ` Johannes Schindelin
  2019-09-26 21:44                     ` SZEDER Gábor
  0 siblings, 1 reply; 62+ messages in thread
From: Johannes Schindelin @ 2019-09-26 19:39 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Emily Shaffer, Jeff King, Jonathan Tan, git

[-- Attachment #1: Type: text/plain, Size: 1740 bytes --]

Hi,

On Thu, 26 Sep 2019, SZEDER Gábor wrote:

> On Thu, Sep 26, 2019 at 01:04:48PM +0200, Johannes Schindelin wrote:
> > > > > > Also, things like the code tracing via `-x` (which relies on Bash
> > > > > > functionality in order to work properly,
> > > > >
> > > > > Not really.
> > > >
> > > > To work properly. What I meant was the trick we need to play with
> > > > `BASH_XTRACEFD`.
> > >
> > > I'm still unsure what BASH_XTRACEFD trick you mean.  AFAICT we don't
> > > play any tricks with it to make '-x' work properly, and indeed '-x'
> > > tracing works properly even without BASH_XTRACEFD (and to achive that
> > > we did have to play some tricks, but not any with BASH_XTRACEFD;
> > > perhaps these tricks are what you meant?).
> >
> > It works okay some of the time.
>
> As far as I can tell it works all the time.

I must be misinterpreting this part of `t/test-lib.sh`, then:

-- snipsnap --
if test -n "$trace" && test -n "$test_untraceable"
then
	# '-x' tracing requested, but this test script can't be reliably
	# traced, unless it is run with a Bash version supporting
	# BASH_XTRACEFD (introduced in Bash v4.1).
	#
	# Perform this version check _after_ the test script was
	# potentially re-executed with $TEST_SHELL_PATH for '--tee' or
	# '--verbose-log', so the right shell is checked and the
	# warning is issued only once.
	if test -n "$BASH_VERSION" && eval '
	     test ${BASH_VERSINFO[0]} -gt 4 || {
	       test ${BASH_VERSINFO[0]} -eq 4 &&
	       test ${BASH_VERSINFO[1]} -ge 1
	     }
	   '
	then
		: Executed by a Bash version supporting BASH_XTRACEFD.  Good.
	else
		echo >&2 "warning: ignoring -x; '$0' is untraceable without BASH_XTRACEFD"
		trace=
	fi
fi

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-26 19:39                   ` Johannes Schindelin
@ 2019-09-26 21:44                     ` SZEDER Gábor
  2019-09-27 22:18                       ` Jeff King
  0 siblings, 1 reply; 62+ messages in thread
From: SZEDER Gábor @ 2019-09-26 21:44 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Emily Shaffer, Jeff King, Jonathan Tan, git

On Thu, Sep 26, 2019 at 09:39:58PM +0200, Johannes Schindelin wrote:
> Hi,
> 
> On Thu, 26 Sep 2019, SZEDER Gábor wrote:
> 
> > On Thu, Sep 26, 2019 at 01:04:48PM +0200, Johannes Schindelin wrote:
> > > > > > > Also, things like the code tracing via `-x` (which relies on Bash
> > > > > > > functionality in order to work properly,
> > > > > >
> > > > > > Not really.
> > > > >
> > > > > To work properly. What I meant was the trick we need to play with
> > > > > `BASH_XTRACEFD`.
> > > >
> > > > I'm still unsure what BASH_XTRACEFD trick you mean.  AFAICT we don't
> > > > play any tricks with it to make '-x' work properly, and indeed '-x'
> > > > tracing works properly even without BASH_XTRACEFD (and to achive that
> > > > we did have to play some tricks, but not any with BASH_XTRACEFD;
> > > > perhaps these tricks are what you meant?).
> > >
> > > It works okay some of the time.
> >
> > As far as I can tell it works all the time.
> 
> I must be misinterpreting this part of `t/test-lib.sh`, then:

Ok, let me try to clarify.

There are a couple of things that we can't do in our tests without
BASH_XTRACEFD, e.g. redirecting the standard error of a subshell or a
loop to a file and then check that file with 'test_cmp' or
'test_must_be_empty'.  With tracing enabled but without BASH_XTRACEFD,
the trace of the commands executed within the subshell or loop end up
in that file as well, and cause failure (grepping through that file is
mostly ok, though).  Back then we had 23 test cases failing because
they were doing things like this and needed to be fixed, so
considering the total number of test cases we only rarely used such
problematic constructs.

Still, as I recall, Peff was concerned that these limitations might
lead to maintenance burden on the long run, so I decided to add an
escape hatch, just in case someone constructs such an elaborate test
script, where redirecting the stderr of a compound command could
considerably simplify the tests. 

That snippet of code that you copied is this escape hatch: if 
$test_untraceable is set to a non-empty value before sourcing
'test-lib.sh', then tracing will only be enabled if BASH_XTRACEFD is
available.

All that was over a year and a half ago, and these limitations weren't
a maintenance burden at all so far, and nobody needed that escape
hatch.

Well, nobody except me, that is :)  When I saw back then that t1510
saves the stderr of nested function calls with 7 parameters, I
shrugged in disgust, admitted defeat, and simply reached for that
escape hatch: partly because I couldn't be bothered to figure out how
that test script works, but more importantly because I didn't want to
risk that any cleanup inadvertently hides a bug in the future.

So that's the only user that piece of code ever had, and I certainly
hope that no other test script will ever grow so complicated that it
will need this escape hatch.  I would actually prefer to remove it,
but t1510 must be cleaned up first...  so I'm afraid it will be with
us for a while.


> -- snipsnap --
> if test -n "$trace" && test -n "$test_untraceable"
> then
> 	# '-x' tracing requested, but this test script can't be reliably
> 	# traced, unless it is run with a Bash version supporting
> 	# BASH_XTRACEFD (introduced in Bash v4.1).
> 	#
> 	# Perform this version check _after_ the test script was
> 	# potentially re-executed with $TEST_SHELL_PATH for '--tee' or
> 	# '--verbose-log', so the right shell is checked and the
> 	# warning is issued only once.
> 	if test -n "$BASH_VERSION" && eval '
> 	     test ${BASH_VERSINFO[0]} -gt 4 || {
> 	       test ${BASH_VERSINFO[0]} -eq 4 &&
> 	       test ${BASH_VERSINFO[1]} -ge 1
> 	     }
> 	   '
> 	then
> 		: Executed by a Bash version supporting BASH_XTRACEFD.  Good.
> 	else
> 		echo >&2 "warning: ignoring -x; '$0' is untraceable without BASH_XTRACEFD"
> 		trace=
> 	fi
> fi


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-26 19:32       ` Johannes Schindelin
@ 2019-09-26 21:54         ` SZEDER Gábor
  0 siblings, 0 replies; 62+ messages in thread
From: SZEDER Gábor @ 2019-09-26 21:54 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Jeff King, git, Olga Telezhnaya, Christian Couder, Elijah Newren,
	Thomas Gummerer, Matheus Tavares Bernardino

On Thu, Sep 26, 2019 at 09:32:35PM +0200, Johannes Schindelin wrote:
> Hi,
> 
> On Thu, 26 Sep 2019, SZEDER Gábor wrote:
> 
> > On Mon, Sep 23, 2019 at 08:07:09PM +0200, SZEDER Gábor wrote:
> > > Here is one more idea for microprojects:
> > >
> > >   Find a group of related preprocessor constants and turn them into an
> > >   enum.  Also find where those constants are stored in variables and
> > >   in structs and passed around as function parameters, and change the
> > >   type of those variables, fields and parameters to the new enum.
> >
> > Peff thought elsewhere in the thread that this is a good idea, so I
> > wanted to try out how this microproject would work in practice, and to
> > add a commit that we can show as a good example, and therefore set out
> > to convert 'cache_entry->ce_flags' to an enum...  and will soon send
> > out a RFH patch, because I hit a snag, and am not sure what to do
> > about it :)  Anyway:
> >
> >   - Finding a group of related preprocessor constants is trivial: the
> >     common prefixes and vertically aligned values of related constants
> >     stand out in output of 'git grep #define'.  Converting them to an
> >     enum is fairly trivial as well.
> >
> >   - Converting various integer types of variables, struct fields, and
> >     function parameters to the new enum is... well, I wouldn't say
> >     that it's hard, but it's tedious (but 'ce_flags' with about 20
> >     related constants is perhaps the biggest we have).  OTOH, it's all
> >     fairly mechanical, and doesn't require any understanding of Git
> >     internals.  Overall I think that this is indeed a micro-sized
> >     microproject, but...
> >
> >   - The bad news is that I expect that reviewing the variable, etc.
> >     type conversions will be just as tedious, and it's quite easy to
> >     miss a conversion or three, so I'm afraid that several rerolls
> >     will be necessary.
> 
> I thought Coccinelle could help with that?

Maybe it could, I don't know.  I mean, it should be able to e.g.
change the data type of the function parameter 'param' if the function
body contains a

  param & <any named value of the new enum or their combinarion>

statement, or similar statements with operators '|=', '&=', or '='.
I have no idea how to tell Coccinelle to do that.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-26 21:44                     ` SZEDER Gábor
@ 2019-09-27 22:18                       ` Jeff King
  2019-10-09 17:25                         ` SZEDER Gábor
  0 siblings, 1 reply; 62+ messages in thread
From: Jeff King @ 2019-09-27 22:18 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Johannes Schindelin, Emily Shaffer, Jonathan Tan, git

On Thu, Sep 26, 2019 at 11:44:48PM +0200, SZEDER Gábor wrote:

> All that was over a year and a half ago, and these limitations weren't
> a maintenance burden at all so far, and nobody needed that escape
> hatch.
> 
> Well, nobody except me, that is :)  When I saw back then that t1510
> saves the stderr of nested function calls with 7 parameters, I
> shrugged in disgust, admitted defeat, and simply reached for that
> escape hatch: partly because I couldn't be bothered to figure out how
> that test script works, but more importantly because I didn't want to
> risk that any cleanup inadvertently hides a bug in the future.
> 
> So that's the only user that piece of code ever had, and I certainly
> hope that no other test script will ever grow so complicated that it
> will need this escape hatch.  I would actually prefer to remove it,
> but t1510 must be cleaned up first...  so I'm afraid it will be with
> us for a while.

I'm actually surprised we haven't run into it more. We have some custom
test scripts in our fork of Git at GitHub. We usually just use
TEST_SHELL_PATH=bash, but curious, I tried running with dash and "-x",
and three of them failed.

Probably they'd be easy enough to fix (and they're out of tree anyway),
so I'm not really arguing against the escape hatch exactly. Mostly I'm
just surprised that if I introduced 3 cases (out of probably a dozen
scripts), I'm surprised that more contributors aren't accidentally doing
so upstream.

-Peff

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-24 15:33             ` Jeff King
@ 2019-09-28  3:56               ` Junio C Hamano
  0 siblings, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2019-09-28  3:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, Emily Shaffer, Jonathan Tan, git

Jeff King <peff@peff.net> writes:

> On Tue, Sep 24, 2019 at 04:25:45PM +0200, Johannes Schindelin wrote:
>
>> > I think it could make sense for merge-index to be able to directly run
>> > the merge-one-file code[1]. But I think we'd want to keep its ability to
>> > run an arbitrary script, and for people to call merge-one-file
>> > separately, since right now you can do:
>> >
>> >   git merge-index my-script
>> >
>> > and have "my-script" do some processing of its own, then hand off more
>> > work to merge-one-file.
>> 
>> Oh, sorry, I did not mean to say that we should do away with this at
>> all! Rather, I meant to say that `merge-index` could detect when it was
>> asked to run `git-merge-one-file` and re-route to internal code instead
>> of spawning a process. If any other script/program was specified, it
>> should be spawned off, just like it is done today.
>
> OK, great, then we are completely on the same page. :)

I wondered briefly if we want tospecial case the string
'git-merge-one-file' (iow, teaching merge-index a new option
"--use-builtin-merge-one-file" and update our own use in
git-merge-octopus.sh, git-merge-resolve.sh, etc. would be safer).

But "git merge-index git-merge-one-file" called in the context of
scrpted Porcelain has the directory that has *OUR*
git-merge-one-file as the first element on $PATH, I think, so it may
be perfectly safe to use the "ah, the command name we are asked to
run happens to be git-merge-one-file, so let's use the internal
version instead without spawning" short-cut.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-24  0:55         ` Eric Wong
  2019-09-26 12:45           ` Johannes Schindelin
@ 2019-09-28  4:01           ` Junio C Hamano
  1 sibling, 0 replies; 62+ messages in thread
From: Junio C Hamano @ 2019-09-28  4:01 UTC (permalink / raw)
  To: Eric Wong
  Cc: Johannes Schindelin, Emily Shaffer, Jeff King, Jonathan Tan, git

Eric Wong <e@80x24.org> writes:

> C is fine for unit tests, and maybe we can use more unit tests
> and less integration tests.

Nicely put.  I often find it somewhat disturbing that what some of
the t/helper/ tests are trying to exercise is at too low a level
that the distance from the real-world observable effect is too many
hops detached.  For unit tests (of an API, for example), that is
exactly what we want.  For a test of an entire command, it feels
like scratching foot from outside while still wearing a shoe.

> I hope to find time to hack some more batch/eval-friendly stuff
> that can make scripting git more performant; but no idea on my
> availability :<


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-26 12:45           ` Johannes Schindelin
@ 2019-09-30  8:55             ` Eric Wong
  0 siblings, 0 replies; 62+ messages in thread
From: Eric Wong @ 2019-09-30  8:55 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Emily Shaffer, Jeff King, Jonathan Tan, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> On Tue, 24 Sep 2019, Eric Wong wrote:
> > Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > > On Mon, 16 Sep 2019, Emily Shaffer wrote:
> > > >  - try and make progress towards running many tests from a single test
> > > >    file in parallel - maybe this is too big, I'm not sure if we know how
> > > >    many of our tests are order-dependent within a file for now...
> > >
> > > Another, potentially more rewarding, project would be to modernize our
> > > test suite framework, so that it is not based on Unix shell scripting,
> > > but on C instead.
> >
> > I worry more C would reduce the amount of contributors (some of
> > the C rewrites already scared me off hacking years ago).  I
> > figure more users are familiar with sh than C.
> 
> Seeing as most of the patches/patch series require contributors not only
> to write test cases in Unix shell script, but also to patch or implement
> code in C, I fail to be concerned about that.

My point was that it was easier to experiment using a less
tedious and verbose language than C.  git already has a lot of
commands and options, but they're mostly documented.  Having to
learn our internal C APIs on top of that is a lot of cognitive
overhead (and I'm still learning our APIs, myself).

> > It would also increase the disparity between tests and use of
> > actual users from the command-line.
> 
> I find it really endearing whenever I hear anybody talking about Git
> users as if every single one of them crafted extensive shell scripts
> around Git. In my experience, most of the people who do that gather on
> this here mailing list, or on a nearby one, and that is but a tiny
> fraction of Git users.

I only mean stuff like: "git add foo && git commit -m ..."
in tests, which is standard command-line usage and using
our documented UI.

Fwiw, I don't have extensive scripts or customizations
around git or any of the tools I use, either.

> Besides, it was my understanding that Git's test suite tries to prevent
> regressions in Git's code, whether it is called in scripts or not. As
> such, it would not matter _how_ that functionality is tested. Does that
> not match your understanding?

That matches my understanding.  My concern that most of our
functionality is exposed to users as commands; so new developers
would feel more comfortable recreating tests off failures
they encounter under normal use, instead of having to learn
C API internals.

> > > The fact that it is based on Unix shell scripting not only costs a lot
> > > of speed, especially on Windows, it also limits us quite a bit, and I am
> > > talking about a lot more than just the awkwardness of having to think
> > > about options of BSD vs GNU variants of common command-line tools.
> >
> > I agree that it costs a lot of time, and I'm even on Linux using
> > dash as /bin/sh + eatmydata (but ancient laptop)
> 
> One thing that I meant to play with (but which is hampered by too many
> parts of Git being implemented in Unix shell/Perl, still, therefore
> making code coverage analysis hard) is Test Impact Analysis
> (https://docs.microsoft.com/en-us/azure/devops/pipelines/test/test-impact-analysis).
> In short, a way to avoid running tests the code touched by them was
> already tested before. Example: if I change the `README.md`, no
> regression test needs to be run at all. If I change `git-p4.py`, the
> majority of test scripts can be skipped, only t98*.sh need to be run
> (and maybe not even all of them).

Cool.  I wonder how much effort it would take to do with gcov +
Devel::Cover.

> There is a lot of work to be done on the built-in'ification, still,
> before that becomes feasible, of course.
> 
> > > For example, many, many, if not all, test cases, spend the majority of
> > > their code on setting up specific scenarios. I don't know about you,
> > > but personally I have to dive into many of them when things fail (and I
> > > _dread_ the numbers 0021, 0025 and 3070, let me tell you) and I really
> > > have to say that most of that code is hard to follow and does not make
> > > it easy to form a mental model of what the code tries to accomplish.
> > >
> > > To address this, a while ago Thomas Rast started to use `fast-export`ed
> > > commit histories in test scripts (see e.g. `t/t3206/history.export`). I
> > > still find that this fails to make it easier for occasional readers to
> > > understand the ideas underlying the test cases.
> > >
> > > Another approach is to document heavily the ideas first, then use code
> > > to implement them. For example, t3430 starts with this:
> > >
> > > 	[...]
> > >
> > > 	Initial setup:
> > >
> > > 	    -- B --                   (first)
> > > 	   /       \
> > > 	 A - C - D - E - H            (master)
> > > 	   \    \       /
> > > 	    \    F - G                (second)
> > > 	     \
> > > 	      Conflicting-G
> > >
> > > 	[...]
> > >
> > > 	test_commit A &&
> > > 	git checkout -b first &&
> > > 	test_commit B &&
> > > 	git checkout master &&
> > > 	test_commit C &&
> > > 	test_commit D &&
> > > 	git merge --no-commit B &&
> > > 	test_tick &&
> > > 	git commit -m E &&
> > > 	git tag -m E E &&
> > > 	git checkout -b second C &&
> > > 	test_commit F &&
> > > 	test_commit G &&
> > > 	git checkout master &&
> > > 	git merge --no-commit G &&
> > > 	test_tick &&
> > > 	git commit -m H &&
> > > 	git tag -m H H &&
> > > 	git checkout A &&
> > > 	test_commit conflicting-G G.t
> > >
> > > 	[...]
> > >
> > > While this is _somewhat_ better than having only the code, I am still
> > > unhappy about it: this wall of `test_commit` lines interspersed with
> > > other commands is very hard to follow.
> >
> > Agreed.  More on the readability part below...
> >
> > As far as speeding that up, I think moving some parts
> > of test setup to Makefiles + fast-import/fast-export would give
> > us a nice balance of speed + maintainability:
> >
> > 1. initial setup is done using normal commands (or graph drawing tool)
> > 2. the result of setup is "built" with fast-export
> > 3. test uses fast-import
> 
> I actually talked about this in my mail. If you find it easy to deduce
> the intent behind a commit history that was exported via fast-export,
> more power to you. (Was the committer name crucial? The file name? Or
> the temporal order of the commits?)
> 
> In contrast, I find it very challenging, myself. And please keep in mind
> that the first thing any contributor needs to do who sees a failing
> regression test (where the failure is most likely caused by the patch
> they plan on contributing): understand what the heck the regression test
> case is trying to ensure. The harder the code makes that, the worse it
> does its primary job: to (help) prevent regressions.
> 
> So no, I am not at all on board with moving to fast-imported commit
> histories in Git's test suite. They provide some convenience to the
> authors of those regression tests, which is not the audience you need to
> cater for in this case: instead, regression tests should make it not
> only easy to catch, but _especially_ easy to fix, regressions. And that
> audience would pay dearly for that erstwhile convenience.

Sorry I wasn't clear.  I only want fast-export/import to be used
as a local cache mechanism for tests.  I don't want to be
writing or maintaining fast-import histories by hand, either :>

> > Makefile rules would prevent subsequent test runs from repeating
> > 1. and 2.
> 
> That is a cute idea, until you realize that the number of developers
> fluent in `make` is even smaller than the number of developers fluent in
> `C`. In other words, you would again _increase_ the the number of
> prerequisites instead of reducing it.

Wouldn't any C developer need to know SOME build system?

We already use make, and make uses sh.  I'm not especially
fluent in make or sh, either; but I can hack my way through
them.

> > > If we were to (slowly) convert our test suite framework to C, we could
> > > change that.
> > >
> > > One idea would be to allow recreating commit history from something that
> > > looks like the output of `git log`, or even `git log --graph --oneline`,
> > > much like `git mktree` (which really should have been a test helper
> > > instead of a Git command, but I digress) takes something that looks like
> > > the output of `git ls-tree` and creates a tree object from it.
> >
> > I've been playing with Graph::Easy (Perl5 module) in other
> > projects, and I also think the setup could be more easily
> > expressed with a declarative language (e.g. GNU make)
> 
> I am dubious. But hey, if you show me something that looks _dead_ easy
> to understand, and even easier to write for new contributors, who am I
> to object?
> 
> But mind, I am not fluent in Perl. I can probably hack my way through
> `git-svn`, but that's a far cry from knowing what I am doing there.
> 
> And wouldn't that _again_ increase the number of prerequisites on
> contributors? I mean, you sounded genuinely concerned about that at the
> beginning of the mail. And I share that concern.

I'm not saying we use Graph::Easy, but we could probably take
inspiration from it for creating something that makes creating
histories easy and maintainable.

But it could also be writing better comments, using "git commit"
as normal, and letting fast-export cache to speed up subsequent
runs.

> > > Another thing that would be much easier if we moved more and more parts
> > > of the test suite framework to C: we could implement more powerful
> > > assertions, a lot more easily. For example, the trace output of a failed
> > > `test_i18ngrep` (or `mingw_test_cmp`!!!) could be made a lot more
> > > focused on what is going wrong than on cluttering the terminal window
> > > with almost useless lines which are tedious to sift through.
> >
> > I fail to see how language choice here matters.
> 
> In Unix shell script, we either trace (everything) or we don't. In C,
> you can be a lot more fine-grained with log messages. A *lot*.
> 
> We even already have such a fine-grained log machinery in place, see
> `trace.h`.
> 
> Also, because of the ability to perform more sophisticated locking, lazy
> prerequisites can easily be cached in C, whereas that is not so easy in
> Unix shell (and hence it is not done).

`make` seems good at driving stuff to cache.

> > > Likewise, having a framework in C would make it a lot easier to improve
> > > debugging, e.g. by making test scripts "resumable" (guarded by an
> > > option, it could store a complete state, including a copy of the trash
> > > directory, before executing commands, which would allow "going back in
> > > time" and calling a failing command with a debugger, or with valgrind, or
> > > just seeing whether the command would still fail, i.e. whether the test
> > > case is flaky).
> >
> > Resumability sounds like a perfect job for GNU make.
> 
> Umm.
> 
> So let me give you an example of something I had to debug recently. A
> `git stash apply` marked files outside the sparse checkout as deleted,
> when they actually had staged changes during the `git stash` call.
> 
> If this had been a regression test case, it would have looked like this:
> 
> test_expect_success 'stash handles skip-worktree entries nicely' '
>         test_commit A &&
> 	echo changed >A.t &&
> 	git add A.t &&
> 	git update-index --skip-worktree A.t &&
> 	rm A.t &&
> 	git stash &&
> 
> 	: this should not mark A.t as deleted &&
> 	git stash apply &&
> 	test -n "$(git ls-files A.t)"
> '
> 
> Now, the problem would have occurred in the very last command: `A.t`
> would have been missing from the index.
> 
> In order to debug this, you would have had to put in "breakpoints"
> (inserting `false &&` at strategic places), or prefix commands with
> `debug` to start them in GDB, then re-run the test case.
> 
> Lather, rinse and repeat, until you figured out that `git stash` was
> failing to record `A.t` properly.
> 
> Then dig into that, recreating the same worktree situation every time
> you run `git stash`, until you find out that there is a call to `git
> update-index -z --add --remove --stdin` that removes that file.
> 
> Further investigation would show you that this command pretty much does
> what it is told to, because it is fed `A.t` in its `stdin`.
> 
> The next step would probably be to go back even one more step and see
> that `diff_index()` reported this file as modified between the index and
> `HEAD`.
> 
> Slowly, you would form an understanding of what is going wrong, and you
> would have to go back and forth between blaming `diff_index()` and
> `update-index`, or the options `git stash` passes to them.
> 
> You would have to recreate the worktree many, many times, in order to
> dig in deep, and of course you would need to understand the intention
> not only of the regression test, but also of the code it calls.
> 
> In this instance, it is merely tedious, but possible. I know, because I
> did it. For flaky tests, not so much.
> 
> *That* is the scenario I tried to get at.
> 
> Writing the test cases in `make` would not help that. Not one bit. It
> would actually make things a lot worse.

I agree debugging can be tedious, but I'm not sure how to get
around that besides making test cases smaller, more isolated
(which can also lead to missing coverage), and better
documented.

How would your ideal test framework approach it?

Was using git tracing in the above scenarios not enough?

I've never cared much about project-specific tracers; but
instead prefer to rely on project-agnostic (but OS-specific)
facilities such as truss/strace since I can reuse that knowledge
across languages/projects.

> > (that said, I don't know if you use make or something else to build
> > gfw)
> 
> We are talking about the test suite, yes?
> 
> Not about building Git? Because it does not matter whether we use `make`
> or not (we do by default, although we also have an option to build in
> Visual Studio and/or via MSBuild).
> 
> On Windows, just like on Linux, we use a Unix shell interpreter (Bash).
> Sure, to run the entire test suite, we use `make` (sometimes in
> conjunction with `prove`), but the tests themselves are written in Unix
> shell script, so I have a hard time imagining a different method to run
> them -- whether on Windows or not -- than to use a Unix shell
> interpreter such as Bash.
> 
> > > In many ways, our current test suite seems to test Git's
> > > functionality as much as (core) contributors' abilities to implement
> > > test cases in Unix shell script, _correctly_, and maybe also
> > > contributors' patience.  You could say that it tests for the wrong
> > > thing at least half of the time, by design.
> >
> > Basic (not advanced) sh is already a prerequisite for using git.
> 
> Well, if you are happy with that prerequisite to be set in stone, I am
> not. Why should any Git user *need* to know sh?

For chaining commands together, why not?

I expect anybody who's learned C to also be able to figure out
some basic scripting to tie a series of commands together in the
same way you can poke around in Perl without being fluent.

And there's plenty of overlap between C and sh when it comes to
with control flow (if/else/while/&&/for).

> > Writing correct code and tests in ANY language is still a
> > challenge for me; but I'm least convinced a low-level language
> > such as C is the right language for writing integration tests in.
> 
> I would be delighted to go for a proper, well-maintained test framework
> such as Jest. But of course, that would not be accepted in this project.
> So I won't even think about it.
> 
> > C is fine for unit tests, and maybe we can use more unit tests and
> > less integration tests.
> 
> We don't have integration tests.
> 
> Unless your concept of what constitutes an "integration test" is very
> different from mine. For me, an integration test would set up an
> environment that involves multiple points of failure, e.g. setting up an
> SSH server on Linux and accessing that via Git from Ubuntu. Or setting
> up a web server with a self-signed certificate, import the public key
> into the Windows Certificate Store and then accessing the server both
> using OpenSSL and Secure Channel (the native Windows way to communicate
> via TLS).

I think of integration tests as anything which covers
real-world use cases (e.g. running documented commands as
a normal user would).

Unit tests would be t/helper/*.c in my mind...

But maybe my terminology is all wrong *shrug*

> There is nothing even close to that in Git's test suite.
> 
> > > It might look like a somewhat less important project, but given that we
> > > exercise almost 150,000 test cases with every CI build, I think it does
> > > make sense to grind our axe for a while, so to say.
> >
> > Something that would benefit both users and regular contributors
> > is the use and adoption of more batch and eval-friendly interfaces.
> > e.g. fast-import/export, cat-file --batch, for-each-ref --perl...
> 
> Given how hard it is to deduce the intention behind such invocations, I
> am rather doubtful that this would improve our test suite.
> 
> > I haven't used hg since 2005, but I know "hg server" exists
> > nowadays to get rid of a lot of startup overhead in Mercurial,
> > and maybe git could steal that idea, too...
> 
> I have no idea what `hg server` does. Care to enlighten me?

Sorry, "hg serve" (no 'r').  It starts a long-lived server to
expose the API over a pipe or socket:
https://www.mercurial-scm.org/wiki/CommandServer
Closest we have is fast-import and "cat-file --batch*"

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-09-27 22:18                       ` Jeff King
@ 2019-10-09 17:25                         ` SZEDER Gábor
  2019-10-11  6:34                           ` Jeff King
  0 siblings, 1 reply; 62+ messages in thread
From: SZEDER Gábor @ 2019-10-09 17:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, Emily Shaffer, Jonathan Tan, git

On Fri, Sep 27, 2019 at 06:18:58PM -0400, Jeff King wrote:
> On Thu, Sep 26, 2019 at 11:44:48PM +0200, SZEDER Gábor wrote:
> 
> > All that was over a year and a half ago, and these limitations weren't
> > a maintenance burden at all so far, and nobody needed that escape
> > hatch.

> I'm actually surprised we haven't run into it more. We have some custom
> test scripts in our fork of Git at GitHub. We usually just use
> TEST_SHELL_PATH=bash, but curious, I tried running with dash and "-x",
> and three of them failed.

I try to avoid using TEST_SHELL_PATH at all costs, it is far too keen
to not do what I naively expect.

> Probably they'd be easy enough to fix (and they're out of tree anyway),
> so I'm not really arguing against the escape hatch exactly. Mostly I'm
> just surprised that if I introduced 3 cases (out of probably a dozen
> scripts), I'm surprised that more contributors aren't accidentally doing
> so upstream.

I see it a bit differently.  Over a decade we gathered about
twenty-something such tests cases: that's about two cases per year.
You added three such cases in about a year and a half: that's two
cases per year.  The numbers add up perfectly, you singlehandedly took
care of everything ;)


Anyway, I did some more digging, and, unfortunately, it turned out
that Dscho is somewhat right.  While the situation is not as bad as he
made it look like ("We need Bash!"), it's not as good as I thought it
is ("But it Just Works!!") either.

  - Some shells do include file descriptor redirections in the trace
    output, and it varies between implementations to which fd the
    trace of the redirection goes.
    
      - 'ksh/ksh93' and NetBSD's /bin/sh send the trace of
        redirections to the "wrong" fd, in the sense that e.g. the
        trace of commands invoked in 'test_must_fail' goes to the
        function's standard error, and checking its stderr with
        'test_cmp' would then fail.
 
        (But 'ksh/ksh93' doesn't really matter, because they don't
        support the 'local' keyword, so they fail a bunch of tests
        even without '-x' anyway.)

        I don't think we can do anything about these shells.

      - 'mksh/lksh' send the trace of redirections to the "right" fd,
        so they won't pollute the stderr of test helper functions.
        And indeed the test suite passes when run with 'mksh' (well,
        at least the subset of the test suite that I usually run).

  - We do call 'test_have_prereq' from within test cases as well,
    notably from the 'test_i18ngrep', 'test_i18ncmp' and
    'test_ln_s_add' helper functions.  In those cases all trace output
    from 'test_have_prereq' is included in the test case's trace
    output, which means that during the first invocation:

      - there is lots of distracting and confusing trace output, as
        the script evaluating the prereq is passed around to a bunch
        of functions.

      - after running the script evaluating the prereq 'test_eval_'
        does indeed turn off tracing, so there will be no trace from
        the remainder of that test case (except with 'mksh': while it
        does run 'set +x' in 'test_eval_', that somehow doesn't turn
        off tracing...  I have no idea whether that's a bug or a
        feature).

    As far as 'test_i18ngrep' is concerned, which accounts for the
    majority of 'test_have_prereq' invocations within test cases, I
    don't understand why it uses 'test_have_prereq' in the first place
    instead of checking the GIT_TEST_GETTEXT_POISON environment
    variable; and 6cdccfce1e (i18n: make GETTEXT_POISON a runtime
    option, 2018-11-08) doesn't give me any insight.

    I recall that some months ago we discussed the idea of how to
    disable trace output from within test helper functions; that would
    help with this 'test_have_prereq' issue as well, at least in case
    of the more "common" shells.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Git in Outreachy December 2019?
  2019-10-09 17:25                         ` SZEDER Gábor
@ 2019-10-11  6:34                           ` Jeff King
  0 siblings, 0 replies; 62+ messages in thread
From: Jeff King @ 2019-10-11  6:34 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Johannes Schindelin, Emily Shaffer, Jonathan Tan, git

On Wed, Oct 09, 2019 at 07:25:51PM +0200, SZEDER Gábor wrote:

> > Probably they'd be easy enough to fix (and they're out of tree anyway),
> > so I'm not really arguing against the escape hatch exactly. Mostly I'm
> > just surprised that if I introduced 3 cases (out of probably a dozen
> > scripts), I'm surprised that more contributors aren't accidentally doing
> > so upstream.
> 
> I see it a bit differently.  Over a decade we gathered about
> twenty-something such tests cases: that's about two cases per year.
> You added three such cases in about a year and a half: that's two
> cases per year.  The numbers add up perfectly, you singlehandedly took
> care of everything ;)

Those cases are actually much older than that. I just didn't bother to
clean them up until recently. So my rate is even lower :)

>   - Some shells do include file descriptor redirections in the trace
>     output, and it varies between implementations to which fd the
>     trace of the redirection goes.
>     
>       - 'ksh/ksh93' and NetBSD's /bin/sh send the trace of
>         redirections to the "wrong" fd, in the sense that e.g. the
>         trace of commands invoked in 'test_must_fail' goes to the
>         function's standard error, and checking its stderr with
>         'test_cmp' would then fail.
>  
>         (But 'ksh/ksh93' doesn't really matter, because they don't
>         support the 'local' keyword, so they fail a bunch of tests
>         even without '-x' anyway.)
> 
>         I don't think we can do anything about these shells.

Yeah, unless somebody is complaining, I don't know that it's worth
worrying about too much. The test suite is certainly useful without
being able to use "-x" on every single test run (you can still run it
without "-x" obviously, or selectively use "-x" to debug a single test
or script). So if it is only unreliable on a few tests on a few obscure
shells, we can probably live with it until somebody demonstrates a
real-world problem (e.g., that they're running automated CI on an
obscure platform that is stuck with an old shell, and really want "-x
--verbose-log" to get more verbose failures).

>   - We do call 'test_have_prereq' from within test cases as well,
>     notably from the 'test_i18ngrep', 'test_i18ncmp' and
>     'test_ln_s_add' helper functions.  In those cases all trace output
>     from 'test_have_prereq' is included in the test case's trace
>     output, which means that during the first invocation:
> 
>       - there is lots of distracting and confusing trace output, as
>         the script evaluating the prereq is passed around to a bunch
>         of functions.

Yeah, I think this is probably an issue even with bash.

>     As far as 'test_i18ngrep' is concerned, which accounts for the
>     majority of 'test_have_prereq' invocations within test cases, I
>     don't understand why it uses 'test_have_prereq' in the first place
>     instead of checking the GIT_TEST_GETTEXT_POISON environment
>     variable; and 6cdccfce1e (i18n: make GETTEXT_POISON a runtime
>     option, 2018-11-08) doesn't give me any insight.

I think it's just that checking the environment variable is non-trivial:
we invoke env--helper to handle bool interpretation. So we'd prefer to
cache the result (and not to run it at all if a test script doesn't use
i18ngrep, though it's perhaps ubiquitous enough that we should just run
it up front for every script).

>     I recall that some months ago we discussed the idea of how to
>     disable trace output from within test helper functions; that would
>     help with this 'test_have_prereq' issue as well, at least in case
>     of the more "common" shells.

That might be worth doing, though IIRC it got kind of hairy. :)

-Peff

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, back to index

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-27  5:17 Git in Outreachy December 2019? Jeff King
2019-08-31  7:58 ` Christian Couder
2019-08-31 19:44   ` Olga Telezhnaya
2019-09-04 19:41 ` Jeff King
2019-09-05  7:24   ` Christian Couder
2019-09-05 19:39   ` Emily Shaffer
2019-09-06 11:55     ` Carlo Arenas
2019-09-07  6:39       ` Jeff King
2019-09-07 10:13         ` Carlo Arenas
2019-09-07  6:36     ` Jeff King
2019-09-08 14:56   ` Pratyush Yadav
2019-09-09 17:00     ` Jeff King
2019-09-23 18:07   ` SZEDER Gábor
2019-09-26  9:47     ` SZEDER Gábor
2019-09-26 19:32       ` Johannes Schindelin
2019-09-26 21:54         ` SZEDER Gábor
2019-09-26 11:42     ` Johannes Schindelin
2019-09-13 20:03 ` Jonathan Tan
2019-09-13 20:51   ` Jeff King
2019-09-16 18:42     ` Emily Shaffer
2019-09-16 21:33       ` Eric Wong
2019-09-16 21:44       ` SZEDER Gábor
2019-09-16 23:13         ` Jonathan Nieder
2019-09-17  0:59           ` Jeff King
2019-09-17 11:23       ` Johannes Schindelin
2019-09-17 12:02         ` SZEDER Gábor
2019-09-23 12:47           ` Johannes Schindelin
2019-09-23 16:58             ` SZEDER Gábor
2019-09-26 11:04               ` Johannes Schindelin
2019-09-26 13:28                 ` SZEDER Gábor
2019-09-26 19:39                   ` Johannes Schindelin
2019-09-26 21:44                     ` SZEDER Gábor
2019-09-27 22:18                       ` Jeff King
2019-10-09 17:25                         ` SZEDER Gábor
2019-10-11  6:34                           ` Jeff King
2019-09-23 18:19             ` Jeff King
2019-09-24 14:30               ` Johannes Schindelin
2019-09-17 15:10         ` Christian Couder
2019-09-23 12:50           ` Johannes Schindelin
2019-09-23 19:30           ` Jeff King
2019-09-23 18:07         ` Jeff King
2019-09-24 14:25           ` Johannes Schindelin
2019-09-24 15:33             ` Jeff King
2019-09-28  3:56               ` Junio C Hamano
2019-09-24  0:55         ` Eric Wong
2019-09-26 12:45           ` Johannes Schindelin
2019-09-30  8:55             ` Eric Wong
2019-09-28  4:01           ` Junio C Hamano
2019-09-20 17:04     ` Jonathan Tan
2019-09-21  1:47       ` Emily Shaffer
2019-09-23 14:23         ` Christian Couder
2019-09-23 19:40         ` Jeff King
2019-09-23 22:29           ` Philip Oakley
2019-09-23 11:49       ` Christian Couder
2019-09-23 17:58         ` Jonathan Tan
2019-09-23 19:27           ` Jeff King
2019-09-23 20:48             ` Jonathan Tan
2019-09-23 19:15       ` Jeff King
2019-09-23 20:38         ` Jonathan Tan
2019-09-23 21:28           ` Jeff King
2019-09-24 17:07             ` Jonathan Tan
2019-09-26  7:09               ` Jeff King

git@vger.kernel.org list mirror (unofficial, one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox