Re: Git in Outreachy December 2019?

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Eric Wong <e@80x24.org>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Emily Shaffer <emilyshaffer@google.com>,
	Jeff King <peff@peff.net>,
	Jonathan Tan <jonathantanmy@google.com>,
	git@vger.kernel.org
Subject: Re: Git in Outreachy December 2019?
Date: Tue, 24 Sep 2019 00:55:29 +0000	[thread overview]
Message-ID: <20190924005529.GA8354@dcvr> (raw)
In-Reply-To: <nycvar.QRO.7.76.6.1909171158090.15067@tvgsbejvaqbjf.bet>

Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> On Mon, 16 Sep 2019, Emily Shaffer wrote:
> >  - try and make progress towards running many tests from a single test
> >    file in parallel - maybe this is too big, I'm not sure if we know how
> >    many of our tests are order-dependent within a file for now...
> 
> Another, potentially more rewarding, project would be to modernize our
> test suite framework, so that it is not based on Unix shell scripting,
> but on C instead.

I worry more C would reduce the amount of contributors (some of
the C rewrites already scared me off hacking years ago).  I
figure more users are familiar with sh than C.

It would also increase the disparity between tests and use of
actual users from the command-line.

> The fact that it is based on Unix shell scripting not only costs a lot
> of speed, especially on Windows, it also limits us quite a bit, and I am
> talking about a lot more than just the awkwardness of having to think
> about options of BSD vs GNU variants of common command-line tools.

I agree that it costs a lot of time, and I'm even on Linux using
dash as /bin/sh + eatmydata (but ancient laptop)

> For example, many, many, if not all, test cases, spend the majority of
> their code on setting up specific scenarios. I don't know about you,
> but personally I have to dive into many of them when things fail (and I
> _dread_ the numbers 0021, 0025 and 3070, let me tell you) and I really
> have to say that most of that code is hard to follow and does not make
> it easy to form a mental model of what the code tries to accomplish.
> 
> To address this, a while ago Thomas Rast started to use `fast-export`ed
> commit histories in test scripts (see e.g. `t/t3206/history.export`). I
> still find that this fails to make it easier for occasional readers to
> understand the ideas underlying the test cases.
> 
> Another approach is to document heavily the ideas first, then use code
> to implement them. For example, t3430 starts with this:
> 
> 	[...]
> 
> 	Initial setup:
> 
> 	    -- B --                   (first)
> 	   /       \
> 	 A - C - D - E - H            (master)
> 	   \    \       /
> 	    \    F - G                (second)
> 	     \
> 	      Conflicting-G
> 
> 	[...]
> 
> 	test_commit A &&
> 	git checkout -b first &&
> 	test_commit B &&
> 	git checkout master &&
> 	test_commit C &&
> 	test_commit D &&
> 	git merge --no-commit B &&
> 	test_tick &&
> 	git commit -m E &&
> 	git tag -m E E &&
> 	git checkout -b second C &&
> 	test_commit F &&
> 	test_commit G &&
> 	git checkout master &&
> 	git merge --no-commit G &&
> 	test_tick &&
> 	git commit -m H &&
> 	git tag -m H H &&
> 	git checkout A &&
> 	test_commit conflicting-G G.t
> 
> 	[...]
> 
> While this is _somewhat_ better than having only the code, I am still
> unhappy about it: this wall of `test_commit` lines interspersed with
> other commands is very hard to follow.

Agreed.  More on the readability part below...

As far as speeding that up, I think moving some parts
of test setup to Makefiles + fast-import/fast-export would give
us a nice balance of speed + maintainability:

1. initial setup is done using normal commands (or graph drawing tool)
2. the result of setup is "built" with fast-export
3. test uses fast-import

Makefile rules would prevent subsequent test runs from repeating
1. and 2.

> If we were to (slowly) convert our test suite framework to C, we could
> change that.
> 
> One idea would be to allow recreating commit history from something that
> looks like the output of `git log`, or even `git log --graph --oneline`,
> much like `git mktree` (which really should have been a test helper
> instead of a Git command, but I digress) takes something that looks like
> the output of `git ls-tree` and creates a tree object from it.

I've been playing with Graph::Easy (Perl5 module) in other
projects, and I also think the setup could be more easily
expressed with a declarative language (e.g. GNU make)

> Another thing that would be much easier if we moved more and more parts
> of the test suite framework to C: we could implement more powerful
> assertions, a lot more easily. For example, the trace output of a failed
> `test_i18ngrep` (or `mingw_test_cmp`!!!) could be made a lot more
> focused on what is going wrong than on cluttering the terminal window
> with almost useless lines which are tedious to sift through.

I fail to see how language choice here matters.  But then again,
I have plenty of experience writing bad code in ALL languages I
know :>

> Likewise, having a framework in C would make it a lot easier to improve
> debugging, e.g. by making test scripts "resumable" (guarded by an
> option, it could store a complete state, including a copy of the trash
> directory, before executing commands, which would allow "going back in
> time" and calling a failing command with a debugger, or with valgrind, or
> just seeing whether the command would still fail, i.e. whether the test
> case is flaky).

Resumability sounds like a perfect job for GNU make.
(that said, I don't know if you use make or something else to build gfw)

> In many ways, our current test suite seems to test Git's functionality
> as much as (core) contributors' abilities to implement test cases in
> Unix shell script, _correctly_, and maybe also contributors' patience.
> You could say that it tests for the wrong thing at least half of the
> time, by design.

Basic (not advanced) sh is already a prerequisite for using git.

Writing correct code and tests in ANY language is still a
challenge for me; but I'm least convinced a low-level language
such as C is the right language for writing integration tests in.

C is fine for unit tests, and maybe we can use more unit tests
and less integration tests.

> It might look like a somewhat less important project, but given that we
> exercise almost 150,000 test cases with every CI build, I think it does
> make sense to grind our axe for a while, so to say.

Something that would benefit both users and regular contributors
is the use and adoption of more batch and eval-friendly interfaces.
e.g. fast-import/export, cat-file --batch, for-each-ref --perl...

I haven't used hg since 2005, but I know "hg server" exists
nowadays to get rid of a lot of startup overhead in Mercurial,
and maybe git could steal that idea, too...

> Therefore, it might be a really good project to modernize our test
> suite. To take ideas from modern test frameworks such as Jest and try to
> bring them to C. Which means that new contributors would probably be
> better suited to work on this project than Git old-timers!
> 
> And the really neat thing about this project is that it could be done
> incrementally.

I hope to find time to hack some more batch/eval-friendly stuff
that can make scripting git more performant; but no idea on my
availability :<

next prev parent reply	other threads:[~2019-09-24  0:55 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-27  5:17 Git in Outreachy December 2019? Jeff King
2019-08-31  7:58 ` Christian Couder
2019-08-31 19:44   ` Olga Telezhnaya
2019-09-04 19:41 ` Jeff King
2019-09-05  7:24   ` Christian Couder
2019-09-05 19:39   ` Emily Shaffer
2019-09-06 11:55     ` Carlo Arenas
2019-09-07  6:39       ` Jeff King
2019-09-07 10:13         ` Carlo Arenas
2019-09-07  6:36     ` Jeff King
2019-09-08 14:56   ` Pratyush Yadav
2019-09-09 17:00     ` Jeff King
2019-09-23 18:07   ` SZEDER Gábor
2019-09-26  9:47     ` SZEDER Gábor
2019-09-26 19:32       ` Johannes Schindelin
2019-09-26 21:54         ` SZEDER Gábor
2019-09-26 11:42     ` Johannes Schindelin
2019-09-13 20:03 ` Jonathan Tan
2019-09-13 20:51   ` Jeff King
2019-09-16 18:42     ` Emily Shaffer
2019-09-16 21:33       ` Eric Wong
2019-09-16 21:44       ` SZEDER Gábor
2019-09-16 23:13         ` Jonathan Nieder
2019-09-17  0:59           ` Jeff King
2019-09-17 11:23       ` Johannes Schindelin
2019-09-17 12:02         ` SZEDER Gábor
2019-09-23 12:47           ` Johannes Schindelin
2019-09-23 16:58             ` SZEDER Gábor
2019-09-26 11:04               ` Johannes Schindelin
2019-09-26 13:28                 ` SZEDER Gábor
2019-09-26 19:39                   ` Johannes Schindelin
2019-09-26 21:44                     ` SZEDER Gábor
2019-09-27 22:18                       ` Jeff King
2019-10-09 17:25                         ` SZEDER Gábor
2019-10-11  6:34                           ` Jeff King
2019-09-23 18:19             ` Jeff King
2019-09-24 14:30               ` Johannes Schindelin
2019-09-17 15:10         ` Christian Couder
2019-09-23 12:50           ` Johannes Schindelin
2019-09-23 19:30           ` Jeff King
2019-09-23 18:07         ` Jeff King
2019-09-24 14:25           ` Johannes Schindelin
2019-09-24 15:33             ` Jeff King
2019-09-28  3:56               ` Junio C Hamano
2019-09-24  0:55         ` Eric Wong [this message]
2019-09-26 12:45           ` Johannes Schindelin
2019-09-30  8:55             ` Eric Wong
2019-09-28  4:01           ` Junio C Hamano
2019-09-20 17:04     ` Jonathan Tan
2019-09-21  1:47       ` Emily Shaffer
2019-09-23 14:23         ` Christian Couder
2019-09-23 19:40         ` Jeff King
2019-09-23 22:29           ` Philip Oakley
2019-10-22 21:16         ` Emily Shaffer
2019-09-23 11:49       ` Christian Couder
2019-09-23 17:58         ` Jonathan Tan
2019-09-23 19:27           ` Jeff King
2019-09-23 20:48             ` Jonathan Tan
2019-09-23 19:15       ` Jeff King
2019-09-23 20:38         ` Jonathan Tan
2019-09-23 21:28           ` Jeff King
2019-09-24 17:07             ` Jonathan Tan
2019-09-26  7:09               ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190924005529.GA8354@dcvr \
    --to=e@80x24.org \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).