From: Luke Diamand <luke@diamand.org>
To: "SZEDER Gábor" <szeder.dev@gmail.com>
Cc: Lars Schneider <larsxschneider@gmail.com>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: Occasional git p4 test failures because of stray fast-import processes
Date: Wed, 27 Feb 2019 15:03:30 +0000 [thread overview]
Message-ID: <CAE5ih78MV1qGTHBmCaN5k+VGv8Cy-RnFwOU0yuJBPEyn37C_4w@mail.gmail.com> (raw)
In-Reply-To: <20190227094926.GE19739@szeder.dev>
On Wed, 27 Feb 2019 at 09:49, SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> Hi Luke,
>
> I saw rare failures in test 6 'git p4 sync uninitialized repo' in
> 't9800-git-p4-basic.sh' on Travis CI, because the 'cleanup_git'
> function failed to do its job. The (redacted) trace looks like this:
Thanks for the *very* detailed analysis!
I think you are right - git-p4 should wait() for all of its children,
and that ought to fix this.
I think I may have even added the bit of code you mention (about a
decade ago now).
I'll have a look and see what can be done.
Thanks!
Luke
>
> + cleanup_git
> + retry_until_success rm -r /home/szeder/src/git/t/trash directory.t9800-git-p4-basic/git
> + time_in_seconds
> + cd /
> + /usr/bin/python -c import time; print(int(time.time()))
> + timeout=1551233042
> + rm -r /home/szeder/src/git/t/trash directory.t9800-git-p4-basic/git
> + test_must_fail test -d /home/szeder/src/git/t/trash directory.t9800-git-p4-basic/git
> test_must_fail: command succeeded: test -d /home/szeder/src/git/t/trash directory.t9800-git-p4-basic/git
> + eval_ret=1
> + :
> not ok 6 - git p4 sync uninitialized repo
>
> Trying to reproduce this failure with stock Git can be tricky: I've
> seen
>
> ./t9800-git-p4-basic.sh --stress -r 1,2,6,22
>
> fail within the first 10 tries, but it also run overnight without a
> single failure... However, the following two-liner patch can reliably
> trigger this failure:
>
> diff --git a/fast-import.c b/fast-import.c
> index b7ba755c2b..54715018b3 100644
> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -3296,6 +3296,7 @@ int cmd_main(int argc, const char **argv)
> rc_free[i].next = &rc_free[i + 1];
> rc_free[cmd_save - 1].next = NULL;
>
> + sleep(1);
> start_packfile();
> set_die_routine(die_nicely);
> set_checkpoint_signal();
> diff --git a/t/lib-git-p4.sh b/t/lib-git-p4.sh
> index b3be3ba011..2d2ef50bfa 100644
> --- a/t/lib-git-p4.sh
> +++ b/t/lib-git-p4.sh
> @@ -190,6 +190,7 @@ kill_p4d () {
>
> cleanup_git () {
> retry_until_success rm -r "$git"
> + sleep 2
> test_must_fail test -d "$git" &&
> retry_until_success mkdir "$git"
> }
>
>
> What's going on is this: 'git p4' invokes 'git fast-import' via
> Python's subprocess.Popen(), and then there are two chain of events
> racing with each other:
>
> - In the foreground:
>
> - 'git p4' notices that there are no p4 branches and die()s as
> expected, but leaves its child fast-import behind
> - 'test_i18ngrep' quickly verifies that 'git p4' died with the
> expected error message
> - the cleanup function removes the "$TRASH_DIRECTORY/git"
> directory, and
> - checks that the directory is indeed gone.
>
> - Meanwhile in the background:
>
> - 'git fast-import' starts up, kicks off the dashed external
> 'git-fast-import' process,
> - which opens a tmp packfile in "$TRASH_DIRECTORY/git" for writing
> (the start_packfile() call in the patch context above), creating
> any leading directories if necessary,
> - notices that its stdin is closed (because 'git p4' died) and it
> has nothing left to do, so
> - it cleans up and exits. Note that this cleanup only removes the
> tmp packfile, but leaves any newly created leading directories
> behind.
>
> While unlikely, it does apparently happen from time to time that the
> test's cleanup function first removes "$TRASH_DIRECTORY/git", but then
> 'git fast-import' re-creates it for its packfile before the cleanup
> function checks the result of the removal. The two well-placed
> sleep()s in the patch above force such a problematic scheduling.
>
> There are 4 test cases running 'test_must_fail git p4 sync': the above
> patch triggers a failure in 3 of them, and with a bit of extra modding
> I could trigger a failure in the fourth one as well.
>
> We could work this around by waiting for 'git p4' and all its child
> processes in the affected tests themselves, using the same shell
> trickery as in ef09036cf3 (t6500: wait for detached auto gc at the end
> of the test script, 2017-04-13), but this feels like, well, a
> workaround.
>
> I think the proper solution would be to ensure that 'git p4' kills and
> waits for all its child processes when die()ing. Alternatively (or in
> addition?), it could perform all the necessary sanity-checking (and
> potential die()ing) before starting the 'git fast-import' process in
> the first place.
>
> I've glanced through all subprocess.Popen() callsites in 'git p4' and
> found most of them OK, in the sense that they wait for whatever child
> process they create. Alas, there was one exception: p4CmdList() can
> invoke an optional callback function before wait()ing on its 'p4'
> child, and the streamP4FilesCb() callback function can die() without
> waiting for the 'p4' process (but it does wait() for 'git
> fast-import'!).
>
> On a related note: this check for the just-removed directory was added
> in 23aee4199a (git-p4: retry kill/cleanup operations in tests with
> timeout, 2015-11-19), which mentions flaky cleanup operations.
> Perhaps this is the same flakiness?!
>
> Anyway, as I mentioned elsewhere before, I have no idea why/how 'git
> p4' works, so I'll leave it up to you how it's best to deal with this
> issue...
>
>
> Gábor
>
next prev parent reply other threads:[~2019-02-27 15:03 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-27 9:49 Occasional git p4 test failures because of stray fast-import processes SZEDER Gábor
2019-02-27 15:03 ` Luke Diamand [this message]
2019-06-02 9:19 ` SZEDER Gábor
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAE5ih78MV1qGTHBmCaN5k+VGv8Cy-RnFwOU0yuJBPEyn37C_4w@mail.gmail.com \
--to=luke@diamand.org \
--cc=git@vger.kernel.org \
--cc=larsxschneider@gmail.com \
--cc=szeder.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).