git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "SZEDER Gábor" <szeder.dev@gmail.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Duy Nguyen <pclouds@gmail.com>,
	Thomas Gummerer <t.gummerer@gmail.com>, Jeff King <peff@peff.net>
Subject: Re: [RFC PATCH 5/5] split-index: smudge and add racily clean cache entries to split index
Date: Thu, 6 Sep 2018 17:14:39 +0200	[thread overview]
Message-ID: <20180906151439.GA8016@localhost> (raw)
In-Reply-To: <87va7ireuu.fsf@evledraar.gmail.com>

On Thu, Sep 06, 2018 at 02:26:49PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Sep 06 2018, SZEDER Gábor wrote:
> > Several tests failed occasionally when the test suite was run with
> > 'GIT_TEST_SPLIT_INDEX=yes'.  Here are those that I managed to trace
> > back to this racy split index problem, starting with those failing
> > more frequently, with a link to a failing Travis CI build job for
> > each.  The highlighted line shows when the racy file was written,
> > which is not always in the failing test (but note that those lines are
> > in the 'after failure' fold, and your browser might unhelpfully fold
> > it up before you could take a good look).
> 
> Thanks for working on this. When I package up git I run the tests
> under a few different modes, in the case of split index I've been
> doing:
> 
>     GIT_TEST_SPLIT_INDEX=true GIT_SKIP_TESTS="t3903 t4015.77"

Yeah, I noticed that you mentioned this in an unrelated thread the
other day, that's why I put you on Cc.  ... and that's why I pulled
this series from the backburner and cleaned it up for submission.
(Gah, most of these commits have an author date back in February...)

> Since those were the ones I spotted failing under that mode, but
> I still had occasional other failures, I don't have a record of
> which, maybe some of these other tests you mention, maybe not.

I poked around the Travis CI API, and managed to get the logs of all
build jobs that failed with 'GIT_TEST_SPLIT_INDEX=yes' but succeeded
without it.  Here is the list of failed test scripts, along with how
many times they failed:

      1 t0027-auto-crlf.sh
      1 t0090-cache-tree.sh
      1 t3404-rebase-interactive.sh
      1 t5520-pull.sh
      1 t5573-pull-verify-signatures.sh
      1 t5608-clone-2gb.sh
      1 t7406-submodule-update.sh
      2 t2200-add-update.sh
      2 t4002-diff-basic.sh
      2 t5504-fetch-receive-strict.sh
      3 t0028-working-tree-encoding.sh
      4 t1000-read-tree-m-3way.sh
      6 t4015-diff-whitespace.sh
     10 t4024-diff-optimize-common.sh
     17 t3030-merge-recursive.sh
     17 t3402-rebase-merge.sh
     17 t3501-revert-cherry-pick.sh
     17 t6022-merge-rename.sh
     17 t6032-merge-large-rename.sh
     17 t6034-merge-rename-nocruft.sh
     17 t6042-merge-rename-corner-cases.sh
     17 t6043-merge-rename-directories.sh
     17 t6046-merge-skip-unneeded-updates.sh
     17 t7003-filter-branch.sh
     17 t7601-merge-pull-config.sh
     23 t3903-stash.sh

I doubt that this racy split index problem is responsible for all
these failures; e.g. I looked at the failures of a few merge-related
test scripts, and these logs show that they tend to fail because of
memory corruption, i.e. with 'Aborted (core dumped)' or 'Segmentation
fault (core dumped)'.

> To test how this this series improves things, I've been running
> this on a 56 core CentOS 7.5 machine:
> 
>     while true; do GIT_TEST_SPLIT_INDEX=yes prove -j$(parallel --number-of-cores) t3903-stash.sh t4024-diff-optimize-common.sh t4015-diff-whitespace.sh t2200-add-update.sh t0090-cache-tree.sh && echo "OK $(date) $(git describe)" >>log2 || echo "FAIL $(date) $(git describe)" >>log2; done
> 
> While, in another window to get some load on the machine (these seem to
> fail more under load):
> 
>     while true; do prove -j$(parallel --number-of-cores) t[156789]*.sh; done
> 
> The results with this series applied up to 4/5. I.e. without the actual
> fix:
> 
>      92 OK v2.19.0-rc2-6-ged839bd155
>       8 FAIL v2.19.0-rc2-6-ged839bd155
> 
> I.e. when running this 100 times, I got 8 failures. So 8%.

Lucky you ;)

I could only get t3903 to fail on me, but not the others.  That was
enough to eventually track down and fix the problem (fun! ;), and then
I looked at the logs of failed git/git Travis CI build jobs to see,
what other failures might have been caused by it.

> With this patch applied:
> 
>     389 OK v2.19.0-rc2-5-g05a5a13935
>      11 FAIL v2.19.0-rc2-5-g05a5a13935
> 
> This time I ran the tests 400 times, and got 11 failures, i.e. a
> ~2.8% failure rate. I don't have a full account of what stuff
> failed (this was just scrolling past in my terminal), but most
> were:
> 
>     t0090-cache-tree.sh          (Wstat: 256 Tests: 21 Failed: 3)
>       Failed tests:  10-12
>       Non-zero exit status: 1
> 
> I.e. these tests:
> 
>     ok 10 - commit --interactive gives cache-tree on partial commit
>     ok 11 - commit in child dir has cache-tree
>     ok 12 - reset --hard gives cache-tree

But hey, 't0090 --verbose-log -x -i' just failed on me with the fix
applied while writing this email, yay!  In its logs I see the
following errors in all three tests you mention:

  error: index uses ?½þA extension, which we do not understand
  fatal: index file corrupt

Test 13 then starts with 'rm -f .git/index', and then all is well for
the remaining tests.

There was a recent discussion about a similar error starting at:

  https://public-inbox.org/git/20180901214157.hxlqmbz3fds7hsdl@ltop.local/

and leading to a fix in 6c003d6ffb (reopen_tempfile(): truncate opened
file, 2018-09-04).  I certainly hope that not my fix is buggy, but
combined with 'GIT_TEST_SPLIT_INDEX=yes' it can occasionally trigger
the same error, and Peff's (Cc'd) fix will resolve it as well.

Could you run your stress tests with Peff's fix merged with mine?
Apparetly your setup is much more capable of triggering these issues
than mine...


  reply	other threads:[~2018-09-06 15:14 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-06  2:48 [RFC PATCH 0/5] Fix the racy split index problem SZEDER Gábor
2018-09-06  2:48 ` [PATCH 1/5] t1700-split-index: drop unnecessary 'grep' SZEDER Gábor
2018-09-06 21:24   ` Junio C Hamano
2018-09-08 13:50   ` Duy Nguyen
2018-09-06  2:48 ` [PATCH 2/5] t0090: disable GIT_TEST_SPLIT_INDEX for the test checking split index SZEDER Gábor
2018-09-06  8:03   ` Ævar Arnfjörð Bjarmason
2018-09-06  2:48 ` [RFC PATCH 3/5] split index: add a test to demonstrate the racy split index problem SZEDER Gábor
2018-09-06  2:48 ` [RFC PATCH 4/5] t1700-split-index: date back files to avoid racy situations SZEDER Gábor
2018-09-06  8:02   ` Ævar Arnfjörð Bjarmason
2018-09-06  9:15     ` SZEDER Gábor
2018-09-06  9:20       ` Ævar Arnfjörð Bjarmason
2018-09-06  2:48 ` [RFC PATCH 5/5] split-index: smudge and add racily clean cache entries to split index SZEDER Gábor
2018-09-06 10:26   ` Ævar Arnfjörð Bjarmason
2018-09-06 12:26   ` Ævar Arnfjörð Bjarmason
2018-09-06 15:14     ` SZEDER Gábor [this message]
2018-09-06 15:26       ` Ævar Arnfjörð Bjarmason
2018-09-06 17:53         ` Ævar Arnfjörð Bjarmason
2018-09-07  3:49           ` SZEDER Gábor
2018-09-10 22:12           ` Paul-Sebastian Ungureanu
2018-09-08 16:45   ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180906151439.GA8016@localhost \
    --to=szeder.dev@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=t.gummerer@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).