git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* t5559-http-fetch-smart-http2 failures
@ 2022-11-30 22:09 Todd Zullinger
  2022-12-01 20:25 ` Jeff King
  0 siblings, 1 reply; 4+ messages in thread
From: Todd Zullinger @ 2022-11-30 22:09 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Taylor Blau, Junio C Hamano

Hi,

I started seeing failures in rc0 but I was distracted by the
long weekend.

The changes in 73c49a4474 (t: run t5551 tests with both HTTP
and HTTP/2, 2022-11-11) seem to rather frequently trigger
test failures on Fedora.

The most frequent test to fail is "large fetch-pack requests
can be sent using chunked encoding" (t5559.30), but earlier
tests have also failed on occasion.  For the common failure,
the test exits with:

    expecting success of 5559.30 'large fetch-pack requests can be sent using chunked encoding': 
	    GIT_TRACE_CURL=true git -c http.postbuffer=65536 \
		    clone --bare "$HTTPD_URL/smart/repo.git" split.git 2>err &&
	    {
		    test_have_prereq HTTP2 ||
		    grep "^=> Send header: Transfer-Encoding: chunked" err
	    }
    +++ GIT_TRACE_CURL=true
    +++ git -c http.postbuffer=65536 clone --bare http://127.0.0.1:5559/smart/repo.git split.git
    error: last command exited with $?=128
    not ok 30 - large fetch-pack requests can be sent using chunked encoding

(Less frequently, I've seen the last command exit 141.)

I haven't been able to track down the problem, which happens
a lot on the Fedora build system but less often during local
builds.  Obviously, the bug is too shy to show itself while
anyone is watching. ;)

I can easily disable the failing tests for a bit, but does
anyone have an idea what might be the cause or how to better
debug it when it occurs on a buildsystem without direct
access?

Thanks,

-- 
Todd

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: t5559-http-fetch-smart-http2 failures
  2022-11-30 22:09 t5559-http-fetch-smart-http2 failures Todd Zullinger
@ 2022-12-01 20:25 ` Jeff King
  2022-12-09  2:20   ` Todd Zullinger
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff King @ 2022-12-01 20:25 UTC (permalink / raw)
  To: Todd Zullinger; +Cc: git, Taylor Blau, Junio C Hamano

On Wed, Nov 30, 2022 at 05:09:34PM -0500, Todd Zullinger wrote:

> The changes in 73c49a4474 (t: run t5551 tests with both HTTP
> and HTTP/2, 2022-11-11) seem to rather frequently trigger
> test failures on Fedora.

Thanks for the report. I can't seem to reproduce here on my Debian
system, even with --stress.

I can try a Fedora chroot or vm, but it may take some time to set up. It
also sounds like you're not able to reproduce it locally on a Fedora
system? Have you tried "./t5559-* --stress"?

> The most frequent test to fail is "large fetch-pack requests
> can be sent using chunked encoding" (t5559.30), but earlier
> tests have also failed on occasion.  For the common failure,
> the test exits with:
> 
>     expecting success of 5559.30 'large fetch-pack requests can be sent using chunked encoding': 
> 	    GIT_TRACE_CURL=true git -c http.postbuffer=65536 \
> 		    clone --bare "$HTTPD_URL/smart/repo.git" split.git 2>err &&
> 	    {
> 		    test_have_prereq HTTP2 ||
> 		    grep "^=> Send header: Transfer-Encoding: chunked" err
> 	    }
>     +++ GIT_TRACE_CURL=true
>     +++ git -c http.postbuffer=65536 clone --bare http://127.0.0.1:5559/smart/repo.git split.git
>     error: last command exited with $?=128
>     not ok 30 - large fetch-pack requests can be sent using chunked encoding
> 
> (Less frequently, I've seen the last command exit 141.)

Interesting. 141 implies SIGPIPE, but that may be a red herring. The
operation isn't supposed to fail, but if it does due to the server
bailing, then the client sometimes hitting SIGPIPE is a known unrelated
problem.

Is it possible to see the contents of "err" here? That will have the
message from the client, but also the curl trace, which is probably
helpful.

It might also be interesting to see the server's view, which would be in
httpd/error.log in the trash directory (also access.log, but probably
that is not interesting).

> I can easily disable the failing tests for a bit, but does
> anyone have an idea what might be the cause or how to better
> debug it when it occurs on a buildsystem without direct
> access?

If t5551 isn't failing similarly, then presumably the problem is either:

  1. There is some race or bad interaction between curl and apache
     speaking http2 (possibly a buggy version of one?)

  2. http2 requires using mpm_event instead of mpm_prefork. Maybe this
     introduces problems with apache on this system?

  3. there is a bug in git. ;) This is always a possibility, of course,
     but we literally do nothing differently from our end, except pass
     it to curl via CURLOPT_HTTP_VERSION.

It would be interesting to see if doing this:

diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
index 0294739a77..b24ca44a95 100644
--- a/t/lib-httpd/apache.conf
+++ b/t/lib-httpd/apache.conf
@@ -73,16 +73,9 @@ LockFile accept.lock
 	LoadModule unixd_module modules/mod_unixd.so
 </IfModule>
 
-<IfDefine HTTP2>
 <IfModule !mod_mpm_event.c>
 	LoadModule mpm_event_module modules/mod_mpm_event.so
 </IfModule>
-</IfDefine>
-<IfDefine !HTTP2>
-<IfModule !mod_mpm_prefork.c>
-	LoadModule mpm_prefork_module modules/mod_mpm_prefork.so
-</IfModule>
-</IfDefine>
 </IfVersion>
 
 PassEnv GIT_VALGRIND

also causes t5551 to start failing. If so, then we can blame mpm_event,
and not http2.

-Peff

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: t5559-http-fetch-smart-http2 failures
  2022-12-01 20:25 ` Jeff King
@ 2022-12-09  2:20   ` Todd Zullinger
  2022-12-09 20:59     ` Jeff King
  0 siblings, 1 reply; 4+ messages in thread
From: Todd Zullinger @ 2022-12-09  2:20 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Taylor Blau, Junio C Hamano

Sorry for the delay.  I wanted to gather some data but could
only do it in small chunks at a time.  Hopefully that
doesn't make this too rambling and/or disjointed.

One upside is that I finally got around to teaching the
Fedora builds to include not just the failing test output,
but tar up the contents of test-results and any trash
directories on failures.  That makes it easier to poke
around when a build fails on a remote build system where I
can't get a shell. :)

Jeff King wrote:
> On Wed, Nov 30, 2022 at 05:09:34PM -0500, Todd Zullinger wrote:
> 
>> The changes in 73c49a4474 (t: run t5551 tests with both HTTP
>> and HTTP/2, 2022-11-11) seem to rather frequently trigger
>> test failures on Fedora.
> 
> Thanks for the report. I can't seem to reproduce here on my Debian
> system, even with --stress.

One notable difference between Debian/Ubuntu and Fedora is
that Debian/Ubuntu uses mod_http2 included with the upstream
Apache httpd source.  Fedora is using the newer, stand-alone
module (which is the upstream source for the http2 module).

Ubuntu 22.04.1 has httpd-2.4.52 with mod_http2-1.15.26 (per
MOD_HTTP2_VERSION in modules/http2/h2_version.h).

Fedora 37, 36, and rawhide have httpd-httpd-2.4.54 with
mod_http2-2.0.9.  They also have curl-7.86 in rawhide (where
I've done the most testing), 7.85.0 in 37, and 7.82.0 in 36.

Interestingly, if I build the same git source rpm against
RHEL-9 which has httpd-2.4.53 and mod_http2-1.15.19, I don't
see any failures.

I'm tempted to do a build of mod_http2-1.x or the embedded
mod_http2 and test with that, but I have not yet made time
to do so.

Searching for errors similar to what I found in the test
failures, I found an old curl issue and a related Fedora
bug:

    https://github.com/curl/curl/issues/3750
    https://bugzilla.redhat.com/show_bug.cgi?id=1690971

Of course, these are from 2019 so you'd think this
particular issue was long-gone.  But perhaps with mod_http2
still being actively developed, similar bugs can crop up.

> I can try a Fedora chroot or vm, but it may take some time to set up. It
> also sounds like you're not able to reproduce it locally on a Fedora
> system? Have you tried "./t5559-* --stress"?

I had, but at the time the failures I got seemed unrelated,
particularly because they triggered on earlier tests and
somewhat randomly.

After I changed the test to append $test_count to err so I
could see the reason for the failure more clearly, I think
they are more similar.

With --stress, http2 fails at various points with errors
like:

    expecting success of 5559.6 'fetch changes via http': 
	    echo content >>file &&
	    git commit -a -m two &&
	    git push public &&
	    (cd clone && git pull) &&
	    test_cmp file clone/file

    +++ echo content
    +++ git commit -a -m two
    [main 5647d5f] two
     Author: A U Thor <author@example.com>
     1 file changed, 1 insertion(+)
    +++ git push public
    To /builddir/build/BUILD/git-2.39.0.rc2/t/trash directory.t5559-http-fetch-smart-http2.stress-13/httpd/www/repo.git
       f39a923..5647d5f  main -> main
    +++ cd clone
    +++ git pull
    error: RPC failed; curl 16 Error in the HTTP2 framing layer
    fatal: expected flush after ref listing
    error: last command exited with $?=1
    not ok 6 - fetch changes via http

or sometimes with the same error in test 30, as below.

>> The most frequent test to fail is "large fetch-pack requests
>> can be sent using chunked encoding" (t5559.30), but earlier
>> tests have also failed on occasion.  For the common failure,
>> the test exits with:
>> 
>>     expecting success of 5559.30 'large fetch-pack requests can be sent using chunked encoding': 
>> 	    GIT_TRACE_CURL=true git -c http.postbuffer=65536 \
>> 		    clone --bare "$HTTPD_URL/smart/repo.git" split.git 2>err &&
>> 	    {
>> 		    test_have_prereq HTTP2 ||
>> 		    grep "^=> Send header: Transfer-Encoding: chunked" err
>> 	    }
>>     +++ GIT_TRACE_CURL=true
>>     +++ git -c http.postbuffer=65536 clone --bare http://127.0.0.1:5559/smart/repo.git split.git
>>     error: last command exited with $?=128
>>     not ok 30 - large fetch-pack requests can be sent using chunked encoding
>> 
>> (Less frequently, I've seen the last command exit 141.)
> 
> Interesting. 141 implies SIGPIPE, but that may be a red herring. The
> operation isn't supposed to fail, but if it does due to the server
> bailing, then the client sometimes hitting SIGPIPE is a known unrelated
> problem.
> 
> Is it possible to see the contents of "err" here? That will have the
> message from the client, but also the curl trace, which is probably
> helpful.

Appending -$test_count to err in all the places it's used in
the test helped here, e.g.:

    sed -Ei 's/\berr\b/&-$test_count/' t/t5551-http-fetch-smart.sh

With that, one of the more common errors is:

    error: RPC failed; HTTP 101 curl 92 HTTP/2 stream 1 was not closed cleanly before end of the underlying stream

others are:

    error: RPC failed; HTTP 101 curl 16 Error in the HTTP2 framing layer
    fatal: expected flush after ref listing

    error: RPC failed; HTTP 101 curl 16 Error in the HTTP2 framing layer

    error: RPC failed; curl 16 Error in the HTTP2 framing layer 
    fatal: expected 'packfile'

The full err output is usually around 5k lines, so it's not
ideal for inclusion here, but I've saved a number of them
and can either extract portions or post them somewhere and
link to them, if you think they might be useful.

> It might also be interesting to see the server's view, which would be in
> httpd/error.log in the trash directory (also access.log, but probably
> that is not interesting).

The error.log looks the same for the failed runs I've
collected:

    [Tue Dec 06 16:35:40.840250 2022] [mpm_event:notice] [pid 298654:tid 298654] AH00489: Apache/2.4.54 (Fedora Linux) configured -- resuming normal operations
    [Tue Dec 06 16:35:40.840388 2022] [core:notice] [pid 298654:tid 298654] AH00094: Command line: '/usr/sbin/httpd -d /tmp/git-t.oOfy/trash directory.t5559-http-fetch-smart-http2/httpd -f /builddir/build/BUILD/git-2.39.0.rc2/t/lib-httpd/apache.conf -D HTTP2 -c Listen 127.0.0.1:5559'
    [Tue Dec 06 16:35:42.372099 2022] [cgi:error] [pid 298658:tid 298768] [remote 127.0.0.1:55644] AH01215: stderr from /builddir/build/BUILD/git-2.39.0.rc2/git-http-backend: Unsupported service: getanyfile
    [Tue Dec 06 16:35:46.209758 2022] [cgi:error] [pid 300664:tid 300695] [client 127.0.0.1:60712] AH01215: stderr from /builddir/build/BUILD/git-2.39.0.rc2/git-http-backend: fatal: git upload-pack: not our ref 51c0379e3f0bea4537b50664334a941865f61b94
    [Tue Dec 06 16:35:46.912816 2022] [cgi:error] [pid 298659:tid 298663] [remote 127.0.0.1:60718] AH01215: stderr from /builddir/build/BUILD/git-2.39.0.rc2/git-http-backend: fatal: git upload-pack: not our ref 51c0379e3f0bea4537b50664334a941865f61b94
    [Tue Dec 06 16:35:53.962296 2022] [auth_basic:error] [pid 300664:tid 300707] [client 127.0.0.1:60926] AH01618: user bogus not found: /auth/smart/repo.git/info/refs
    [Tue Dec 06 16:35:56.627946 2022] [auth_basic:error] [pid 300664:tid 300713] [client 127.0.0.1:37644] AH01617: user user@host: authentication failure for "/auth/smart/repo.git/info/refs": Password Mismatch
    [Tue Dec 06 16:35:58.185699 2022] [mpm_event:notice] [pid 298654:tid 298654] AH00491: caught SIGTERM, shutting down

The LogLevel might need to be adjusted to get more useful
output there, perhaps?

> If t5551 isn't failing similarly, then presumably the problem is either:
> 
>   1. There is some race or bad interaction between curl and apache
>      speaking http2 (possibly a buggy version of one?)
> 
>   2. http2 requires using mpm_event instead of mpm_prefork. Maybe this
>      introduces problems with apache on this system?
> 
>   3. there is a bug in git. ;) This is always a possibility, of course,
>      but we literally do nothing differently from our end, except pass
>      it to curl via CURLOPT_HTTP_VERSION.
> 
> It would be interesting to see if doing this:
> 
> diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
> index 0294739a77..b24ca44a95 100644
> --- a/t/lib-httpd/apache.conf
> +++ b/t/lib-httpd/apache.conf
> @@ -73,16 +73,9 @@ LockFile accept.lock
>  	LoadModule unixd_module modules/mod_unixd.so
>  </IfModule>
>  
> -<IfDefine HTTP2>
>  <IfModule !mod_mpm_event.c>
>  	LoadModule mpm_event_module modules/mod_mpm_event.so
>  </IfModule>
> -</IfDefine>
> -<IfDefine !HTTP2>
> -<IfModule !mod_mpm_prefork.c>
> -	LoadModule mpm_prefork_module modules/mod_mpm_prefork.so
> -</IfModule>
> -</IfDefine>
>  </IfVersion>
>  
>  PassEnv GIT_VALGRIND
> 
> also causes t5551 to start failing. If so, then we can blame mpm_event,
> and not http2.

Good idea.  With that applied, I've still not seen a failure
in t5551, not even when run via --stress for some minutes.

I'm not sure whether any of this points to a bug in Git's
http2 code at all.  It _seems_ like it's going to be
elsewhere, in curl and/or httpd/mod_http2.  In other words,
your 1 above.

If nothing jumps out to point to a possible issue in git,
I'll extract a reproduction recipe from the test suite and
file a Fedora bug.  Maybe the folks who have looked at
similar issues in curl and httpd/mod_http2 will spot
something.

Thanks!

-- 
Todd

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: t5559-http-fetch-smart-http2 failures
  2022-12-09  2:20   ` Todd Zullinger
@ 2022-12-09 20:59     ` Jeff King
  0 siblings, 0 replies; 4+ messages in thread
From: Jeff King @ 2022-12-09 20:59 UTC (permalink / raw)
  To: Todd Zullinger; +Cc: git, Taylor Blau, Junio C Hamano

On Thu, Dec 08, 2022 at 09:20:00PM -0500, Todd Zullinger wrote:

> Sorry for the delay.  I wanted to gather some data but could
> only do it in small chunks at a time.  Hopefully that
> doesn't make this too rambling and/or disjointed.

No problem. I would be happy if all bug reports were this thorough and
clear. :)

> > Thanks for the report. I can't seem to reproduce here on my Debian
> > system, even with --stress.
> 
> One notable difference between Debian/Ubuntu and Fedora is
> that Debian/Ubuntu uses mod_http2 included with the upstream
> Apache httpd source.  Fedora is using the newer, stand-alone
> module (which is the upstream source for the http2 module).
> 
> Ubuntu 22.04.1 has httpd-2.4.52 with mod_http2-1.15.26 (per
> MOD_HTTP2_VERSION in modules/http2/h2_version.h).

Makes sense. I'm on Debian unstable, which is httpd-2.4.54 and
mod_http2-1.15.28 (the latter I pulled from "strings mod_http2.so", so
uh...it's probably right).

> Fedora 37, 36, and rawhide have httpd-httpd-2.4.54 with
> mod_http2-2.0.9.  They also have curl-7.86 in rawhide (where
> I've done the most testing), 7.85.0 in 37, and 7.82.0 in 36.

So that's the same httpd and curl version (7.86.0) I'm using, which
implies to me the problem is in the newer mod_http2 version. And...

> Interestingly, if I build the same git source rpm against
> RHEL-9 which has httpd-2.4.53 and mod_http2-1.15.19, I don't
> see any failures.

...that data point would be consistent with the theory.

> I'm tempted to do a build of mod_http2-1.x or the embedded
> mod_http2 and test with that, but I have not yet made time
> to do so.

Yeah, I suspect it would make the problem go away. In theory it may even
be possible to bisect within mod_http2, but I don't know how painful it
is to do a build+test cycle.

> With that, one of the more common errors is:
> 
>     error: RPC failed; HTTP 101 curl 92 HTTP/2 stream 1 was not closed cleanly before end of the underlying stream
> 
> others are:
> 
>     error: RPC failed; HTTP 101 curl 16 Error in the HTTP2 framing layer
>     fatal: expected flush after ref listing
> 
>     error: RPC failed; HTTP 101 curl 16 Error in the HTTP2 framing layer
> 
>     error: RPC failed; curl 16 Error in the HTTP2 framing layer 
>     fatal: expected 'packfile'

OK, that makes me pretty confident that there's nothing Git is doing
wrong here. I don't think we could stimulate a failure at that layer of
curl even if we wanted to. And those errors plus the racy nature of what
you're seeing really makes it look like there is just some race
condition or other bug in mod_http2 (possibly coupled with the mpm).

> The error.log looks the same for the failed runs I've
> collected:
> [...]
> The LogLevel might need to be adjusted to get more useful
> output there, perhaps?

Yeah, that output is not particularly enlightening. In a sense it's not
surprising, though. If there's a bug on the server side, we're not
likely to get a log line saying "sending garbage output to the client".
It's the client who sees the problem and hangs up. Turning up the log
level could help, but I'd be surprised.

> > also causes t5551 to start failing. If so, then we can blame mpm_event,
> > and not http2.
> 
> Good idea.  With that applied, I've still not seen a failure
> in t5551, not even when run via --stress for some minutes.

OK. So I think that rules out mpm_event being a problem by itself. It's
possible there's some bad interaction between mpm_event and mod_http2,
but it seems more likely there's simply a bug or race in mod_http2.

> I'm not sure whether any of this points to a bug in Git's
> http2 code at all.  It _seems_ like it's going to be
> elsewhere, in curl and/or httpd/mod_http2.  In other words,
> your 1 above.

My best guess is a bug in mod_http2. But one thing that it still _could_
be is that Git's server-side CGI interacts badly with mod_http2 somehow
(or maybe only some versions of it).

I guess some other things to try would be:

  1. Take Git out of the mix completely. If we stress-test command-line
     curl hitting our test apache, serving up static files, can we
     trigger the problem? If not, then...

  2. Try the same thing, but hit endpoints that trigger git-http-backend
     on the server side. If that fails, then we've absolved Git's client
     code, and the bug is either in mod_http2 or some bad interaction
     with Git's CGI output.

> If nothing jumps out to point to a possible issue in git,
> I'll extract a reproduction recipe from the test suite and
> file a Fedora bug.  Maybe the folks who have looked at
> similar issues in curl and httpd/mod_http2 will spot
> something.

You might try running the failing tests with GIT_TRACE_CURL set in the
environment. That should get you a pretty detailed view of what curl is
seeing, which would probably be helpful for a bug report.

-Peff

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-12-09 20:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-30 22:09 t5559-http-fetch-smart-http2 failures Todd Zullinger
2022-12-01 20:25 ` Jeff King
2022-12-09  2:20   ` Todd Zullinger
2022-12-09 20:59     ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).