git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v2] travis-ci: retry if Git for Windows CI returns HTTP error 502 or 503
@ 2017-05-03 21:50 Lars Schneider
  2017-05-04  9:19 ` Johannes Schindelin
  2017-05-09  6:31 ` Junio C Hamano
  0 siblings, 2 replies; 5+ messages in thread
From: Lars Schneider @ 2017-05-03 21:50 UTC (permalink / raw)
  To: git; +Cc: gitster, Johannes.Schindelin

The Git for Windows CI web app sometimes returns HTTP errors of
"502 bad gateway" or "503 service unavailable" [1]. We also need to
check the HTTP content because the GfW web app seems to pass through
(error) results from other Azure calls with HTTP code 200.
Wait a little and retry the request if this happens.

[1] https://docs.microsoft.com/en-in/azure/app-service-web/app-service-web-troubleshoot-http-502-http-503

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---

Hi Junio,

I can't really test this as my TravisCI account does not have the
extended timeout and I am unable to reproduce the error.

It would be great if we could test this is a little bit in pu.

Thanks,
Lars

Notes:
    Base Ref: next
    Web-Diff: https://github.com/larsxschneider/git/commit/af0f0f0eb8
    Checkout: git fetch https://github.com/larsxschneider/git travisci/win-retry-v2 && git checkout af0f0f0eb8

    Interdiff (v1..v2):

    diff --git a/ci/run-windows-build.sh b/ci/run-windows-build.sh
    index 7a9aa9c6a7..3e5a0abee0 100755
    --- a/ci/run-windows-build.sh
    +++ b/ci/run-windows-build.sh
    @@ -14,26 +14,33 @@ COMMIT=$2

     gfwci () {
     	local CURL_ERROR_CODE HTTP_CODE
    -	exec 3>&1
    +	CONTENT_FILE=$(mktemp -t "git-windows-ci-XXXXXX")
     	while test -z $HTTP_CODE
     	do
     	HTTP_CODE=$(curl \
     		-H "Authentication: Bearer $GFW_CI_TOKEN" \
     		--silent --retry 5 --write-out '%{HTTP_CODE}' \
    -		--output >(sed "$(printf '1s/^\xef\xbb\xbf//')" >cat >&3) \
    +		--output >(sed "$(printf '1s/^\xef\xbb\xbf//')" >$CONTENT_FILE) \
     		"https://git-for-windows-ci.azurewebsites.net/api/TestNow?$1" \
     	)
     	CURL_ERROR_CODE=$?
     		# The GfW CI web app sometimes returns HTTP errors of
     		# "502 bad gateway" or "503 service unavailable".
    -		# Wait a little and retry if it happens. More info:
    +		# We also need to check the HTTP content because the GfW web
    +		# app seems to pass through (error) results from other Azure
    +		# calls with HTTP code 200.
    +		# Wait a little and retry if we detect this error. More info:
     		# https://docs.microsoft.com/en-in/azure/app-service-web/app-service-web-troubleshoot-http-502-http-503
    -		if test $HTTP_CODE -eq 502 || test $HTTP_CODE -eq 503
    +		if test $HTTP_CODE -eq 502 ||
    +		   test $HTTP_CODE -eq 503 ||
    +		   grep "502 - Web server received an invalid response" $CONTENT_FILE >/dev/null
     		then
     			sleep 10
     			HTTP_CODE=
     		fi
     	done
    +	cat $CONTENT_FILE
    +	rm $CONTENT_FILE
     	if test $CURL_ERROR_CODE -ne 0
     	then
     		return $CURL_ERROR_CODE

    \0

 ci/run-windows-build.sh | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/ci/run-windows-build.sh b/ci/run-windows-build.sh
index e043440799..3e5a0abee0 100755
--- a/ci/run-windows-build.sh
+++ b/ci/run-windows-build.sh
@@ -14,14 +14,33 @@ COMMIT=$2

 gfwci () {
 	local CURL_ERROR_CODE HTTP_CODE
-	exec 3>&1
+	CONTENT_FILE=$(mktemp -t "git-windows-ci-XXXXXX")
+	while test -z $HTTP_CODE
+	do
 	HTTP_CODE=$(curl \
 		-H "Authentication: Bearer $GFW_CI_TOKEN" \
 		--silent --retry 5 --write-out '%{HTTP_CODE}' \
-		--output >(sed "$(printf '1s/^\xef\xbb\xbf//')" >cat >&3) \
+		--output >(sed "$(printf '1s/^\xef\xbb\xbf//')" >$CONTENT_FILE) \
 		"https://git-for-windows-ci.azurewebsites.net/api/TestNow?$1" \
 	)
 	CURL_ERROR_CODE=$?
+		# The GfW CI web app sometimes returns HTTP errors of
+		# "502 bad gateway" or "503 service unavailable".
+		# We also need to check the HTTP content because the GfW web
+		# app seems to pass through (error) results from other Azure
+		# calls with HTTP code 200.
+		# Wait a little and retry if we detect this error. More info:
+		# https://docs.microsoft.com/en-in/azure/app-service-web/app-service-web-troubleshoot-http-502-http-503
+		if test $HTTP_CODE -eq 502 ||
+		   test $HTTP_CODE -eq 503 ||
+		   grep "502 - Web server received an invalid response" $CONTENT_FILE >/dev/null
+		then
+			sleep 10
+			HTTP_CODE=
+		fi
+	done
+	cat $CONTENT_FILE
+	rm $CONTENT_FILE
 	if test $CURL_ERROR_CODE -ne 0
 	then
 		return $CURL_ERROR_CODE

base-commit: 1ea7e62026c5dde4d8be80b2544696fc6aa70121
--
2.12.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] travis-ci: retry if Git for Windows CI returns HTTP error 502 or 503
  2017-05-03 21:50 [PATCH v2] travis-ci: retry if Git for Windows CI returns HTTP error 502 or 503 Lars Schneider
@ 2017-05-04  9:19 ` Johannes Schindelin
  2017-05-09  6:31 ` Junio C Hamano
  1 sibling, 0 replies; 5+ messages in thread
From: Johannes Schindelin @ 2017-05-04  9:19 UTC (permalink / raw)
  To: Lars Schneider; +Cc: git, gitster

Hi Lars,


On Wed, 3 May 2017, Lars Schneider wrote:

> The Git for Windows CI web app sometimes returns HTTP errors of
> "502 bad gateway" or "503 service unavailable" [1]. We also need to
> check the HTTP content because the GfW web app seems to pass through
> (error) results from other Azure calls with HTTP code 200.
> Wait a little and retry the request if this happens.

Thanks. In theory, it would be better to fix the web app to pass through
also the 502 error code, in practice I have a hard time finding the time
to make it so ;-)

Therefore, I would be very much in favor of the current version of the
patch.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] travis-ci: retry if Git for Windows CI returns HTTP error 502 or 503
  2017-05-03 21:50 [PATCH v2] travis-ci: retry if Git for Windows CI returns HTTP error 502 or 503 Lars Schneider
  2017-05-04  9:19 ` Johannes Schindelin
@ 2017-05-09  6:31 ` Junio C Hamano
  2017-05-09 17:40   ` Lars Schneider
  1 sibling, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2017-05-09  6:31 UTC (permalink / raw)
  To: Lars Schneider; +Cc: git, Johannes.Schindelin

Lars Schneider <larsxschneider@gmail.com> writes:

> The Git for Windows CI web app sometimes returns HTTP errors of
> "502 bad gateway" or "503 service unavailable" [1]. We also need to
> check the HTTP content because the GfW web app seems to pass through
> (error) results from other Azure calls with HTTP code 200.
> Wait a little and retry the request if this happens.
>
> [1] https://docs.microsoft.com/en-in/azure/app-service-web/app-service-web-troubleshoot-http-502-http-503
>
> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
> ---
>
> Hi Junio,
>
> I can't really test this as my TravisCI account does not have the
> extended timeout and I am unable to reproduce the error.
>
> It would be great if we could test this is a little bit in pu.

This has been in 'pu' for a while.  

As the patch simply discards 502 (and others), it is unclear if the
failing test on 'next' is now gone, or the attempt to run 'pu'
happened to be lucky not to get one, from the output we can see in
https://travis-ci.org/git/git/jobs/229867212

Are you comfortable enough to move this forward?  It's not like a
possible breakage in this patch will harm anything (the relaying to
the Windows CI is flaky if the build server cannot deal with the
load anyway), so I would rather have this early in 'next', while we
deal with a few other topics that Windows build is not happy with
that are on 'pu'.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] travis-ci: retry if Git for Windows CI returns HTTP error 502 or 503
  2017-05-09  6:31 ` Junio C Hamano
@ 2017-05-09 17:40   ` Lars Schneider
  2017-05-09 23:50     ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Lars Schneider @ 2017-05-09 17:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Johannes.Schindelin


> On 09 May 2017, at 07:31, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Lars Schneider <larsxschneider@gmail.com> writes:
> 
>> The Git for Windows CI web app sometimes returns HTTP errors of
>> "502 bad gateway" or "503 service unavailable" [1]. We also need to
>> check the HTTP content because the GfW web app seems to pass through
>> (error) results from other Azure calls with HTTP code 200.
>> Wait a little and retry the request if this happens.
>> 
>> [1] https://docs.microsoft.com/en-in/azure/app-service-web/app-service-web-troubleshoot-http-502-http-503
>> 
>> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
>> ---
>> 
>> Hi Junio,
>> 
>> I can't really test this as my TravisCI account does not have the
>> extended timeout and I am unable to reproduce the error.
>> 
>> It would be great if we could test this is a little bit in pu.
> 
> This has been in 'pu' for a while.  
> 
> As the patch simply discards 502 (and others), it is unclear if the
> failing test on 'next' is now gone, or the attempt to run 'pu'
> happened to be lucky not to get one, from the output we can see in
> https://travis-ci.org/git/git/jobs/229867212
> 
> Are you comfortable enough to move this forward?

Yes, please move it forward. I haven't seen a "502 - Web server 
received an invalid response" on pu for a while. That means the
patch should work as expected.


Unrelated to this patch I have, however, seen two kinds of timeouts:

(1) Timeout in the "notStarted" state. This job eventually finished
with a failure but it did start only *after* 3h:
https://travis-ci.org/git/git/jobs/230225611

(2) Timeout in the "in progress" state. This job eventually finished
successfully but it took longer than 3h:
https://travis-ci.org/git/git/jobs/229867248

Right now the timeout generates potential false negative results. 
I would like to change that and respond with a successful build 
*before* we approach the 3h timeout. This means we could generate
false positives. Although this is not ideal, I think that is the better 
compromise as a failing Windows build would usually fail quickly 
(e.g. in the compile step).

What do you guys think? Would you be OK with that reasoning?
If the Git for Windows builds get more stable over time then
we could reevaluate this compromise.


- Lars

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] travis-ci: retry if Git for Windows CI returns HTTP error 502 or 503
  2017-05-09 17:40   ` Lars Schneider
@ 2017-05-09 23:50     ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2017-05-09 23:50 UTC (permalink / raw)
  To: Lars Schneider; +Cc: git, Johannes.Schindelin

Lars Schneider <larsxschneider@gmail.com> writes:

>>> It would be great if we could test this is a little bit in pu.
>> 
>> This has been in 'pu' for a while.  
>> 
>> As the patch simply discards 502 (and others), it is unclear if the
>> failing test on 'next' is now gone, or the attempt to run 'pu'
>> happened to be lucky not to get one, from the output we can see in
>> https://travis-ci.org/git/git/jobs/229867212
>> 
>> Are you comfortable enough to move this forward?
>
> Yes, please move it forward. I haven't seen a "502 - Web server 
> received an invalid response" on pu for a while. That means the
> patch should work as expected.

Will do, thanks.

> Unrelated to this patch I have, however, seen two kinds of timeouts:
>
> (1) Timeout in the "notStarted" state. This job eventually finished
> with a failure but it did start only *after* 3h:
> https://travis-ci.org/git/git/jobs/230225611
>
> (2) Timeout in the "in progress" state. This job eventually finished
> successfully but it took longer than 3h:
> https://travis-ci.org/git/git/jobs/229867248
>
> Right now the timeout generates potential false negative results. 
> I would like to change that and respond with a successful build 
> *before* we approach the 3h timeout. This means we could generate
> false positives. Although this is not ideal, I think that is the better 
> compromise as a failing Windows build would usually fail quickly 
> (e.g. in the compile step).
>
> What do you guys think? Would you be OK with that reasoning?
> If the Git for Windows builds get more stable over time then
> we could reevaluate this compromise.

I'd rather see a false breakage on Windows build (i.e. "this might
have succeeded given enough time, but it didn't finish within the
alloted time") than a false sucess (i.e. "we successfully launched
and the build is still running, so let's assume the test succeeds").
Because I do not pay attention to what the overall build page [*1*]
says about a particular branch tip, and I instead look at the
summary list of the indiviaul "Build Jobs", e.g. [*2*]), seeing
errored/failed on [*1*] does not bother me personally, if that is
what you are getting at.


[References]

*1* https://travis-ci.org/git/git/builds/
*2* https://travis-ci.org/git/git/builds/230235081

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-05-09 23:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-03 21:50 [PATCH v2] travis-ci: retry if Git for Windows CI returns HTTP error 502 or 503 Lars Schneider
2017-05-04  9:19 ` Johannes Schindelin
2017-05-09  6:31 ` Junio C Hamano
2017-05-09 17:40   ` Lars Schneider
2017-05-09 23:50     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).