git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH/RFC] push: anonymize URL in error output
@ 2017-08-23  9:49 Ivan Vyshnevskyi
  2017-08-23 10:57 ` Lars Schneider
  2017-08-23 15:58 ` Jeff King
  0 siblings, 2 replies; 7+ messages in thread
From: Ivan Vyshnevskyi @ 2017-08-23  9:49 UTC (permalink / raw)
  To: git; +Cc: Brandon Williams, Jeff King

Commits 47abd85 (fetch: Strip usernames from url's before storing them,
2009-04-17) and later 882d49c (push: anonymize URL in status output,
2016-07-14) made fetch and push strip the authentication part of the
remote URLs when used in the merge-commit messages or status outputs.
The URLs that are part of the error messages were not anonymized.

A commonly used pattern for storing artifacts from a build server in a
remote repository utilizes a "secure" environment variable with
credentials to embed them in the URL and execute a push. Given enough
runs, an intermittent network failure will cause a push to fail, leaving
a non-anonymized URL in the build log.

To prevent that, reuse the same anonymizing function to scrub
credentials from URL in the push error output.

Signed-off-by: Ivan Vyshnevskyi <sainaen@gmail.com>
---

This is my first attempt to propose a patch, sorry if I did something wrong!

I've tested my changes on Travis CI, and the build is green [1].

Not sure how much of the background should be included in the commit message.
The "commonly used pattern" I mention could be found in the myriad of
online tutorials and looks something like this:

    git push -fq https://$GIT_CREDS@github.com/$REPO_SLUG

Note, that a lot of developers mistakenly assume that '--quiet' ('-q') will
suppress all output. Sometimes, they would redirect standard output to
/dev/null, without realizing that errors are printed out to stderr.

I didn't mention this in the commit, but another typical offender is a tool that
calls 'git push' as part of its execution. This is a spectrum that starts with
shell scripts, includes simple one-task apps in Python or Ruby, and ends with
the plugins for JavaScript, Java, Groovy, and Scala build tools. (I'd like to
avoid shaming their authors publicly, but could send you a few examples
privately.)

I gathered the data by going through build logs of popular open source projects
(and projects of their contributors) hosted on GitHub and built by Travis CI.
I found about 2.3k unique credentials (but only about nine hundred were active
at the time), and more than a half of those were exposed by a failed push. See
the advisory from Travis CI [2] for results of their own scan.

While the issue is public for several months now and addressed on Travis CI,
I keep finding build logs with credentials on the internet. So I think it's
worth fixing in the git itself.

[1]: https://travis-ci.org/sainaen/git/builds/267180560
[2]: https://blog.travis-ci.com/2017-05-08-security-advisory

 builtin/push.c             |  2 +-
 t/t5541-http-push-smart.sh | 18 ++++++++++++++++++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/builtin/push.c b/builtin/push.c
index 03846e837..59f3bc975 100644
--- a/builtin/push.c
+++ b/builtin/push.c
@@ -336,7 +336,7 @@ static int push_with_options(struct transport *transport, int flags)
 	err = transport_push(transport, refspec_nr, refspec, flags,
 			     &reject_reasons);
 	if (err != 0)
-		error(_("failed to push some refs to '%s'"), transport->url);
+		error(_("failed to push some refs to '%s'"), transport_anonymize_url(transport->url));
 
 	err |= transport_disconnect(transport);
 	if (!err)
diff --git a/t/t5541-http-push-smart.sh b/t/t5541-http-push-smart.sh
index d38bf3247..0b6fb6252 100755
--- a/t/t5541-http-push-smart.sh
+++ b/t/t5541-http-push-smart.sh
@@ -377,5 +377,23 @@ test_expect_success 'push status output scrubs password' '
 	grep "^To $HTTPD_URL/smart/test_repo.git" status
 '
 
+cat >"$HTTPD_DOCUMENT_ROOT_PATH/test_repo.git/hooks/update" <<EOF
+#!/bin/sh
+exit 1
+EOF
+chmod a+x "$HTTPD_DOCUMENT_ROOT_PATH/test_repo.git/hooks/update"
+
+cat >exp <<EOF
+error: failed to push some refs to '$HTTPD_URL/smart/test_repo.git'
+EOF
+
+test_expect_success 'failed push status output scrubs password' '
+	cd "$ROOT_PATH"/test_repo_clone &&
+	test_must_fail git push "$HTTPD_URL_USER_PASS/smart/test_repo.git" +HEAD:scrub_err 2>stderr &&
+	grep "^error: failed to push some refs" stderr >act &&
+	test_i18ncmp exp act
+'
+rm -f "$HTTPD_DOCUMENT_ROOT_PATH/test_repo.git/hooks/update"
+
 stop_httpd
 test_done
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH/RFC] push: anonymize URL in error output
  2017-08-23  9:49 [PATCH/RFC] push: anonymize URL in error output Ivan Vyshnevskyi
@ 2017-08-23 10:57 ` Lars Schneider
  2017-08-24 18:58   ` Ivan Vyshnevskyi
  2017-08-23 15:58 ` Jeff King
  1 sibling, 1 reply; 7+ messages in thread
From: Lars Schneider @ 2017-08-23 10:57 UTC (permalink / raw)
  To: Ivan Vyshnevskyi; +Cc: git, Brandon Williams, Jeff King


> On 23 Aug 2017, at 11:49, Ivan Vyshnevskyi <sainaen@gmail.com> wrote:
> 
> Commits 47abd85 (fetch: Strip usernames from url's before storing them,
> 2009-04-17) and later 882d49c (push: anonymize URL in status output,
> 2016-07-14) made fetch and push strip the authentication part of the
> remote URLs when used in the merge-commit messages or status outputs.
> The URLs that are part of the error messages were not anonymized.
> 
> A commonly used pattern for storing artifacts from a build server in a
> remote repository utilizes a "secure" environment variable with
> credentials to embed them in the URL and execute a push. Given enough
> runs, an intermittent network failure will cause a push to fail, leaving
> a non-anonymized URL in the build log.
> 
> To prevent that, reuse the same anonymizing function to scrub
> credentials from URL in the push error output.
> 
> Signed-off-by: Ivan Vyshnevskyi <sainaen@gmail.com>
> ---
> 
> This is my first attempt to propose a patch, sorry if I did something wrong!
> 
> I've tested my changes on Travis CI, and the build is green [1].
> 
> Not sure how much of the background should be included in the commit message.
> The "commonly used pattern" I mention could be found in the myriad of
> online tutorials and looks something like this:
> 
>    git push -fq https://$GIT_CREDS@github.com/$REPO_SLUG
> 
> Note, that a lot of developers mistakenly assume that '--quiet' ('-q') will
> suppress all output. Sometimes, they would redirect standard output to
> /dev/null, without realizing that errors are printed out to stderr.
> 
> I didn't mention this in the commit, but another typical offender is a tool that
> calls 'git push' as part of its execution. This is a spectrum that starts with
> shell scripts, includes simple one-task apps in Python or Ruby, and ends with
> the plugins for JavaScript, Java, Groovy, and Scala build tools. (I'd like to
> avoid shaming their authors publicly, but could send you a few examples
> privately.)
> 
> I gathered the data by going through build logs of popular open source projects
> (and projects of their contributors) hosted on GitHub and built by Travis CI.
> I found about 2.3k unique credentials (but only about nine hundred were active
> at the time), and more than a half of those were exposed by a failed push. See
> the advisory from Travis CI [2] for results of their own scan.
> 
> While the issue is public for several months now and addressed on Travis CI,
> I keep finding build logs with credentials on the internet. So I think it's
> worth fixing in the git itself.
> 
> [1]: https://travis-ci.org/sainaen/git/builds/267180560
> [2]: https://blog.travis-ci.com/2017-05-08-security-advisory
> 

This sounds very reasonable to me! Thanks for the contribution!

I wonder if we should even extend this. Consider this case:

  $ git push https://lars:secret@server/org/repo1
  remote: Invalid username or password.
  fatal: Authentication failed for 'https://lars:secret@server/org/repo1/'

I might not have valid credentials for repo1 but my credentials could
very well be valid for repo2.

- Lars


> builtin/push.c             |  2 +-
> t/t5541-http-push-smart.sh | 18 ++++++++++++++++++
> 2 files changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/push.c b/builtin/push.c
> index 03846e837..59f3bc975 100644
> --- a/builtin/push.c
> +++ b/builtin/push.c
> @@ -336,7 +336,7 @@ static int push_with_options(struct transport *transport, int flags)
> 	err = transport_push(transport, refspec_nr, refspec, flags,
> 			     &reject_reasons);
> 	if (err != 0)
> -		error(_("failed to push some refs to '%s'"), transport->url);
> +		error(_("failed to push some refs to '%s'"), transport_anonymize_url(transport->url));
> 
> 	err |= transport_disconnect(transport);
> 	if (!err)
> diff --git a/t/t5541-http-push-smart.sh b/t/t5541-http-push-smart.sh
> index d38bf3247..0b6fb6252 100755
> --- a/t/t5541-http-push-smart.sh
> +++ b/t/t5541-http-push-smart.sh
> @@ -377,5 +377,23 @@ test_expect_success 'push status output scrubs password' '
> 	grep "^To $HTTPD_URL/smart/test_repo.git" status
> '
> 
> +cat >"$HTTPD_DOCUMENT_ROOT_PATH/test_repo.git/hooks/update" <<EOF
> +#!/bin/sh
> +exit 1
> +EOF
> +chmod a+x "$HTTPD_DOCUMENT_ROOT_PATH/test_repo.git/hooks/update"
> +
> +cat >exp <<EOF
> +error: failed to push some refs to '$HTTPD_URL/smart/test_repo.git'
> +EOF
> +
> +test_expect_success 'failed push status output scrubs password' '
> +	cd "$ROOT_PATH"/test_repo_clone &&
> +	test_must_fail git push "$HTTPD_URL_USER_PASS/smart/test_repo.git" +HEAD:scrub_err 2>stderr &&
> +	grep "^error: failed to push some refs" stderr >act &&
> +	test_i18ncmp exp act
> +'
> +rm -f "$HTTPD_DOCUMENT_ROOT_PATH/test_repo.git/hooks/update"
> +
> stop_httpd
> test_done
> -- 
> 2.14.1
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH/RFC] push: anonymize URL in error output
  2017-08-23  9:49 [PATCH/RFC] push: anonymize URL in error output Ivan Vyshnevskyi
  2017-08-23 10:57 ` Lars Schneider
@ 2017-08-23 15:58 ` Jeff King
  2017-08-24 19:01   ` Ivan Vyshnevskyi
  1 sibling, 1 reply; 7+ messages in thread
From: Jeff King @ 2017-08-23 15:58 UTC (permalink / raw)
  To: Ivan Vyshnevskyi; +Cc: git, Brandon Williams

On Wed, Aug 23, 2017 at 12:49:29PM +0300, Ivan Vyshnevskyi wrote:

> Commits 47abd85 (fetch: Strip usernames from url's before storing them,
> 2009-04-17) and later 882d49c (push: anonymize URL in status output,
> 2016-07-14) made fetch and push strip the authentication part of the
> remote URLs when used in the merge-commit messages or status outputs.
> The URLs that are part of the error messages were not anonymized.
> 
> A commonly used pattern for storing artifacts from a build server in a
> remote repository utilizes a "secure" environment variable with
> credentials to embed them in the URL and execute a push. Given enough
> runs, an intermittent network failure will cause a push to fail, leaving
> a non-anonymized URL in the build log.
> 
> To prevent that, reuse the same anonymizing function to scrub
> credentials from URL in the push error output.

This makes sense. I suspect that most errors we output should be using
the anonymized URL. Did you poke around for other calls?

The general structure of the patch looks good, but I have a few minor
comments below.

> Not sure how much of the background should be included in the commit message.
> The "commonly used pattern" I mention could be found in the myriad of
> online tutorials and looks something like this:

My knee-jerk reaction is if it's worth writing after the dashes, it's
worth putting in the commit message.

However, in the case I think it is OK as-is (the motivation of "we
already avoid leaking auth info to stdout, so we should do the same for
error messages" seems self-contained and reasonable).

> diff --git a/builtin/push.c b/builtin/push.c
> index 03846e837..59f3bc975 100644
> --- a/builtin/push.c
> +++ b/builtin/push.c
> @@ -336,7 +336,7 @@ static int push_with_options(struct transport *transport, int flags)
>  	err = transport_push(transport, refspec_nr, refspec, flags,
>  			     &reject_reasons);
>  	if (err != 0)
> -		error(_("failed to push some refs to '%s'"), transport->url);
> +		error(_("failed to push some refs to '%s'"), transport_anonymize_url(transport->url));

This leaks the return value. That's probably not a _huge_ deal since the
program is likely to exit, but it's a bad pattern. I wonder if we should
be setting up transport->anonymous_url preemptively, and just let its
memory belong to the transport struct.

> diff --git a/t/t5541-http-push-smart.sh b/t/t5541-http-push-smart.sh
> index d38bf3247..0b6fb6252 100755
> --- a/t/t5541-http-push-smart.sh
> +++ b/t/t5541-http-push-smart.sh
> @@ -377,5 +377,23 @@ test_expect_success 'push status output scrubs password' '
>  	grep "^To $HTTPD_URL/smart/test_repo.git" status
>  '
>  
> +cat >"$HTTPD_DOCUMENT_ROOT_PATH/test_repo.git/hooks/update" <<EOF
> +#!/bin/sh
> +exit 1
> +EOF
> +chmod a+x "$HTTPD_DOCUMENT_ROOT_PATH/test_repo.git/hooks/update"
> +
> +cat >exp <<EOF
> +error: failed to push some refs to '$HTTPD_URL/smart/test_repo.git'
> +EOF

I know the t5541 script, which is old and messy, led you into these bad
constructs. But usually in modern tests we:

 1. Try to keep all commands inside test_expect blocks to catch
    unexpected failures or unwanted output.

 2. Use write_script for writing scripts, like:

      write_script "$HTTPD_DOCUMENT_ROOT_PATH/test_repo.git/hooks/update" <<-\EOF
      exit 1
      EOF

 3. Backslash our here-doc delimiter to suppress interpolation.

> +test_expect_success 'failed push status output scrubs password' '
> +	cd "$ROOT_PATH"/test_repo_clone &&
> +	test_must_fail git push "$HTTPD_URL_USER_PASS/smart/test_repo.git" +HEAD:scrub_err 2>stderr &&
> +	grep "^error: failed to push some refs" stderr >act &&
> +	test_i18ncmp exp act
> +'
> +rm -f "$HTTPD_DOCUMENT_ROOT_PATH/test_repo.git/hooks/update"

Similarly, this "rm" should probably be a test_when_finished in the
block with the write_script (unless you really need to carry it over
several test_expect blocks, in which case there should be an explicit
test_expect cleaning it up).

Instead of grepping for the exact error, should we instead grep for the
password to make sure it is not present on _any_ line?

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH/RFC] push: anonymize URL in error output
  2017-08-23 10:57 ` Lars Schneider
@ 2017-08-24 18:58   ` Ivan Vyshnevskyi
  0 siblings, 0 replies; 7+ messages in thread
From: Ivan Vyshnevskyi @ 2017-08-24 18:58 UTC (permalink / raw)
  To: Lars Schneider; +Cc: git, Brandon Williams, Jeff King

On 23/08/17 13:57, Lars Schneider wrote:
> 
>> On 23 Aug 2017, at 11:49, Ivan Vyshnevskyi <sainaen@gmail.com> wrote:
>>
>> Commits 47abd85 (fetch: Strip usernames from url's before storing them,
>> 2009-04-17) and later 882d49c (push: anonymize URL in status output,
>> 2016-07-14) made fetch and push strip the authentication part of the
>> remote URLs when used in the merge-commit messages or status outputs.
>> The URLs that are part of the error messages were not anonymized.
>>
>> A commonly used pattern for storing artifacts from a build server in a
>> remote repository utilizes a "secure" environment variable with
>> credentials to embed them in the URL and execute a push. Given enough
>> runs, an intermittent network failure will cause a push to fail, leaving
>> a non-anonymized URL in the build log.
>>
>> To prevent that, reuse the same anonymizing function to scrub
>> credentials from URL in the push error output.
>>
>> Signed-off-by: Ivan Vyshnevskyi <sainaen@gmail.com>
>> ---
>>
>> This is my first attempt to propose a patch, sorry if I did something wrong!
>>
>> I've tested my changes on Travis CI, and the build is green [1].
>>
>> Not sure how much of the background should be included in the commit message.
>> The "commonly used pattern" I mention could be found in the myriad of
>> online tutorials and looks something like this:
>>
>>    git push -fq https://$GIT_CREDS@github.com/$REPO_SLUG
>>
>> Note, that a lot of developers mistakenly assume that '--quiet' ('-q') will
>> suppress all output. Sometimes, they would redirect standard output to
>> /dev/null, without realizing that errors are printed out to stderr.
>>
>> I didn't mention this in the commit, but another typical offender is a tool that
>> calls 'git push' as part of its execution. This is a spectrum that starts with
>> shell scripts, includes simple one-task apps in Python or Ruby, and ends with
>> the plugins for JavaScript, Java, Groovy, and Scala build tools. (I'd like to
>> avoid shaming their authors publicly, but could send you a few examples
>> privately.)
>>
>> I gathered the data by going through build logs of popular open source projects
>> (and projects of their contributors) hosted on GitHub and built by Travis CI.
>> I found about 2.3k unique credentials (but only about nine hundred were active
>> at the time), and more than a half of those were exposed by a failed push. See
>> the advisory from Travis CI [2] for results of their own scan.
>>
>> While the issue is public for several months now and addressed on Travis CI,
>> I keep finding build logs with credentials on the internet. So I think it's
>> worth fixing in the git itself.
>>
>> [1]: https://travis-ci.org/sainaen/git/builds/267180560
>> [2]: https://blog.travis-ci.com/2017-05-08-security-advisory
>>
> 
> This sounds very reasonable to me! Thanks for the contribution!>
Thank you!

> I wonder if we should even extend this. Consider this case:
> 
>   $ git push https://lars:secret@server/org/repo1
>   remote: Invalid username or password.
>   fatal: Authentication failed for 'https://lars:secret@server/org/repo1/'
> 
> I might not have valid credentials for repo1 but my credentials could
> very well be valid for repo2.
> 
> - Lars
> 
Yeah, you're right. In fact, there was a similar scenario:
1. maintainer creates a "<something>-bot" GitHub account to use for
pushing from CI back to the repository, but forgets to add this new
account to the project's organization
2. then they update the build to do the push with new credentials
3. auto-triggered build fails because the "*-bot" has no access yet

After that, they'd typically revoke exposed token and create a new one,
but sometimes they forget, and so the active token stays in the build log.

I found the place where this and couple of other errors seem to be
emitted ('discover_refs()' in remote-curl.c), but, to be honest, it just
seemed harder to figure out for the first patch: do I include the
transport.h here just to use this one function or should I copy it over?
Or move it to url.c (I guess?) and replace other usages?

Though, with some help, I'd be happy to tackle harder cases too. :)

> 
>> builtin/push.c             |  2 +-
>> t/t5541-http-push-smart.sh | 18 ++++++++++++++++++
>> 2 files changed, 19 insertions(+), 1 deletion(-)
>>
>> diff --git a/builtin/push.c b/builtin/push.c
>> index 03846e837..59f3bc975 100644
>> --- a/builtin/push.c
>> +++ b/builtin/push.c
>> @@ -336,7 +336,7 @@ static int push_with_options(struct transport *transport, int flags)
>> 	err = transport_push(transport, refspec_nr, refspec, flags,
>> 			     &reject_reasons);
>> 	if (err != 0)
>> -		error(_("failed to push some refs to '%s'"), transport->url);
>> +		error(_("failed to push some refs to '%s'"), transport_anonymize_url(transport->url));
>>
>> 	err |= transport_disconnect(transport);
>> 	if (!err)
>> diff --git a/t/t5541-http-push-smart.sh b/t/t5541-http-push-smart.sh
>> index d38bf3247..0b6fb6252 100755
>> --- a/t/t5541-http-push-smart.sh
>> +++ b/t/t5541-http-push-smart.sh
>> @@ -377,5 +377,23 @@ test_expect_success 'push status output scrubs password' '
>> 	grep "^To $HTTPD_URL/smart/test_repo.git" status
>> '
>>
>> +cat >"$HTTPD_DOCUMENT_ROOT_PATH/test_repo.git/hooks/update" <<EOF
>> +#!/bin/sh
>> +exit 1
>> +EOF
>> +chmod a+x "$HTTPD_DOCUMENT_ROOT_PATH/test_repo.git/hooks/update"
>> +
>> +cat >exp <<EOF
>> +error: failed to push some refs to '$HTTPD_URL/smart/test_repo.git'
>> +EOF
>> +
>> +test_expect_success 'failed push status output scrubs password' '
>> +	cd "$ROOT_PATH"/test_repo_clone &&
>> +	test_must_fail git push "$HTTPD_URL_USER_PASS/smart/test_repo.git" +HEAD:scrub_err 2>stderr &&
>> +	grep "^error: failed to push some refs" stderr >act &&
>> +	test_i18ncmp exp act
>> +'
>> +rm -f "$HTTPD_DOCUMENT_ROOT_PATH/test_repo.git/hooks/update"
>> +
>> stop_httpd
>> test_done
>> -- 
>> 2.14.1
>>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH/RFC] push: anonymize URL in error output
  2017-08-23 15:58 ` Jeff King
@ 2017-08-24 19:01   ` Ivan Vyshnevskyi
  2017-08-25 19:37     ` Brandon Williams
  0 siblings, 1 reply; 7+ messages in thread
From: Ivan Vyshnevskyi @ 2017-08-24 19:01 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Brandon Williams

On 23/08/17 18:58, Jeff King wrote:
> On Wed, Aug 23, 2017 at 12:49:29PM +0300, Ivan Vyshnevskyi wrote:
> 
>> Commits 47abd85 (fetch: Strip usernames from url's before storing them,
>> 2009-04-17) and later 882d49c (push: anonymize URL in status output,
>> 2016-07-14) made fetch and push strip the authentication part of the
>> remote URLs when used in the merge-commit messages or status outputs.
>> The URLs that are part of the error messages were not anonymized.
>>
>> A commonly used pattern for storing artifacts from a build server in a
>> remote repository utilizes a "secure" environment variable with
>> credentials to embed them in the URL and execute a push. Given enough
>> runs, an intermittent network failure will cause a push to fail, leaving
>> a non-anonymized URL in the build log.
>>
>> To prevent that, reuse the same anonymizing function to scrub
>> credentials from URL in the push error output.
> 
> This makes sense. I suspect that most errors we output should be using
> the anonymized URL. Did you poke around for other calls?
Yes, I tried to check and unfortunately there are couple of places with
possible leaks:
* 'discover_refs()' in remote-curl.c when there's a HTTP error (see a
real-life scenario with an authz error in my response to Lars) -- is it
ok to include transport.h just to use one function or is there a cleaner
way?
* 'setup_push_upstream()' in push.c when a command doesn't have a branch
names (haven't saw problems with this in the wild, but could occur
during the CI setup) -- for this one, probably anonymization should
happen when the 'remote->name' field is set in the 'make_remote()'; same
question though, is it ok to include transport.h here?

Also there's an case of verbose output: I'm not sure I should change it,
but it does print out the non-anonymized URLs at least during push.
> 
> The general structure of the patch looks good, but I have a few minor
> comments below.
> 
>> Not sure how much of the background should be included in the commit message.
>> The "commonly used pattern" I mention could be found in the myriad of
>> online tutorials and looks something like this:
> 
> My knee-jerk reaction is if it's worth writing after the dashes, it's
> worth putting in the commit message.
> 
> However, in the case I think it is OK as-is (the motivation of "we
> already avoid leaking auth info to stdout, so we should do the same for
> error messages" seems self-contained and reasonable)
Well, I tend to be wordy, and most of the commit messages I saw were
rather short, so decided to split. Wonder, if maybe example command
should be included without the rest of it. Would it be useful?
> 
>> diff --git a/builtin/push.c b/builtin/push.c
>> index 03846e837..59f3bc975 100644
>> --- a/builtin/push.c
>> +++ b/builtin/push.c
>> @@ -336,7 +336,7 @@ static int push_with_options(struct transport *transport, int flags)
>>  	err = transport_push(transport, refspec_nr, refspec, flags,
>>  			     &reject_reasons);
>>  	if (err != 0)
>> -		error(_("failed to push some refs to '%s'"), transport->url);
>> +		error(_("failed to push some refs to '%s'"), transport_anonymize_url(transport->url));
> 
> This leaks the return value. That's probably not a _huge_ deal since the
> program is likely to exit, but it's a bad pattern. I wonder if we should
> be setting up transport->anonymous_url preemptively, and just let its
> memory belong to the transport struct.
Ah. Thanks! I knew I'd fail in the memory management even with the
one-line patch. :)

About 'transport->anonymous_url': not sure if it's worth it. There are
four calls to 'transport_anonymize_url' right now and it looks like the
new one in my patch is the first that has a transport struct instance
near by. The next likely candidate for update 'discover_refs()' also
gets the url as an argument.
> 
>> diff --git a/t/t5541-http-push-smart.sh b/t/t5541-http-push-smart.sh
>> index d38bf3247..0b6fb6252 100755
>> --- a/t/t5541-http-push-smart.sh
>> +++ b/t/t5541-http-push-smart.sh
>> @@ -377,5 +377,23 @@ test_expect_success 'push status output scrubs password' '
>>  	grep "^To $HTTPD_URL/smart/test_repo.git" status
>>  '
>>  
>> +cat >"$HTTPD_DOCUMENT_ROOT_PATH/test_repo.git/hooks/update" <<EOF
>> +#!/bin/sh
>> +exit 1
>> +EOF
>> +chmod a+x "$HTTPD_DOCUMENT_ROOT_PATH/test_repo.git/hooks/update"
>> +
>> +cat >exp <<EOF
>> +error: failed to push some refs to '$HTTPD_URL/smart/test_repo.git'
>> +EOF
> 
> I know the t5541 script, which is old and messy, led you into these bad
> constructs. But usually in modern tests we:
> 
>  1. Try to keep all commands inside test_expect blocks to catch
>     unexpected failures or unwanted output.
> 
>  2. Use write_script for writing scripts, like:
> 
>       write_script "$HTTPD_DOCUMENT_ROOT_PATH/test_repo.git/hooks/update" <<-\EOF
>       exit 1
>       EOF
> 
>  3. Backslash our here-doc delimiter to suppress interpolation.
> 
>> +test_expect_success 'failed push status output scrubs password' '
>> +	cd "$ROOT_PATH"/test_repo_clone &&
>> +	test_must_fail git push "$HTTPD_URL_USER_PASS/smart/test_repo.git" +HEAD:scrub_err 2>stderr &&
>> +	grep "^error: failed to push some refs" stderr >act &&
>> +	test_i18ncmp exp act
>> +'
>> +rm -f "$HTTPD_DOCUMENT_ROOT_PATH/test_repo.git/hooks/update"
> 
> Similarly, this "rm" should probably be a test_when_finished in the
> block with the write_script (unless you really need to carry it over
> several test_expect blocks, in which case there should be an explicit
> test_expect cleaning it up).
Thanks! You're right. I just followed examples in the file.
Updated [1], will send with the next patch version.

> 
> Instead of grepping for the exact error, should we instead grep for the
> password to make sure it is not present on _any_ line?
> 
> -Peff
> 
One possible issue I see is that this will make it overlap with the
'push status output scrubs password' case above. But if it's not a
problem, I can replace last two lines with just a 'test_i18ngrep !'

[1]:
https://github.com/sainaen/git/blob/af17713/t/t5541-http-push-smart.sh#L380-L392

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH/RFC] push: anonymize URL in error output
  2017-08-24 19:01   ` Ivan Vyshnevskyi
@ 2017-08-25 19:37     ` Brandon Williams
  2017-08-26 19:00       ` Jeff King
  0 siblings, 1 reply; 7+ messages in thread
From: Brandon Williams @ 2017-08-25 19:37 UTC (permalink / raw)
  To: Ivan Vyshnevskyi; +Cc: Jeff King, git

On 08/24, Ivan Vyshnevskyi wrote:
> On 23/08/17 18:58, Jeff King wrote:
> > On Wed, Aug 23, 2017 at 12:49:29PM +0300, Ivan Vyshnevskyi wrote:
> > 
> >> Commits 47abd85 (fetch: Strip usernames from url's before storing them,
> >> 2009-04-17) and later 882d49c (push: anonymize URL in status output,
> >> 2016-07-14) made fetch and push strip the authentication part of the
> >> remote URLs when used in the merge-commit messages or status outputs.
> >> The URLs that are part of the error messages were not anonymized.
> >>
> >> A commonly used pattern for storing artifacts from a build server in a
> >> remote repository utilizes a "secure" environment variable with
> >> credentials to embed them in the URL and execute a push. Given enough
> >> runs, an intermittent network failure will cause a push to fail, leaving
> >> a non-anonymized URL in the build log.
> >>
> >> To prevent that, reuse the same anonymizing function to scrub
> >> credentials from URL in the push error output.
> > 
> > This makes sense. I suspect that most errors we output should be using
> > the anonymized URL. Did you poke around for other calls?
> Yes, I tried to check and unfortunately there are couple of places with
> possible leaks:
> * 'discover_refs()' in remote-curl.c when there's a HTTP error (see a
> real-life scenario with an authz error in my response to Lars) -- is it
> ok to include transport.h just to use one function or is there a cleaner
> way?
> * 'setup_push_upstream()' in push.c when a command doesn't have a branch
> names (haven't saw problems with this in the wild, but could occur
> during the CI setup) -- for this one, probably anonymization should
> happen when the 'remote->name' field is set in the 'make_remote()'; same
> question though, is it ok to include transport.h here?
> 
> Also there's an case of verbose output: I'm not sure I should change it,
> but it does print out the non-anonymized URLs at least during push.
> > 
> > The general structure of the patch looks good, but I have a few minor
> > comments below.
> > 
> >> Not sure how much of the background should be included in the commit message.
> >> The "commonly used pattern" I mention could be found in the myriad of
> >> online tutorials and looks something like this:
> > 
> > My knee-jerk reaction is if it's worth writing after the dashes, it's
> > worth putting in the commit message.
> > 
> > However, in the case I think it is OK as-is (the motivation of "we
> > already avoid leaking auth info to stdout, so we should do the same for
> > error messages" seems self-contained and reasonable)
> Well, I tend to be wordy, and most of the commit messages I saw were
> rather short, so decided to split. Wonder, if maybe example command
> should be included without the rest of it. Would it be useful?

I'm guilty of writing short commit messages (something I need to work
on) but when looking through logs I much prefer to see longer messages
explaining rationals and trade-offs.

> > 
> >> diff --git a/builtin/push.c b/builtin/push.c
> >> index 03846e837..59f3bc975 100644
> >> --- a/builtin/push.c
> >> +++ b/builtin/push.c
> >> @@ -336,7 +336,7 @@ static int push_with_options(struct transport *transport, int flags)
> >>  	err = transport_push(transport, refspec_nr, refspec, flags,
> >>  			     &reject_reasons);
> >>  	if (err != 0)
> >> -		error(_("failed to push some refs to '%s'"), transport->url);
> >> +		error(_("failed to push some refs to '%s'"), transport_anonymize_url(transport->url));
> > 
> > This leaks the return value. That's probably not a _huge_ deal since the
> > program is likely to exit, but it's a bad pattern. I wonder if we should
> > be setting up transport->anonymous_url preemptively, and just let its
> > memory belong to the transport struct.
> Ah. Thanks! I knew I'd fail in the memory management even with the
> one-line patch. :)
> 
> About 'transport->anonymous_url': not sure if it's worth it. There are
> four calls to 'transport_anonymize_url' right now and it looks like the
> new one in my patch is the first that has a transport struct instance
> near by. The next likely candidate for update 'discover_refs()' also
> gets the url as an argument.
> > 
> >> diff --git a/t/t5541-http-push-smart.sh b/t/t5541-http-push-smart.sh
> >> index d38bf3247..0b6fb6252 100755
> >> --- a/t/t5541-http-push-smart.sh
> >> +++ b/t/t5541-http-push-smart.sh
> >> @@ -377,5 +377,23 @@ test_expect_success 'push status output scrubs password' '
> >>  	grep "^To $HTTPD_URL/smart/test_repo.git" status
> >>  '
> >>  
> >> +cat >"$HTTPD_DOCUMENT_ROOT_PATH/test_repo.git/hooks/update" <<EOF
> >> +#!/bin/sh
> >> +exit 1
> >> +EOF
> >> +chmod a+x "$HTTPD_DOCUMENT_ROOT_PATH/test_repo.git/hooks/update"
> >> +
> >> +cat >exp <<EOF
> >> +error: failed to push some refs to '$HTTPD_URL/smart/test_repo.git'
> >> +EOF
> > 
> > I know the t5541 script, which is old and messy, led you into these bad
> > constructs. But usually in modern tests we:
> > 
> >  1. Try to keep all commands inside test_expect blocks to catch
> >     unexpected failures or unwanted output.
> > 
> >  2. Use write_script for writing scripts, like:
> > 
> >       write_script "$HTTPD_DOCUMENT_ROOT_PATH/test_repo.git/hooks/update" <<-\EOF
> >       exit 1
> >       EOF
> > 
> >  3. Backslash our here-doc delimiter to suppress interpolation.
> > 
> >> +test_expect_success 'failed push status output scrubs password' '
> >> +	cd "$ROOT_PATH"/test_repo_clone &&
> >> +	test_must_fail git push "$HTTPD_URL_USER_PASS/smart/test_repo.git" +HEAD:scrub_err 2>stderr &&
> >> +	grep "^error: failed to push some refs" stderr >act &&
> >> +	test_i18ncmp exp act
> >> +'
> >> +rm -f "$HTTPD_DOCUMENT_ROOT_PATH/test_repo.git/hooks/update"
> > 
> > Similarly, this "rm" should probably be a test_when_finished in the
> > block with the write_script (unless you really need to carry it over
> > several test_expect blocks, in which case there should be an explicit
> > test_expect cleaning it up).
> Thanks! You're right. I just followed examples in the file.
> Updated [1], will send with the next patch version.
> 
> > 
> > Instead of grepping for the exact error, should we instead grep for the
> > password to make sure it is not present on _any_ line?
> > 
> > -Peff
> > 
> One possible issue I see is that this will make it overlap with the
> 'push status output scrubs password' case above. But if it's not a
> problem, I can replace last two lines with just a 'test_i18ngrep !'
> 
> [1]:
> https://github.com/sainaen/git/blob/af17713/t/t5541-http-push-smart.sh#L380-L392

-- 
Brandon Williams

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH/RFC] push: anonymize URL in error output
  2017-08-25 19:37     ` Brandon Williams
@ 2017-08-26 19:00       ` Jeff King
  0 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2017-08-26 19:00 UTC (permalink / raw)
  To: Brandon Williams; +Cc: Ivan Vyshnevskyi, git

On Fri, Aug 25, 2017 at 12:37:43PM -0700, Brandon Williams wrote:

> > > My knee-jerk reaction is if it's worth writing after the dashes, it's
> > > worth putting in the commit message.
> > > 
> > > However, in the case I think it is OK as-is (the motivation of "we
> > > already avoid leaking auth info to stdout, so we should do the same for
> > > error messages" seems self-contained and reasonable)
> > Well, I tend to be wordy, and most of the commit messages I saw were
> > rather short, so decided to split. Wonder, if maybe example command
> > should be included without the rest of it. Would it be useful?
> 
> I'm guilty of writing short commit messages (something I need to work
> on) but when looking through logs I much prefer to see longer messages
> explaining rationals and trade-offs.

I think as with all writing, there is both "too short" and "too long".
I like having those extra bits in the commit message, too, but you have
to make sure they don't drown out the main points.

I find that when a message gets long, it often benefits from revising
and re-ordering. E.g., having an intro paragraph that explains the
motivation and solution in one sentence, and _then_ go into the details
for people who are really digging into the details. Good organization
lets readers get just the level they need and skip the rest.

Easier said than done, of course. :)

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-08-26 19:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-23  9:49 [PATCH/RFC] push: anonymize URL in error output Ivan Vyshnevskyi
2017-08-23 10:57 ` Lars Schneider
2017-08-24 18:58   ` Ivan Vyshnevskyi
2017-08-23 15:58 ` Jeff King
2017-08-24 19:01   ` Ivan Vyshnevskyi
2017-08-25 19:37     ` Brandon Williams
2017-08-26 19:00       ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).