git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Martin Langhoff" <martin.langhoff@gmail.com>,
	"Git Mailing List" <git@vger.kernel.org>
Subject: [PATCH 1/3] transport_anonymize_url(): support retaining username
Date: Sun, 19 May 2019 01:10:32 -0400	[thread overview]
Message-ID: <20190519051031.GA19434@sigill.intra.peff.net> (raw)
In-Reply-To: <20190519050724.GA26179@sigill.intra.peff.net>

When we anonymize URLs to show in messages, we strip out both the
username and password (if any). But there are also contexts where we
should strip out the password (to avoid leaking it) but retain the
username.

Let's generalize transport_anonymize_url() to support both cases. We'll
give it a new name since the password-only mode isn't really
"anonymizing", but keep the old name as a synonym to avoid disrupting
existing callers.

Note that there are actually three places we parse URLs, and this
functionality _could_ go into any of them:

  - transport_anonymize_url(), which we modify here

  - the urlmatch.c code parses a URL into its constituent parts, from
    which we could easily remove the elements we want to drop and
    re-format it as a single URL. But its parsing also normalizes
    elements (e.g., downcasing hostnames).  This isn't wrong, but it's
    more friendly if we can leave the rest of the URL untouched.

  - credential_form_url() parses a URL and decodes the specific
    elements, but it's hard to convert it back into a regular URL. It
    treats "host:port" as a single unit, meaning it needs to be
    re-encoded specially (since a colon would otherwise end
    percent-encoded).

Since transport_anonymize_url() seemed closest to what we want here, I
used that as the base.

Signed-off-by: Jeff King <peff@peff.net>
---
I think it would be beneficial to unify these three cases under a single
parser, but it seemed like too big a rabbit hole for this topic. Of the
three, the urlmatch one seems the most mature. I think if we could
simply separate the normalization from the parsing/decoding, the others
could build on top of it. It might also require some careful thinking
about how pseudo-urls like ssh "host:path" interact.

I won't call that a #leftoverbits, because it's more of a feast. :)

 transport.c | 21 ++++++++++++++-------
 transport.h | 11 ++++++++++-
 2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/transport.c b/transport.c
index f1fcd2c4b0..ba61e57295 100644
--- a/transport.c
+++ b/transport.c
@@ -1335,11 +1335,7 @@ int transport_disconnect(struct transport *transport)
 	return ret;
 }
 
-/*
- * Strip username (and password) from a URL and return
- * it in a newly allocated string.
- */
-char *transport_anonymize_url(const char *url)
+char *transport_strip_url(const char *url, int strip_user)
 {
 	char *scheme_prefix, *anon_part;
 	size_t anon_len, prefix_len = 0;
@@ -1348,7 +1344,10 @@ char *transport_anonymize_url(const char *url)
 	if (url_is_local_not_ssh(url) || !anon_part)
 		goto literal_copy;
 
-	anon_len = strlen(++anon_part);
+	anon_len = strlen(anon_part);
+	if (strip_user)
+		anon_part++;
+
 	scheme_prefix = strstr(url, "://");
 	if (!scheme_prefix) {
 		if (!strchr(anon_part, ':'))
@@ -1373,7 +1372,15 @@ char *transport_anonymize_url(const char *url)
 		cp = strchr(scheme_prefix + 3, '/');
 		if (cp && cp < anon_part)
 			goto literal_copy;
-		prefix_len = scheme_prefix - url + 3;
+
+		if (strip_user)
+			prefix_len = scheme_prefix - url + 3;
+		else {
+			cp = strchr(scheme_prefix + 3, ':');
+			if (cp && cp > anon_part)
+				goto literal_copy; /* username only */
+			prefix_len = cp - url;
+		}
 	}
 	return xstrfmt("%.*s%.*s", (int)prefix_len, url,
 		       (int)anon_len, anon_part);
diff --git a/transport.h b/transport.h
index 06e06d3d89..6d8c99ac91 100644
--- a/transport.h
+++ b/transport.h
@@ -243,10 +243,19 @@ const struct ref *transport_get_remote_refs(struct transport *transport,
 int transport_fetch_refs(struct transport *transport, struct ref *refs);
 void transport_unlock_pack(struct transport *transport);
 int transport_disconnect(struct transport *transport);
-char *transport_anonymize_url(const char *url);
 void transport_take_over(struct transport *transport,
 			 struct child_process *child);
 
+/*
+ * Strip password and optionally username from a URL and return
+ * it in a newly allocated string (even if nothing was stripped).
+ */
+char *transport_strip_url(const char *url, int strip_username);
+static inline char *transport_anonymize_url(const char *url)
+{
+	return transport_strip_url(url, 1);
+}
+
 int transport_connect(struct transport *transport, const char *name,
 		      const char *exec, int fd[2]);
 
-- 
2.22.0.rc0.583.g23d90da2b3


  reply	other threads:[~2019-05-19 18:37 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-15 17:49 Git ransom campaign incident report - May 2019 Martin Langhoff
2019-05-15 18:59 ` Ævar Arnfjörð Bjarmason
2019-05-16  4:27   ` Jeff King
2019-05-17 19:39     ` Johannes Schindelin
2019-05-17 22:20       ` Jeff King
2019-05-17 23:13         ` Martin Langhoff
2019-05-19  5:07         ` Jeff King
2019-05-19  5:10           ` Jeff King [this message]
2019-05-19 23:28             ` [PATCH 1/3] transport_anonymize_url(): support retaining username Eric Sunshine
2019-05-20 16:14             ` René Scharfe
2019-05-20 16:36             ` Johannes Schindelin
2019-05-20 16:43             ` Johannes Schindelin
2019-05-19  5:12           ` [PATCH 2/3] clone: avoid storing URL passwords in config Jeff King
2019-05-19  5:16           ` [PATCH 3/3] clone: auto-enable git-credential-store when necessary Jeff King
2019-05-20 11:28             ` Eric Sunshine
2019-05-20 12:31               ` Jeff King
2019-05-20 16:48                 ` Johannes Schindelin
2019-05-20 13:56             ` Ævar Arnfjörð Bjarmason
2019-05-20 14:08               ` Jeff King
2019-05-20 15:17                 ` Ævar Arnfjörð Bjarmason
2019-05-20 15:24                   ` Jeff King
2019-05-20 17:08             ` Ævar Arnfjörð Bjarmason
2019-05-20 14:43           ` Git ransom campaign incident report - May 2019 Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190519051031.GA19434@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=martin.langhoff@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).