git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Elia Pinto <gitter.spiros@gmail.com>
Cc: git@vger.kernel.org, tboegi@web.de, ramsay@ramsayjones.plus.com,
	gitster@pobox.com, sunshine@sunshineco.com
Subject: Re: [PATCHv4 1/2] http.c: implement the GIT_TRACE_CURL environment variable
Date: Thu, 28 Apr 2016 10:47:10 -0400	[thread overview]
Message-ID: <20160428144710.GD25364@sigill.intra.peff.net> (raw)
In-Reply-To: <20160428115748.37177-2-gitter.spiros@gmail.com>

On Thu, Apr 28, 2016 at 11:57:47AM +0000, Elia Pinto wrote:

> +static void curl_dump(const char *text, unsigned char *ptr, size_t size, char nohex, char nopriv)

We usually use "int" for our boolean flags. Space savings don't matter
outside of a struct (and if they did, you should be using a single flags
field), and this way the user does not have to guess whether the "char"
is significant.

It looks like we never pass anything but "1" for nohex. Can we drop this
parameter entirely? But see below...

> +{
> +	size_t i;
> +	struct strbuf out = STRBUF_INIT;
> +	unsigned int width = 0x10;
> +
> +	/* without the hex output, we can fit more on screen */
> +	if (nohex) width = 0x50;

Maybe it is just me, but I think this is more readable using decimal
constants. I mind it less in checking ASCII values like 0x20, but here I
think just saying "80" is more customary.

> +	for (i = 0; i < size; i += width) {
> +		size_t w;
> +		strbuf_addf(&out, "%s: ", text);

I really like this new format. Doing:

  GIT_TRACE_CURL=1 git ... 2>&1 | grep '=> Send header: '

is very readable.

However, I did run into an interesting case. The output looks like:

  10:24:04.540803 http.c:527              => Send header: Host: github.com
  10:24:04.540809 http.c:527              => Send header: x
  10:24:04.540811 http.c:527              => Send header: User-Agent: git/2.8.1.341.g2caf4c9.dirty

What's that weird "x" line?

It turns out that the line before it is:

  Authorization: Basic some-really-long-opaque-token-that-ends-in-x

Since we break at a newline _or_ at the width, that gets broken onto the
following line. The Authorization line hits the code below to suppress
the output.

So not only do I find the breaking of the line hard to read, but it
means we may leak data from the Authorization line that got broken into
the next chunk (here it was only one character, but with a sufficiently
long header, it could be real data).

So I think we probably want to _just_ break at newlines, however long
they are.

But that probably isn't a good idea for binary data. So I'd suggest that
sending/receiving headers break on newlines, and actual body data should
respect the width field (we may still have line-oriented data in the
body which would be easier to read without line-breaking, but if you are
debugging that you are better off with GIT_TRACE_PACKET anyway).

> +		 for (w = 0; (w < width) && (i + w < size); w++) {
> +			   if (nohex && (i + w + 1 < size) && ptr[i + w] == '\r'
> +				&& ptr[i + w + 1] == '\n') {
> +				    i += (w + 2 - width);
> +				    break;
> +			   }

This loop puzzled me for a bit. When we end early due to a newline, we
subtract out the width here. I guess that's to accomodate the "i +=
width" that the outer for-loop is going to do.

If you follow my suggestion above to split the code paths for
line-oriented and fixed-width data, then this all gets much simpler.

> +		/* if we are called with nopriv we skip the Authorization field if present
> +		 * and print a blank line
> +		*/
> +		if ( nopriv && strstr(out.buf, "Authorization:"))
> +			strbuf_reset(&out);

Style: multi-line comments should look like:

  /*
   * the comment
   * goes here
   */

and there should be no whitespace after the opening "(".

Removing the field entirely may be a bit confusing when you're
debugging. Instead, perhaps we can just redact the interesting bits,
like:

diff --git a/http.c b/http.c
index 8ab0adc..30e8858 100644
--- a/http.c
+++ b/http.c
@@ -481,7 +481,11 @@ static void curl_dump(const char *text, unsigned char *ptr, size_t size, char no
 
 	for (i = 0; i < size; i += width) {
 		size_t w;
+		size_t prefix_len;
+		const char *header;
+
 		strbuf_addf(&out, "%s: ", text);
+		prefix_len = out.len;
 		if (!nohex) {
 			for (w = 0; w < width; w++)
 				if (i + w < size)
@@ -507,8 +511,17 @@ static void curl_dump(const char *text, unsigned char *ptr, size_t size, char no
 		/* if we are called with nopriv we skip the Authorization field if present
 		 * and print a blank line
 		*/
-		if ( nopriv && strstr(out.buf, "Authorization:"))
-			strbuf_reset(&out);
+		if (nopriv &&
+		    skip_prefix(out.buf + prefix_len, "Authorization:", &header)) {
+			/* The first token is the type, which is OK to log */
+			while (isspace(*header))
+				header++;
+			while (*header && !isspace(*header))
+				header++;
+			/* Everything else is opaque and possibly sensitive */
+			strbuf_setlen(&out, header - out.buf);
+			strbuf_addstr(&out, " <redacted>");
+		}
 
 		strbuf_addch(&out, '\n');
 		trace_strbuf(&trace_curl, &out);

That tells the viewer that we did in fact send the header (which is
useful to know), and which type it used.

> +		strbuf_addch(&out, '\n');
> +		trace_strbuf(&trace_curl, &out);
> +		strbuf_release(&out);
> +	}
> +}

This is the only strbuf_release() in the function, and it's inside the
loop. Yet we use the strbuf to print the initial line (and do a reset()
after). So if the field we get is 0 bytes, we'll leak the strbuf memory
used by the initial line.

I don't know if that's possible with curl or not. But just in case, we
could structure the loop more like:

  ... output initial line ...
  for (i = 0; i < size; i += width) {
	strbuf_reset(&out);
	... output data line ...
  }
  strbuf_release(&out);

That has the added bonus that we do not have to reallocate for each
iteration of the loop (we just reset the length back to zero each time,
and then free the memory at the very end).

-Peff

  reply	other threads:[~2016-04-28 14:47 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-28 11:57 [PATCHv4 0/2] Implement the GIT_TRACE_CURL environment variable Elia Pinto
2016-04-28 11:57 ` [PATCHv4 1/2] http.c: implement " Elia Pinto
2016-04-28 14:47   ` Jeff King [this message]
2016-04-28 17:35     ` Junio C Hamano
2016-04-28 17:26   ` Stefan Beller
2016-04-28 17:44     ` Jeff King
2016-04-28 17:48       ` Stefan Beller
2016-04-28 18:05         ` Jeff King
2016-04-28 11:57 ` [PATCHv4 2/2] imap-send.c: introduce the GIT_TRACE_CURL enviroment variable Elia Pinto
2016-04-28 14:55   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160428144710.GD25364@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=gitter.spiros@gmail.com \
    --cc=ramsay@ramsayjones.plus.com \
    --cc=sunshine@sunshineco.com \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).