From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeff King Subject: Re: [PATCHv4 1/2] http.c: implement the GIT_TRACE_CURL environment variable Date: Thu, 28 Apr 2016 10:47:10 -0400 Message-ID: <20160428144710.GD25364@sigill.intra.peff.net> References: <20160428115748.37177-1-gitter.spiros@gmail.com> <20160428115748.37177-2-gitter.spiros@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: git@vger.kernel.org, tboegi@web.de, ramsay@ramsayjones.plus.com, gitster@pobox.com, sunshine@sunshineco.com To: Elia Pinto X-From: git-owner@vger.kernel.org Thu Apr 28 16:47:47 2016 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1avnEI-0005KP-LU for gcvg-git-2@plane.gmane.org; Thu, 28 Apr 2016 16:47:47 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752861AbcD1OrZ (ORCPT ); Thu, 28 Apr 2016 10:47:25 -0400 Received: from cloud.peff.net ([50.56.180.127]:58375 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1752819AbcD1OrY (ORCPT ); Thu, 28 Apr 2016 10:47:24 -0400 Received: (qmail 29089 invoked by uid 102); 28 Apr 2016 14:47:13 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.84) with SMTP; Thu, 28 Apr 2016 10:47:13 -0400 Received: (qmail 10031 invoked by uid 107); 28 Apr 2016 14:47:14 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.84) with SMTP; Thu, 28 Apr 2016 10:47:14 -0400 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Thu, 28 Apr 2016 10:47:10 -0400 Content-Disposition: inline In-Reply-To: <20160428115748.37177-2-gitter.spiros@gmail.com> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: On Thu, Apr 28, 2016 at 11:57:47AM +0000, Elia Pinto wrote: > +static void curl_dump(const char *text, unsigned char *ptr, size_t size, char nohex, char nopriv) We usually use "int" for our boolean flags. Space savings don't matter outside of a struct (and if they did, you should be using a single flags field), and this way the user does not have to guess whether the "char" is significant. It looks like we never pass anything but "1" for nohex. Can we drop this parameter entirely? But see below... > +{ > + size_t i; > + struct strbuf out = STRBUF_INIT; > + unsigned int width = 0x10; > + > + /* without the hex output, we can fit more on screen */ > + if (nohex) width = 0x50; Maybe it is just me, but I think this is more readable using decimal constants. I mind it less in checking ASCII values like 0x20, but here I think just saying "80" is more customary. > + for (i = 0; i < size; i += width) { > + size_t w; > + strbuf_addf(&out, "%s: ", text); I really like this new format. Doing: GIT_TRACE_CURL=1 git ... 2>&1 | grep '=> Send header: ' is very readable. However, I did run into an interesting case. The output looks like: 10:24:04.540803 http.c:527 => Send header: Host: github.com 10:24:04.540809 http.c:527 => Send header: x 10:24:04.540811 http.c:527 => Send header: User-Agent: git/2.8.1.341.g2caf4c9.dirty What's that weird "x" line? It turns out that the line before it is: Authorization: Basic some-really-long-opaque-token-that-ends-in-x Since we break at a newline _or_ at the width, that gets broken onto the following line. The Authorization line hits the code below to suppress the output. So not only do I find the breaking of the line hard to read, but it means we may leak data from the Authorization line that got broken into the next chunk (here it was only one character, but with a sufficiently long header, it could be real data). So I think we probably want to _just_ break at newlines, however long they are. But that probably isn't a good idea for binary data. So I'd suggest that sending/receiving headers break on newlines, and actual body data should respect the width field (we may still have line-oriented data in the body which would be easier to read without line-breaking, but if you are debugging that you are better off with GIT_TRACE_PACKET anyway). > + for (w = 0; (w < width) && (i + w < size); w++) { > + if (nohex && (i + w + 1 < size) && ptr[i + w] == '\r' > + && ptr[i + w + 1] == '\n') { > + i += (w + 2 - width); > + break; > + } This loop puzzled me for a bit. When we end early due to a newline, we subtract out the width here. I guess that's to accomodate the "i += width" that the outer for-loop is going to do. If you follow my suggestion above to split the code paths for line-oriented and fixed-width data, then this all gets much simpler. > + /* if we are called with nopriv we skip the Authorization field if present > + * and print a blank line > + */ > + if ( nopriv && strstr(out.buf, "Authorization:")) > + strbuf_reset(&out); Style: multi-line comments should look like: /* * the comment * goes here */ and there should be no whitespace after the opening "(". Removing the field entirely may be a bit confusing when you're debugging. Instead, perhaps we can just redact the interesting bits, like: diff --git a/http.c b/http.c index 8ab0adc..30e8858 100644 --- a/http.c +++ b/http.c @@ -481,7 +481,11 @@ static void curl_dump(const char *text, unsigned char *ptr, size_t size, char no for (i = 0; i < size; i += width) { size_t w; + size_t prefix_len; + const char *header; + strbuf_addf(&out, "%s: ", text); + prefix_len = out.len; if (!nohex) { for (w = 0; w < width; w++) if (i + w < size) @@ -507,8 +511,17 @@ static void curl_dump(const char *text, unsigned char *ptr, size_t size, char no /* if we are called with nopriv we skip the Authorization field if present * and print a blank line */ - if ( nopriv && strstr(out.buf, "Authorization:")) - strbuf_reset(&out); + if (nopriv && + skip_prefix(out.buf + prefix_len, "Authorization:", &header)) { + /* The first token is the type, which is OK to log */ + while (isspace(*header)) + header++; + while (*header && !isspace(*header)) + header++; + /* Everything else is opaque and possibly sensitive */ + strbuf_setlen(&out, header - out.buf); + strbuf_addstr(&out, " "); + } strbuf_addch(&out, '\n'); trace_strbuf(&trace_curl, &out); That tells the viewer that we did in fact send the header (which is useful to know), and which type it used. > + strbuf_addch(&out, '\n'); > + trace_strbuf(&trace_curl, &out); > + strbuf_release(&out); > + } > +} This is the only strbuf_release() in the function, and it's inside the loop. Yet we use the strbuf to print the initial line (and do a reset() after). So if the field we get is 0 bytes, we'll leak the strbuf memory used by the initial line. I don't know if that's possible with curl or not. But just in case, we could structure the loop more like: ... output initial line ... for (i = 0; i < size; i += width) { strbuf_reset(&out); ... output data line ... } strbuf_release(&out); That has the added bonus that we do not have to reallocate for each iteration of the loop (we just reset the length back to zero each time, and then free the memory at the very end). -Peff