git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] sanitizing error message contents
@ 2017-01-11 14:01 Jeff King
  2017-01-11 14:02 ` [PATCH 1/2] Revert "vreportf: avoid intermediate buffer" Jeff King
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Jeff King @ 2017-01-11 14:01 UTC (permalink / raw)
  To: git

When adding a warning() call in 50d341374 (http: make redirects more
obvious, 2016-12-06), somebody brought up that evil servers can redirect
you to something like:

  https://evil.example.com/some/repo?unused=\rwarning:+rainbows+and_unicorns_ahead

(where "\r" is a literal CR), and instead of seeing:

  warning: redirecting to https://evil.example.com/...

you just get:

  warning: rainbows and unicorns ahead

or whatever innocuous looking line they prefer (probably just ANSI
"clear to beginning of line" would be even more effective).

Since it's hard to figure out which error messages could potentially
contain malicious contents, and since spewing control characters to the
terminal is generally bad anyway, this series sanitizes at the lowest
level.

Note that this doesn't cover "remote:" lines coming over the sideband.
Those are already covered for "\r", as we have to parse it to handle
printing "remote:" consistently. But you can play tricks like putting:

  printf '\0331K\033[0Efatal: this looks local\n'

into a pre-receive hook. I'm not sure if we would want to do more
sanitizing there. The goal of this series is not so much that a remote
can't send funny strings that may look local, but that they can't
prevent local strings from being displayed. OTOH, I suspect clever use
of ANSI codes (moving the cursor, clearing lines, etc) could get you
pretty far.

I'd be hesitant to disallow control codes entirely, though, as I suspect
some servers do send colors over the sideband. So I punted on that here,
but I think this is at least an incremental improvement.

  [1/2]: Revert "vreportf: avoid intermediate buffer"
  [2/2]: vreport: sanitize ASCII control chars

 usage.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] Revert "vreportf: avoid intermediate buffer"
  2017-01-11 14:01 [PATCH 0/2] sanitizing error message contents Jeff King
@ 2017-01-11 14:02 ` Jeff King
  2017-01-11 14:02 ` [PATCH 2/2] vreport: sanitize ASCII control chars Jeff King
  2017-01-11 14:07 ` [RFC PATCH 3/2] vreportf: add prefix to each line Jeff King
  2 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2017-01-11 14:02 UTC (permalink / raw)
  To: git

This reverts commit f4c3edc0b156362a92bf9de4f0ec794e90a757fc.

The purpose of that commit was to let us write errors of
arbitrary length to stderr by skipping the intermediate
buffer and sending our varargs straight to fprintf. That
works, but it comes with a downside: we do not get access to
the varargs before they are sent to stderr.

On balance, it's not a good tradeoff. Error messages larger
than our 4K buffer are quite uncommon, and we've lost the
ability to make any modifications to the output (e.g., to
remove non-printable characters).

The only way to have both would be one of:

  1. Write into a dynamic buffer. But this is a bad idea for
     a low-level function that may be called when malloc()
     has failed.

  2. Do our own printf-format varargs parsing. This is too
     complex to be worth the trouble.

Let's just revert that change and go back to a fixed buffer.

Signed-off-by: Jeff King <peff@peff.net>
---
 usage.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/usage.c b/usage.c
index 17f52c1b5..b1cbe6799 100644
--- a/usage.c
+++ b/usage.c
@@ -7,21 +7,13 @@
 #include "cache.h"
 
 static FILE *error_handle;
-static int tweaked_error_buffering;
 
 void vreportf(const char *prefix, const char *err, va_list params)
 {
+	char msg[4096];
 	FILE *fh = error_handle ? error_handle : stderr;
-
-	fflush(fh);
-	if (!tweaked_error_buffering) {
-		setvbuf(fh, NULL, _IOLBF, 0);
-		tweaked_error_buffering = 1;
-	}
-
-	fputs(prefix, fh);
-	vfprintf(fh, err, params);
-	fputc('\n', fh);
+	vsnprintf(msg, sizeof(msg), err, params);
+	fprintf(fh, "%s%s\n", prefix, msg);
 }
 
 static NORETURN void usage_builtin(const char *err, va_list params)
@@ -93,7 +85,6 @@ void set_die_is_recursing_routine(int (*routine)(void))
 void set_error_handle(FILE *fh)
 {
 	error_handle = fh;
-	tweaked_error_buffering = 0;
 }
 
 void NORETURN usagef(const char *err, ...)
-- 
2.11.0.627.gfa6151259


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] vreport: sanitize ASCII control chars
  2017-01-11 14:01 [PATCH 0/2] sanitizing error message contents Jeff King
  2017-01-11 14:02 ` [PATCH 1/2] Revert "vreportf: avoid intermediate buffer" Jeff King
@ 2017-01-11 14:02 ` Jeff King
  2017-01-11 14:07 ` [RFC PATCH 3/2] vreportf: add prefix to each line Jeff King
  2 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2017-01-11 14:02 UTC (permalink / raw)
  To: git

Our error() and die() calls may report messages with
arbitrary data (e.g., filenames or even data from a remote
server). Let's make it harder to cause confusion with
mischievous filenames. E.g., try:

  git rev-parse "$(printf "\rfatal: this argument is too sneaky")" --

or

  git rev-parse "$(printf "\x1b[5mblinky\x1b[0m")" --

Let's block all ASCII control characters, with the exception
of TAB and LF. We use both in our own messages (and we are
necessarily sanitizing the complete output of snprintf here,
as we do not have access to the individual varargs). And TAB
and LF are unlikely to cause confusion (you could put
"\nfatal: sneaky\n" in your filename, but it would at least
not _cover up_ the message leading to it, unlike "\r").

We'll replace the characters with a "?", which is similar to
how "ls" behaves. It might be nice to do something less
lossy, like converting them to "\x" hex codes. But replacing
with a single character makes it easy to do in-place and
without worrying about length limitations. This feature
should kick in rarely enough that the "?" marks are almost
never seen.

We'll leave high-bit characters as-is, as they are likely to
be UTF-8 (though there may be some Unicode mischief you
could cause, which may require further patches).

Signed-off-by: Jeff King <peff@peff.net>
---
 usage.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/usage.c b/usage.c
index b1cbe6799..ad6d2910f 100644
--- a/usage.c
+++ b/usage.c
@@ -12,7 +12,13 @@ void vreportf(const char *prefix, const char *err, va_list params)
 {
 	char msg[4096];
 	FILE *fh = error_handle ? error_handle : stderr;
+	char *p;
+
 	vsnprintf(msg, sizeof(msg), err, params);
+	for (p = msg; *p; p++) {
+		if (iscntrl(*p) && *p != '\t' && *p != '\n')
+			*p = '?';
+	}
 	fprintf(fh, "%s%s\n", prefix, msg);
 }
 
-- 
2.11.0.627.gfa6151259

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [RFC PATCH 3/2] vreportf: add prefix to each line
  2017-01-11 14:01 [PATCH 0/2] sanitizing error message contents Jeff King
  2017-01-11 14:02 ` [PATCH 1/2] Revert "vreportf: avoid intermediate buffer" Jeff King
  2017-01-11 14:02 ` [PATCH 2/2] vreport: sanitize ASCII control chars Jeff King
@ 2017-01-11 14:07 ` Jeff King
  2017-01-11 22:11   ` Junio C Hamano
  2 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2017-01-11 14:07 UTC (permalink / raw)
  To: git

On Wed, Jan 11, 2017 at 09:01:38AM -0500, Jeff King wrote:

>   [1/2]: Revert "vreportf: avoid intermediate buffer"
>   [2/2]: vreport: sanitize ASCII control chars

We've talked before about repeating the "error:" header for multi-line
messages. The reversion in patch 1 makes this easy to play with, so I
did. I kind of hate it. But here it is, for your viewing pleasure.

-- >8 --
Subject: vreportf: add prefix to each line

Some error and warning messages are multi-line, but we put
the prefix only on the first line. This means that either
the subsequent lines don't have any indication that they're
connected to the original error, or the caller has to make
multiple calls to error() or warning(). And that's not even
possible for die().

Instead, let's put the prefix at the start of each line,
just as advise() does.

Note that this patch doesn't update the tests yet, so it
causes tons of failures. This is just to see what it might
look like.

It's actually kind of ugly.  For instance, a failing test in
t3600 now produces:

   error: the following files have staged content different from both the
   error: file and the HEAD:
   error:     bar.txt
   error:     foo.txt
   error: (use -f to force removal)

which seems a bit aggressive.  It also screws up messages
which indent with tabs (the prefix eats up some of the
tabstop, making the indentation smaller). And the result is
a little harder if you need to cut-and-paste the file lines
(if your terminal lets you triple-click to select a whole
line, now you have this "error:" cruft on the front).

It may be possible to address some of that by using some
other kind of continuation marker (instead of just repeating
the prefix), and expanding initial tabs.

Signed-off-by: Jeff King <peff@peff.net>
---
 usage.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/usage.c b/usage.c
index ad6d2910f..8a1f6ff4e 100644
--- a/usage.c
+++ b/usage.c
@@ -8,18 +8,30 @@
 
 static FILE *error_handle;
 
+static void show_line(FILE *fh, const char *prefix, const char *line, int len)
+{
+	fprintf(fh, "%s%.*s\n", prefix, len, line);
+}
+
 void vreportf(const char *prefix, const char *err, va_list params)
 {
 	char msg[4096];
 	FILE *fh = error_handle ? error_handle : stderr;
+	const char *base;
 	char *p;
 
 	vsnprintf(msg, sizeof(msg), err, params);
+
+	base = msg;
 	for (p = msg; *p; p++) {
-		if (iscntrl(*p) && *p != '\t' && *p != '\n')
+		if (*p == '\n') {
+			show_line(fh, prefix, base, p - base);
+			base = p + 1;
+		} else if (iscntrl(*p) && *p != '\t')
 			*p = '?';
 	}
-	fprintf(fh, "%s%s\n", prefix, msg);
+
+	show_line(fh, prefix, base, p - base);
 }
 
 static NORETURN void usage_builtin(const char *err, va_list params)
-- 
2.11.0.627.gfa6151259


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 3/2] vreportf: add prefix to each line
  2017-01-11 14:07 ` [RFC PATCH 3/2] vreportf: add prefix to each line Jeff King
@ 2017-01-11 22:11   ` Junio C Hamano
  2017-01-12  6:59     ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2017-01-11 22:11 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> On Wed, Jan 11, 2017 at 09:01:38AM -0500, Jeff King wrote:
>
>>   [1/2]: Revert "vreportf: avoid intermediate buffer"
>>   [2/2]: vreport: sanitize ASCII control chars
>
> We've talked before about repeating the "error:" header for multi-line
> messages. The reversion in patch 1 makes this easy to play with, so I
> did. I kind of hate it. But here it is, for your viewing pleasure.

Thanks.

>
> -- >8 --
> Subject: vreportf: add prefix to each line
>
> Some error and warning messages are multi-line, but we put
> the prefix only on the first line. This means that either
> the subsequent lines don't have any indication that they're
> connected to the original error, or the caller has to make
> multiple calls to error() or warning(). And that's not even
> possible for die().
>
> Instead, let's put the prefix at the start of each line,
> just as advise() does.
>
> Note that this patch doesn't update the tests yet, so it
> causes tons of failures. This is just to see what it might
> look like.
>
> It's actually kind of ugly.  For instance, a failing test in
> t3600 now produces:
>
>    error: the following files have staged content different from both the
>    error: file and the HEAD:
>    error:     bar.txt
>    error:     foo.txt
>    error: (use -f to force removal)
>
> which seems a bit aggressive.  

I agree that it is ugly, but one reason I was hoping to do this
myself (or have somebody else do it by procrastinating) was that I
thought it may help i18n.  That is, for an original

	error(_("we found an error"))

a .po file may translate the string to a multi-line string that has
LFs in it and the output would look correct.  The translator already
can do so by indenting the second and subsequent lines by the same
column-width as "error: " (or its translation in their language, if
we are going to i18n these headers), but that (1) is extra work for
them, and (2) makes it impossible to have the same message appear in
different contexts (i.e. "error:" vs "warning:" that have different
column-widths).

> It also screws up messages which indent with tabs (the prefix eats
> up some of the tabstop, making the indentation smaller).

This is unavoidable and at the same time is a non-issue, isn't it?
Messages that indent the second and subsequent lines with tabs are
compensating the lack of the multi-line support of vreportf(), which
this RFC patch fixes.  They may need to be adjusted to the new world
order, but that is a good thing.  New multi-line messages no longer
have to worry about the prefix that is added only to the first line
when continuing the message to multiple lines.

> And the result is
> a little harder if you need to cut-and-paste the file lines
> (if your terminal lets you triple-click to select a whole
> line, now you have this "error:" cruft on the front).

This might be an issue, but I personally do not care (as I know how
to drive my "screen", namely, use 'c' while cutting) ;-).

> It may be possible to address some of that by using some
> other kind of continuation marker (instead of just repeating
> the prefix), and expanding initial tabs.

Yes indeed.  The "some other kind of continuation marker" could just
be a run of spaces that fill the same column as the "error: " or
other prefix given to the first line.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  usage.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/usage.c b/usage.c
> index ad6d2910f..8a1f6ff4e 100644
> --- a/usage.c
> +++ b/usage.c
> @@ -8,18 +8,30 @@
>  
>  static FILE *error_handle;
>  
> +static void show_line(FILE *fh, const char *prefix, const char *line, int len)
> +{
> +	fprintf(fh, "%s%.*s\n", prefix, len, line);
> +}
> +
>  void vreportf(const char *prefix, const char *err, va_list params)
>  {
>  	char msg[4096];
>  	FILE *fh = error_handle ? error_handle : stderr;
> +	const char *base;
>  	char *p;
>  
>  	vsnprintf(msg, sizeof(msg), err, params);
> +
> +	base = msg;
>  	for (p = msg; *p; p++) {
> -		if (iscntrl(*p) && *p != '\t' && *p != '\n')
> +		if (*p == '\n') {
> +			show_line(fh, prefix, base, p - base);
> +			base = p + 1;
> +		} else if (iscntrl(*p) && *p != '\t')
>  			*p = '?';
>  	}
> -	fprintf(fh, "%s%s\n", prefix, msg);
> +
> +	show_line(fh, prefix, base, p - base);
>  }
>  
>  static NORETURN void usage_builtin(const char *err, va_list params)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH 3/2] vreportf: add prefix to each line
  2017-01-11 22:11   ` Junio C Hamano
@ 2017-01-12  6:59     ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2017-01-12  6:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wed, Jan 11, 2017 at 02:11:42PM -0800, Junio C Hamano wrote:

> > It's actually kind of ugly.  For instance, a failing test in
> > t3600 now produces:
> >
> >    error: the following files have staged content different from both the
> >    error: file and the HEAD:
> >    error:     bar.txt
> >    error:     foo.txt
> >    error: (use -f to force removal)
> >
> > which seems a bit aggressive.  
> 
> I agree that it is ugly, but one reason I was hoping to do this
> myself (or have somebody else do it by procrastinating) was that I
> thought it may help i18n.  That is, for an original
> 
> 	error(_("we found an error"))
> 
> a .po file may translate the string to a multi-line string that has
> LFs in it and the output would look correct.  The translator already
> can do so by indenting the second and subsequent lines by the same
> column-width as "error: " (or its translation in their language, if
> we are going to i18n these headers), but that (1) is extra work for
> them, and (2) makes it impossible to have the same message appear in
> different contexts (i.e. "error:" vs "warning:" that have different
> column-widths).

Yes, I agree that would be a functional benefit. I'm just hoping we can
do it in a way that is visually pleasing.

> > It also screws up messages which indent with tabs (the prefix eats
> > up some of the tabstop, making the indentation smaller).
> 
> This is unavoidable and at the same time is a non-issue, isn't it?
> Messages that indent the second and subsequent lines with tabs are
> compensating the lack of the multi-line support of vreportf(), which
> this RFC patch fixes.  They may need to be adjusted to the new world
> order, but that is a good thing.  New multi-line messages no longer
> have to worry about the prefix that is added only to the first line
> when continuing the message to multiple lines.

I'm not so sure it is just about compensating. Look at the message
quoted above. The original looks like:

  error: the following files have staged content different from both the
  file and the HEAD:
      bar.txt
      foo.txt
  (use -f to force removal)

The leading whitespace is visually separating the list of files, not
just from the line with "error:", but from the other lines.

Though I think if we replaced tabs with spaces in this instance, then
they would still be bumped relative to the rest of the text.

> > It may be possible to address some of that by using some
> > other kind of continuation marker (instead of just repeating
> > the prefix), and expanding initial tabs.
> 
> Yes indeed.  The "some other kind of continuation marker" could just
> be a run of spaces that fill the same column as the "error: " or
> other prefix given to the first line.

I tried that, along with several other variants, but it gets
confusing/ugly when mixed with indentation that is significant to the
line. For example, the t3600 message becomes:

  error: the following files have staged content different from both the
         file and the HEAD:
             bar.txt
             foo.txt
         (use -f to force removal)

Which is arguably better than what we have now, but still looks pretty
bad to me.

I wonder if it would help for the marker to end in a non-whitespace
character. Like:

  error: the following files have staged content different from both the
       :  file and the HEAD:
       :     bar.txt
       :     foo.txt
       : (use -f to force removal)

or something. The ":" is a bit sparse looking. Maybe there are better
options. That does ruin line-by-line selection for cut-and-paste,
though.

So I dunno. I am open to ideas, but I didn't find one that I really
liked (this patch is actually a leftover from before Christmas, so I
don't even remember all the things I tried, just that I didn't like any
of them ;) ).

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-01-12  6:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-11 14:01 [PATCH 0/2] sanitizing error message contents Jeff King
2017-01-11 14:02 ` [PATCH 1/2] Revert "vreportf: avoid intermediate buffer" Jeff King
2017-01-11 14:02 ` [PATCH 2/2] vreport: sanitize ASCII control chars Jeff King
2017-01-11 14:07 ` [RFC PATCH 3/2] vreportf: add prefix to each line Jeff King
2017-01-11 22:11   ` Junio C Hamano
2017-01-12  6:59     ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).