git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Consider escaping special characters like 'less' does
@ 2017-10-15 13:33 Joris Valette
  2017-10-15 14:15 ` Jason Pyeron
  0 siblings, 1 reply; 13+ messages in thread
From: Joris Valette @ 2017-10-15 13:33 UTC (permalink / raw)
  To: git

The pager 'less' escapes some characters when calling 'git diff'. This
is what I might get:

$ git diff --cached
diff --git a/some_file b/some_file
new file mode 100644
index 0000000..357323f
--- /dev/null
+++ b/some_file
@@ -0,0 +1 @@
+<U+FEFF>Hello
\ No newline at end of file

This example is a simple file encoded in UTF-8 *with BOM*.
On the other hand, the built-in git output shows this:

$ git --no-pager diff --cached
diff --git a/some_file b/some_file
new file mode 100644
index 0000000..357323f
--- /dev/null
+++ b/some_file
@@ -0,0 +1 @@
+Hello
\ No newline at end of file




You can see 'less' shows <U+FEFF> explicitly, while it is implicit and
hidden with git's built-in output.
Other characters behave the same, like ZERO WIDTH SPACE <U+200B>, and
I believe many others.
This is particularly annoying when there are other changes on the same
line. Here I add ' world!' but also remove the BOM:

$ git diff
diff --git a/some_file b/some_file
index 357323f..6769dd6 100644
--- a/some_file
+++ b/some_file
@@ -1 +1 @@
-<U+FEFF>Hello
\ No newline at end of file
+Hello world!
\ No newline at end of file

Compare with:

$ git --no-pager diff
diff --git a/some_file b/some_file
index 357323f..6769dd6 100644
--- a/some_file
+++ b/some_file
@@ -1 +1 @@
-Hello
\ No newline at end of file
+Hello world!
\ No newline at end of file




In the first example I can see both changes, in the second example the
BOM removal is hidden and I won't notice it because there is indeed
another genuine change on the same line.

I'll add a third example with CRLF line-terminators:

$ git diff
diff --git a/some_file b/some_file
index 357323f..dfd6895 100644
--- a/some_file
+++ b/some_file
@@ -1 +1 @@
-<U+FEFF>Hello
\ No newline at end of file
+Hello world!^M

and:

$ git --no-pager diff
diff --git a/some_file b/some_file
index 357323f..dfd6895 100644
--- a/some_file
+++ b/some_file
@@ -1 +1 @@
-Hello
\ No newline at end of file
+Hello world!




My thoughts are that git should add this feature and tend towards
less's behavior. A diff output should show as much as possible.
I would happily use 'less' all the time, but unfortunately I don't
think I can. For instance, 'git add --patch', which I use quite often,
prints through git's built-in output, and I miss less's features.

Joris Valette.

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* RE: Consider escaping special characters like 'less' does
  2017-10-15 13:33 Consider escaping special characters like 'less' does Joris Valette
@ 2017-10-15 14:15 ` Jason Pyeron
  2017-10-15 15:46   ` Andreas Schwab
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Pyeron @ 2017-10-15 14:15 UTC (permalink / raw)
  To: 'Joris Valette', git

> -----Original Message-----
> From: Joris Valette
> Sent: Sunday, October 15, 2017 9:34 AM
> To: git@vger.kernel.org
> Subject: Consider escaping special characters like 'less' does
> 
> The pager 'less' escapes some characters when calling 'git diff'. This
> is what I might get:
> 
> $ git diff --cached
> diff --git a/some_file b/some_file
> new file mode 100644
> index 0000000..357323f
> --- /dev/null
> +++ b/some_file
> @@ -0,0 +1 @@
> +<U+FEFF>Hello
> \ No newline at end of file
> 
> This example is a simple file encoded in UTF-8 *with BOM*.
> On the other hand, the built-in git output shows this:
> 
> $ git --no-pager diff --cached
> diff --git a/some_file b/some_file
> new file mode 100644
> index 0000000..357323f
> --- /dev/null
> +++ b/some_file
> @@ -0,0 +1 @@
> +?Hello
> \ No newline at end of file

It is your terminal, not git's fault that you get a ? rendered.

$ git diff
diff --git a/test.txt b/test.txt
index af9f93a..294e397 100644
--- a/test.txt
+++ b/test.txt
@@ -1 +1 @@
-the quick brown fox jumps over the lazy dog
+<U+FEFF>the quick brown fox jumps over the lazy dog

$ git diff | hexdump.exe -C
00000000  64 69 66 66 20 2d 2d 67  69 74 20 61 2f 74 65 73  |diff --git a/tes|
00000010  74 2e 74 78 74 20 62 2f  74 65 73 74 2e 74 78 74  |t.txt b/test.txt|
00000020  0a 69 6e 64 65 78 20 61  66 39 66 39 33 61 2e 2e  |.index af9f93a..|
00000030  32 39 34 65 33 39 37 20  31 30 30 36 34 34 0a 2d  |294e397 100644.-|
00000040  2d 2d 20 61 2f 74 65 73  74 2e 74 78 74 0a 2b 2b  |-- a/test.txt.++|
00000050  2b 20 62 2f 74 65 73 74  2e 74 78 74 0a 40 40 20  |+ b/test.txt.@@ |
00000060  2d 31 20 2b 31 20 40 40  0a 2d 74 68 65 20 71 75  |-1 +1 @@.-the qu|
00000070  69 63 6b 20 62 72 6f 77  6e 20 66 6f 78 20 6a 75  |ick brown fox ju|
00000080  6d 70 73 20 6f 76 65 72  20 74 68 65 20 6c 61 7a  |mps over the laz|

Note: 0a is newline, 2b is + ef bb bf are the specials added to the file, 74 68 65 20 71 75 is 'the qu'

00000090  79 20 64 6f 67 0a 2b ef  bb bf 74 68 65 20 71 75  |y dog.+...the qu|
000000a0  69 63 6b 20 62 72 6f 77  6e 20 66 6f 78 20 6a 75  |ick brown fox ju|
000000b0  6d 70 73 20 6f 76 65 72  20 74 68 65 20 6c 61 7a  |mps over the laz|
000000c0  79 20 64 6f 67 0a                                 |y dog.|
000000c6

$ git diff | cat
diff --git a/test.txt b/test.txt
index af9f93a..294e397 100644
--- a/test.txt
+++ b/test.txt
@@ -1 +1 @@
-the quick brown fox jumps over the lazy dog
+the quick brown fox jumps over the lazy dog

$ git --no-pager diff
diff --git a/test.txt b/test.txt
index af9f93a..294e397 100644
--- a/test.txt
+++ b/test.txt
@@ -1 +1 @@
-the quick brown fox jumps over the lazy dog
+the quick brown fox jumps over the lazy dog

And my UTF-8 capable terminal displays it fine.

What do you get when you pipe to hexdump?

-Jason


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Consider escaping special characters like 'less' does
  2017-10-15 14:15 ` Jason Pyeron
@ 2017-10-15 15:46   ` Andreas Schwab
  2017-10-15 16:39     ` Joris Valette
  0 siblings, 1 reply; 13+ messages in thread
From: Andreas Schwab @ 2017-10-15 15:46 UTC (permalink / raw)
  To: Jason Pyeron; +Cc: 'Joris Valette', git

On Okt 15 2017, "Jason Pyeron" <jpyeron@pdinc.us> wrote:

>> -----Original Message-----
>> From: Joris Valette
>> Sent: Sunday, October 15, 2017 9:34 AM
>> To: git@vger.kernel.org
>> Subject: Consider escaping special characters like 'less' does
>> 
>> The pager 'less' escapes some characters when calling 'git diff'. This
>> is what I might get:
>> 
>> $ git diff --cached
>> diff --git a/some_file b/some_file
>> new file mode 100644
>> index 0000000..357323f
>> --- /dev/null
>> +++ b/some_file
>> @@ -0,0 +1 @@
>> +<U+FEFF>Hello
>> \ No newline at end of file
>> 
>> This example is a simple file encoded in UTF-8 *with BOM*.
>> On the other hand, the built-in git output shows this:
>> 
>> $ git --no-pager diff --cached
>> diff --git a/some_file b/some_file
>> new file mode 100644
>> index 0000000..357323f
>> --- /dev/null
>> +++ b/some_file
>> @@ -0,0 +1 @@
>> +?Hello
>> \ No newline at end of file
>
> It is your terminal, not git's fault that you get a ? rendered.

It's your MUA's fault that you get a ?, the mail didn't contain any.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Consider escaping special characters like 'less' does
  2017-10-15 15:46   ` Andreas Schwab
@ 2017-10-15 16:39     ` Joris Valette
  2017-10-15 20:06       ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Joris Valette @ 2017-10-15 16:39 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Jason Pyeron, git

2017-10-15 17:46 GMT+02:00 Andreas Schwab <schwab@linux-m68k.org>:
> On Okt 15 2017, "Jason Pyeron" <jpyeron@pdinc.us> wrote:
>
>>> -----Original Message-----
>>> From: Joris Valette
>>> Sent: Sunday, October 15, 2017 9:34 AM
>>> To: git@vger.kernel.org
>>> Subject: Consider escaping special characters like 'less' does
>>>
>>> The pager 'less' escapes some characters when calling 'git diff'. This
>>> is what I might get:
>>>
>>> $ git diff --cached
>>> diff --git a/some_file b/some_file
>>> new file mode 100644
>>> index 0000000..357323f
>>> --- /dev/null
>>> +++ b/some_file
>>> @@ -0,0 +1 @@
>>> +<U+FEFF>Hello
>>> \ No newline at end of file
>>>
>>> This example is a simple file encoded in UTF-8 *with BOM*.
>>> On the other hand, the built-in git output shows this:
>>>
>>> $ git --no-pager diff --cached
>>> diff --git a/some_file b/some_file
>>> new file mode 100644
>>> index 0000000..357323f
>>> --- /dev/null
>>> +++ b/some_file
>>> @@ -0,0 +1 @@
>>> +?Hello
>>> \ No newline at end of file
>>
>> It is your terminal, not git's fault that you get a ? rendered.
>
> It's your MUA's fault that you get a ?, the mail didn't contain any.

Actually, the original mail contained the special BOM sequence but
it's generally invisible. His MUA shows it with a '?', mine doesn't
show anything, neither does Firefox on the mailing list page.

The question remains: could escaping be done?

Joris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Consider escaping special characters like 'less' does
  2017-10-15 16:39     ` Joris Valette
@ 2017-10-15 20:06       ` Jeff King
  2017-10-15 21:37         ` Joris Valette
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2017-10-15 20:06 UTC (permalink / raw)
  To: Joris Valette; +Cc: Andreas Schwab, Jason Pyeron, git

On Sun, Oct 15, 2017 at 06:39:45PM +0200, Joris Valette wrote:

> > It's your MUA's fault that you get a ?, the mail didn't contain any.
> 
> Actually, the original mail contained the special BOM sequence but
> it's generally invisible. His MUA shows it with a '?', mine doesn't
> show anything, neither does Firefox on the mailing list page.
> 
> The question remains: could escaping be done?

Git's diff generation will never do such escaping by default, because it
means creating a patch that cannot be applied to get back the original
content.

There _are_ already options to create diffs that cannot be applied, like
--textconv. So it would be possible to add a similar option for
escaping. But I don't think we really need or want a separate option,
when you can already do one of:

  1. If your files have special binary characters that are hard to see,
     you can use the existing textconv system to do whatever escaping
     you like. And then the Git will diff the result of the escaping,
     which means you get readable diffs when they change.

  2. Put the raw output of git's diff through a filter that escapes. We
     already do this most of the time by piping through less. The most
     noticeable exception is "add --patch". There you can set up a
     program to filter as well. There's more information in a recent
     thread here:

       https://public-inbox.org/git/20171012184736.rglkbyryauwuvn2a@sigill.intra.peff.net/

It doesn't seem out of the question to me to have an out-of-the-box
default for interactive.diffFilter which does some basic escaping (we
could even implement it inside the perl script for efficiency).

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Consider escaping special characters like 'less' does
  2017-10-15 20:06       ` Jeff King
@ 2017-10-15 21:37         ` Joris Valette
  2017-10-16 22:13           ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Joris Valette @ 2017-10-15 21:37 UTC (permalink / raw)
  To: Jeff King; +Cc: Andreas Schwab, Jason Pyeron, git

2017-10-15 22:06 GMT+02:00 Jeff King <peff@peff.net>:
> Git's diff generation will never do such escaping by default, because it
> means creating a patch that cannot be applied to get back the original
> content.

Indeed this would be a problem. That's where my knowledge of git's
source code ends, but in that case, can't the output be discriminated
against the command that was executed?
Command that outputs an applicable patch -> don't escape
Command that outputs a diff to see changes -> escape

> There _are_ already options to create diffs that cannot be applied, like
> --textconv. So it would be possible to add a similar option for
> escaping. But I don't think we really need or want a separate option,
> when you can already do one of:
>
>   1. If your files have special binary characters that are hard to see,
>      you can use the existing textconv system to do whatever escaping
>      you like. And then the Git will diff the result of the escaping,
>      which means you get readable diffs when they change.
>
>   2. Put the raw output of git's diff through a filter that escapes. We
>      already do this most of the time by piping through less. The most
>      noticeable exception is "add --patch". There you can set up a
>      program to filter as well. There's more information in a recent
>      thread here:
>
>        https://public-inbox.org/git/20171012184736.rglkbyryauwuvn2a@sigill.intra.peff.net/
>
> It doesn't seem out of the question to me to have an out-of-the-box
> default for interactive.diffFilter which does some basic escaping (we
> could even implement it inside the perl script for efficiency).

Yes I read this thread, but I was left unsatisfied because I would
like something out-of-the-box.
Your suggestion might be the best solution then: implement some
default escaping for interactive.diffFilter.

Joris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Consider escaping special characters like 'less' does
  2017-10-15 21:37         ` Joris Valette
@ 2017-10-16 22:13           ` Jeff King
  2017-10-16 22:47             ` Andreas Schwab
                               ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Jeff King @ 2017-10-16 22:13 UTC (permalink / raw)
  To: Joris Valette; +Cc: Andreas Schwab, Jason Pyeron, git

On Sun, Oct 15, 2017 at 11:37:04PM +0200, Joris Valette wrote:

> 2017-10-15 22:06 GMT+02:00 Jeff King <peff@peff.net>:
> > Git's diff generation will never do such escaping by default, because it
> > means creating a patch that cannot be applied to get back the original
> > content.
> 
> Indeed this would be a problem. That's where my knowledge of git's
> source code ends, but in that case, can't the output be discriminated
> against the command that was executed?
> Command that outputs an applicable patch -> don't escape
> Command that outputs a diff to see changes -> escape

Speaking in a general sense, people use "git diff" for both purposes,
and we can't necessarily tell which[1]. As a matter of fact, that's a
potential problem with textconv filters as well (which are enabled by
default for git-diff, but not for plumbing like diff-tree, format-patch,
etc). I think the feature isn't widely used enough for people to run
into problems (and they also use it only on things that they don't
_expect_ to be able to make patches for, since they're binaries).

[1] Of course we can come up with heuristics, like "is stdout going to a
    a terminal"? But in such a case we already kick in the pager, and it
    does the exact escaping you're asking for. :)

For a command like add--interactive, it knows which invocations are for
showing to the user and which are for applying (and it already uses
"--color" selectively, for instance). So if there were an "escape this"
option in git's diff generation, we could selectively pass it. But
again, I think the right solution there is not building the escaping
into Git, but just passing it through another filter.

> > It doesn't seem out of the question to me to have an out-of-the-box
> > default for interactive.diffFilter which does some basic escaping (we
> > could even implement it inside the perl script for efficiency).
> 
> Yes I read this thread, but I was left unsatisfied because I would
> like something out-of-the-box.
> Your suggestion might be the best solution then: implement some
> default escaping for interactive.diffFilter.

Alternatively, I suppose we could just always escape in
add--interactive. I'm trying to think of a case where somebody would
really want their diffFilter to see the raw bytes (or vice versa, to
show raw bytes produced by their filter), but I'm having trouble coming
up with one.

I.e., something like this would probably help your case without hurting
anybody:

diff --git a/git-add--interactive.perl b/git-add--interactive.perl
index 28b325d754..d44e5ea459 100755
--- a/git-add--interactive.perl
+++ b/git-add--interactive.perl
@@ -714,6 +714,16 @@ sub parse_diff {
 		push @{$hunk[-1]{DISPLAY}},
 			(@colored ? $colored[$i] : $diff[$i]);
 	}
+
+	foreach my $hunk (@hunk) {
+		foreach my $line (@{$hunk->{DISPLAY}}) {
+			# all control chars minus newline and ESC (for color)
+			if ($line =~ s/[\000-\011\013-\032\034-\037]/?/g) {
+				$hunk->{CONTROLCHARS} = 1;
+			}
+		}
+	}
+
 	return @hunk;
 }
 
@@ -1407,6 +1417,9 @@ sub patch_update_file {
 		if ($hunk[$ix]{TYPE} eq 'hunk') {
 			$other .= ',e';
 		}
+		if ($hunk[$ix]->{CONTROLCHARS}) {
+			print "warning: control characters in hunk have been replaced by '?'\n";
+		}
 		for (@{$hunk[$ix]{DISPLAY}}) {
 			print;
 		}

I can't help but feel this is the tip of a larger iceberg, though. E.g.,
what about characters outside of the terminal's correct encoding? Or
broken UTF-8 characters?

-Peff

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Consider escaping special characters like 'less' does
  2017-10-16 22:13           ` Jeff King
@ 2017-10-16 22:47             ` Andreas Schwab
  2017-10-16 22:58               ` Jeff King
  2017-10-17  1:13             ` Junio C Hamano
  2017-10-22 12:27             ` Jason Pyeron
  2 siblings, 1 reply; 13+ messages in thread
From: Andreas Schwab @ 2017-10-16 22:47 UTC (permalink / raw)
  To: Jeff King; +Cc: Joris Valette, Jason Pyeron, git

On Okt 16 2017, Jeff King <peff@peff.net> wrote:

> I can't help but feel this is the tip of a larger iceberg, though. E.g.,
> what about characters outside of the terminal's correct encoding? Or
> broken UTF-8 characters?

Or correctly encoded UTF-8 characters that look confusing?  Or blobs
with embedded escape sequences?

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Consider escaping special characters like 'less' does
  2017-10-16 22:47             ` Andreas Schwab
@ 2017-10-16 22:58               ` Jeff King
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2017-10-16 22:58 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Joris Valette, Jason Pyeron, git

On Tue, Oct 17, 2017 at 12:47:01AM +0200, Andreas Schwab wrote:

> On Okt 16 2017, Jeff King <peff@peff.net> wrote:
> 
> > I can't help but feel this is the tip of a larger iceberg, though. E.g.,
> > what about characters outside of the terminal's correct encoding? Or
> > broken UTF-8 characters?
> 
> Or correctly encoded UTF-8 characters that look confusing?  Or blobs
> with embedded escape sequences?

Yes, leaving ESC unfiltered here made me hesitate, too. We must allow it
to show colors, but showing diffs with raw terminal codes can be quite
confusing.

My general advice on committing unprintable characters is: don't.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Consider escaping special characters like 'less' does
  2017-10-16 22:13           ` Jeff King
  2017-10-16 22:47             ` Andreas Schwab
@ 2017-10-17  1:13             ` Junio C Hamano
  2017-10-17  2:45               ` Jeff King
  2017-10-22 12:27             ` Jason Pyeron
  2 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2017-10-17  1:13 UTC (permalink / raw)
  To: Jeff King; +Cc: Joris Valette, Andreas Schwab, Jason Pyeron, git

Jeff King <peff@peff.net> writes:

> Alternatively, I suppose we could just always escape in
> add--interactive. I'm trying to think of a case where somebody would
> really want their diffFilter to see the raw bytes (or vice versa, to
> show raw bytes produced by their filter), but I'm having trouble coming
> up with one.

Your patch below only implements the "tame the raw bytes that come
out of their filter", which is quite agreeable.

> I.e., something like this would probably help your case without hurting
> anybody:
> ...
>
> I can't help but feel this is the tip of a larger iceberg, though. E.g.,
> what about characters outside of the terminal's correct encoding? Or
> broken UTF-8 characters?

Hmph.  If you use it as a "built-in" that is a fallback for
diffFilter, i.e. use it only when the end user does not have one,
then users can override whatever wrong thing the built-in logic does
so... ;-)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Consider escaping special characters like 'less' does
  2017-10-17  1:13             ` Junio C Hamano
@ 2017-10-17  2:45               ` Jeff King
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2017-10-17  2:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Joris Valette, Andreas Schwab, Jason Pyeron, git

On Tue, Oct 17, 2017 at 10:13:34AM +0900, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Alternatively, I suppose we could just always escape in
> > add--interactive. I'm trying to think of a case where somebody would
> > really want their diffFilter to see the raw bytes (or vice versa, to
> > show raw bytes produced by their filter), but I'm having trouble coming
> > up with one.
> 
> Your patch below only implements the "tame the raw bytes that come
> out of their filter", which is quite agreeable.

Yes. I think that is probably OK, especially given that we continue to
allow terminal escapes (certainly some filters would want to use colors;
I don't know if any would want to use more exotic control codes).

> > I can't help but feel this is the tip of a larger iceberg, though. E.g.,
> > what about characters outside of the terminal's correct encoding? Or
> > broken UTF-8 characters?
> 
> Hmph.  If you use it as a "built-in" that is a fallback for
> diffFilter, i.e. use it only when the end user does not have one,
> then users can override whatever wrong thing the built-in logic does
> so... ;-)

Yes, and maybe that is the best way to do it. It just seems like it is
opening a can of worms about exactly which things should be filtered and
how.

I also wondered if people would be annoyed that by using a filter, they
don't get the benefit of the escaping, unless their filter implements it
separately on top (and the original purpose of the filter option was for
things like diff-highlight and diff-so-fancy, which do not do such
escaping).

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: Consider escaping special characters like 'less' does
  2017-10-16 22:13           ` Jeff King
  2017-10-16 22:47             ` Andreas Schwab
  2017-10-17  1:13             ` Junio C Hamano
@ 2017-10-22 12:27             ` Jason Pyeron
  2017-10-23  0:45               ` Jeff King
  2 siblings, 1 reply; 13+ messages in thread
From: Jason Pyeron @ 2017-10-22 12:27 UTC (permalink / raw)
  To: 'Jeff King', 'Joris Valette'
  Cc: 'Andreas Schwab', git

> -----Original Message-----
> From: Jeff King
> Sent: Monday, October 16, 2017 6:13 PM
> To: Joris Valette
> Cc: Andreas Schwab; Jason Pyeron; git@vger.kernel.org
> Subject: Re: Consider escaping special characters like 'less' does
> 
<snip/>
> 
> I.e., something like this would probably help your case 
> without hurting
> anybody:
> 
> diff --git a/git-add--interactive.perl b/git-add--interactive.perl
> index 28b325d754..d44e5ea459 100755
> --- a/git-add--interactive.perl
> +++ b/git-add--interactive.perl
> @@ -714,6 +714,16 @@ sub parse_diff {
>  		push @{$hunk[-1]{DISPLAY}},
>  			(@colored ? $colored[$i] : $diff[$i]);
>  	}
> +
> +	foreach my $hunk (@hunk) {
> +		foreach my $line (@{$hunk->{DISPLAY}}) {
> +			# all control chars minus newline and 
> ESC (for color)
> +			if ($line =~ s/[\000-\011\013-\032\034-\037]/?/g) {

What about CR [0x0D] ?

> +				$hunk->{CONTROLCHARS} = 1;
> +			}
> +		}
> +	}
> +
>  	return @hunk;
>  }
>  
> @@ -1407,6 +1417,9 @@ sub patch_update_file {
>  		if ($hunk[$ix]{TYPE} eq 'hunk') {
>  			$other .= ',e';
>  		}
> +		if ($hunk[$ix]->{CONTROLCHARS}) {
> +			print "warning: control characters in 
> hunk have been replaced by '?'\n";
> +		}
>  		for (@{$hunk[$ix]{DISPLAY}}) {
>  			print;
>  		}
> 
> I can't help but feel this is the tip of a larger iceberg, 
> though. E.g.,
> what about characters outside of the terminal's correct encoding? Or
> broken UTF-8 characters?
> 
> -Peff
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Consider escaping special characters like 'less' does
  2017-10-22 12:27             ` Jason Pyeron
@ 2017-10-23  0:45               ` Jeff King
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2017-10-23  0:45 UTC (permalink / raw)
  To: Jason Pyeron; +Cc: 'Joris Valette', 'Andreas Schwab', git

On Sun, Oct 22, 2017 at 08:27:20AM -0400, Jason Pyeron wrote:

> > ESC (for color)
> > +			if ($line =~ s/[\000-\011\013-\032\034-\037]/?/g) {
> 
> What about CR [0x0D] ?

I assumed that CR was one of the things we'd want to avoid (and it was
in fact what I used to test this). E.g., try:

  echo base >file
  git add file
  printf 'foo\r+bar\n' >file

Running through "less" shows something like:

  diff --git a/file b/file
  index df967b9..5b6ee80 100644
  --- a/file
  +++ b/file
  @@ -1 +1 @@
  -base
  +foo^M+bar

but "git add -p" shows:

  $ git add -p
  diff --git a/file b/file
  index df967b9..5b6ee80 100644
  --- a/file
  +++ b/file
  @@ -1 +1 @@
  -base
  +bar
  Stage this hunk [y,n,q,a,d,/,e,?]? 

For systems where CRLF is the native line ending, we'd still expect to
see only LFs here when line-ending conversion is on (since the diff
shows the canonical in-repo form).

For files where the CRs must remain for some reason, they'd show as a
"?" at the end. The "^M" shown by "less" is a bit more informative. If
we really do want to pursue this direction, it might make sense to use
more descriptive placeholders.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-10-23  0:45 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-15 13:33 Consider escaping special characters like 'less' does Joris Valette
2017-10-15 14:15 ` Jason Pyeron
2017-10-15 15:46   ` Andreas Schwab
2017-10-15 16:39     ` Joris Valette
2017-10-15 20:06       ` Jeff King
2017-10-15 21:37         ` Joris Valette
2017-10-16 22:13           ` Jeff King
2017-10-16 22:47             ` Andreas Schwab
2017-10-16 22:58               ` Jeff King
2017-10-17  1:13             ` Junio C Hamano
2017-10-17  2:45               ` Jeff King
2017-10-22 12:27             ` Jason Pyeron
2017-10-23  0:45               ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).