git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Matthew Coleman <matt@1eanda.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: "Stephon Harris" <theonestep4@gmail.com>,
	git@vger.kernel.org, "Duy Nguyen" <pclouds@gmail.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>
Subject: Re: [PATCH] specify encoding for sed command
Date: Thu, 12 Apr 2018 18:12:12 -0400	[thread overview]
Message-ID: <CBA1FB4A-C586-48E0-A64E-371CCD2F6AC4@1eanda.com> (raw)
In-Reply-To: <3FE7BFB6-769A-4F11-9C3B-86D681B3502F@gmail.com>

I did a little more digging into this issue today.

> On Apr 11, 2018, at 4:42 PM, Matt Coleman <cat.moleman@gmail.com> wrote:
> 
> I found another (possibly better) way to fix this:
> 
>> On Apr 10, 2018, at 3:18 AM, Matt Coleman <cat.moleman@gmail.com> wrote:
>> 
>>> 1) What platform OS / version / sed version is this on?
>> I'm experiencing this on macOS Sierra (10.12.6). The issue occurs with the OS's native sed, which is FreeBSD sed so the version number is kind of ambiguous.
>> 
>> The error goes away if I set LANG=C or LC_ALL=C or change it to use GNU sed (installed via homebrew as gsed).
> 
> If I change it to use awk instead of sed, it works with mawk, gawk, and macOS awk:
> unset $(set | awk -F '=' '/^__gitcomp_builtin_[a-zA-Z0-9_][a-zA-Z0-9_]*=/ {print $1}') 2>/dev/null
> 
> I compared sed vs. awk on Linux and Mac and they all take about the same amount of time to run (within 0.002ms).

The issue isn't actually caused by `sed`: it's caused by the way that the `set` builtin outputs characters in the Unicode Private Use Area (PUA) in the build of Bash 3.2.57 that macOS Sierra ships with.

Powerline uses several PUA code points to make some of its pretty text UI elements:

Code point (hex value): description
U+E0A0 (0xEE82A0): Version control branch
U+E0A1 (0xEE82A1): LN (line) symbol
U+E0A2 (0xEE82A2): Closed padlock
U+E0B0 (0xEE82B0): Rightwards black arrowhead
U+E0B1 (0xEE82B1): Rightwards arrowhead
U+E0B2 (0xEE82B2): Leftwards black arrowhead
U+E0B3 (0xEE82B3): Leftwards arrowhead

macOS Bash 3.2.57's `set` builtin has garbled output where Powerline's special symbols should be in the PS1 variable, but Bash 4.4.19 (installed on macOS via homebrew) and Bash 4.3.38 (Ubuntu 16.04) both display it correctly in the output of `set`. `echo $PS1` does display the symbols correctly on these versions of Bash.

So...

> 3) There's other invocations of "sed" in the file, aren't those affected as well?
The short answer: no. Slightly longer: not by the same thing that's affecting the line in the patch, at least.

Long: As described above, the problem isn't actually `sed`: it's the `set` builtin in macOS's build of Bash. The other invocations of `sed` should be safe, because `sed` properly handles the PUA code points on its own:

$ echo $'\xee\x82\xb0' | sed 's/./@/'
@

The way that `set` is displaying the PS1 variable seems to be sending them to `sed` individually or somehow split up:

$ for character in $'\xee' $'\x82' $'\xb0' $'\xee\x82' $'\x82\xb0'; do echo $character | sed 's/./@/'; done
sed: RE error: illegal byte sequence
sed: RE error: illegal byte sequence
sed: RE error: illegal byte sequence
sed: RE error: illegal byte sequence
sed: RE error: illegal byte sequence

Interestingly, Bash 3.2.25's `set` builtin on CentOS 5 correctly displays the octal values for the symbols (prompt edited to keep this email ASCII):

$ PS1=$'\xee\x82\xb0 '
>  set | grep PS1
PS1=$'\356\202\260 '

I haven't started digging through Bash's codebase, but it could either be a bug that was introduced between 3.2.25 and 3.2.57 or Apple used some silly flags when compiling Bash. Ideally, this should be fixed in Bash, but since Apple's using such an old version of Bash for license reasons, I think it's unlikely that they'll fix the issue.

I think the best way to move forward with this is a new patch that uses `awk` instead of `sed`: I tested several `awk` variants and the command was portable without requiring any changes to LANG or LC_ALL.

Does that sound like a good plan?

  reply	other threads:[~2018-04-12 22:20 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-05  3:10 [PATCH] specify encoding for sed command Stephon Harris
2018-04-05  6:53 ` Ævar Arnfjörð Bjarmason
2018-04-05  9:42   ` Eric Sunshine
2018-04-05  9:43   ` SZEDER Gábor
2018-04-10  7:18   ` Matt Coleman
2018-04-11 20:42     ` Matt Coleman
2018-04-12 22:12       ` Matthew Coleman [this message]
2018-04-13  0:01         ` SZEDER Gábor
2018-04-13  3:00           ` Matthew Coleman
2018-04-13 10:30             ` [PATCH] completion: reduce overhead of clearing cached --options SZEDER Gábor
2018-04-13 21:44               ` Jakub Narebski
2018-04-13 22:23                 ` SZEDER Gábor
2018-04-14 13:27                   ` Jakub Narebski
2018-04-16 18:23                     ` Jacob Keller
2018-04-16 20:35                       ` Matthew Coleman
2018-04-16  5:10                   ` Junio C Hamano
2018-04-16 13:15                     ` SZEDER Gábor
2018-04-16 13:29                       ` Jakub Narębski
2018-04-16 22:43                         ` Junio C Hamano
2018-04-17 22:02                           ` [PATCH v2] " SZEDER Gábor
2018-04-17 23:45                             ` Junio C Hamano
2018-05-07 15:05                               ` Matthew Coleman
2018-05-08  2:28                                 ` Todd Zullinger
2018-05-08  3:28                                   ` Junio C Hamano
2018-06-07  5:48                             ` Jonathan Nieder
2018-06-07 12:07                               ` Dave Borowitz
2018-06-07 12:41                               ` Rick van Hattem
2018-06-08 21:16                               ` SZEDER Gábor
2018-06-08 21:23                                 ` Jonathan Nieder
2018-06-08 21:41                                   ` SZEDER Gábor
2018-06-08 21:52                                     ` Jonathan Nieder
2018-06-08 21:58                                       ` SZEDER Gábor
2018-06-11 17:53                                   ` Junio C Hamano
2018-06-11 18:20                                 ` [PATCH] completion: correct zsh detection when run from git-completion.zsh (Re: [PATCH v2] completion: reduce overhead of clearing cached --options) Jonathan Nieder
2018-06-12  8:46                                   ` SZEDER Gábor
2018-06-12  9:48                                   ` SZEDER Gábor
2018-06-12  9:51                                   ` Rick van Hattem
2018-04-08 23:17 ` [PATCH] specify encoding for sed command Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CBA1FB4A-C586-48E0-A64E-371CCD2F6AC4@1eanda.com \
    --to=matt@1eanda.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=szeder.dev@gmail.com \
    --cc=theonestep4@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).