git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Broken interactive rebase text after some UTF-8 characters
@ 2019-01-15 19:29 Michal Nowak
  2019-01-16 10:33 ` Phillip Wood
  0 siblings, 1 reply; 16+ messages in thread
From: Michal Nowak @ 2019-01-15 19:29 UTC (permalink / raw)
  To: git

Hello,

on OpenIndiana 2018.10 (illumos kernel) line of the interactive rebase 
text after a particular name (Gergő Mihály Doma) is broken:

pick 1ea94c756c 10202 loader: use screen-#rows to find bottom left 
co-ordinates Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: 
Gergő Mihály Doma <domag02@gmail.com> Approved by: Robert Mustacchi 
<rm@joyent.com>
p
pick cadd68ea00 10078 smatch fixes for UCB Reviewed by: Andy Fiddaman 
<omnios@citrus-it.net> Reviewed by: Toomas Soome <tsoome@me.com> 
Reviewed by: Peter Tribble <peter.tribble@gmail.com> Approved by: Robert 
Mustacchi <rm@joyent.com>


Source: https://github.com/illumos/illumos-gate.git

See the second item which only has "p".

This is with LC_ALL=en_US.UTF-8 (same with cs_CZ.UTF-8), with LC_ALL=C 
the text is fine.

I checked various editors (vim, nano, cat), they are all the same.

`git log` (PAGER is set to `/usr/bin/less -ins`) shows the particular 
commit correctly.

I tested following git version: 2.16.5, 2.19.2, 2.20.1, git.next branch.

Here's the configure output: 
https://paste.ec/paste/oDdydNxZ#r3avan8BL+8lldFMo928cw3eXSServTsSkGPW4jRBOd

Michal

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Broken interactive rebase text after some UTF-8 characters
  2019-01-15 19:29 Broken interactive rebase text after some UTF-8 characters Michal Nowak
@ 2019-01-16 10:33 ` Phillip Wood
  2019-01-16 21:36   ` Michal Nowak
  0 siblings, 1 reply; 16+ messages in thread
From: Phillip Wood @ 2019-01-16 10:33 UTC (permalink / raw)
  To: Michal Nowak, git

Dear Michal

Thanks for the bug report, unfortunately I'm unable to reproduce it here 
using git 2.20.1 (see below). Knowing a little about how the todo list 
is created I cannot think how multibyte characters would break it. What 
command line were you using to start the rebase?

Best Wishes

Phillip

$ git clone --shallow-since='11-1-2019' 
https://github.com/illumos/illumos-gate.git
Cloning into 'illumos-gate'...
remote: Enumerating objects: 54348, done.
remote: Counting objects: 100% (54348/54348), done.
remote: Compressing objects: 100% (39008/39008), done.
remote: Total 54348 (delta 19671), reused 31187 (delta 11884), pack-reused 0
Receiving objects: 100% (54348/54348), 148.91 MiB | 1.18 MiB/s, done.
Resolving deltas: 100% (19671/19671), done.
Checking out files: 100% (47352/47352), done.

$ cd ~/src/illumos-gate
$ git log --oneline -1
f482e26c (HEAD -> master, origin/master, origin/HEAD) 10233 dboot: 
process_module() is missing newline at the end of the string Reviewed 
by: Sebastian Wiedenroth <wiedi@frubar.net> Reviewed by: Jason King 
<jbk@joyent.com> Reviewed by: Andy Stormont 
<astormont@racktopsystems.com> Approved by: Robert Mustacchi <rm@joyent.com>

$ GIT_SEQUENCE_EDITOR=cat git rebase -i 
08487eea68a2fa501b5042131c6db068089f82e1

hint: Waiting for your editor to close the file...
pick 1ea94c75 10202 loader: use screen-#rows to find bottom left 
co-ordinates Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: 
Gergő Mihály Doma <domag02@gmail.com> Approved by: Robert Mustacchi 
<rm@joyent.com>
pick cadd68ea 10078 smatch fixes for UCB Reviewed by: Andy Fiddaman 
<omnios@citrus-it.net> Reviewed by: Toomas Soome <tsoome@me.com> 
Reviewed by: Peter Tribble <peter.tribble@gmail.com> Approved by: Robert 
Mustacchi <rm@joyent.com>
pick 05ede3db 10079 smatch Makefile changes for usr/src/lib Reviewed by: 
Andy Fiddaman <andy@omniosce.org> Approved by: Robert Mustacchi 
<rm@joyent.com>
pick 5661bb76 10080 smatch Makefile changes for usr/src/cmd Reviewed by: 
Andy Fiddaman <andy@omniosce.org> Approved by: Robert Mustacchi 
<rm@joyent.com>
pick 15c07adc 10081 smatch indenting fixes for usr/src/uts Reviewed by: 
Toomas Soome <tsoome@me.com> Reviewed by: Peter Tribble 
<peter.tribble@gmail.com> Reviewed by: Andy Fiddaman <andy@omniosce.org> 
Approved by: Robert Mustacchi <rm@joyent.com>
pick c653bb47 10197 build smatch in parallel Reviewed by: Andy Fiddaman 
<andy@omniosce.org> Approved by: Robert Mustacchi <rm@joyent.com>
pick 161294fe 10217 mdb: r9w isn't r8w 10218 CONV_CAP_VAL_HW2_BUFSIZE 
wasn't updated 10219 AV_386_2_AVX512VPOPCDQ elfcap name should be 
consistent Reviewed by: Dan McDonald <danmcd@joyent.com> Reviewed by: 
Rob Johnston <rob.johnston@joyent.com> Reviewed by: Patrick Mooney 
<patrick.mooney@joyent.com> Reviewed by: Andy Fiddaman 
<andy@omniosce.org> Approved by: Dan McDonald <danmcd@joyent.com>
pick f482e26c 10233 dboot: process_module() is missing newline at the 
end of the string Reviewed by: Sebastian Wiedenroth <wiedi@frubar.net> 
Reviewed by: Jason King <jbk@joyent.com> Reviewed by: Andy Stormont 
<astormont@racktopsystems.com> Approved by: Robert Mustacchi <rm@joyent.com>

# Rebase 08487eea..f482e26c onto 08487eea (8 commands)
#
# Commands:
# p, pick <commit> = use commit
# r, reword <commit> = use commit, but edit the commit message
# e, edit <commit> = use commit, but stop for amending
# s, squash <commit> = use commit, but meld into previous commit
# f, fixup <commit> = like "squash", but discard this commit's log message
# x, exec <command> = run command (the rest of the line) using shell
# b, break = stop here (continue rebase later with 'git rebase --continue')
# d, drop <commit> = remove commit
# l, label <label> = label current HEAD with a name
# t, reset <label> = reset HEAD to a label
# m, merge [-C <commit> | -c <commit>] <label> [# <oneline>]
# .       create a merge commit using the original merge commit's
# .       message (or the oneline, if no original merge commit was
# .       specified). Use -c <commit> to reword the commit message.
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out
Successfully rebased and updated refs/heads/master.


On 15/01/2019 19:29, Michal Nowak wrote:
> Hello,
> 
> on OpenIndiana 2018.10 (illumos kernel) line of the interactive rebase 
> text after a particular name (Gergő Mihály Doma) is broken:
> 
> pick 1ea94c756c 10202 loader: use screen-#rows to find bottom left 
> co-ordinates Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: 
> Gergő Mihály Doma <domag02@gmail.com> Approved by: Robert Mustacchi 
> <rm@joyent.com>
> p
> pick cadd68ea00 10078 smatch fixes for UCB Reviewed by: Andy Fiddaman 
> <omnios@citrus-it.net> Reviewed by: Toomas Soome <tsoome@me.com> 
> Reviewed by: Peter Tribble <peter.tribble@gmail.com> Approved by: Robert 
> Mustacchi <rm@joyent.com>
> 
> 
> Source: https://github.com/illumos/illumos-gate.git
> 
> See the second item which only has "p".
> 
> This is with LC_ALL=en_US.UTF-8 (same with cs_CZ.UTF-8), with LC_ALL=C 
> the text is fine.
> 
> I checked various editors (vim, nano, cat), they are all the same.
> 
> `git log` (PAGER is set to `/usr/bin/less -ins`) shows the particular 
> commit correctly.
> 
> I tested following git version: 2.16.5, 2.19.2, 2.20.1, git.next branch.
> 
> Here's the configure output: 
> https://paste.ec/paste/oDdydNxZ#r3avan8BL+8lldFMo928cw3eXSServTsSkGPW4jRBOd
> 
> Michal
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Broken interactive rebase text after some UTF-8 characters
  2019-01-16 10:33 ` Phillip Wood
@ 2019-01-16 21:36   ` Michal Nowak
  2019-01-17 11:04     ` Phillip Wood
  0 siblings, 1 reply; 16+ messages in thread
From: Michal Nowak @ 2019-01-16 21:36 UTC (permalink / raw)
  To: phillip.wood, git

Hello Phillip, thanks for reaching out.

This is what I see with your suggested command:


$ GIT_SEQUENCE_EDITOR=cat git rebase -i 
08487eea68a2fa501b5042131c6db068089f82e1
hint: Waiting for your editor to close the file... pick 1ea94c756c 10202 
loader: use screen-#rows to find bottom left co-ordinates Reviewed by: 
Toomas Soome <tsoome@me.com> Reviewed by: Gergő Mihály Doma 
<domag02@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com>
p
pick cadd68ea00 10078 smatch fixes for UCB Reviewed by: Andy Fiddaman 
<omnios@citrus-it.net> Reviewed by: Toomas Soome <tsoome@me.com> 
Reviewed by: Peter Tribble <peter.tribble@gmail.com> Approved by: Robert 
Mustacchi <rm@joyent.com>
pick 05ede3db5e 10079 smatch Makefile changes for usr/src/lib Reviewed 
by: Andy Fiddaman <andy@omniosce.org> Approved by: Robert Mustacchi 
<rm@joyent.com>
pick 5661bb7641 10080 smatch Makefile changes for usr/src/cmd Reviewed 
by: Andy Fiddaman <andy@omniosce.org> Approved by: Robert Mustacchi 
<rm@joyent.com>
pick 15c07adc1c 10081 smatch indenting fixes for usr/src/uts Reviewed 
by: Toomas Soome <tsoome@me.com> Reviewed by: Peter Tribble 
<peter.tribble@gmail.com> Reviewed by: Andy Fiddaman <andy@omniosce.org> 
Approved by: Robert Mustacchi <rm@joyent.com>
pick c653bb4713 10197 build smatch in parallel Reviewed by: Andy 
Fiddaman <andy@omniosce.org> Approved by: Robert Mustacchi <rm@joyent.com>

# Rebase 08487eea68..c653bb4713 onto 08487eea68 (6 commands)
#
# Commands:
# p, pick <commit> = use commit
# r, reword <commit> = use commit, but edit the commit message
# e, edit <commit> = use commit, but stop for amending
# s, squash <commit> = use commit, but meld into previous commit
# f, fixup <commit> = like "squash", but discard this commit's log message
# x, exec <command> = run command (the rest of the line) using shell
# b, break = stop here (continue rebase later with 'git rebase --continue')
# d, drop <commit> = remove commit
# l, label <label> = label current HEAD with a name
# t, reset <label> = reset HEAD to a label
# m, merge [-C <commit> | -c <commit>] <label> [# <oneline>]
# .       create a merge commit using the original merge commit's
# .       message (or the oneline, if no original merge commit was
# .       specified). Use -c <commit> to reword the commit message.
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out
error: missing arguments for pick
error: invalid line 2: p
You can fix this with 'git rebase --edit-todo' and then run 'git rebase 
--continue'.
Or you can abort the rebase with 'git rebase --abort'.


Let me know, if there's anything I can try myself to gather more 
information. Should you want to reproduce this on OpenIndiana on your 
own, Vagrant is the more straightforward way to deploy OpenIndiana.

Thanks,
Michal

On 01/16/19 11:33 AM, Phillip Wood wrote:
> Dear Michal
> 
> Thanks for the bug report, unfortunately I'm unable to reproduce it here 
> using git 2.20.1 (see below). Knowing a little about how the todo list 
> is created I cannot think how multibyte characters would break it. What 
> command line were you using to start the rebase?
> 
> Best Wishes
> 
> Phillip
> 
> $ git clone --shallow-since='11-1-2019' 
> https://github.com/illumos/illumos-gate.git
> Cloning into 'illumos-gate'...
> remote: Enumerating objects: 54348, done.
> remote: Counting objects: 100% (54348/54348), done.
> remote: Compressing objects: 100% (39008/39008), done.
> remote: Total 54348 (delta 19671), reused 31187 (delta 11884), 
> pack-reused 0
> Receiving objects: 100% (54348/54348), 148.91 MiB | 1.18 MiB/s, done.
> Resolving deltas: 100% (19671/19671), done.
> Checking out files: 100% (47352/47352), done.
> 
> $ cd ~/src/illumos-gate
> $ git log --oneline -1
> f482e26c (HEAD -> master, origin/master, origin/HEAD) 10233 dboot: 
> process_module() is missing newline at the end of the string Reviewed 
> by: Sebastian Wiedenroth <wiedi@frubar.net> Reviewed by: Jason King 
> <jbk@joyent.com> Reviewed by: Andy Stormont 
> <astormont@racktopsystems.com> Approved by: Robert Mustacchi 
> <rm@joyent.com>
> 
> $ GIT_SEQUENCE_EDITOR=cat git rebase -i 
> 08487eea68a2fa501b5042131c6db068089f82e1
> 
> hint: Waiting for your editor to close the file...
> pick 1ea94c75 10202 loader: use screen-#rows to find bottom left 
> co-ordinates Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: 
> Gergő Mihály Doma <domag02@gmail.com> Approved by: Robert Mustacchi 
> <rm@joyent.com>
> pick cadd68ea 10078 smatch fixes for UCB Reviewed by: Andy Fiddaman 
> <omnios@citrus-it.net> Reviewed by: Toomas Soome <tsoome@me.com> 
> Reviewed by: Peter Tribble <peter.tribble@gmail.com> Approved by: Robert 
> Mustacchi <rm@joyent.com>
> pick 05ede3db 10079 smatch Makefile changes for usr/src/lib Reviewed by: 
> Andy Fiddaman <andy@omniosce.org> Approved by: Robert Mustacchi 
> <rm@joyent.com>
> pick 5661bb76 10080 smatch Makefile changes for usr/src/cmd Reviewed by: 
> Andy Fiddaman <andy@omniosce.org> Approved by: Robert Mustacchi 
> <rm@joyent.com>
> pick 15c07adc 10081 smatch indenting fixes for usr/src/uts Reviewed by: 
> Toomas Soome <tsoome@me.com> Reviewed by: Peter Tribble 
> <peter.tribble@gmail.com> Reviewed by: Andy Fiddaman <andy@omniosce.org> 
> Approved by: Robert Mustacchi <rm@joyent.com>
> pick c653bb47 10197 build smatch in parallel Reviewed by: Andy Fiddaman 
> <andy@omniosce.org> Approved by: Robert Mustacchi <rm@joyent.com>
> pick 161294fe 10217 mdb: r9w isn't r8w 10218 CONV_CAP_VAL_HW2_BUFSIZE 
> wasn't updated 10219 AV_386_2_AVX512VPOPCDQ elfcap name should be 
> consistent Reviewed by: Dan McDonald <danmcd@joyent.com> Reviewed by: 
> Rob Johnston <rob.johnston@joyent.com> Reviewed by: Patrick Mooney 
> <patrick.mooney@joyent.com> Reviewed by: Andy Fiddaman 
> <andy@omniosce.org> Approved by: Dan McDonald <danmcd@joyent.com>
> pick f482e26c 10233 dboot: process_module() is missing newline at the 
> end of the string Reviewed by: Sebastian Wiedenroth <wiedi@frubar.net> 
> Reviewed by: Jason King <jbk@joyent.com> Reviewed by: Andy Stormont 
> <astormont@racktopsystems.com> Approved by: Robert Mustacchi 
> <rm@joyent.com>
> 
> # Rebase 08487eea..f482e26c onto 08487eea (8 commands)
> #
> # Commands:
> # p, pick <commit> = use commit
> # r, reword <commit> = use commit, but edit the commit message
> # e, edit <commit> = use commit, but stop for amending
> # s, squash <commit> = use commit, but meld into previous commit
> # f, fixup <commit> = like "squash", but discard this commit's log message
> # x, exec <command> = run command (the rest of the line) using shell
> # b, break = stop here (continue rebase later with 'git rebase --continue')
> # d, drop <commit> = remove commit
> # l, label <label> = label current HEAD with a name
> # t, reset <label> = reset HEAD to a label
> # m, merge [-C <commit> | -c <commit>] <label> [# <oneline>]
> # .       create a merge commit using the original merge commit's
> # .       message (or the oneline, if no original merge commit was
> # .       specified). Use -c <commit> to reword the commit message.
> #
> # These lines can be re-ordered; they are executed from top to bottom.
> #
> # If you remove a line here THAT COMMIT WILL BE LOST.
> #
> # However, if you remove everything, the rebase will be aborted.
> #
> # Note that empty commits are commented out
> Successfully rebased and updated refs/heads/master.
> 
> 
> On 15/01/2019 19:29, Michal Nowak wrote:
>> Hello,
>>
>> on OpenIndiana 2018.10 (illumos kernel) line of the interactive rebase 
>> text after a particular name (Gergő Mihály Doma) is broken:
>>
>> pick 1ea94c756c 10202 loader: use screen-#rows to find bottom left 
>> co-ordinates Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: 
>> Gergő Mihály Doma <domag02@gmail.com> Approved by: Robert Mustacchi 
>> <rm@joyent.com>
>> p
>> pick cadd68ea00 10078 smatch fixes for UCB Reviewed by: Andy Fiddaman 
>> <omnios@citrus-it.net> Reviewed by: Toomas Soome <tsoome@me.com> 
>> Reviewed by: Peter Tribble <peter.tribble@gmail.com> Approved by: 
>> Robert Mustacchi <rm@joyent.com>
>>
>>
>> Source: https://github.com/illumos/illumos-gate.git
>>
>> See the second item which only has "p".
>>
>> This is with LC_ALL=en_US.UTF-8 (same with cs_CZ.UTF-8), with LC_ALL=C 
>> the text is fine.
>>
>> I checked various editors (vim, nano, cat), they are all the same.
>>
>> `git log` (PAGER is set to `/usr/bin/less -ins`) shows the particular 
>> commit correctly.
>>
>> I tested following git version: 2.16.5, 2.19.2, 2.20.1, git.next branch.
>>
>> Here's the configure output: 
>> https://paste.ec/paste/oDdydNxZ#r3avan8BL+8lldFMo928cw3eXSServTsSkGPW4jRBOd 
>>
>>
>> Michal
>>
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Broken interactive rebase text after some UTF-8 characters
  2019-01-16 21:36   ` Michal Nowak
@ 2019-01-17 11:04     ` Phillip Wood
  2019-01-31 17:43       ` Alban Gruin
  0 siblings, 1 reply; 16+ messages in thread
From: Phillip Wood @ 2019-01-17 11:04 UTC (permalink / raw)
  To: Michal Nowak, phillip.wood, git

Hi Michal

On 16/01/2019 21:36, Michal Nowak wrote:
> Hello Phillip, thanks for reaching out.
> 
> This is what I see with your suggested command:
> 
> 
> $ GIT_SEQUENCE_EDITOR=cat git rebase -i 
> 08487eea68a2fa501b5042131c6db068089f82e1
> hint: Waiting for your editor to close the file... pick 1ea94c756c 10202 
> loader: use screen-#rows to find bottom left co-ordinates Reviewed by: 
> Toomas Soome <tsoome@me.com> Reviewed by: Gergő Mihály Doma 
> <domag02@gmail.com> Approved by: Robert Mustacchi <rm@joyent.com>
> p

That's really weird, I'm not sure what's going on. One thing that would 
be useful would be to check whether the output of

git log --pretty='%s' --reverse 08487eea68a2fa5..

includes the stray 'p' or not. I had a look through the configure output 
you attached to your previous mail and it seems you're building without 
iconv and gettext support, I've built git on linux with
make NO_ICONV=1 NO_GETTEXT=1
to try and test that and the rebase todo list was fine. Building with 
configure rather than the Makefile that comes with git can be a bit 
flaky sometimes as occasionally when a new compile option gets added to 
the Makefile updating the configure script gets overlooked but I don't 
know if that is the problem here or not.

Best Wishes

Phillip

> pick cadd68ea00 10078 smatch fixes for UCB Reviewed by: Andy Fiddaman 
> <omnios@citrus-it.net> Reviewed by: Toomas Soome <tsoome@me.com> 
> Reviewed by: Peter Tribble <peter.tribble@gmail.com> Approved by: Robert 
> Mustacchi <rm@joyent.com>
> pick 05ede3db5e 10079 smatch Makefile changes for usr/src/lib Reviewed 
> by: Andy Fiddaman <andy@omniosce.org> Approved by: Robert Mustacchi 
> <rm@joyent.com>
> pick 5661bb7641 10080 smatch Makefile changes for usr/src/cmd Reviewed 
> by: Andy Fiddaman <andy@omniosce.org> Approved by: Robert Mustacchi 
> <rm@joyent.com>
> pick 15c07adc1c 10081 smatch indenting fixes for usr/src/uts Reviewed 
> by: Toomas Soome <tsoome@me.com> Reviewed by: Peter Tribble 
> <peter.tribble@gmail.com> Reviewed by: Andy Fiddaman <andy@omniosce.org> 
> Approved by: Robert Mustacchi <rm@joyent.com>
> pick c653bb4713 10197 build smatch in parallel Reviewed by: Andy 
> Fiddaman <andy@omniosce.org> Approved by: Robert Mustacchi <rm@joyent.com>
> 
> # Rebase 08487eea68..c653bb4713 onto 08487eea68 (6 commands)
> #
> # Commands:
> # p, pick <commit> = use commit
> # r, reword <commit> = use commit, but edit the commit message
> # e, edit <commit> = use commit, but stop for amending
> # s, squash <commit> = use commit, but meld into previous commit
> # f, fixup <commit> = like "squash", but discard this commit's log message
> # x, exec <command> = run command (the rest of the line) using shell
> # b, break = stop here (continue rebase later with 'git rebase --continue')
> # d, drop <commit> = remove commit
> # l, label <label> = label current HEAD with a name
> # t, reset <label> = reset HEAD to a label
> # m, merge [-C <commit> | -c <commit>] <label> [# <oneline>]
> # .       create a merge commit using the original merge commit's
> # .       message (or the oneline, if no original merge commit was
> # .       specified). Use -c <commit> to reword the commit message.
> #
> # These lines can be re-ordered; they are executed from top to bottom.
> #
> # If you remove a line here THAT COMMIT WILL BE LOST.
> #
> # However, if you remove everything, the rebase will be aborted.
> #
> # Note that empty commits are commented out
> error: missing arguments for pick
> error: invalid line 2: p
> You can fix this with 'git rebase --edit-todo' and then run 'git rebase 
> --continue'.
> Or you can abort the rebase with 'git rebase --abort'.
> 
> 
> Let me know, if there's anything I can try myself to gather more 
> information. Should you want to reproduce this on OpenIndiana on your 
> own, Vagrant is the more straightforward way to deploy OpenIndiana.
> 
> Thanks,
> Michal
> 
> On 01/16/19 11:33 AM, Phillip Wood wrote:
>> Dear Michal
>>
>> Thanks for the bug report, unfortunately I'm unable to reproduce it 
>> here using git 2.20.1 (see below). Knowing a little about how the todo 
>> list is created I cannot think how multibyte characters would break 
>> it. What command line were you using to start the rebase?
>>
>> Best Wishes
>>
>> Phillip
>>
>> $ git clone --shallow-since='11-1-2019' 
>> https://github.com/illumos/illumos-gate.git
>> Cloning into 'illumos-gate'...
>> remote: Enumerating objects: 54348, done.
>> remote: Counting objects: 100% (54348/54348), done.
>> remote: Compressing objects: 100% (39008/39008), done.
>> remote: Total 54348 (delta 19671), reused 31187 (delta 11884), 
>> pack-reused 0
>> Receiving objects: 100% (54348/54348), 148.91 MiB | 1.18 MiB/s, done.
>> Resolving deltas: 100% (19671/19671), done.
>> Checking out files: 100% (47352/47352), done.
>>
>> $ cd ~/src/illumos-gate
>> $ git log --oneline -1
>> f482e26c (HEAD -> master, origin/master, origin/HEAD) 10233 dboot: 
>> process_module() is missing newline at the end of the string Reviewed 
>> by: Sebastian Wiedenroth <wiedi@frubar.net> Reviewed by: Jason King 
>> <jbk@joyent.com> Reviewed by: Andy Stormont 
>> <astormont@racktopsystems.com> Approved by: Robert Mustacchi 
>> <rm@joyent.com>
>>
>> $ GIT_SEQUENCE_EDITOR=cat git rebase -i 
>> 08487eea68a2fa501b5042131c6db068089f82e1
>>
>> hint: Waiting for your editor to close the file...
>> pick 1ea94c75 10202 loader: use screen-#rows to find bottom left 
>> co-ordinates Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: 
>> Gergő Mihály Doma <domag02@gmail.com> Approved by: Robert Mustacchi 
>> <rm@joyent.com>
>> pick cadd68ea 10078 smatch fixes for UCB Reviewed by: Andy Fiddaman 
>> <omnios@citrus-it.net> Reviewed by: Toomas Soome <tsoome@me.com> 
>> Reviewed by: Peter Tribble <peter.tribble@gmail.com> Approved by: 
>> Robert Mustacchi <rm@joyent.com>
>> pick 05ede3db 10079 smatch Makefile changes for usr/src/lib Reviewed 
>> by: Andy Fiddaman <andy@omniosce.org> Approved by: Robert Mustacchi 
>> <rm@joyent.com>
>> pick 5661bb76 10080 smatch Makefile changes for usr/src/cmd Reviewed 
>> by: Andy Fiddaman <andy@omniosce.org> Approved by: Robert Mustacchi 
>> <rm@joyent.com>
>> pick 15c07adc 10081 smatch indenting fixes for usr/src/uts Reviewed 
>> by: Toomas Soome <tsoome@me.com> Reviewed by: Peter Tribble 
>> <peter.tribble@gmail.com> Reviewed by: Andy Fiddaman 
>> <andy@omniosce.org> Approved by: Robert Mustacchi <rm@joyent.com>
>> pick c653bb47 10197 build smatch in parallel Reviewed by: Andy 
>> Fiddaman <andy@omniosce.org> Approved by: Robert Mustacchi 
>> <rm@joyent.com>
>> pick 161294fe 10217 mdb: r9w isn't r8w 10218 CONV_CAP_VAL_HW2_BUFSIZE 
>> wasn't updated 10219 AV_386_2_AVX512VPOPCDQ elfcap name should be 
>> consistent Reviewed by: Dan McDonald <danmcd@joyent.com> Reviewed by: 
>> Rob Johnston <rob.johnston@joyent.com> Reviewed by: Patrick Mooney 
>> <patrick.mooney@joyent.com> Reviewed by: Andy Fiddaman 
>> <andy@omniosce.org> Approved by: Dan McDonald <danmcd@joyent.com>
>> pick f482e26c 10233 dboot: process_module() is missing newline at the 
>> end of the string Reviewed by: Sebastian Wiedenroth <wiedi@frubar.net> 
>> Reviewed by: Jason King <jbk@joyent.com> Reviewed by: Andy Stormont 
>> <astormont@racktopsystems.com> Approved by: Robert Mustacchi 
>> <rm@joyent.com>
>>
>> # Rebase 08487eea..f482e26c onto 08487eea (8 commands)
>> #
>> # Commands:
>> # p, pick <commit> = use commit
>> # r, reword <commit> = use commit, but edit the commit message
>> # e, edit <commit> = use commit, but stop for amending
>> # s, squash <commit> = use commit, but meld into previous commit
>> # f, fixup <commit> = like "squash", but discard this commit's log 
>> message
>> # x, exec <command> = run command (the rest of the line) using shell
>> # b, break = stop here (continue rebase later with 'git rebase 
>> --continue')
>> # d, drop <commit> = remove commit
>> # l, label <label> = label current HEAD with a name
>> # t, reset <label> = reset HEAD to a label
>> # m, merge [-C <commit> | -c <commit>] <label> [# <oneline>]
>> # .       create a merge commit using the original merge commit's
>> # .       message (or the oneline, if no original merge commit was
>> # .       specified). Use -c <commit> to reword the commit message.
>> #
>> # These lines can be re-ordered; they are executed from top to bottom.
>> #
>> # If you remove a line here THAT COMMIT WILL BE LOST.
>> #
>> # However, if you remove everything, the rebase will be aborted.
>> #
>> # Note that empty commits are commented out
>> Successfully rebased and updated refs/heads/master.
>>
>>
>> On 15/01/2019 19:29, Michal Nowak wrote:
>>> Hello,
>>>
>>> on OpenIndiana 2018.10 (illumos kernel) line of the interactive 
>>> rebase text after a particular name (Gergő Mihály Doma) is broken:
>>>
>>> pick 1ea94c756c 10202 loader: use screen-#rows to find bottom left 
>>> co-ordinates Reviewed by: Toomas Soome <tsoome@me.com> Reviewed by: 
>>> Gergő Mihály Doma <domag02@gmail.com> Approved by: Robert Mustacchi 
>>> <rm@joyent.com>
>>> p
>>> pick cadd68ea00 10078 smatch fixes for UCB Reviewed by: Andy Fiddaman 
>>> <omnios@citrus-it.net> Reviewed by: Toomas Soome <tsoome@me.com> 
>>> Reviewed by: Peter Tribble <peter.tribble@gmail.com> Approved by: 
>>> Robert Mustacchi <rm@joyent.com>
>>>
>>>
>>> Source: https://github.com/illumos/illumos-gate.git
>>>
>>> See the second item which only has "p".
>>>
>>> This is with LC_ALL=en_US.UTF-8 (same with cs_CZ.UTF-8), with 
>>> LC_ALL=C the text is fine.
>>>
>>> I checked various editors (vim, nano, cat), they are all the same.
>>>
>>> `git log` (PAGER is set to `/usr/bin/less -ins`) shows the particular 
>>> commit correctly.
>>>
>>> I tested following git version: 2.16.5, 2.19.2, 2.20.1, git.next branch.
>>>
>>> Here's the configure output: 
>>> https://paste.ec/paste/oDdydNxZ#r3avan8BL+8lldFMo928cw3eXSServTsSkGPW4jRBOd 
>>>
>>>
>>> Michal
>>>
>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Broken interactive rebase text after some UTF-8 characters
  2019-01-17 11:04     ` Phillip Wood
@ 2019-01-31 17:43       ` Alban Gruin
  2019-01-31 20:40         ` Phillip Wood
  0 siblings, 1 reply; 16+ messages in thread
From: Alban Gruin @ 2019-01-31 17:43 UTC (permalink / raw)
  To: phillip.wood, Michal Nowak, git

Hi Phillip and Michal,

I think I found the bug.

If you look at .git/rebase-merge/git-rebase-todo.backup, which is
created before the editor is opened, you can see that it does not have
this problem.  In between, transform_todos() is called and causes the
problem reported by Michal.

This seems to be caused by a single line, sequencer:4661 (on b5101f9297,
"Fourth batch after 2.20", 2019-01-29)[1].  If you add just before a
something like this:

    fwrite(item->arg, item->arg_len, sizeof(char), stdout);

You will see that the argument is properly written to stdout.  But if
you write this:

    printf("%.*s\n", item->arg_len, item->arg);

You will have the same broken output as in the todo file.

Are we misusing C formats?

[1] https://github.com/git/git/blob/master/sequencer.c#L4661

Cheers,
Alban



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Broken interactive rebase text after some UTF-8 characters
  2019-01-31 17:43       ` Alban Gruin
@ 2019-01-31 20:40         ` Phillip Wood
  2019-01-31 21:00           ` Alban Gruin
  2019-01-31 21:35           ` Junio C Hamano
  0 siblings, 2 replies; 16+ messages in thread
From: Phillip Wood @ 2019-01-31 20:40 UTC (permalink / raw)
  To: Alban Gruin, phillip.wood, Michal Nowak, git; +Cc: Johannes Schindelin

Hi Alban

On 31/01/2019 17:43, Alban Gruin wrote:
> Hi Phillip and Michal,
> 
> I think I found the bug.

Good find! Which os are you on?
> 
> If you look at .git/rebase-merge/git-rebase-todo.backup, which is
> created before the editor is opened, you can see that it does not have
> this problem.  In between, transform_todos() is called and causes the
> problem reported by Michal.
> 
> This seems to be caused by a single line, sequencer:4661 (on b5101f9297,
> "Fourth batch after 2.20", 2019-01-29)[1].  If you add just before a
> something like this:
> 
>      fwrite(item->arg, item->arg_len, sizeof(char), stdout);
> 
> You will see that the argument is properly written to stdout.  But if
> you write this:
> 
>      printf("%.*s\n", item->arg_len, item->arg);
> 
> You will have the same broken output as in the todo file.
> 
> Are we misusing C formats?

The C standard and POSIX both say that the * refers to the maximum 
number of bytes to print but it looks like it is being treated as the 
maximum number of characters on OpenIndiana.

Johannes - Perhaps we should change it to use fwrite() unless printf() 
gets fixed and we're sure no other operating systems are affected?

[1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/fprintf.html
[2] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf page 309

Best Wishes

Phillip

> [1] https://github.com/git/git/blob/master/sequencer.c#L4661
> 
> Cheers,
> Alban
> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Broken interactive rebase text after some UTF-8 characters
  2019-01-31 20:40         ` Phillip Wood
@ 2019-01-31 21:00           ` Alban Gruin
  2019-01-31 21:35           ` Junio C Hamano
  1 sibling, 0 replies; 16+ messages in thread
From: Alban Gruin @ 2019-01-31 21:00 UTC (permalink / raw)
  To: phillip.wood, Michal Nowak, git; +Cc: Johannes Schindelin

Hi Phillip,

Le 31/01/2019 à 21:40, Phillip Wood a écrit :
> Hi Alban
> 
> On 31/01/2019 17:43, Alban Gruin wrote:
>> Hi Phillip and Michal,
>>
>> I think I found the bug.
> 
> Good find! Which os are you on?

I’m on Linux, but I found this on OpenIndiana running under QEMU.

-- Alban


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Broken interactive rebase text after some UTF-8 characters
  2019-01-31 20:40         ` Phillip Wood
  2019-01-31 21:00           ` Alban Gruin
@ 2019-01-31 21:35           ` Junio C Hamano
  2019-02-01  7:38             ` Johannes Schindelin
  1 sibling, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2019-01-31 21:35 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Alban Gruin, phillip.wood, Michal Nowak, git, Johannes Schindelin

Phillip Wood <phillip.wood@talktalk.net> writes:

>> Are we misusing C formats?
>
> The C standard and POSIX both say that the * refers to the maximum
> number of bytes to print but it looks like it is being treated as the
> maximum number of characters on OpenIndiana.
>
> Johannes - Perhaps we should change it to use fwrite() unless printf()
> gets fixed and we're sure no other operating systems are affected?

Avoid such a rewrite, as "%*.s" that takes (int, char *) are used in
many other places in our codebase, if you can.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Broken interactive rebase text after some UTF-8 characters
  2019-01-31 21:35           ` Junio C Hamano
@ 2019-02-01  7:38             ` Johannes Schindelin
  2019-02-01  9:06               ` Michal Nowak
  2019-02-01 16:13               ` Alban Gruin
  0 siblings, 2 replies; 16+ messages in thread
From: Johannes Schindelin @ 2019-02-01  7:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Phillip Wood, Alban Gruin, phillip.wood, Michal Nowak, git

Hi,

On Thu, 31 Jan 2019, Junio C Hamano wrote:

> Phillip Wood <phillip.wood@talktalk.net> writes:
> 
> >> Are we misusing C formats?
> >
> > The C standard and POSIX both say that the * refers to the maximum
> > number of bytes to print but it looks like it is being treated as the
> > maximum number of characters on OpenIndiana.
> >
> > Johannes - Perhaps we should change it to use fwrite() unless printf()
> > gets fixed and we're sure no other operating systems are affected?
> 
> Avoid such a rewrite, as "%*.s" that takes (int, char *) are used in
> many other places in our codebase, if you can.

Yes, this would be painful in particular in cases like

	master:advice.c:101:           fprintf(stderr, _("%shint: %.*s%s\n"),

where we want to write more than just a variable-length buffer.

I am curious: is libintl (gettext) used on OpenIndiana? I ask because
AFAIR fprintf() is overridden in that case, and the bug might be a lot
easier to fix if it is in libintl rather than in libc.

Of course, it might *still* be a bug in libc by virtue of handing '%.*s'
through to libc's implementation.

Alban, can you test this with NO_GETTEXT?

Thanks,
Johannes

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Broken interactive rebase text after some UTF-8 characters
  2019-02-01  7:38             ` Johannes Schindelin
@ 2019-02-01  9:06               ` Michal Nowak
  2019-02-01 14:33                 ` Johannes Schindelin
  2019-02-01 16:15                 ` Alban Gruin
  2019-02-01 16:13               ` Alban Gruin
  1 sibling, 2 replies; 16+ messages in thread
From: Michal Nowak @ 2019-02-01  9:06 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Phillip Wood, Alban Gruin, phillip.wood,
	Michal Nowak, git

Johannes,

On Friday, February 1, 2019 at 8:38 AM, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> Hi,
> 
> On Thu, 31 Jan 2019, Junio C Hamano wrote:
> 
>> Phillip Wood <phillip.wood@talktalk.net> writes:
>>
>> >> Are we misusing C formats?
>> >
>> > The C standard and POSIX both say that the * refers to the maximum
>> > number of bytes to print but it looks like it is being treated as the
>> > maximum number of characters on OpenIndiana.
>> >
>> > Johannes - Perhaps we should change it to use fwrite() unless
>> printf()
>> > gets fixed and we're sure no other operating systems are affected?
>>
>> Avoid such a rewrite, as "%*.s" that takes (int, char *) are used in
>> many other places in our codebase, if you can.
> 
> Yes, this would be painful in particular in cases like
> 
> 	master:advice.c:101:           fprintf(stderr, _("%shint: %.*s%s\n"),
> 
> where we want to write more than just a variable-length buffer.
> 
> I am curious: is libintl (gettext) used on OpenIndiana? I ask because
> AFAIR fprintf() is overridden in that case, and the bug might be a lot
> easier to fix if it is in libintl rather than in libc.

here you can see the full output of the OpenIndiana git build: https://hipster.openindiana.org/logs/oi-userland/latest/git.publish.log.

From what I see there, libintl was found.

If you believe this is illumos libc bug, it would be cool if someone created an simple testcase, which I can forward to the illumos developers.

Thanks,
Michal

> 
> Of course, it might *still* be a bug in libc by virtue of handing '%.*s'
> through to libc's implementation.
> 
> Alban, can you test this with NO_GETTEXT?
> 
> Thanks,
> Johannes

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Broken interactive rebase text after some UTF-8 characters
  2019-02-01  9:06               ` Michal Nowak
@ 2019-02-01 14:33                 ` Johannes Schindelin
  2019-02-01 16:24                   ` Michal Nowak
  2019-02-01 16:15                 ` Alban Gruin
  1 sibling, 1 reply; 16+ messages in thread
From: Johannes Schindelin @ 2019-02-01 14:33 UTC (permalink / raw)
  To: Michal Nowak; +Cc: Junio C Hamano, Phillip Wood, Alban Gruin, phillip.wood, git

Hi Michal,

On Fri, 1 Feb 2019, Michal Nowak wrote:

> On Friday, February 1, 2019 at 8:38 AM, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > 
> > On Thu, 31 Jan 2019, Junio C Hamano wrote:
> > 
> >> Phillip Wood <phillip.wood@talktalk.net> writes:
> >>
> >> >> Are we misusing C formats?
> >> >
> >> > The C standard and POSIX both say that the * refers to the maximum
> >> > number of bytes to print but it looks like it is being treated as the
> >> > maximum number of characters on OpenIndiana.
> >> >
> >> > Johannes - Perhaps we should change it to use fwrite() unless
> >> printf()
> >> > gets fixed and we're sure no other operating systems are affected?
> >>
> >> Avoid such a rewrite, as "%*.s" that takes (int, char *) are used in
> >> many other places in our codebase, if you can.
> > 
> > Yes, this would be painful in particular in cases like
> > 
> > 	master:advice.c:101:           fprintf(stderr, _("%shint: %.*s%s\n"),
> > 
> > where we want to write more than just a variable-length buffer.
> > 
> > I am curious: is libintl (gettext) used on OpenIndiana? I ask because
> > AFAIR fprintf() is overridden in that case, and the bug might be a lot
> > easier to fix if it is in libintl rather than in libc.
> 
> here you can see the full output of the OpenIndiana git build: https://hipster.openindiana.org/logs/oi-userland/latest/git.publish.log.
> 
> From what I see there, libintl was found.
> 
> If you believe this is illumos libc bug, it would be cool if someone created an simple testcase, which I can forward to the illumos developers.

You already have that example. Just take the UTF-8 text in your original
bug report, put it into something like

	int main(int argc, char **argv)
	{
		char utf8[] = "... your text here...";

		printf("%.*s", (int)(sizeof(utf8) - 1), utf8);

		return 0;
	}

You should first verify, though, that this replicate the problem, and if
it does not, use libintl (I think you have to `#include <gettext.h>` and
`-lintl` or some such) and see whether that reproduces your problem.

Ciao,
Johannes

> 
> Thanks,
> Michal
> 
> > 
> > Of course, it might *still* be a bug in libc by virtue of handing '%.*s'
> > through to libc's implementation.
> > 
> > Alban, can you test this with NO_GETTEXT?
> > 
> > Thanks,
> > Johannes
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Broken interactive rebase text after some UTF-8 characters
  2019-02-01  7:38             ` Johannes Schindelin
  2019-02-01  9:06               ` Michal Nowak
@ 2019-02-01 16:13               ` Alban Gruin
  1 sibling, 0 replies; 16+ messages in thread
From: Alban Gruin @ 2019-02-01 16:13 UTC (permalink / raw)
  To: Johannes Schindelin, Junio C Hamano
  Cc: Phillip Wood, phillip.wood, Michal Nowak, git

Hi Johannes,

Le 01/02/2019 à 08:38, Johannes Schindelin a écrit :
> Hi,
> 
> On Thu, 31 Jan 2019, Junio C Hamano wrote:
> 
>> Phillip Wood <phillip.wood@talktalk.net> writes:
>>
>>>> Are we misusing C formats?
>>>
>>> The C standard and POSIX both say that the * refers to the maximum
>>> number of bytes to print but it looks like it is being treated as the
>>> maximum number of characters on OpenIndiana.
>>>
>>> Johannes - Perhaps we should change it to use fwrite() unless printf()
>>> gets fixed and we're sure no other operating systems are affected?
>>
>> Avoid such a rewrite, as "%*.s" that takes (int, char *) are used in
>> many other places in our codebase, if you can.
> 
> Yes, this would be painful in particular in cases like
> 
> 	master:advice.c:101:           fprintf(stderr, _("%shint: %.*s%s\n"),
> 
> where we want to write more than just a variable-length buffer.
> 
> I am curious: is libintl (gettext) used on OpenIndiana? I ask because
> AFAIR fprintf() is overridden in that case, and the bug might be a lot
> easier to fix if it is in libintl rather than in libc.
> 
> Of course, it might *still* be a bug in libc by virtue of handing '%.*s'
> through to libc's implementation.
> 
> Alban, can you test this with NO_GETTEXT?

Sure. :)

The bug no longer happens when git is built with NO_GETTEXT.  All is
working as expected.

> 
> Thanks,
> Johannes
> 

Cheers,
Alban


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Broken interactive rebase text after some UTF-8 characters
  2019-02-01  9:06               ` Michal Nowak
  2019-02-01 14:33                 ` Johannes Schindelin
@ 2019-02-01 16:15                 ` Alban Gruin
  1 sibling, 0 replies; 16+ messages in thread
From: Alban Gruin @ 2019-02-01 16:15 UTC (permalink / raw)
  To: Michal Nowak, Johannes Schindelin
  Cc: Junio C Hamano, Phillip Wood, phillip.wood, git

[-- Attachment #1: Type: text/plain, Size: 2316 bytes --]

Hi Michal,

Le 01/02/2019 à 10:06, Michal Nowak a écrit :
> Johannes,
> 
> On Friday, February 1, 2019 at 8:38 AM, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>> Hi,
>>
>> On Thu, 31 Jan 2019, Junio C Hamano wrote:
>>
>>> Phillip Wood <phillip.wood@talktalk.net> writes:
>>>
>>>>> Are we misusing C formats?
>>>>
>>>> The C standard and POSIX both say that the * refers to the maximum
>>>> number of bytes to print but it looks like it is being treated as the
>>>> maximum number of characters on OpenIndiana.
>>>>
>>>> Johannes - Perhaps we should change it to use fwrite() unless
>>> printf()
>>>> gets fixed and we're sure no other operating systems are affected?
>>>
>>> Avoid such a rewrite, as "%*.s" that takes (int, char *) are used in
>>> many other places in our codebase, if you can.
>>
>> Yes, this would be painful in particular in cases like
>>
>> 	master:advice.c:101:           fprintf(stderr, _("%shint: %.*s%s\n"),
>>
>> where we want to write more than just a variable-length buffer.
>>
>> I am curious: is libintl (gettext) used on OpenIndiana? I ask because
>> AFAIR fprintf() is overridden in that case, and the bug might be a lot
>> easier to fix if it is in libintl rather than in libc.
> 
> here you can see the full output of the OpenIndiana git build: https://hipster.openindiana.org/logs/oi-userland/latest/git.publish.log.
> 
> From what I see there, libintl was found.
> 
> If you believe this is illumos libc bug, it would be cool if someone created an simple testcase, which I can forward to the illumos developers.
> 

I attached a test case to this email.  You can build it with `gcc
test-case.c', and run it with `./a.out'.

Output on my Linux system:

    Before setting locale:
    Expected output:
    áaaa
    Actual output:
    áaaa

    After setting locale:
    Expected output:
    áaaa
    Actual output:
    áaaa

Output on an OpenIndiana system:

    Before setting locale:
    Expected output:
    áaaa
    Actual output:
    áaaa

    After setting locale:
    Expected output:
    áaaa
    Actual output:
    áaaaa

> Thanks,
> Michal
> 
>>
>> Of course, it might *still* be a bug in libc by virtue of handing '%.*s'
>> through to libc's implementation.
>>
>> Alban, can you test this with NO_GETTEXT?
>>
>> Thanks,
>> Johannes

Cheers,
Alban


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: test-case.c --]
[-- Type: text/x-csrc; name="test-case.c", Size: 650 bytes --]

/*
 * Test case for OpenIndiana '%.*s' bug
 * Build with `gcc test-case.c'
 * Run with `./a.out'
 */

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#include <locale.h>
#include <libintl.h>

static void compare_output(const char *str, int len) {
	puts("Expected output:");
	fwrite(str, len, sizeof(char), stdout);

	puts("\nActual output:");
	printf("%.*s\n", len, str);
}

int main(int argc, char **argv) {
	char buf[] = "áaaaaaa";

	puts("Before setting locale:");
	compare_output(buf, 5);

	setlocale(LC_ALL, "");

	puts("\nAfter setting locale:");
	compare_output(buf, 5);

	return EXIT_SUCCESS;
}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Broken interactive rebase text after some UTF-8 characters
  2019-02-01 14:33                 ` Johannes Schindelin
@ 2019-02-01 16:24                   ` Michal Nowak
  2019-02-01 17:30                     ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Michal Nowak @ 2019-02-01 16:24 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Phillip Wood, Alban Gruin, phillip.wood, git

On 02/01/19 03:33 PM, Johannes Schindelin wrote:
> Hi Michal,
> 
> On Fri, 1 Feb 2019, Michal Nowak wrote:
> 
>> On Friday, February 1, 2019 at 8:38 AM, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>>>
>>> On Thu, 31 Jan 2019, Junio C Hamano wrote:
>>>
>>>> Phillip Wood <phillip.wood@talktalk.net> writes:
>>>>
>>>>>> Are we misusing C formats?
>>>>>
>>>>> The C standard and POSIX both say that the * refers to the maximum
>>>>> number of bytes to print but it looks like it is being treated as the
>>>>> maximum number of characters on OpenIndiana.
>>>>>
>>>>> Johannes - Perhaps we should change it to use fwrite() unless
>>>> printf()
>>>>> gets fixed and we're sure no other operating systems are affected?
>>>>
>>>> Avoid such a rewrite, as "%*.s" that takes (int, char *) are used in
>>>> many other places in our codebase, if you can.
>>>
>>> Yes, this would be painful in particular in cases like
>>>
>>> 	master:advice.c:101:           fprintf(stderr, _("%shint: %.*s%s\n"),
>>>
>>> where we want to write more than just a variable-length buffer.
>>>
>>> I am curious: is libintl (gettext) used on OpenIndiana? I ask because
>>> AFAIR fprintf() is overridden in that case, and the bug might be a lot
>>> easier to fix if it is in libintl rather than in libc.
>>
>> here you can see the full output of the OpenIndiana git build: https://hipster.openindiana.org/logs/oi-userland/latest/git.publish.log.
>>
>>  From what I see there, libintl was found.
>>
>> If you believe this is illumos libc bug, it would be cool if someone created an simple testcase, which I can forward to the illumos developers.
> 
> You already have that example. Just take the UTF-8 text in your original
> bug report, put it into something like
> 
> 	int main(int argc, char **argv)
> 	{
> 		char utf8[] = "... your text here...";
> 
> 		printf("%.*s", (int)(sizeof(utf8) - 1), utf8);
> 
> 		return 0;
> 	}
> 
> You should first verify, though, that this replicate the problem, and if
> it does not, use libintl (I think you have to `#include <gettext.h>` and
> `-lintl` or some such) and see whether that reproduces your problem.

Thank you, Johannes for the test case.

However, I don't see any problem with the output on OpenIndiana:

{global} newman@lenovo:~ $ cat printf.c
#include <stdio.h>
//#include <gettext.h>
int main(int argc, char **argv) {
   char utf8[] = "Gergő Mihály Doma\n";
   printf("%.*s", (int)(sizeof(utf8) - 1), utf8);
   return 0;
}

{global} newman@lenovo:~ $ gcc printf.c -o printf && ./printf
Gergő Mihály Doma

Enabled the gettext header in the source file.

{global} newman@lenovo:~ $ gcc printf.c -o printf_intl -lintl 
-I/usr/share/gettext/ && ./printf_intl
Gergő Mihály Doma

{global} newman@lenovo:~ $ ldd printf printf_intl
printf:
         libc.so.1 =>     /lib/libc.so.1
         libm.so.2 =>     /lib/libm.so.2
printf_intl:
         libintl.so.1 =>  /lib/libintl.so.1
         libc.so.1 =>     /lib/libc.so.1
         libm.so.2 =>     /lib/libm.so.2

{global} newman@lenovo:~ $ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=


I even tried more arcane characters from 
https://www.w3.org/2001/06/utf-8-test/UTF-8-demo.html but they are 
displayed correctly as well.

Michal

> 
> Ciao,
> Johannes
> 
>>
>> Thanks,
>> Michal
>>
>>>
>>> Of course, it might *still* be a bug in libc by virtue of handing '%.*s'
>>> through to libc's implementation.
>>>
>>> Alban, can you test this with NO_GETTEXT?
>>>
>>> Thanks,
>>> Johannes
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Broken interactive rebase text after some UTF-8 characters
  2019-02-01 16:24                   ` Michal Nowak
@ 2019-02-01 17:30                     ` Junio C Hamano
  2019-02-01 19:00                       ` Michal Nowak
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2019-02-01 17:30 UTC (permalink / raw)
  To: Michal Nowak
  Cc: Johannes Schindelin, Phillip Wood, Alban Gruin, phillip.wood, git

Michal Nowak <mnowak@startmail.com> writes:

>> You already have that example. Just take the UTF-8 text in your original
>> bug report, put it into something like
>>
>> 	int main(int argc, char **argv)
>> 	{
>> 		char utf8[] = "... your text here...";
>>
>> 		printf("%.*s", (int)(sizeof(utf8) - 1), utf8);
>>
>> 		return 0;
>> 	}

When replayed literally, this is not a very good test.

> {global} newman@lenovo:~ $ cat printf.c
> #include <stdio.h>
> //#include <gettext.h>
> int main(int argc, char **argv) {
>   char utf8[] = "Gergő Mihály Doma\n";
>   printf("%.*s", (int)(sizeof(utf8) - 1), utf8);
>   return 0;
> }

And this is replaying it literally.

The current working suspicion in this thread is that the platform
printf("%.*s", num, str) emits up to num "characters" starting at
str, which is an incorrect implementation, as it should emit up to
num "bytes".  

Notice that the num in this case is the byte count of that utf8[]
string.  That number is always larger than the number of "characters"
for a string with multi-byte character(s) in it.  Let's say that the
sample string has N "characters", and it is N+X "bytes" long, where
X > 1.

If the suspicion is correct, i.e. the way the printf implementation
is broken on this platform is that it shows up to num "characters",
then the call is asking to show up to N+X "characters".  The buggy
printf shows all the available N "characters", notices the string
stops there, and finishes.  So you won't _see_ the bug with that
test program.

Instead, use something like this.

        #include <stdio.h>

        int main(int ac, char **av)
        {
                char utf8[] = "ふabc";
                printf("%.*s\n", 4, utf8);
                return 0;
        }

With or without gettext or i18n, the output must end with 'a' followed
by a newline, and you must not see 'b' nor 'c'.  Otherwise your printf
is broken.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Broken interactive rebase text after some UTF-8 characters
  2019-02-01 17:30                     ` Junio C Hamano
@ 2019-02-01 19:00                       ` Michal Nowak
  0 siblings, 0 replies; 16+ messages in thread
From: Michal Nowak @ 2019-02-01 19:00 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Phillip Wood, Alban Gruin, phillip.wood, git

On 02/01/19 06:30 PM, Junio C Hamano wrote:
> Michal Nowak <mnowak@startmail.com> writes:
> 
>>> You already have that example. Just take the UTF-8 text in your original
>>> bug report, put it into something like
>>>
>>> 	int main(int argc, char **argv)
>>> 	{
>>> 		char utf8[] = "... your text here...";
>>>
>>> 		printf("%.*s", (int)(sizeof(utf8) - 1), utf8);
>>>
>>> 		return 0;
>>> 	}
> 
> When replayed literally, this is not a very good test.
> 
>> {global} newman@lenovo:~ $ cat printf.c
>> #include <stdio.h>
>> //#include <gettext.h>
>> int main(int argc, char **argv) {
>>    char utf8[] = "Gergő Mihály Doma\n";
>>    printf("%.*s", (int)(sizeof(utf8) - 1), utf8);
>>    return 0;
>> }
> 
> And this is replaying it literally.
> 
> The current working suspicion in this thread is that the platform
> printf("%.*s", num, str) emits up to num "characters" starting at
> str, which is an incorrect implementation, as it should emit up to
> num "bytes".
> 
> Notice that the num in this case is the byte count of that utf8[]
> string.  That number is always larger than the number of "characters"
> for a string with multi-byte character(s) in it.  Let's say that the
> sample string has N "characters", and it is N+X "bytes" long, where
> X > 1.
> 
> If the suspicion is correct, i.e. the way the printf implementation
> is broken on this platform is that it shows up to num "characters",
> then the call is asking to show up to N+X "characters".  The buggy
> printf shows all the available N "characters", notices the string
> stops there, and finishes.  So you won't _see_ the bug with that
> test program.
> 
> Instead, use something like this.
> 
>          #include <stdio.h>
> 
>          int main(int ac, char **av)
>          {
>                  char utf8[] = "ふabc";
>                  printf("%.*s\n", 4, utf8);
>                  return 0;
>          }
> 
> With or without gettext or i18n, the output must end with 'a' followed
> by a newline, and you must not see 'b' nor 'c'.  Otherwise your printf
> is broken.
> 

The output ends with 'a' for me on OpenIndiana:

{global} newman@lenovo:~ $ gcc tp.c -o tp && ./tp
ふa

Though, I can reproduce the problem with Alban's test case the exact 
same way he wrote earlier today.

Michal

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-02-01 19:00 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-15 19:29 Broken interactive rebase text after some UTF-8 characters Michal Nowak
2019-01-16 10:33 ` Phillip Wood
2019-01-16 21:36   ` Michal Nowak
2019-01-17 11:04     ` Phillip Wood
2019-01-31 17:43       ` Alban Gruin
2019-01-31 20:40         ` Phillip Wood
2019-01-31 21:00           ` Alban Gruin
2019-01-31 21:35           ` Junio C Hamano
2019-02-01  7:38             ` Johannes Schindelin
2019-02-01  9:06               ` Michal Nowak
2019-02-01 14:33                 ` Johannes Schindelin
2019-02-01 16:24                   ` Michal Nowak
2019-02-01 17:30                     ` Junio C Hamano
2019-02-01 19:00                       ` Michal Nowak
2019-02-01 16:15                 ` Alban Gruin
2019-02-01 16:13               ` Alban Gruin

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).