bug-coreutils@gnu.org mirror (unofficial)
 help / color / mirror / Atom feed
From: "Pádraig Brady" <P@draigBrady.com>
To: Thomas Dreibholz <dreibh@simula.no>, 69951@debbugs.gnu.org
Subject: bug#69951: coreutils: printf formatting bug for nb_NO and nn_NO locales
Date: Sat, 23 Mar 2024 14:39:04 +0000	[thread overview]
Message-ID: <6033d290-a8c0-9ba5-fc85-e550b326339c@draigBrady.com> (raw)
In-Reply-To: <6d1e3259-f473-44b7-aada-34bc77917192@simula.no>

tag 69951 notabug
close 69951
stop

On 22/03/2024 20:22, Thomas Dreibholz wrote:
> Hi,
> 
> I just discovered a printf bug for at least the nb_NO and nn_NO locales
> when printing numbers with thousands separator. To reproduce:
> 
> #!/bin/bash
> for l in de_DE nb_NO ; do
>      echo "LC_NUMERIC=$l.UTF-8"
>      for n in 1 100 1000 10000 100000 1000000 10000000 ; do
>         LC_NUMERIC=$l.UTF-8 /usr/bin/printf "<%'10d>\n" $n
>      done
> done
> 
> The expected output of "%'10d" is a right-formatted number string with
> 10 characters.
> 
> The output of the test script is fine for e.g. LC_NUMERIC=de_DE.UTF-8
> and LC_NUMERIC=en_US.UTF-8:
> 
> LC_NUMERIC=de_DE.UTF-8
> <         1>
> <       100>
> <     1.000>
> <    10.000>
> <   100.000>
> < 1.000.000>
> <10.000.000>

> However, for LC_NUMERIC=nb_NO.UTF-8 and LC_NUMERIC=nn_NO.UTF-8, the
> formatting is wrong:
> 
> LC_NUMERIC=nb_NO.UTF-8
> <         1>
> <       100>
> <   1 000>
> <  10 000>
> < 100 000>
> <1 000 000>
> <10 000 000>

> I reproduced the issue with coreutils-8.32-4.1ubuntu1.1 (Ubuntu 22.04)
> as well as coreutils-9.3-5.fc39.x86_64 (Fedora 39).
> 
> Under FreeBSD 14.0-RELEASE (coreutils-9.4_1), the output looks slightly
> better but is still wrong:
> 
> LC_NUMERIC=nb_NO.UTF-8
> <         1>
> <       100>
> <    1 000>
> <   10 000>
> <  100 000>
> <1 000 000>
> <10 000 000>
> LC_NUMERIC=nn_NO.UTF-8
> <         1>
> <       100>
> <    1 000>
> <   10 000>
> <  100 000>
> <1 000 000>
> <10 000 000>
> 
> May be the issue is that the thousands separator for the Norwegian
> locales is a space " ", while it is "."/"," for German/US English locales.

The issue looks to be that the thousands separator for Norwegian locales
is “NARROW NO-BREAK SPACE", or more problematically the _three_ byte
UTF8 sequence E2 80 AF. So it looks like an issue with libc routines
counting bytes rather than characters in this case.

One suggestion is to do the alignment after. For example:

$ export LC_NUMERIC=nb_NO.UTF-8
$ printf "%'.f\n" $(seq -f '1E%.f' 7) | column --table-right=1 -t
         10
        100
      1 000
     10 000
    100 000
  1 000 000
10 000 000

Actually I've just noticed that specifying the %'10.f format
does count characters and not bytes! So another solution is:

$ export LC_NUMERIC=nb_NO.UTF-8
$ printf "%'10.f\n" $(seq -f '1E%.f' 7)
         10
        100
      1 000
     10 000
    100 000
  1 000 000
10 000 000

The issue if there is one is in libc at least.
It would be worth checking existing glibc reports about this
and reporting if not mentioned.

cheers,
Pádraig.




  parent reply	other threads:[~2024-03-23 14:40 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-22 20:22 bug#69951: coreutils: printf formatting bug for nb_NO and nn_NO locales Thomas Dreibholz
2024-03-23 11:39 ` Thomas Dreibholz
2024-03-23 14:39 ` Pádraig Brady [this message]
2024-03-23 18:17 ` Thomas Dreibholz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.gnu.org/mailman/listinfo/bug-coreutils

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6033d290-a8c0-9ba5-fc85-e550b326339c@draigBrady.com \
    --to=p@draigbrady.com \
    --cc=69951@debbugs.gnu.org \
    --cc=dreibh@simula.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).