git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: 孟子易 <mengziyi540841@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: bug report: symbolic-ref --short command echos the wrong text while use Chinese language
Date: Mon, 13 Feb 2023 15:18:28 -0500	[thread overview]
Message-ID: <Y+qbFN+PhHVuWT2T@coredump.intra.peff.net> (raw)
In-Reply-To: <CAGF3oAcCi+fG12j-1U0hcrWwkF5K_9WhOi6ZPHBzUUzfkrZDxA@mail.gmail.com>

On Mon, Feb 13, 2023 at 02:38:08PM +0800, 孟子易 wrote:

> System: Mac Os (Ventura 13.2)
> Language: Chinese simplified
> Preconditions:
> # git checkout -b 测试-加-增加-加-增加
> # git symbolic-ref --short HEAD
> Wrong Echo (Current Echo):
> 测试-�
> Correct Echo:
> // I Don't know, may be "测试-加" ?

Hmm, I can't reproduce here on Linux:

  $ git init
  $ git commit --allow-empty -m foo
  $ git checkout -b 测试-加-增加-加-增加
  $ git symbolic-ref --short HEAD
  测试-加-增加-加-增加

I wonder if it is related to using macOS. The refs are stored as
individual files in the filesystem, and HFS+ will do some unicode
normalization. So I get:

  $ ls .git/refs/heads/ | xxd
  00000000: 6d61 696e 0ae6 b58b e8af 952d e58a a02d  main.......-...-
  00000010: e5a2 9ee5 8aa0 2de5 8aa0 2de5 a29e e58a  ......-...-.....
  00000020: a00a          

Are your on-disk bytes different?

My instinct was that this might be related to the shortening code
treating the names as bytes, rather than characters. But looking at
shorten_unambiguous_ref(), it is really operating at the level of path
components, and should never split a partial string.

Another possibility: the shortening is done by applying our usual
ref-resolving rules one by one via scanf(). There's an assumption in the
code that the resulting string can never be longer than the input:

	/* buffer for scanf result, at most refname must fit */
	short_name = xstrdup(refname);

	...
        for (i = nr_rules - 1; i > 0 ; --i) {
		...
                if (1 != sscanf(refname, scanf_fmts[i], short_name))
                        continue;

Is it possible that this assumption is violated based on some particular
combination of unicode normalization and locale? That seems unlikely to
me, but it wouldn't be the first time I've been surprised by subtle
unicode implications.

Is it possible for you to run Git in a debugger and check the
intermediate steps happening in refs_shorten_unambiguous_ref()?

-Peff

  reply	other threads:[~2023-02-13 20:19 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-13  6:38 bug report: symbolic-ref --short command echos the wrong text while use Chinese language 孟子易
2023-02-13 20:18 ` Jeff King [this message]
2023-02-13 22:58   ` Eric Sunshine
2023-02-14  1:39     ` Jeff King
2023-02-14  5:15       ` Eric Sunshine
2023-02-14  5:33         ` Eric Sunshine
2023-02-14  5:40           ` Junio C Hamano
2023-02-14  6:05             ` Eric Sunshine
2023-02-14  6:45               ` Junio C Hamano
2023-02-14  6:55                 ` Eric Sunshine
2023-02-14 16:01                   ` Jeff King
2023-02-14 16:29                     ` Eric Sunshine
2023-02-14 17:07                       ` Jeff King
2023-02-14 18:38                         ` [PATCH 0/3] get rid of sscanf() when shortening refs Jeff King
2023-02-14 18:39                           ` [PATCH 1/3] shorten_unambiguous_ref(): avoid integer truncation Jeff King
2023-02-14 18:40                           ` [PATCH 2/3] shorten_unambiguous_ref(): use NUM_REV_PARSE_RULES constant Jeff King
2023-02-14 21:34                             ` Junio C Hamano
2023-02-14 22:23                               ` Jeff King
2023-02-14 18:41                           ` [PATCH 3/3] shorten_unambiguous_ref(): avoid sscanf() Jeff King
2023-02-14 21:48                             ` Junio C Hamano
2023-02-14 22:25                               ` Junio C Hamano
2023-02-14 22:30                               ` Jeff King
2023-02-14 22:34                                 ` Junio C Hamano
2023-02-14 22:40                                   ` Jeff King
2023-02-15  5:10                                     ` Junio C Hamano
2023-02-15 14:30                                       ` Jeff King
2023-02-15 16:41                                         ` Junio C Hamano
2023-02-14 23:20                               ` Eric Sunshine
2023-02-15 15:16                           ` [PATCH v2 0/3] get rid of sscanf() when shortening refs Jeff King
2023-02-15 15:16                             ` [PATCH v2 1/3] shorten_unambiguous_ref(): avoid integer truncation Jeff King
2023-02-15 15:16                             ` [PATCH v2 2/3] shorten_unambiguous_ref(): use NUM_REV_PARSE_RULES constant Jeff King
2023-02-15 15:16                             ` [PATCH v2 3/3] shorten_unambiguous_ref(): avoid sscanf() Jeff King
2023-02-16  5:56                               ` Torsten Bögershausen
2023-02-16  6:16                                 ` Eric Sunshine
2023-02-16 17:21                                   ` Junio C Hamano
2023-02-16 17:28                                     ` Jeff King
2023-02-16 23:36                                       ` Junio C Hamano
2023-02-16 17:31                                 ` Jeff King
2023-02-17  6:46                                   ` Torsten Bögershausen
2023-02-15 18:00                             ` [PATCH v2 0/3] get rid of sscanf() when shortening refs Junio C Hamano
2023-02-14 16:40                     ` bug report: symbolic-ref --short command echos the wrong text while use Chinese language Junio C Hamano
2023-02-14 17:40                       ` Jeff King
2023-02-15 16:26   ` Torsten Bögershausen
2023-02-15 16:37     ` Eric Sunshine
2023-02-15 17:19       ` Torsten Bögershausen
2023-02-16  6:08         ` Eric Sunshine

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y+qbFN+PhHVuWT2T@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=mengziyi540841@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).