git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* git-bugreport-2021-01-06-1209.txt (git can't deal with special characters)
@ 2021-01-06 11:35 Daniel Troger
  2021-01-06 14:21 ` Torsten Bögershausen
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Daniel Troger @ 2021-01-06 11:35 UTC (permalink / raw)
  To: git

Thank you for filling out a Git bug report!
Please answer the following questions to help us understand your issue.

What did you do before the bug happened? (Steps to reproduce your issue)
I ran `git restore -p .`

What did you expect to happen? (Expected behavior)
git restore to open in interactive mode and letting me select data to discard

What happened instead? (Actual behavior)
I got an error message:
```
me@iMac:[redacted]/paulbrunngård-springyard/src$ git restore -p .
BUG: pathspec.c:495: error initializing pathspec_item
Cannot close git diff-index --cached --numstat --summary HEAD -- :(,prefix:44)[redacted]/paulbrunngård-springyard/src/ () at /usr/local/Cellar/git/2.29.2/libexec/git-core/git-add--interactive line 183.
```

What's different between what you expected and what actually happened?
The main problem is that the command didn't do anything. It basically printed an error message instead of changing a file in the way I wanted.

Anything else you want to add:
Yeah. So pretty obviously the problem is the "å" in the filename. The interesting thing is that the folder with the "å" in the filename seems to exist twice, with differently encoded "å"s. But their content IS AUTOMATICALLY PERFECTLY SYNCED. And the only tool which recognizes them as two different folders is git. ls in the terminal shows them as one, finder shows them as one, even "glob" in php shows them as one.
This is what git seems to see them at (taken from git status):

`modified:   "paulbrunnga\314\212rd-springyard/`
And further down:
```
Untracked files:
  (use "git add <file>..." to include in what will be committed)
	"paulbrunng\303\245rd-springyard/"
```
Here is a zip containing the folder (emptied): https://arxius.io/f/7ec7e1c7
I tried creating a new git repo, extracting the zip in it, going into the folder and running `git add empty` and could reproduce that git sees it as two different folders.


Please review the rest of the bug report below.
You can delete any lines you don't wish to share.


[System Info]
git version:
git version 2.29.2
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
uname: Darwin 17.7.0 Darwin Kernel Version 17.7.0: Sun Jun  2 20:31:42 PDT 2019; root:xnu-4570.71.46~1/RELEASE_X86_64 x86_64
compiler info: clang: 10.0.0 (clang-1000.11.45.5)
libc info: no libc information available
$SHELL (typically, interactive shell): /bin/bash


[Enabled Hooks]


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: git-bugreport-2021-01-06-1209.txt (git can't deal with special characters)
  2021-01-06 11:35 git-bugreport-2021-01-06-1209.txt (git can't deal with special characters) Daniel Troger
@ 2021-01-06 14:21 ` Torsten Bögershausen
  2021-01-06 16:49   ` Daniel Troger
  2021-01-24 15:13 ` [PATCH/RFC v1 1/1] git restore -p . and precomposed unicode tboegi
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Torsten Bögershausen @ 2021-01-06 14:21 UTC (permalink / raw)
  To: Daniel Troger; +Cc: git

On Wed, Jan 06, 2021 at 12:35:26PM +0100, Daniel Troger wrote:
> Thank you for filling out a Git bug report!
> Please answer the following questions to help us understand your issue.
>
> What did you do before the bug happened? (Steps to reproduce your issue)
> I ran `git restore -p .`
>
> What did you expect to happen? (Expected behavior)
> git restore to open in interactive mode and letting me select data to discard
>
> What happened instead? (Actual behavior)
> I got an error message:
> ```
> me@iMac:[redacted]/paulbrunngård-springyard/src$ git restore -p .
> BUG: pathspec.c:495: error initializing pathspec_item
> Cannot close git diff-index --cached --numstat --summary HEAD -- :(,prefix:44)[redacted]/paulbrunngård-springyard/src/ () at /usr/local/Cellar/git/2.29.2/libexec/git-core/git-add--interactive line 183.
> ```
>
> What's different between what you expected and what actually happened?
> The main problem is that the command didn't do anything. It basically printed an error message instead of changing a file in the way I wanted.
>
> Anything else you want to add:
> Yeah. So pretty obviously the problem is the "å" in the filename. The interesting thing is that the folder with the "å" in the filename seems to exist twice, with differently encoded "å"s. But their content IS AUTOMATICALLY PERFECTLY SYNCED. And the only tool which recognizes them as two different folders is git. ls in the terminal shows them as one, finder shows them as one, even "glob" in php shows them as one.
> This is what git seems to see them at (taken from git status):
>
> `modified:   "paulbrunnga\314\212rd-springyard/`
> And further down:
> ```
> Untracked files:
>   (use "git add <file>..." to include in what will be committed)
> 	"paulbrunng\303\245rd-springyard/"
> ```
> Here is a zip containing the folder (emptied): https://arxius.io/f/7ec7e1c7
> I tried creating a new git repo, extracting the zip in it, going into the folder and running `git add empty` and could reproduce that git sees it as two different folders.
>
>
> Please review the rest of the bug report below.
> You can delete any lines you don't wish to share.
>
>
> [System Info]
> git version:
> git version 2.29.2
> cpu: x86_64
> no commit associated with this build
> sizeof-long: 8
> sizeof-size_t: 8
> shell-path: /bin/sh
> uname: Darwin 17.7.0 Darwin Kernel Version 17.7.0: Sun Jun  2 20:31:42 PDT 2019; root:xnu-4570.71.46~1/RELEASE_X86_64 x86_64
> compiler info: clang: 10.0.0 (clang-1000.11.45.5)
> libc info: no libc information available
> $SHELL (typically, interactive shell): /bin/bash
>
>
> [Enabled Hooks]
>

This uses the decomposed form of "å":
> `modified:   "paulbrunnga\314\212rd-springyard/`
...
And here is the precomposed form og the "å"
> Untracked files:
>   (use "git add <file>..." to include in what will be committed)
> 	"paulbrunng\303\245rd-springyard/"
> ```

Both are lowercase, right ?

What does
git config -l | grep unicode
tell you ?

I have
core.precomposeunicode=true

and that what we need on a Mac (to handle precomposed and decomposed better)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: git-bugreport-2021-01-06-1209.txt (git can't deal with special characters)
  2021-01-06 14:21 ` Torsten Bögershausen
@ 2021-01-06 16:49   ` Daniel Troger
  2021-01-06 21:47     ` Torsten Bögershausen
  0 siblings, 1 reply; 27+ messages in thread
From: Daniel Troger @ 2021-01-06 16:49 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git

Hi, thank you for investigating. Here's the output you asked for + proof the issue still persists on the latest version of git:

```

me@iMac:[redacted]/paulbrunngård-springyard/src$ git restore -p .
BUG: pathspec.c:495: error initializing pathspec_item
Cannot close git diff-index --cached --numstat --summary HEAD -- :(,prefix:44)[redacted]/paulbrunngård-springyard/src/ () at /usr/local/Cellar/git/2.30.0/libexec/git-core/git-add--interactive line 183.
me@iMac:[redacted]/paulbrunngård-springyard/src$ git --version
git version 2.30.0
me@iMac:[redacted]/paulbrunngård-springyard/src$ git config -l | grep unicode
core.precomposeunicode=true
```

> 6 jan. 2021 kl. 15:21 skrev Torsten Bögershausen <tboegi@web.de>:
> 
> On Wed, Jan 06, 2021 at 12:35:26PM +0100, Daniel Troger wrote:
>> Thank you for filling out a Git bug report!
>> Please answer the following questions to help us understand your issue.
>> 
>> What did you do before the bug happened? (Steps to reproduce your issue)
>> I ran `git restore -p .`
>> 
>> What did you expect to happen? (Expected behavior)
>> git restore to open in interactive mode and letting me select data to discard
>> 
>> What happened instead? (Actual behavior)
>> I got an error message:
>> ```
>> me@iMac:[redacted]/paulbrunngård-springyard/src$ git restore -p .
>> BUG: pathspec.c:495: error initializing pathspec_item
>> Cannot close git diff-index --cached --numstat --summary HEAD -- :(,prefix:44)[redacted]/paulbrunngård-springyard/src/ () at /usr/local/Cellar/git/2.29.2/libexec/git-core/git-add--interactive line 183.
>> ```
>> 
>> What's different between what you expected and what actually happened?
>> The main problem is that the command didn't do anything. It basically printed an error message instead of changing a file in the way I wanted.
>> 
>> Anything else you want to add:
>> Yeah. So pretty obviously the problem is the "å" in the filename. The interesting thing is that the folder with the "å" in the filename seems to exist twice, with differently encoded "å"s. But their content IS AUTOMATICALLY PERFECTLY SYNCED. And the only tool which recognizes them as two different folders is git. ls in the terminal shows them as one, finder shows them as one, even "glob" in php shows them as one.
>> This is what git seems to see them at (taken from git status):
>> 
>> `modified:   "paulbrunnga\314\212rd-springyard/`
>> And further down:
>> ```
>> Untracked files:
>>  (use "git add <file>..." to include in what will be committed)
>>    "paulbrunng\303\245rd-springyard/"
>> ```
>> Here is a zip containing the folder (emptied): https://arxius.io/f/7ec7e1c7
>> I tried creating a new git repo, extracting the zip in it, going into the folder and running `git add empty` and could reproduce that git sees it as two different folders.
>> 
>> 
>> Please review the rest of the bug report below.
>> You can delete any lines you don't wish to share.
>> 
>> 
>> [System Info]
>> git version:
>> git version 2.29.2
>> cpu: x86_64
>> no commit associated with this build
>> sizeof-long: 8
>> sizeof-size_t: 8
>> shell-path: /bin/sh
>> uname: Darwin 17.7.0 Darwin Kernel Version 17.7.0: Sun Jun  2 20:31:42 PDT 2019; root:xnu-4570.71.46~1/RELEASE_X86_64 x86_64
>> compiler info: clang: 10.0.0 (clang-1000.11.45.5)
>> libc info: no libc information available
>> $SHELL (typically, interactive shell): /bin/bash
>> 
>> 
>> [Enabled Hooks]
>> 
> 
> This uses the decomposed form of "å":
>> `modified:   "paulbrunnga\314\212rd-springyard/`
> ...
> And here is the precomposed form og the "å"
>> Untracked files:
>>  (use "git add <file>..." to include in what will be committed)
>>    "paulbrunng\303\245rd-springyard/"
>> ```
> 
> Both are lowercase, right ?
> 
> What does
> git config -l | grep unicode
> tell you ?
> 
> I have
> core.precomposeunicode=true
> 
> and that what we need on a Mac (to handle precomposed and decomposed better)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: git-bugreport-2021-01-06-1209.txt (git can't deal with special characters)
  2021-01-06 16:49   ` Daniel Troger
@ 2021-01-06 21:47     ` Torsten Bögershausen
  2021-01-06 22:21       ` Daniel Troger
  0 siblings, 1 reply; 27+ messages in thread
From: Torsten Bögershausen @ 2021-01-06 21:47 UTC (permalink / raw)
  To: Daniel Troger; +Cc: git

On Wed, Jan 06, 2021 at 05:49:17PM +0100, Daniel Troger wrote:

(Please avoid top-posting in this list)
> Hi, thank you for investigating. Here's the output you asked for + proof the issue still persists on the latest version of git:
>

Thank's for testing the latest version.

Is there any way, that we can reproduce your problem ?

We don't have access to you repo (or do we ?)

Unzipping your zip on a Linux box shows these file:
./__MACOSX
./__MACOSX/paulbrunngård-springyard
./__MACOSX/paulbrunngård-springyard/._.DS_Store
./paulbrunngård-springyard
./paulbrunngård-springyard/.DS_Store
./paulbrunngård-springyard/empty

Is it possible to make a step-by-step receipe to get a dummy repo into
this state ?

> ```
>
> me@iMac:[redacted]/paulbrunngård-springyard/src$ git restore -p .
> BUG: pathspec.c:495: error initializing pathspec_item
> Cannot close git diff-index --cached --numstat --summary HEAD -- :(,prefix:44)[redacted]/paulbrunngård-springyard/src/ () at /usr/local/Cellar/git/2.30.0/libexec/git-core/git-add--interactive line 183.
> me@iMac:[redacted]/paulbrunngård-springyard/src$ git --version
> git version 2.30.0
> me@iMac:[redacted]/paulbrunngård-springyard/src$ git config -l | grep unicode
> core.precomposeunicode=true
> ```
>
> > 6 jan. 2021 kl. 15:21 skrev Torsten Bögershausen <tboegi@web.de>:
> >
> > On Wed, Jan 06, 2021 at 12:35:26PM +0100, Daniel Troger wrote:
> >> Thank you for filling out a Git bug report!
> >> Please answer the following questions to help us understand your issue.
> >>
> >> What did you do before the bug happened? (Steps to reproduce your issue)
> >> I ran `git restore -p .`
> >>
> >> What did you expect to happen? (Expected behavior)
> >> git restore to open in interactive mode and letting me select data to discard
> >>
> >> What happened instead? (Actual behavior)
> >> I got an error message:
> >> ```
> >> me@iMac:[redacted]/paulbrunngård-springyard/src$ git restore -p .
> >> BUG: pathspec.c:495: error initializing pathspec_item
> >> Cannot close git diff-index --cached --numstat --summary HEAD -- :(,prefix:44)[redacted]/paulbrunngård-springyard/src/ () at /usr/local/Cellar/git/2.29.2/libexec/git-core/git-add--interactive line 183.
> >> ```
> >>
> >> What's different between what you expected and what actually happened?
> >> The main problem is that the command didn't do anything. It basically printed an error message instead of changing a file in the way I wanted.
> >>
> >> Anything else you want to add:
> >> Yeah. So pretty obviously the problem is the "å" in the filename. The interesting thing is that the folder with the "å" in the filename seems to exist twice, with differently encoded "å"s. But their content IS AUTOMATICALLY PERFECTLY SYNCED. And the only tool which recognizes them as two different folders is git. ls in the terminal shows them as one, finder shows them as one, even "glob" in php shows them as one.
> >> This is what git seems to see them at (taken from git status):
> >>
> >> `modified:   "paulbrunnga\314\212rd-springyard/`
> >> And further down:
> >> ```
> >> Untracked files:
> >>  (use "git add <file>..." to include in what will be committed)
> >>    "paulbrunng\303\245rd-springyard/"
> >> ```
> >> Here is a zip containing the folder (emptied): https://arxius.io/f/7ec7e1c7
> >> I tried creating a new git repo, extracting the zip in it, going into the folder and running `git add empty` and could reproduce that git sees it as two different folders.
> >>
> >>
> >> Please review the rest of the bug report below.
> >> You can delete any lines you don't wish to share.
> >>
> >>
> >> [System Info]
> >> git version:
> >> git version 2.29.2
> >> cpu: x86_64
> >> no commit associated with this build
> >> sizeof-long: 8
> >> sizeof-size_t: 8
> >> shell-path: /bin/sh
> >> uname: Darwin 17.7.0 Darwin Kernel Version 17.7.0: Sun Jun  2 20:31:42 PDT 2019; root:xnu-4570.71.46~1/RELEASE_X86_64 x86_64
> >> compiler info: clang: 10.0.0 (clang-1000.11.45.5)
> >> libc info: no libc information available
> >> $SHELL (typically, interactive shell): /bin/bash
> >>
> >>
> >> [Enabled Hooks]
> >>
> >
> > This uses the decomposed form of "å":
> >> `modified:   "paulbrunnga\314\212rd-springyard/`
> > ...
> > And here is the precomposed form og the "å"
> >> Untracked files:
> >>  (use "git add <file>..." to include in what will be committed)
> >>    "paulbrunng\303\245rd-springyard/"
> >> ```
> >
> > Both are lowercase, right ?
> >
> > What does
> > git config -l | grep unicode
> > tell you ?
> >
> > I have
> > core.precomposeunicode=true
> >
> > and that what we need on a Mac (to handle precomposed and decomposed better)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: git-bugreport-2021-01-06-1209.txt (git can't deal with special characters)
  2021-01-06 21:47     ` Torsten Bögershausen
@ 2021-01-06 22:21       ` Daniel Troger
  2021-01-06 23:07         ` Randall S. Becker
  0 siblings, 1 reply; 27+ messages in thread
From: Daniel Troger @ 2021-01-06 22:21 UTC (permalink / raw)
  To: Torsten Bögershausen; +Cc: git

Hi, maybe this helps you reproduce. I think I should have committed before doing the second changes but I still got the error message and the two names for one folder:

me@iMac:/tmp$ mkdir git_bug
me@iMac:/tmp$ cd git_bug
me@iMac:/tmp/git_bug$ git init
hint: Using 'master' as the name for the initial branch. This default branch name
hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint: 
hint: 	git config --global init.defaultBranch <name>
hint: 
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint: 
hint: 	git branch -m <name>
Initialized empty Git repository in /private/tmp/git_bug/.git/
me@iMac:/tmp/git_bug$ ls -la
total 8
drwxr-xr-x   4 daniel  wheel   128 Jan  6 23:13 .
drwxrwxrwt  27 root    wheel   864 Jan  6 23:13 ..
drwxr-xr-x   9 daniel  wheel   288 Jan  6 23:12 .git
-rw-r--r--@  1 daniel  staff  1283 Jan  6 23:13 paulbrunngård-springyard.zip
me@iMac:/tmp/git_bug$ unzip paulbrunngård-springyard.zip 
Archive:  paulbrunngård-springyard.zip
   creating: paulbrunnga??rd-springyard/
  inflating: paulbrunnga??rd-springyard/.DS_Store  
   creating: __MACOSX/
   creating: __MACOSX/paulbrunnga??rd-springyard/
  inflating: __MACOSX/paulbrunnga??rd-springyard/._.DS_Store  
 extracting: paulbrunnga??rd-springyard/empty  
me@iMac:/tmp/git_bug$ rm -rf __MACOSX/ *.zip
me@iMac:/tmp/git_bug$ ls -la
total 0
drwxr-xr-x   4 daniel  wheel  128 Jan  6 23:15 .
drwxrwxrwt  27 root    wheel  864 Jan  6 23:13 ..
drwxr-xr-x   9 daniel  wheel  288 Jan  6 23:15 .git
drwxr-xr-x@  4 daniel  wheel  128 Jan  6 12:20 paulbrunngård-springyard
me@iMac:/tmp/git_bug$ cd paulbrunngård-springyard/
me@iMac:/tmp/git_bug/paulbrunngård-springyard$ nano empty 
me@iMac:/tmp/git_bug/paulbrunngård-springyard$ cat empty 
Initial content
me@iMac:/tmp/git_bug/paulbrunngård-springyard$ git add empty 
me@iMac:/tmp/git_bug/paulbrunngård-springyard$ nano empty 
me@iMac:/tmp/git_bug/paulbrunngård-springyard$ cat empty 
Initial content


Line I want to keep

Line I want gone
me@iMac:/tmp/git_bug/paulbrunngård-springyard$ git restore -p .
BUG: pathspec.c:495: error initializing pathspec_item
Cannot close git diff-index --cached --numstat --summary 4b825dc642cb6eb9a060e54bf8d69288fbee4904 -- :(,prefix:27)paulbrunngård-springyard/ () at /usr/local/Cellar/git/2.30.0/libexec/git-core/git-add--interactive line 183.
me@iMac:/tmp/git_bug/paulbrunngård-springyard$ cd ..
me@iMac:/tmp/git_bug$ git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
	new file:   "paulbrunnga\314\212rd-springyard/empty"

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   "paulbrunnga\314\212rd-springyard/empty"

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	.DS_Store
	"paulbrunng\303\245rd-springyard/"

me@iMac:/tmp/git_bug$ 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: git-bugreport-2021-01-06-1209.txt (git can't deal with special characters)
  2021-01-06 22:21       ` Daniel Troger
@ 2021-01-06 23:07         ` Randall S. Becker
  2021-01-07 14:34           ` Philippe Blain
  0 siblings, 1 reply; 27+ messages in thread
From: Randall S. Becker @ 2021-01-06 23:07 UTC (permalink / raw)
  To: 'Daniel Troger', 'Torsten Bögershausen'; +Cc: git

On January 6, 2021 5:22 PM, Daniel Troger wrote:
> Hi, maybe this helps you reproduce. I think I should have committed before
> doing the second changes but I still got the error message and the two
> names for one folder:
> 
> me@iMac:/tmp$ mkdir git_bug
> me@iMac:/tmp$ cd git_bug
> me@iMac:/tmp/git_bug$ git init
> hint: Using 'master' as the name for the initial branch. This default branch
> name
> hint: is subject to change. To configure the initial branch name to use in all
> hint: of your new repositories, which will suppress this warning, call:
> hint:
> hint: 	git config --global init.defaultBranch <name>
> hint:
> hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
> hint: 'development'. The just-created branch can be renamed via this
> command:
> hint:
> hint: 	git branch -m <name>
> Initialized empty Git repository in /private/tmp/git_bug/.git/
> me@iMac:/tmp/git_bug$ ls -la total 8
> drwxr-xr-x   4 daniel  wheel   128 Jan  6 23:13 .
> drwxrwxrwt  27 root    wheel   864 Jan  6 23:13 ..
> drwxr-xr-x   9 daniel  wheel   288 Jan  6 23:12 .git
> -rw-r--r--@  1 daniel  staff  1283 Jan  6 23:13 paulbrunngård-springyard.zip
> me@iMac:/tmp/git_bug$ unzip paulbrunngård-springyard.zip
> Archive:  paulbrunngård-springyard.zip
>    creating: paulbrunnga??rd-springyard/
>   inflating: paulbrunnga??rd-springyard/.DS_Store
>    creating: __MACOSX/
>    creating: __MACOSX/paulbrunnga??rd-springyard/
>   inflating: __MACOSX/paulbrunnga??rd-springyard/._.DS_Store
>  extracting: paulbrunnga??rd-springyard/empty me@iMac:/tmp/git_bug$ rm
> -rf __MACOSX/ *.zip me@iMac:/tmp/git_bug$ ls -la total 0
> drwxr-xr-x   4 daniel  wheel  128 Jan  6 23:15 .
> drwxrwxrwt  27 root    wheel  864 Jan  6 23:13 ..
> drwxr-xr-x   9 daniel  wheel  288 Jan  6 23:15 .git
> drwxr-xr-x@  4 daniel  wheel  128 Jan  6 12:20 paulbrunngård-springyard
> me@iMac:/tmp/git_bug$ cd paulbrunngård-springyard/
> me@iMac:/tmp/git_bug/paulbrunngård-springyard$ nano empty
> me@iMac:/tmp/git_bug/paulbrunngård-springyard$ cat empty Initial
> content me@iMac:/tmp/git_bug/paulbrunngård-springyard$ git add empty
> me@iMac:/tmp/git_bug/paulbrunngård-springyard$ nano empty
> me@iMac:/tmp/git_bug/paulbrunngård-springyard$ cat empty Initial
> content
> 
> 
> Line I want to keep
> 
> Line I want gone
> me@iMac:/tmp/git_bug/paulbrunngård-springyard$ git restore -p .
> BUG: pathspec.c:495: error initializing pathspec_item Cannot close git diff-
> index --cached --numstat --summary
> 4b825dc642cb6eb9a060e54bf8d69288fbee4904 --
> :(,prefix:27)paulbrunngård-springyard/ () at
> /usr/local/Cellar/git/2.30.0/libexec/git-core/git-add--interactive line 183.
> me@iMac:/tmp/git_bug/paulbrunngård-springyard$ cd ..
> me@iMac:/tmp/git_bug$ git status
> On branch master
> 
> No commits yet
> 
> Changes to be committed:
>   (use "git rm --cached <file>..." to unstage)
> 	new file:   "paulbrunnga\314\212rd-springyard/empty"
> 
> Changes not staged for commit:
>   (use "git add <file>..." to update what will be committed)
>   (use "git restore <file>..." to discard changes in working directory)
> 	modified:   "paulbrunnga\314\212rd-springyard/empty"
> 
> Untracked files:
>   (use "git add <file>..." to include in what will be committed)
> 	.DS_Store
> 	"paulbrunng\303\245rd-springyard/"
> 
> me@iMac:/tmp/git_bug$

Is it possible that the å character is coming from a UTF-16 encoding and is not representable in UTF-8? I'm wondering whether the name has a double-byte representation where one of the bytes is null, resulting in a truncated file name coming from readdir(). The file name would not be representable on some platforms that do not support UTF-16 path names.

Regards,
Randall

-- Brief whoami:
 NonStop developer since approximately 211288444200000000
 UNIX developer since approximately 421664400
-- In my real life, I talk too much.




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: git-bugreport-2021-01-06-1209.txt (git can't deal with special characters)
  2021-01-06 23:07         ` Randall S. Becker
@ 2021-01-07 14:34           ` Philippe Blain
  2021-01-07 15:49             ` Torsten Bögershausen
  0 siblings, 1 reply; 27+ messages in thread
From: Philippe Blain @ 2021-01-07 14:34 UTC (permalink / raw)
  To: Randall S. Becker, 'Daniel Troger',
	'Torsten Bögershausen'
  Cc: git, Johannes Schindelin

Hi everyone,

Le 2021-01-06 à 18:07, Randall S. Becker a écrit :
> On January 6, 2021 5:22 PM, Daniel Troger wrote:
>> Hi, maybe this helps you reproduce. I think I should have committed before
>> doing the second changes but I still got the error message and the two
>> names for one folder:
>>
>> me@iMac:/tmp$ mkdir git_bug
>> me@iMac:/tmp$ cd git_bug
>> me@iMac:/tmp/git_bug$ git init
>> hint: Using 'master' as the name for the initial branch. This default branch
>> name
>> hint: is subject to change. To configure the initial branch name to use in all
>> hint: of your new repositories, which will suppress this warning, call:
>> hint:
>> hint: 	git config --global init.defaultBranch <name>
>> hint:
>> hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
>> hint: 'development'. The just-created branch can be renamed via this
>> command:
>> hint:
>> hint: 	git branch -m <name>
>> Initialized empty Git repository in /private/tmp/git_bug/.git/
>> me@iMac:/tmp/git_bug$ ls -la total 8
>> drwxr-xr-x   4 daniel  wheel   128 Jan  6 23:13 .
>> drwxrwxrwt  27 root    wheel   864 Jan  6 23:13 ..
>> drwxr-xr-x   9 daniel  wheel   288 Jan  6 23:12 .git
>> -rw-r--r--@  1 daniel  staff  1283 Jan  6 23:13 paulbrunngård-springyard.zip
>> me@iMac:/tmp/git_bug$ unzip paulbrunngård-springyard.zip
>> Archive:  paulbrunngård-springyard.zip
>>     creating: paulbrunnga??rd-springyard/
>>    inflating: paulbrunnga??rd-springyard/.DS_Store
>>     creating: __MACOSX/
>>     creating: __MACOSX/paulbrunnga??rd-springyard/
>>    inflating: __MACOSX/paulbrunnga??rd-springyard/._.DS_Store
>>   extracting: paulbrunnga??rd-springyard/empty me@iMac:/tmp/git_bug$ rm
>> -rf __MACOSX/ *.zip me@iMac:/tmp/git_bug$ ls -la total 0
>> drwxr-xr-x   4 daniel  wheel  128 Jan  6 23:15 .
>> drwxrwxrwt  27 root    wheel  864 Jan  6 23:13 ..
>> drwxr-xr-x   9 daniel  wheel  288 Jan  6 23:15 .git
>> drwxr-xr-x@  4 daniel  wheel  128 Jan  6 12:20 paulbrunngård-springyard
>> me@iMac:/tmp/git_bug$ cd paulbrunngård-springyard/
>> me@iMac:/tmp/git_bug/paulbrunngård-springyard$ nano empty
>> me@iMac:/tmp/git_bug/paulbrunngård-springyard$ cat empty Initial
>> content me@iMac:/tmp/git_bug/paulbrunngård-springyard$ git add empty
>> me@iMac:/tmp/git_bug/paulbrunngård-springyard$ nano empty
>> me@iMac:/tmp/git_bug/paulbrunngård-springyard$ cat empty Initial
>> content
>>
>>
>> Line I want to keep
>>
>> Line I want gone
>> me@iMac:/tmp/git_bug/paulbrunngård-springyard$ git restore -p .
>> BUG: pathspec.c:495: error initializing pathspec_item Cannot close git diff-
>> index --cached --numstat --summary
>> 4b825dc642cb6eb9a060e54bf8d69288fbee4904 --
>> :(,prefix:27)paulbrunngård-springyard/ () at
>> /usr/local/Cellar/git/2.30.0/libexec/git-core/git-add--interactive line 183.
>> me@iMac:/tmp/git_bug/paulbrunngård-springyard$ cd ..
>> me@iMac:/tmp/git_bug$ git status
>> On branch master
>>
>> No commits yet
>>
>> Changes to be committed:
>>    (use "git rm --cached <file>..." to unstage)
>> 	new file:   "paulbrunnga\314\212rd-springyard/empty"
>>
>> Changes not staged for commit:
>>    (use "git add <file>..." to update what will be committed)
>>    (use "git restore <file>..." to discard changes in working directory)
>> 	modified:   "paulbrunnga\314\212rd-springyard/empty"
>>
>> Untracked files:
>>    (use "git add <file>..." to include in what will be committed)
>> 	.DS_Store
>> 	"paulbrunng\303\245rd-springyard/"
>>
>> me@iMac:/tmp/git_bug$
> 
> Is it possible that the å character is coming from a UTF-16 encoding and is not representable in UTF-8? I'm wondering whether the name has a double-byte representation where one of the bytes is null, resulting in a truncated file name coming from readdir(). The file name would not be representable on some platforms that do not support UTF-16 path names.

I don't think that's the case (the angstrom is present in UTF-8 [1]).
I think it's another UTF-8 precomposed/decomposed bug. As far as I
was able to test it happens as soon as you have a precomposed character
in the folder name. I observed the same behaviour with a folder named
"folderü", for example. I also tried 'git -c add.interactive.usebuiltin restore -p .'
to see if the new experimental builtin add-interactive has the same problem,
and it does (though the error is less verbose).

Anyway as you show with 'git status', it's not just 'git add -p' that is
faulty, it's deeper than that, I would say.

Cheers,

Philippe.

[1] https://en.wikipedia.org/wiki/%C3%85#On_computers

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: git-bugreport-2021-01-06-1209.txt (git can't deal with special characters)
  2021-01-07 14:34           ` Philippe Blain
@ 2021-01-07 15:49             ` Torsten Bögershausen
  2021-01-07 16:21               ` Philippe Blain
  0 siblings, 1 reply; 27+ messages in thread
From: Torsten Bögershausen @ 2021-01-07 15:49 UTC (permalink / raw)
  To: Philippe Blain
  Cc: Randall S. Becker, 'Daniel Troger', git, Johannes Schindelin

On Thu, Jan 07, 2021 at 09:34:35AM -0500, Philippe Blain wrote:
> Hi everyone,
>
> Le 2021-01-06 à 18:07, Randall S. Becker a écrit :
> > On January 6, 2021 5:22 PM, Daniel Troger wrote:
> > > Hi, maybe this helps you reproduce. I think I should have committed before
> > > doing the second changes but I still got the error message and the two
> > > names for one folder:
> > >
> > > me@iMac:/tmp$ mkdir git_bug
> > > me@iMac:/tmp$ cd git_bug
> > > me@iMac:/tmp/git_bug$ git init
> > > hint: Using 'master' as the name for the initial branch. This default branch
> > > name
> > > hint: is subject to change. To configure the initial branch name to use in all
> > > hint: of your new repositories, which will suppress this warning, call:
> > > hint:
> > > hint: 	git config --global init.defaultBranch <name>
> > > hint:
> > > hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
> > > hint: 'development'. The just-created branch can be renamed via this
> > > command:
> > > hint:
> > > hint: 	git branch -m <name>
> > > Initialized empty Git repository in /private/tmp/git_bug/.git/
> > > me@iMac:/tmp/git_bug$ ls -la total 8
> > > drwxr-xr-x   4 daniel  wheel   128 Jan  6 23:13 .
> > > drwxrwxrwt  27 root    wheel   864 Jan  6 23:13 ..
> > > drwxr-xr-x   9 daniel  wheel   288 Jan  6 23:12 .git
> > > -rw-r--r--@  1 daniel  staff  1283 Jan  6 23:13 paulbrunngård-springyard.zip
> > > me@iMac:/tmp/git_bug$ unzip paulbrunngård-springyard.zip
> > > Archive:  paulbrunngård-springyard.zip
> > >     creating: paulbrunnga??rd-springyard/
> > >    inflating: paulbrunnga??rd-springyard/.DS_Store
> > >     creating: __MACOSX/
> > >     creating: __MACOSX/paulbrunnga??rd-springyard/
> > >    inflating: __MACOSX/paulbrunnga??rd-springyard/._.DS_Store
> > >   extracting: paulbrunnga??rd-springyard/empty me@iMac:/tmp/git_bug$ rm
> > > -rf __MACOSX/ *.zip me@iMac:/tmp/git_bug$ ls -la total 0
> > > drwxr-xr-x   4 daniel  wheel  128 Jan  6 23:15 .
> > > drwxrwxrwt  27 root    wheel  864 Jan  6 23:13 ..
> > > drwxr-xr-x   9 daniel  wheel  288 Jan  6 23:15 .git
> > > drwxr-xr-x@  4 daniel  wheel  128 Jan  6 12:20 paulbrunngård-springyard
> > > me@iMac:/tmp/git_bug$ cd paulbrunngård-springyard/
> > > me@iMac:/tmp/git_bug/paulbrunngård-springyard$ nano empty
> > > me@iMac:/tmp/git_bug/paulbrunngård-springyard$ cat empty Initial
> > > content me@iMac:/tmp/git_bug/paulbrunngård-springyard$ git add empty
> > > me@iMac:/tmp/git_bug/paulbrunngård-springyard$ nano empty
> > > me@iMac:/tmp/git_bug/paulbrunngård-springyard$ cat empty Initial
> > > content
> > >
> > >
> > > Line I want to keep
> > >
> > > Line I want gone
> > > me@iMac:/tmp/git_bug/paulbrunngård-springyard$ git restore -p .
> > > BUG: pathspec.c:495: error initializing pathspec_item Cannot close git diff-
> > > index --cached --numstat --summary
> > > 4b825dc642cb6eb9a060e54bf8d69288fbee4904 --
> > > :(,prefix:27)paulbrunngård-springyard/ () at
> > > /usr/local/Cellar/git/2.30.0/libexec/git-core/git-add--interactive line 183.
> > > me@iMac:/tmp/git_bug/paulbrunngård-springyard$ cd ..
> > > me@iMac:/tmp/git_bug$ git status
> > > On branch master
> > >
> > > No commits yet
> > >
> > > Changes to be committed:
> > >    (use "git rm --cached <file>..." to unstage)
> > > 	new file:   "paulbrunnga\314\212rd-springyard/empty"
> > >
> > > Changes not staged for commit:
> > >    (use "git add <file>..." to update what will be committed)
> > >    (use "git restore <file>..." to discard changes in working directory)
> > > 	modified:   "paulbrunnga\314\212rd-springyard/empty"
> > >
> > > Untracked files:
> > >    (use "git add <file>..." to include in what will be committed)
> > > 	.DS_Store
> > > 	"paulbrunng\303\245rd-springyard/"
> > >
> > > me@iMac:/tmp/git_bug$
> >
> > Is it possible that the å character is coming from a UTF-16 encoding and is not representable in UTF-8? I'm wondering whether the name has a double-byte representation where one of the bytes is null, resulting in a truncated file name coming from readdir(). The file name would not be representable on some platforms that do not support UTF-16 path names.
>
> I don't think that's the case (the angstrom is present in UTF-8 [1]).
> I think it's another UTF-8 precomposed/decomposed bug. As far as I
> was able to test it happens as soon as you have a precomposed character
> in the folder name. I observed the same behaviour with a folder named
> "folderü", for example. I also tried 'git -c add.interactive.usebuiltin restore -p .'
> to see if the new experimental builtin§ add-interactive has the same problem,
> and it does (though the error is less verbose).
>
> Anyway as you show with 'git status', it's not just 'git add -p' that is
> faulty, it's deeper than that, I would say.
>
> Cheers,
>
> Philippe.
>
> [1] https://en.wikipedia.org/wiki/%C3%85#On_computers

Folks, I can not reproduce anything here.
- The zip file mentioned earlier does not include a decomposed "å"
  Neither when running unzip under Linux nor under Mac
- Trying to write a test script does not show anything special

My attempt looks like this:
cat test.sh
#!/bin/sh
DIR=git-test-restore-p
GIT=/usr/local/bin/git

Adiarnfc=$(printf '\303\204')
Adiarnfd=$(printf 'A\314\210')
DIRNAME=xx${Adiarnfd}yy
FILENAME=$DIRNAME/file

rm -rf $DIR &&
    mkdir $DIR &&
    cd $DIR &&
    $GIT init &&
    mkdir $DIRNAME &&
    echo "Initial" >$FILENAME &&
    $GIT add $FILENAME &&
    echo  >>$FILENAME &&
    echo  >>$FILENAME &&
    echo "One more line"  >>$FILENAME &&
    echo  >>$FILENAME &&
    echo  >>$FILENAME &&
    echo "Last line"  >$FILENAME &&
    $GIT restore -p . &&
    $GIT restore -p . &&
    echo git=$GIT &&
    $GIT --version &&
    echo "OK"

==================================
It points out Git from brew, I think.
And running it (using a decomposed Ä instead of å or ü)
shows nothing strange, please see below.

Is anybody able to provide a shell-script, that does reproduce ?
Excuse my bad shell-programming style, this is for demo only.

hint: is subject to change. To configure the initial branch name to use in all
hint: of your new repositories, which will suppress this warning, call:
hint:
hint:   git config --global init.defaultBranch <name>
hint:
hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
hint: 'development'. The just-created branch can be renamed via this command:
hint:
hint:   git branch -m <name>
Initialized empty Git repository in ..../2021-01-07-git_decompose_test/git-test-restore-p/.git/
diff --git a/xxÄyy/file b/xxÄyy/file
index a77fa51..8ea8e21 100644
--- a/xxÄyy/file
+++ b/xxÄyy/file
@@ -1 +1 @@
-Initial
+Last line
(1/1) Discard this hunk from worktree [y,n,q,a,d,e,?]? y

No changes.
git=/usr/local/bin/git
git version 2.30.0
OK


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: git-bugreport-2021-01-06-1209.txt (git can't deal with special characters)
  2021-01-07 15:49             ` Torsten Bögershausen
@ 2021-01-07 16:21               ` Philippe Blain
  2021-01-08 19:07                 ` Torsten Bögershausen
  0 siblings, 1 reply; 27+ messages in thread
From: Philippe Blain @ 2021-01-07 16:21 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Randall S. Becker, 'Daniel Troger', git, Johannes Schindelin

Hi Torsten,

Le 2021-01-07 à 10:49, Torsten Bögershausen a écrit :
> On Thu, Jan 07, 2021 at 09:34:35AM -0500, Philippe Blain wrote:
>> Hi everyone,
>>
>> Le 2021-01-06 à 18:07, Randall S. Becker a écrit :
>>> On January 6, 2021 5:22 PM, Daniel Troger wrote:
>>>> Hi, maybe this helps you reproduce. I think I should have committed before
>>>> doing the second changes but I still got the error message and the two
>>>> names for one folder:
>>>>
>>>> me@iMac:/tmp$ mkdir git_bug
>>>> me@iMac:/tmp$ cd git_bug
>>>> me@iMac:/tmp/git_bug$ git init
>>>> hint: Using 'master' as the name for the initial branch. This default branch
>>>> name
>>>> hint: is subject to change. To configure the initial branch name to use in all
>>>> hint: of your new repositories, which will suppress this warning, call:
>>>> hint:
>>>> hint: 	git config --global init.defaultBranch <name>
>>>> hint:
>>>> hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
>>>> hint: 'development'. The just-created branch can be renamed via this
>>>> command:
>>>> hint:
>>>> hint: 	git branch -m <name>
>>>> Initialized empty Git repository in /private/tmp/git_bug/.git/
>>>> me@iMac:/tmp/git_bug$ ls -la total 8
>>>> drwxr-xr-x   4 daniel  wheel   128 Jan  6 23:13 .
>>>> drwxrwxrwt  27 root    wheel   864 Jan  6 23:13 ..
>>>> drwxr-xr-x   9 daniel  wheel   288 Jan  6 23:12 .git
>>>> -rw-r--r--@  1 daniel  staff  1283 Jan  6 23:13 paulbrunngård-springyard.zip
>>>> me@iMac:/tmp/git_bug$ unzip paulbrunngård-springyard.zip
>>>> Archive:  paulbrunngård-springyard.zip
>>>>      creating: paulbrunnga??rd-springyard/
>>>>     inflating: paulbrunnga??rd-springyard/.DS_Store
>>>>      creating: __MACOSX/
>>>>      creating: __MACOSX/paulbrunnga??rd-springyard/
>>>>     inflating: __MACOSX/paulbrunnga??rd-springyard/._.DS_Store
>>>>    extracting: paulbrunnga??rd-springyard/empty me@iMac:/tmp/git_bug$ rm
>>>> -rf __MACOSX/ *.zip me@iMac:/tmp/git_bug$ ls -la total 0
>>>> drwxr-xr-x   4 daniel  wheel  128 Jan  6 23:15 .
>>>> drwxrwxrwt  27 root    wheel  864 Jan  6 23:13 ..
>>>> drwxr-xr-x   9 daniel  wheel  288 Jan  6 23:15 .git
>>>> drwxr-xr-x@  4 daniel  wheel  128 Jan  6 12:20 paulbrunngård-springyard
>>>> me@iMac:/tmp/git_bug$ cd paulbrunngård-springyard/
>>>> me@iMac:/tmp/git_bug/paulbrunngård-springyard$ nano empty
>>>> me@iMac:/tmp/git_bug/paulbrunngård-springyard$ cat empty Initial
>>>> content me@iMac:/tmp/git_bug/paulbrunngård-springyard$ git add empty
>>>> me@iMac:/tmp/git_bug/paulbrunngård-springyard$ nano empty
>>>> me@iMac:/tmp/git_bug/paulbrunngård-springyard$ cat empty Initial
>>>> content
>>>>
>>>>
>>>> Line I want to keep
>>>>
>>>> Line I want gone
>>>> me@iMac:/tmp/git_bug/paulbrunngård-springyard$ git restore -p .
>>>> BUG: pathspec.c:495: error initializing pathspec_item Cannot close git diff-
>>>> index --cached --numstat --summary
>>>> 4b825dc642cb6eb9a060e54bf8d69288fbee4904 --
>>>> :(,prefix:27)paulbrunngård-springyard/ () at
>>>> /usr/local/Cellar/git/2.30.0/libexec/git-core/git-add--interactive line 183.
>>>> me@iMac:/tmp/git_bug/paulbrunngård-springyard$ cd ..
>>>> me@iMac:/tmp/git_bug$ git status
>>>> On branch master
>>>>
>>>> No commits yet
>>>>
>>>> Changes to be committed:
>>>>     (use "git rm --cached <file>..." to unstage)
>>>> 	new file:   "paulbrunnga\314\212rd-springyard/empty"
>>>>
>>>> Changes not staged for commit:
>>>>     (use "git add <file>..." to update what will be committed)
>>>>     (use "git restore <file>..." to discard changes in working directory)
>>>> 	modified:   "paulbrunnga\314\212rd-springyard/empty"
>>>>
>>>> Untracked files:
>>>>     (use "git add <file>..." to include in what will be committed)
>>>> 	.DS_Store
>>>> 	"paulbrunng\303\245rd-springyard/"
>>>>
>>>> me@iMac:/tmp/git_bug$
>>>
>>> Is it possible that the å character is coming from a UTF-16 encoding and is not representable in UTF-8? I'm wondering whether the name has a double-byte representation where one of the bytes is null, resulting in a truncated file name coming from readdir(). The file name would not be representable on some platforms that do not support UTF-16 path names.
>>
>> I don't think that's the case (the angstrom is present in UTF-8 [1]).
>> I think it's another UTF-8 precomposed/decomposed bug. As far as I
>> was able to test it happens as soon as you have a precomposed character
>> in the folder name. I observed the same behaviour with a folder named
>> "folderü", for example. I also tried 'git -c add.interactive.usebuiltin restore -p .'
>> to see if the new experimental builtin§ add-interactive has the same problem,
>> and it does (though the error is less verbose).
>>
>> Anyway as you show with 'git status', it's not just 'git add -p' that is
>> faulty, it's deeper than that, I would say.
>>
>> Cheers,
>>
>> Philippe.
>>
>> [1] https://en.wikipedia.org/wiki/%C3%85#On_computers
> 
> Folks, I can not reproduce anything here.
> - The zip file mentioned earlier does not include a decomposed "å"
>    Neither when running unzip under Linux nor under Mac
> - Trying to write a test script does not show anything special
> 
> My attempt looks like this:

With a few modifications, you should see things start to get strange:

> cat test.sh
> #!/bin/sh
> DIR=git-test-restore-p
> GIT=/usr/local/bin/git
> 
> Adiarnfc=$(printf '\303\204')
> Adiarnfd=$(printf 'A\314\210')
> DIRNAME=xx${Adiarnfd}yy
> FILENAME=$DIRNAME/file

change to:
FILENAME=file

> 
> rm -rf $DIR &&
>      mkdir $DIR &&
>      cd $DIR &&
>      $GIT init &&
>      mkdir $DIRNAME &&

add:
cd $DIRNAME &&

>      echo "Initial" >$FILENAME &&

add:
$GIT status &&

>      $GIT add $FILENAME &&

add:
$GIT status &&

>      echo  >>$FILENAME &&
>      echo  >>$FILENAME &&
>      echo "One more line"  >>$FILENAME &&
>      echo  >>$FILENAME &&
>      echo  >>$FILENAME &&
>      echo "Last line"  >$FILENAME &&

add:
$GIT status &&

>      $GIT restore -p . &&
>      $GIT restore -p . &&
>      echo git=$GIT &&
>      $GIT --version &&
>      echo "OK"
> 
> ==================================
> It points out Git from brew, I think.

I don't think it's a packaging bug;
I tried with /usr/bin/git on macOS 10.11 (git version 2.10.1 (Apple Git-78))
and I also get a (different) error but the command does not fail
(I needed to change restore to reset):

$ ./test.sh
Initialized empty Git repository in /private/var/folders/lr/r6n2057j0dzd4gdb614fp0740000gp/T/repro/git-test-restore-p/.git/
Assertion failed: (item->nowildcard_len <= item->len && item->prefix <= item->len), function prefix_pathspec, file pathspec.c, line 317.
No changes.
Assertion failed: (item->nowildcard_len <= item->len && item->prefix <= item->len), function prefix_pathspec, file pathspec.c, line 317.
No changes.
git=/usr/bin/git
git version 2.10.1 (Apple Git-78)
OK


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: git-bugreport-2021-01-06-1209.txt (git can't deal with special characters)
  2021-01-07 16:21               ` Philippe Blain
@ 2021-01-08 19:07                 ` Torsten Bögershausen
  0 siblings, 0 replies; 27+ messages in thread
From: Torsten Bögershausen @ 2021-01-08 19:07 UTC (permalink / raw)
  To: Philippe Blain
  Cc: Randall S. Becker, 'Daniel Troger', git, Johannes Schindelin

On Thu, Jan 07, 2021 at 11:21:56AM -0500, Philippe Blain wrote:
> Hi Torsten,

[snip]

>
> change to:
> FILENAME=file
>
> >
> > rm -rf $DIR &&
> >      mkdir $DIR &&
> >      cd $DIR &&
> >      $GIT init &&
> >      mkdir $DIRNAME &&
>
> add:
> cd $DIRNAME &&
>
> >      echo "Initial" >$FILENAME &&
[]

That did the trick, the test case is now reproducible here.

I am looking into the stuff, somewhere inside the pathspec machinery,
determinating a "prefix" and using xgetcwd we need a precompose_string.
But only if core.precomposeunicode is true.

And I am still digging.
It seems as if there could be a patch in a couple of days, but no promises.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH/RFC v1 1/1] git restore -p . and precomposed unicode
  2021-01-06 11:35 git-bugreport-2021-01-06-1209.txt (git can't deal with special characters) Daniel Troger
  2021-01-06 14:21 ` Torsten Bögershausen
@ 2021-01-24 15:13 ` tboegi
  2021-01-24 19:51   ` Junio C Hamano
  2021-01-29 17:15 ` [PATCH v2 1/1] MacOS: precompose_argv_prefix() tboegi
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: tboegi @ 2021-01-24 15:13 UTC (permalink / raw)
  To: git, random_n0body, evraiphilippeblain
  Cc: Torsten Bögershausen, Philippe Blain

From: Torsten Bögershausen <tboegi@web.de>

The following sequence leads to a "BUG" assertion running under MacOS:

  !/bin/sh
  DIR=git-test-restore-p
  Adiarnfd=$(printf 'A\314\210')
  DIRNAME=xx${Adiarnfd}yy
  mkdir $DIR &&
  cd $DIR &&
  git init &&
  mkdir $DIRNAME &&
  cd $DIRNAME &&
  echo "Initial" >file &&
  git add file &&
  echo "One more line"  >>file &&
  echo y | git restore -p . &&
  echo "OK"

 Initialized empty Git repository in /tmp/git-test-restore-p/.git/
 BUG: pathspec.c:495: error initializing pathspec_item
 Cannot close git diff-index --cached --numstat
 [snip]


The command `git restore` is run from a directory inside a Git repo.
The Git needs to split the $CWD into 2 parts:
The path to the repo and "the rest", if any.
"The rest" becomes a "prefix" later used inside the pathspec code.

As an example, "/path/to/repo/dir-inside-repå" would determine
"/path/to/repo" as the root of the repo, the place where the
configuration file .git/config is found.

The rest becomes the prefix ("dir-inside-repå"), from where the pathspec
machinery expands the ".", more about this later.
If there is a decomposed form, (making the decomposing visible like this),
"dir-inside-rep°a" doesn't match "dir-inside-repå".

The solution is to read the config variable "core.precomposeunicode" early.
Then, if configured, precompose "prefix" (and argv) and handle the prefix
over into pathspec for expanding "." into a list of path names tracked by Git.

[1] git-bugreport-2021-01-06-1209.txt (git can't deal with special characters)
[2] https://lore.kernel.org/git/A102844A-9501-4A86-854D-E3B387D378AA@icloud.com/

Reported-by: Daniel Troger <random_n0body@icloud.com>
Helped-By: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 This may need some refinements, but we need to start somewhere...
 Are there any good ideas how to improve the commit message ?
 Should the code in git.c be "hidden" in a function somewhere else ?
 Other comments are appreciated.


compat/precompose_utf8.c     | 24 ++++++++++++++++++++++++
 compat/precompose_utf8.h     |  2 ++
 git-compat-util.h            |  8 ++++++++
 git.c                        |  9 +++++++++
 t/t3910-mac-os-precompose.sh | 15 +++++++++++++++
 5 files changed, 58 insertions(+)

diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c
index 136250fbf6..06e371660f 100644
--- a/compat/precompose_utf8.c
+++ b/compat/precompose_utf8.c
@@ -36,6 +36,11 @@ static size_t has_non_ascii(const char *s, size_t maxlen, size_t *strlen_c)
 	return ret;
 }

+int precompose_read_config_gently(void)
+{
+	git_config_get_bool("core.precomposeunicode", &precomposed_unicode);
+	return precomposed_unicode == 1;
+}

 void probe_utf8_pathname_composition(void)
 {
@@ -60,6 +65,25 @@ void probe_utf8_pathname_composition(void)
 	strbuf_release(&path);
 }

+char *precompose_string_if_needed(const char *in)
+{
+	size_t inlen = strlen(in);
+	size_t outlen;
+	char *out = NULL;
+	if ((has_non_ascii(in, inlen, NULL)) && (precomposed_unicode == 1)) {
+		int saved_errno = errno;
+		out = reencode_string_len(in, inlen,
+					  repo_encoding, path_encoding,
+					  &outlen);
+		if (out && outlen == inlen && !memcmp(in, out, outlen)) {
+			/* strings are identical: no need to return a new one */
+			free(out);
+			out = NULL;
+		}
+		errno = saved_errno;
+	}
+	return out;
+}

 void precompose_argv(int argc, const char **argv)
 {
diff --git a/compat/precompose_utf8.h b/compat/precompose_utf8.h
index 6f843d3e1a..ce82857d73 100644
--- a/compat/precompose_utf8.h
+++ b/compat/precompose_utf8.h
@@ -28,6 +28,8 @@ typedef struct {
 	struct dirent_prec_psx *dirent_nfc;
 } PREC_DIR;

+int precompose_read_config_gently(void);
+char *precompose_string_if_needed(const char *in);
 void precompose_argv(int argc, const char **argv);
 void probe_utf8_pathname_composition(void);

diff --git a/git-compat-util.h b/git-compat-util.h
index 104993b975..f34854b66f 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -252,6 +252,14 @@ typedef unsigned long uintptr_t;
 #ifdef PRECOMPOSE_UNICODE
 #include "compat/precompose_utf8.h"
 #else
+static inline int precompose_read_config_gently(void)
+{
+	return 0;
+}
+static inline char *precompose_string_if_needed(const char *in)
+{
+	return NULL; /* no need to precompose a string */
+}
 static inline void precompose_argv(int argc, const char **argv)
 {
 	; /* nothing */
diff --git a/git.c b/git.c
index a00a0a4d94..f09e14f733 100644
--- a/git.c
+++ b/git.c
@@ -421,6 +421,15 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
 			prefix = setup_git_directory_gently(&nongit_ok);
 		}

+		if (precompose_read_config_gently()) {
+			precompose_argv(argc, argv);
+			if (prefix) {
+				const char *prec_pfx;
+					prec_pfx = precompose_string_if_needed(prefix);
+				if (prec_pfx)
+					prefix = prec_pfx; /* memory lost */
+			}
+		}
 		if (use_pager == -1 && p->option & (RUN_SETUP | RUN_SETUP_GENTLY) &&
 		    !(p->option & DELAY_PAGER_CONFIG))
 			use_pager = check_pager_config(p->cmd);
diff --git a/t/t3910-mac-os-precompose.sh b/t/t3910-mac-os-precompose.sh
index 54ce19e353..bbbc50da93 100755
--- a/t/t3910-mac-os-precompose.sh
+++ b/t/t3910-mac-os-precompose.sh
@@ -191,6 +191,21 @@ test_expect_failure 'handle existing decomposed filenames' '
 	test_must_be_empty untracked
 '

+test_expect_success "unicode decomposed: git restore -p . " '
+	DIRNAMEPWD=dir.Odiarnfc &&
+	DIRNAMEINREPO=dir.$Adiarnfc &&
+	export DIRNAMEPWD DIRNAMEINREPO &&
+	git init $DIRNAMEPWD &&
+	( cd $DIRNAMEPWD &&
+		mkdir $DIRNAMEINREPO &&
+		cd $DIRNAMEINREPO &&
+		echo "Initial" >file &&
+		git add file &&
+		echo "More stuff" >>file &&
+		echo y | git restore -p .
+	)
+'
+
 # Test if the global core.precomposeunicode stops autosensing
 # Must be the last test case
 test_expect_success "respect git config --global core.precomposeunicode" '
--
2.30.0.155.g66e871b664


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC v1 1/1] git restore -p . and precomposed unicode
  2021-01-24 15:13 ` [PATCH/RFC v1 1/1] git restore -p . and precomposed unicode tboegi
@ 2021-01-24 19:51   ` Junio C Hamano
  2021-01-25 16:53     ` Torsten Bögershausen
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2021-01-24 19:51 UTC (permalink / raw)
  To: tboegi; +Cc: git, random_n0body, evraiphilippeblain, Philippe Blain

tboegi@web.de writes:

> The solution is to read the config variable "core.precomposeunicode" early.

For a single command like "restore", we need to fail if it is run
outside a repository anyway, so it is OK, but code in "git.c" in
general can be called outside a repository, we may not know where
our configuration files are, though.  So I'd prefer to avoid
adding too many "do this thing early for every single command".

Where do we normally read that variable?  I see calls to
precompose_argv() on only a few codepaths

builtin/diff-files.c:38:	precompose_argv(argc, argv);
builtin/diff-index.c:28:	precompose_argv(argc, argv);
builtin/diff-tree.c:129:	precompose_argv(argc, argv);
builtin/diff.c:455:	precompose_argv(argc, argv);
builtin/submodule--helper.c:1260:	precompose_argv(diff_args.nr, diff_args.v);
parse-options.c:872:	precompose_argv(argc, argv);

I guess the reason we can get away with it is because most of the
newer commands use the parse_options() API, and the call to
precompose_argv() is used by the codepaths implementing these
commands.  And as a rule, these commands read config first before
calling parse_options(), so by the time the control reaches there,
the value of the variable may be already known.

The question is why "restore -p" is so special?  Or does this patch
mean everybody, even the ones that use parse_options() is broken?

I guess "prefix" needs to be munged for everybody, so "restore -p"
on the title of the patch is a red herring, and the problem applies
to all "git" commands---in which case, inside "git.c" would be the
right place to do so.

I wonder if everybody who calls precompoase_argv() has access to the
prefix, though.  If so, would it make more sense to extend the API
to

	precompose_argv(int argc, char **argv, char **prefix)

so that the callers only need to call a single function, without any
additional code like this patch does?

Also, as the current precompose_argv() begins like this:

        void precompose_argv(int argc, const char **argv)
        {
                int i = 0;
                const char *oldarg;
                char *newarg;
                iconv_t ic_precompose;

                if (precomposed_unicode != 1)
                        return;

and environment.c initializes the global to (-1), I wonder if we can
get away with an approach not to read the "config" anywhere outside
precompose_argv() function.  Instead, can't the above snippet become
more like:

                if (precomposed_unicode < 0)
                        precomposed_unicode = read from config;
		if (precomposed_unicode != 1)
			return;

That way, you do not even have to touch anything outside
compat/precompose_utf8.c, other than adjusting the callers to pass
the pointer to their prefix to be munged.

Namely

> +int precompose_read_config_gently(void)

This can become file-scope static.

> +{
> +	git_config_get_bool("core.precomposeunicode", &precomposed_unicode);
> +	return precomposed_unicode == 1;
> +}
>
>  void probe_utf8_pathname_composition(void)
>  {
> @@ -60,6 +65,25 @@ void probe_utf8_pathname_composition(void)
>  	strbuf_release(&path);
>  }
>
> +char *precompose_string_if_needed(const char *in)
> +{

This too.

> diff --git a/compat/precompose_utf8.h b/compat/precompose_utf8.h
> index 6f843d3e1a..ce82857d73 100644
> --- a/compat/precompose_utf8.h
> +++ b/compat/precompose_utf8.h
> @@ -28,6 +28,8 @@ typedef struct {
>  	struct dirent_prec_psx *dirent_nfc;
>  } PREC_DIR;
>
> +int precompose_read_config_gently(void);
> +char *precompose_string_if_needed(const char *in);
>  void precompose_argv(int argc, const char **argv);
>  void probe_utf8_pathname_composition(void);

And this can go away.

> diff --git a/git-compat-util.h b/git-compat-util.h
> index 104993b975..f34854b66f 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -252,6 +252,14 @@ typedef unsigned long uintptr_t;
>  #ifdef PRECOMPOSE_UNICODE
>  #include "compat/precompose_utf8.h"
>  #else
> +static inline int precompose_read_config_gently(void)
> +{
> +	return 0;
> +}
> +static inline char *precompose_string_if_needed(const char *in)
> +{
> +	return NULL; /* no need to precompose a string */
> +}

So do these.

> diff --git a/git.c b/git.c
> index a00a0a4d94..f09e14f733 100644
> --- a/git.c
> +++ b/git.c
> @@ -421,6 +421,15 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
>  			prefix = setup_git_directory_gently(&nongit_ok);
>  		}
>
> +		if (precompose_read_config_gently()) {
> +			precompose_argv(argc, argv);
> +			if (prefix) {
> +				const char *prec_pfx;
> +					prec_pfx = precompose_string_if_needed(prefix);
> +				if (prec_pfx)
> +					prefix = prec_pfx; /* memory lost */
> +			}
> +		}

And this would move to the beginning of precompose_argv()
implementation.

Also the code we currently use to read the core.precomposeunicode
configuration variable (presumably somewhere in git_default_config()
callchain; I didn't check) can go away.

Which would be much nicer outcome, if doable (again, I didn't check).

> diff --git a/t/t3910-mac-os-precompose.sh b/t/t3910-mac-os-precompose.sh
> index 54ce19e353..bbbc50da93 100755
> --- a/t/t3910-mac-os-precompose.sh
> +++ b/t/t3910-mac-os-precompose.sh
> @@ -191,6 +191,21 @@ test_expect_failure 'handle existing decomposed filenames' '
>  	test_must_be_empty untracked
>  '
>
> +test_expect_success "unicode decomposed: git restore -p . " '
> +	DIRNAMEPWD=dir.Odiarnfc &&
> +	DIRNAMEINREPO=dir.$Adiarnfc &&
> +	export DIRNAMEPWD DIRNAMEINREPO &&
> +	git init $DIRNAMEPWD &&
> +	( cd $DIRNAMEPWD &&
> +		mkdir $DIRNAMEINREPO &&

Style:

	(
		cd $DIRNAMEPWD &&
		mkdir $DIRNAMEINREPO &&

Is "restore" the only thing that has this issue?  

I would imagine that anything that takes '.' pathspec to limit its
operation to the current subdirectory (e.g. "cd sub && git diff .")
would be affected (clarification: I am *not* hinting that other
commands need to be tested---I am however hinting to update the
proposed log message to explain either (1) this applies in general
to commands that does X, or (2) this affects only "restore -p" which
does this unusual thing Y that no other commands do).

Thanks.


> +		cd $DIRNAMEINREPO &&
> +		echo "Initial" >file &&
> +		git add file &&
> +		echo "More stuff" >>file &&
> +		echo y | git restore -p .
> +	)
> +'
> +
>  # Test if the global core.precomposeunicode stops autosensing
>  # Must be the last test case
>  test_expect_success "respect git config --global core.precomposeunicode" '
> --
> 2.30.0.155.g66e871b664

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH/RFC v1 1/1] git restore -p . and precomposed unicode
  2021-01-24 19:51   ` Junio C Hamano
@ 2021-01-25 16:53     ` Torsten Bögershausen
  0 siblings, 0 replies; 27+ messages in thread
From: Torsten Bögershausen @ 2021-01-25 16:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, random_n0body, evraiphilippeblain, Philippe Blain

On Sun, Jan 24, 2021 at 11:51:33AM -0800, Junio C Hamano wrote:

Thanks for the fast review, much appreciated.

The short answer: There is more to be done, V2 is coming somewhen.
For the longer story, please see inline.

> tboegi@web.de writes:
>
> > The solution is to read the config variable "core.precomposeunicode" early.
>
> For a single command like "restore", we need to fail if it is run
> outside a repository anyway, so it is OK, but code in "git.c" in
> general can be called outside a repository, we may not know where
> our configuration files are, though.  So I'd prefer to avoid
> adding too many "do this thing early for every single command".
>
> Where do we normally read that variable?  I see calls to
> precompose_argv() on only a few codepaths
>
> builtin/diff-files.c:38:	precompose_argv(argc, argv);
> builtin/diff-index.c:28:	precompose_argv(argc, argv);
> builtin/diff-tree.c:129:	precompose_argv(argc, argv);
> builtin/diff.c:455:	precompose_argv(argc, argv);
> builtin/submodule--helper.c:1260:	precompose_argv(diff_args.nr, diff_args.v);
> parse-options.c:872:	precompose_argv(argc, argv);
>
> I guess the reason we can get away with it is because most of the
> newer commands use the parse_options() API, and the call to
> precompose_argv() is used by the codepaths implementing these
> commands.  And as a rule, these commands read config first before
> calling parse_options(), so by the time the control reaches there,
> the value of the variable may be already known.

Yes.

>
> The question is why "restore -p" is so special?  Or does this patch
> mean everybody, even the ones that use parse_options() is broken?

One special thing is that it uses pathspec, `git restore -p .`
and $CWD is inside a decomposed directory.

There is more work to be done, as it seems.
Running `git status . ` in an ASCII based repo (ignore the debug prints)

  user@mac:/tmp/210125-precomp/A.dir>  git status .
  git.c/handle_builtin:699 cmd="status" len=6
  git.c:428 prefix="A.dir/" (6)
  git.c:434 prec_pfx="(null)" (4294967295)
  builtin/commit.c:1401 prefix="A.dir/" (6)
  On branch master
  Changes not staged for commit:
    (use "git add <file>..." to update what will be committed)
      (use "git restore <file>..." to discard changes in working directory)
              modified:   O.file

-------------------------
But doing the same test within a decomosed directory:
  user@mac:/tmp/210125-decomp/Ä.dir> git status .
  git.c/handle_builtin:699 cmd="status" len=6
  git.c:428 prefix="Ä.dir/" (8)
  git.c:434 prec_pfx="Ä.dir/" (7)
  builtin/commit.c:1401 prefix="Ä.dir/" (7)
  On branch master
  nothing to commit, working tree clean
---------
But using ".." as the pathspec works:

  user@mac:/tmp/210125-decomp/Ä.dir> git status ..
  git.c/handle_builtin:699 cmd="status" len=6
  git.c:428 prefix="Ä.dir/" (8)
  git.c:434 prec_pfx="Ä.dir/" (7)
  builtin/commit.c:1401 prefix="Ä.dir/" (7)
  On branch master
  Changes not staged for commit:
    (use "git add <file>..." to update what will be committed)
      (use "git restore <file>..." to discard changes in working directory)
              modified:   "../A\314\210.dir/\303\226.file"

  no changes added to commit (use "git add" and/or "git commit -a")
--------------

So in that sense, it seems as if `git restore` is special because
the patch helps.

More patches are needed asseen above.

And the rest of the suggestions makes all sense.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 1/1] MacOS: precompose_argv_prefix()
  2021-01-06 11:35 git-bugreport-2021-01-06-1209.txt (git can't deal with special characters) Daniel Troger
  2021-01-06 14:21 ` Torsten Bögershausen
  2021-01-24 15:13 ` [PATCH/RFC v1 1/1] git restore -p . and precomposed unicode tboegi
@ 2021-01-29 17:15 ` tboegi
  2021-01-29 23:19   ` Junio C Hamano
  2021-01-31  0:43   ` Junio C Hamano
  2021-02-02 15:11 ` [PATCH v3 " tboegi
  2021-02-03 16:28 ` [PATCH v4 " tboegi
  4 siblings, 2 replies; 27+ messages in thread
From: tboegi @ 2021-01-29 17:15 UTC (permalink / raw)
  To: git, random_n0body, levraiphilippeblain; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

The following sequence leads to a "BUG" assertion running under MacOS:

  DIR=git-test-restore-p
  Adiarnfd=$(printf 'A\314\210')
  DIRNAME=xx${Adiarnfd}yy
  mkdir $DIR &&
  cd $DIR &&
  git init &&
  mkdir $DIRNAME &&
  cd $DIRNAME &&
  echo "Initial" >file &&
  git add file &&
  echo "One more line" >>file &&
  echo y | git restore -p .

 Initialized empty Git repository in /tmp/git-test-restore-p/.git/
 BUG: pathspec.c:495: error initializing pathspec_item
 Cannot close git diff-index --cached --numstat
 [snip]

The command `git restore` is run from a directory inside a Git repo.
Git needs to split the $CWD into 2 parts:
The path to the repo and "the rest", if any.
"The rest" becomes a "prefix" later used inside the pathspec code.

As an example, "/path/to/repo/dir-inside-repå" would determine
"/path/to/repo" as the root of the repo, the place where the
configuration file .git/config is found.

The rest becomes the prefix ("dir-inside-repå"), from where the pathspec
machinery expands the ".", more about this later.
If there is a decomposed form, (making the decomposing visible like this),
"dir-inside-rep°a" doesn't match "dir-inside-repå".

Git commands need to:

 (a) read the configuration variable "core.precomposeunicode"
 (b) precocompose argv[]
 (c) precompose the prefix, if there was any

The first commit,
76759c7dff53 "git on Mac OS and precomposed unicode"
addressed (a) and (b).

The call to precompose_argv() was added into parse-options.c,
because that seemed to be a good place when the patch was written.

Commands that don't use parse-options need to do (a) and (b) themselfs.

The commands `diff-files`, `diff-index`, `diff-tree` and `diff`
learned (a) and (b) in
commit 90a78b83e0b8 "diff: run arguments through precompose_argv"

Branch names (or refs in general) using decomposed code points
resulting in decomposed file names had been fixed in
commit 8e712ef6fc97 "Honor core.precomposeUnicode in more places"

The bug report from above shows 2 things:
- more commands need to handle precomposed unicode
- (c) should be implemented for all commands using pathspecs

Solution:
precompose_argv() now handles the prefix (if needed), and is renamed into
precompose_argv_prefix().

Inside this function the config variable core.precomposeunicode is read
into the global variable precomposed_unicode, as before.
This reading is skipped if precomposed_unicode had been read before.

The original patch for preocomposed unicode, 76759c7dff53, placed
precompose_argv() into parse-options.c
Now move it into git.c .
As a cleanup, remove it from parse-options.c and diff*.c

There may be room for more cleanups.
This change intends to be a bug fix.
It does cleanup for the Git commands that have test cases in
t/t3910-mac-os-precompose.sh, more cleanup needs more tests and done
in future commits.

[1] git-bugreport-2021-01-06-1209.txt (git can't deal with special characters)
[2] https://lore.kernel.org/git/A102844A-9501-4A86-854D-E3B387D378AA@icloud.com/

Reported-by: Daniel Troger <random_n0body@icloud.com>
Helped-By: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>

Changes since V1:
 - major rework of the commit message
 - Use precompose_argv_prefix().
   I changed the suggestion precompose_argv(int argc, char **argv, char **prefix)
   to pass the string, and not char **, which probably should have been const char **
   Small details.
 - The function precompose_read_config_gently(void) is integrated into
   precompose_argv_prefix(). The changes in git.c look much nicer now.
 - > "I am *not* hinting that other commands need to be tested"
   I am not sure how to read this - I do think that more test could be good.
   And more setups that use decomposed unicode for $CWD and branch names.
   However, this patch should be a bugfix.
   The mild cleanup that is done turns it already into a not-only-bugfix-patch.

---
 builtin/diff-files.c         |  2 --
 builtin/diff-index.c         |  1 -
 builtin/diff-tree.c          |  1 -
 builtin/diff.c               |  2 +-
 builtin/submodule--helper.c  |  2 +-
 compat/precompose_utf8.c     | 36 ++++++++++++++++++++++++------------
 compat/precompose_utf8.h     |  2 +-
 git-compat-util.h            |  4 ++--
 git.c                        |  2 +-
 parse-options.c              |  1 -
 t/t3910-mac-os-precompose.sh | 16 ++++++++++++++++
 11 files changed, 46 insertions(+), 23 deletions(-)

diff --git a/builtin/diff-files.c b/builtin/diff-files.c
index 1e352dd8f7..454872e8fb 100644
--- a/builtin/diff-files.c
+++ b/builtin/diff-files.c
@@ -35,8 +35,6 @@ int cmd_diff_files(int argc, const char **argv, const char *prefix)
 	 */
 	rev.diffopt.ita_invisible_in_index = 1;

-	precompose_argv(argc, argv);
-
 	argc = setup_revisions(argc, argv, &rev, NULL);
 	while (1 < argc && argv[1][0] == '-') {
 		if (!strcmp(argv[1], "--base"))
diff --git a/builtin/diff-index.c b/builtin/diff-index.c
index 7f5281c461..1c63965123 100644
--- a/builtin/diff-index.c
+++ b/builtin/diff-index.c
@@ -25,7 +25,6 @@ int cmd_diff_index(int argc, const char **argv, const char *prefix)
 	git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
 	repo_init_revisions(the_repository, &rev, prefix);
 	rev.abbrev = 0;
-	precompose_argv(argc, argv);

 	argc = setup_revisions(argc, argv, &rev, NULL);
 	for (i = 1; i < argc; i++) {
diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c
index 9fc95e959f..104e9a844d 100644
--- a/builtin/diff-tree.c
+++ b/builtin/diff-tree.c
@@ -126,7 +126,6 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix)
 	memset(&s_r_opt, 0, sizeof(s_r_opt));
 	s_r_opt.tweak = diff_tree_tweak_rev;

-	precompose_argv(argc, argv);
 	argc = setup_revisions(argc, argv, opt, &s_r_opt);

 	memset(&w, 0, sizeof(w));
diff --git a/builtin/diff.c b/builtin/diff.c
index 780c33877f..3c87c95967 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -452,7 +452,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)

 	init_diff_ui_defaults();
 	git_config(git_diff_ui_config, NULL);
-	precompose_argv(argc, argv);
+	prefix = precompose_argv_prefix(argc, argv, prefix);

 	repo_init_revisions(the_repository, &rev, prefix);

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index c2bd882d17..9d505a6329 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1257,7 +1257,7 @@ static int compute_summary_module_list(struct object_id *head_oid,
 	git_config(git_diff_basic_config, NULL);
 	init_revisions(&rev, info->prefix);
 	rev.abbrev = 0;
-	precompose_argv(diff_args.nr, diff_args.v);
+	precompose_argv_prefix(diff_args.nr, diff_args.v, NULL);
 	setup_revisions(diff_args.nr, diff_args.v, &rev, NULL);
 	rev.diffopt.output_format = DIFF_FORMAT_NO_OUTPUT | DIFF_FORMAT_CALLBACK;
 	rev.diffopt.format_callback = submodule_summary_callback;
diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c
index 136250fbf6..780a557148 100644
--- a/compat/precompose_utf8.c
+++ b/compat/precompose_utf8.c
@@ -60,32 +60,44 @@ void probe_utf8_pathname_composition(void)
 	strbuf_release(&path);
 }

+static inline const char *precompose_string_if_needed(const char *in,
+						      iconv_t ic_precompose)
+{
+	size_t inlen;
+	size_t outlen;
+	if (has_non_ascii(in, (size_t)-1, &inlen)) {
+		char *out = reencode_string_iconv(in, inlen, ic_precompose,
+						  0, &outlen);
+		if (out && outlen == inlen && !memcmp(in, out, outlen))
+			free(out); /* identical: no need to return a new one */
+		else
+			in = out;
+	}
+	return in;
+}

-void precompose_argv(int argc, const char **argv)
+const char *precompose_argv_prefix(int argc, const char **argv, const char *prefix)
 {
 	int i = 0;
-	const char *oldarg;
-	char *newarg;
 	iconv_t ic_precompose;

+	git_config_get_bool("core.precomposeunicode", &precomposed_unicode);
 	if (precomposed_unicode != 1)
-		return;
+		return NULL;

 	ic_precompose = iconv_open(repo_encoding, path_encoding);
 	if (ic_precompose == (iconv_t) -1)
-		return;
+		return NULL;

 	while (i < argc) {
-		size_t namelen;
-		oldarg = argv[i];
-		if (has_non_ascii(oldarg, (size_t)-1, &namelen)) {
-			newarg = reencode_string_iconv(oldarg, namelen, ic_precompose, 0, NULL);
-			if (newarg)
-				argv[i] = newarg;
-		}
+		argv[i] = precompose_string_if_needed(argv[i], ic_precompose);
 		i++;
 	}
+	if (prefix) {
+		prefix = precompose_string_if_needed(prefix, ic_precompose);
+	}
 	iconv_close(ic_precompose);
+	return prefix;
 }


diff --git a/compat/precompose_utf8.h b/compat/precompose_utf8.h
index 6f843d3e1a..d70b84665c 100644
--- a/compat/precompose_utf8.h
+++ b/compat/precompose_utf8.h
@@ -28,7 +28,7 @@ typedef struct {
 	struct dirent_prec_psx *dirent_nfc;
 } PREC_DIR;

-void precompose_argv(int argc, const char **argv);
+const char *precompose_argv_prefix(int argc, const char **argv, const char *prefix);
 void probe_utf8_pathname_composition(void);

 PREC_DIR *precompose_utf8_opendir(const char *dirname);
diff --git a/git-compat-util.h b/git-compat-util.h
index 104993b975..93d9b4b7af 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -252,9 +252,9 @@ typedef unsigned long uintptr_t;
 #ifdef PRECOMPOSE_UNICODE
 #include "compat/precompose_utf8.h"
 #else
-static inline void precompose_argv(int argc, const char **argv)
+static inline const char *precompose_argv_prefix(int argc, const char **argv, const char *prefix)
 {
-	; /* nothing */
+	return prefix;
 }
 #define probe_utf8_pathname_composition()
 #endif
diff --git a/git.c b/git.c
index a00a0a4d94..16a485fbe7 100644
--- a/git.c
+++ b/git.c
@@ -420,7 +420,7 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
 			int nongit_ok;
 			prefix = setup_git_directory_gently(&nongit_ok);
 		}
-
+		prefix = precompose_argv_prefix(argc, argv, prefix);
 		if (use_pager == -1 && p->option & (RUN_SETUP | RUN_SETUP_GENTLY) &&
 		    !(p->option & DELAY_PAGER_CONFIG))
 			use_pager = check_pager_config(p->cmd);
diff --git a/parse-options.c b/parse-options.c
index f0507432ee..1de404bdba 100644
--- a/parse-options.c
+++ b/parse-options.c
@@ -869,7 +869,6 @@ int parse_options(int argc, const char **argv, const char *prefix,
 		usage_with_options(usagestr, options);
 	}

-	precompose_argv(argc, argv);
 	free(real_options);
 	free(ctx.alias_groups);
 	return parse_options_end(&ctx);
diff --git a/t/t3910-mac-os-precompose.sh b/t/t3910-mac-os-precompose.sh
index 54ce19e353..8f7b49221f 100755
--- a/t/t3910-mac-os-precompose.sh
+++ b/t/t3910-mac-os-precompose.sh
@@ -191,6 +191,22 @@ test_expect_failure 'handle existing decomposed filenames' '
 	test_must_be_empty untracked
 '

+test_expect_success "unicode decomposed: git restore -p . " '
+	DIRNAMEPWD=dir.Odiarnfc &&
+	DIRNAMEINREPO=dir.$Adiarnfc &&
+	export DIRNAMEPWD DIRNAMEINREPO &&
+	git init $DIRNAMEPWD &&
+	(
+		cd $DIRNAMEPWD &&
+		mkdir $DIRNAMEINREPO &&
+		cd $DIRNAMEINREPO &&
+		echo "Initial" >file &&
+		git add file &&
+		echo "More stuff" >>file &&
+		echo y | git restore -p .
+	)
+'
+
 # Test if the global core.precomposeunicode stops autosensing
 # Must be the last test case
 test_expect_success "respect git config --global core.precomposeunicode" '
--
2.30.0.155.g66e871b664


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/1] MacOS: precompose_argv_prefix()
  2021-01-29 17:15 ` [PATCH v2 1/1] MacOS: precompose_argv_prefix() tboegi
@ 2021-01-29 23:19   ` Junio C Hamano
  2021-01-31  0:43   ` Junio C Hamano
  1 sibling, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2021-01-29 23:19 UTC (permalink / raw)
  To: tboegi; +Cc: git, random_n0body, levraiphilippeblain

tboegi@web.de writes:

> Solution:
> precompose_argv() now handles the prefix (if needed), and is renamed into
> precompose_argv_prefix().
>
> Inside this function the config variable core.precomposeunicode is read
> into the global variable precomposed_unicode, as before.
> This reading is skipped if precomposed_unicode had been read before.
>
> The original patch for preocomposed unicode, 76759c7dff53, placed
> precompose_argv() into parse-options.c
> Now move it into git.c .
> As a cleanup, remove it from parse-options.c and diff*.c

So an effect of this change is that now everybody's argv[], not just
those who are using parse-options API, is munged at the same single
place.  That sounds like a good move toward a happy future.

Will queue.  Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/1] MacOS: precompose_argv_prefix()
  2021-01-29 17:15 ` [PATCH v2 1/1] MacOS: precompose_argv_prefix() tboegi
  2021-01-29 23:19   ` Junio C Hamano
@ 2021-01-31  0:43   ` Junio C Hamano
  2021-01-31  9:50     ` Torsten =?unknown-8bit?Q?B=C3=B6gershausen?=
  1 sibling, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2021-01-31  0:43 UTC (permalink / raw)
  To: tboegi; +Cc: git, random_n0body, levraiphilippeblain

tboegi@web.de writes:

> From: Torsten Bögershausen <tboegi@web.de>
> ...
> There may be room for more cleanups.
> This change intends to be a bug fix.
> It does cleanup for the Git commands that have test cases in
> t/t3910-mac-os-precompose.sh, more cleanup needs more tests and done
> in future commits.

Also it seems to break many tests only on macOS (quite
understandably).

https://github.com/git/git/actions/runs/524162837
https://github.com/git/git/actions/runs/524265437


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 1/1] MacOS: precompose_argv_prefix()
  2021-01-31  0:43   ` Junio C Hamano
@ 2021-01-31  9:50     ` Torsten =?unknown-8bit?Q?B=C3=B6gershausen?=
  0 siblings, 0 replies; 27+ messages in thread
From: Torsten =?unknown-8bit?Q?B=C3=B6gershausen?= @ 2021-01-31  9:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, random_n0body, levraiphilippeblain

On Sat, Jan 30, 2021 at 04:43:30PM -0800, Junio C Hamano wrote:
> tboegi@web.de writes:
>
> > From: Torsten B??gershausen <tboegi@web.de>
> > ...
> > There may be room for more cleanups.
> > This change intends to be a bug fix.
> > It does cleanup for the Git commands that have test cases in
> > t/t3910-mac-os-precompose.sh, more cleanup needs more tests and done
> > in future commits.
>
> Also it seems to break many tests only on macOS (quite
> understandably).
>
> https://github.com/git/git/actions/runs/524162837
> https://github.com/git/git/actions/runs/524265437
>

Oh, that's not good - I'll check.
And, somewhat embarrasing, the commit message includes the
 "Changes since V1:" section

There will be a  v3 the next day, please drop this from seen.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 1/1] MacOS: precompose_argv_prefix()
  2021-01-06 11:35 git-bugreport-2021-01-06-1209.txt (git can't deal with special characters) Daniel Troger
                   ` (2 preceding siblings ...)
  2021-01-29 17:15 ` [PATCH v2 1/1] MacOS: precompose_argv_prefix() tboegi
@ 2021-02-02 15:11 ` tboegi
  2021-02-02 17:43   ` Junio C Hamano
  2021-02-03 16:28 ` [PATCH v4 " tboegi
  4 siblings, 1 reply; 27+ messages in thread
From: tboegi @ 2021-02-02 15:11 UTC (permalink / raw)
  To: git, random_n0body, levraiphilippeblain; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

The following sequence leads to a "BUG" assertion running under MacOS:

  DIR=git-test-restore-p
  Adiarnfd=$(printf 'A\314\210')
  DIRNAME=xx${Adiarnfd}yy
  mkdir $DIR &&
  cd $DIR &&
  git init &&
  mkdir $DIRNAME &&
  cd $DIRNAME &&
  echo "Initial" >file &&
  git add file &&
  echo "One more line" >>file &&
  echo y | git restore -p .

 Initialized empty Git repository in /tmp/git-test-restore-p/.git/
 BUG: pathspec.c:495: error initializing pathspec_item
 Cannot close git diff-index --cached --numstat
 [snip]

The command `git restore` is run from a directory inside a Git repo.
Git needs to split the $CWD into 2 parts:
The path to the repo and "the rest", if any.
"The rest" becomes a "prefix" later used inside the pathspec code.

As an example, "/path/to/repo/dir-inside-repå" would determine
"/path/to/repo" as the root of the repo, the place where the
configuration file .git/config is found.

The rest becomes the prefix ("dir-inside-repå"), from where the
pathspec machinery expands the ".", more about this later.
If there is a decomposed form, (making the decomposing visible like this),
"dir-inside-rep°a" doesn't match "dir-inside-repå".

Git commands need to:

 (a) read the configuration variable "core.precomposeunicode"
 (b) precocompose argv[]
 (c) precompose the prefix, if there was any

The first commit,
76759c7dff53 "git on Mac OS and precomposed unicode"
addressed (a) and (b).

The call to precompose_argv() was added into parse-options.c,
because that seemed to be a good place when the patch was written.

Commands that don't use parse-options need to do (a) and (b) themselfs.

The commands `diff-files`, `diff-index`, `diff-tree` and `diff`
learned (a) and (b) in
commit 90a78b83e0b8 "diff: run arguments through precompose_argv"

Branch names (or refs in general) using decomposed code points
resulting in decomposed file names had been fixed in
commit 8e712ef6fc97 "Honor core.precomposeUnicode in more places"

The bug report from above shows 2 things:
- more commands need to handle precomposed unicode
- (c) should be implemented for all commands using pathspecs

Solution:
precompose_argv() now handles the prefix (if needed), and is renamed into
precompose_argv_prefix().

Inside this function the config variable core.precomposeunicode is read
into the global variable precomposed_unicode, as before.
This reading is skipped if precomposed_unicode had been read before.

The original patch for preocomposed unicode, 76759c7dff53, placed
precompose_argv() into parse-options.c
Now add it into git.c (as well) for commands that don't use parse-options.

There is certainly room for cleanups.
This change intends to be a bug fix.
Cleanups needs more tests in e.g. t/t3910-mac-os-precompose.sh, and should
be done in future commits.

[1] git-bugreport-2021-01-06-1209.txt (git can't deal with special characters)
[2] https://lore.kernel.org/git/A102844A-9501-4A86-854D-E3B387D378AA@icloud.com/

Reported-by: Daniel Troger <random_n0body@icloud.com>
Helped-By: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 Changes since V2:
 - reverted some of the cleanups done in V2, adjusted the commit message
 - found and fixed a major bug in compat/precompose_utf8.c, when non-unicode
   argv was send to Git, keep them "as is". That is the very old handling.
   If the original and precomposed string are identical, free() the new one.

 builtin/diff-files.c         |  2 +-
 builtin/diff-index.c         |  2 +-
 builtin/diff-tree.c          |  2 +-
 builtin/diff.c               |  2 +-
 builtin/submodule--helper.c  |  2 +-
 compat/precompose_utf8.c     | 52 +++++++++++++++++++++++-------------
 compat/precompose_utf8.h     |  2 +-
 git-compat-util.h            |  4 +--
 git.c                        |  2 +-
 parse-options.c              |  2 +-
 t/t3910-mac-os-precompose.sh | 16 +++++++++++
 11 files changed, 59 insertions(+), 29 deletions(-)

diff --git a/builtin/diff-files.c b/builtin/diff-files.c
index 1e352dd8f7..e3851dd1c0 100644
--- a/builtin/diff-files.c
+++ b/builtin/diff-files.c
@@ -35,7 +35,7 @@ int cmd_diff_files(int argc, const char **argv, const char *prefix)
 	 */
 	rev.diffopt.ita_invisible_in_index = 1;

-	precompose_argv(argc, argv);
+	prefix = precompose_argv_prefix(argc, argv, prefix);

 	argc = setup_revisions(argc, argv, &rev, NULL);
 	while (1 < argc && argv[1][0] == '-') {
diff --git a/builtin/diff-index.c b/builtin/diff-index.c
index 7f5281c461..c33d7af478 100644
--- a/builtin/diff-index.c
+++ b/builtin/diff-index.c
@@ -25,7 +25,7 @@ int cmd_diff_index(int argc, const char **argv, const char *prefix)
 	git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
 	repo_init_revisions(the_repository, &rev, prefix);
 	rev.abbrev = 0;
-	precompose_argv(argc, argv);
+	prefix = precompose_argv_prefix(argc, argv, prefix);

 	argc = setup_revisions(argc, argv, &rev, NULL);
 	for (i = 1; i < argc; i++) {
diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c
index 9fc95e959f..178d12f07f 100644
--- a/builtin/diff-tree.c
+++ b/builtin/diff-tree.c
@@ -126,7 +126,7 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix)
 	memset(&s_r_opt, 0, sizeof(s_r_opt));
 	s_r_opt.tweak = diff_tree_tweak_rev;

-	precompose_argv(argc, argv);
+	prefix = precompose_argv_prefix(argc, argv, prefix);
 	argc = setup_revisions(argc, argv, opt, &s_r_opt);

 	memset(&w, 0, sizeof(w));
diff --git a/builtin/diff.c b/builtin/diff.c
index 780c33877f..3c87c95967 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -452,7 +452,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)

 	init_diff_ui_defaults();
 	git_config(git_diff_ui_config, NULL);
-	precompose_argv(argc, argv);
+	prefix = precompose_argv_prefix(argc, argv, prefix);

 	repo_init_revisions(the_repository, &rev, prefix);

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index c2bd882d17..9d505a6329 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1257,7 +1257,7 @@ static int compute_summary_module_list(struct object_id *head_oid,
 	git_config(git_diff_basic_config, NULL);
 	init_revisions(&rev, info->prefix);
 	rev.abbrev = 0;
-	precompose_argv(diff_args.nr, diff_args.v);
+	precompose_argv_prefix(diff_args.nr, diff_args.v, NULL);
 	setup_revisions(diff_args.nr, diff_args.v, &rev, NULL);
 	rev.diffopt.output_format = DIFF_FORMAT_NO_OUTPUT | DIFF_FORMAT_CALLBACK;
 	rev.diffopt.format_callback = submodule_summary_callback;
diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c
index 136250fbf6..ec560565a8 100644
--- a/compat/precompose_utf8.c
+++ b/compat/precompose_utf8.c
@@ -60,32 +60,46 @@ void probe_utf8_pathname_composition(void)
 	strbuf_release(&path);
 }

-
-void precompose_argv(int argc, const char **argv)
+static inline const char *precompose_string_if_needed(const char *in)
 {
-	int i = 0;
-	const char *oldarg;
-	char *newarg;
-	iconv_t ic_precompose;
+	size_t inlen;
+	size_t outlen;
+	if (has_non_ascii(in, (size_t)-1, &inlen)) {
+		iconv_t ic_prec;
+		char *out;
+		if (precomposed_unicode < 0)
+			git_config_get_bool("core.precomposeunicode", &precomposed_unicode);
+		if (precomposed_unicode != 1)
+			return in;
+		ic_prec = iconv_open(repo_encoding, path_encoding);
+		if (ic_prec == (iconv_t) -1)
+			return in;
+
+		out = reencode_string_iconv(in, inlen, ic_prec, 0, &outlen);
+		if (out) {
+			if (outlen == inlen && !memcmp(in, out, outlen))
+				free(out); /* no need to return indentical */
+			else
+				in = out;
+		}
+		iconv_close(ic_prec);

-	if (precomposed_unicode != 1)
-		return;
+	}
+	return in;
+}

-	ic_precompose = iconv_open(repo_encoding, path_encoding);
-	if (ic_precompose == (iconv_t) -1)
-		return;
+const char *precompose_argv_prefix(int argc, const char **argv, const char *prefix)
+{
+	int i = 0;

 	while (i < argc) {
-		size_t namelen;
-		oldarg = argv[i];
-		if (has_non_ascii(oldarg, (size_t)-1, &namelen)) {
-			newarg = reencode_string_iconv(oldarg, namelen, ic_precompose, 0, NULL);
-			if (newarg)
-				argv[i] = newarg;
-		}
+		argv[i] = precompose_string_if_needed(argv[i]);
 		i++;
 	}
-	iconv_close(ic_precompose);
+	if (prefix) {
+		prefix = precompose_string_if_needed(prefix);
+	}
+	return prefix;
 }


diff --git a/compat/precompose_utf8.h b/compat/precompose_utf8.h
index 6f843d3e1a..d70b84665c 100644
--- a/compat/precompose_utf8.h
+++ b/compat/precompose_utf8.h
@@ -28,7 +28,7 @@ typedef struct {
 	struct dirent_prec_psx *dirent_nfc;
 } PREC_DIR;

-void precompose_argv(int argc, const char **argv);
+const char *precompose_argv_prefix(int argc, const char **argv, const char *prefix);
 void probe_utf8_pathname_composition(void);

 PREC_DIR *precompose_utf8_opendir(const char *dirname);
diff --git a/git-compat-util.h b/git-compat-util.h
index 104993b975..93d9b4b7af 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -252,9 +252,9 @@ typedef unsigned long uintptr_t;
 #ifdef PRECOMPOSE_UNICODE
 #include "compat/precompose_utf8.h"
 #else
-static inline void precompose_argv(int argc, const char **argv)
+static inline const char *precompose_argv_prefix(int argc, const char **argv, const char *prefix)
 {
-	; /* nothing */
+	return prefix;
 }
 #define probe_utf8_pathname_composition()
 #endif
diff --git a/git.c b/git.c
index a00a0a4d94..16a485fbe7 100644
--- a/git.c
+++ b/git.c
@@ -420,7 +420,7 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
 			int nongit_ok;
 			prefix = setup_git_directory_gently(&nongit_ok);
 		}
-
+		prefix = precompose_argv_prefix(argc, argv, prefix);
 		if (use_pager == -1 && p->option & (RUN_SETUP | RUN_SETUP_GENTLY) &&
 		    !(p->option & DELAY_PAGER_CONFIG))
 			use_pager = check_pager_config(p->cmd);
diff --git a/parse-options.c b/parse-options.c
index f0507432ee..fbea16eaf5 100644
--- a/parse-options.c
+++ b/parse-options.c
@@ -869,7 +869,7 @@ int parse_options(int argc, const char **argv, const char *prefix,
 		usage_with_options(usagestr, options);
 	}

-	precompose_argv(argc, argv);
+	precompose_argv_prefix(argc, argv, NULL);
 	free(real_options);
 	free(ctx.alias_groups);
 	return parse_options_end(&ctx);
diff --git a/t/t3910-mac-os-precompose.sh b/t/t3910-mac-os-precompose.sh
index 54ce19e353..8f7b49221f 100755
--- a/t/t3910-mac-os-precompose.sh
+++ b/t/t3910-mac-os-precompose.sh
@@ -191,6 +191,22 @@ test_expect_failure 'handle existing decomposed filenames' '
 	test_must_be_empty untracked
 '

+test_expect_success "unicode decomposed: git restore -p . " '
+	DIRNAMEPWD=dir.Odiarnfc &&
+	DIRNAMEINREPO=dir.$Adiarnfc &&
+	export DIRNAMEPWD DIRNAMEINREPO &&
+	git init $DIRNAMEPWD &&
+	(
+		cd $DIRNAMEPWD &&
+		mkdir $DIRNAMEINREPO &&
+		cd $DIRNAMEINREPO &&
+		echo "Initial" >file &&
+		git add file &&
+		echo "More stuff" >>file &&
+		echo y | git restore -p .
+	)
+'
+
 # Test if the global core.precomposeunicode stops autosensing
 # Must be the last test case
 test_expect_success "respect git config --global core.precomposeunicode" '
--
2.30.0.155.g66e871b664


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 1/1] MacOS: precompose_argv_prefix()
  2021-02-02 15:11 ` [PATCH v3 " tboegi
@ 2021-02-02 17:43   ` Junio C Hamano
  0 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2021-02-02 17:43 UTC (permalink / raw)
  To: tboegi; +Cc: git, random_n0body, levraiphilippeblain

tboegi@web.de writes:

> diff --git a/builtin/diff-files.c b/builtin/diff-files.c
> index 1e352dd8f7..e3851dd1c0 100644
> --- a/builtin/diff-files.c
> +++ b/builtin/diff-files.c
> @@ -35,7 +35,7 @@ int cmd_diff_files(int argc, const char **argv, const char *prefix)
>  	 */
>  	rev.diffopt.ita_invisible_in_index = 1;
>
> -	precompose_argv(argc, argv);
> +	prefix = precompose_argv_prefix(argc, argv, prefix);

When git.c::cmd_main() calls run_builtin() to call cmd_diff_files(),
precompose would have already been called, and we end up calling the
already processed argv[] and prefix again here.

Is there a codepath where cmd_diff_files() gets called _without_
making the call to precompose() in git.c::run_builtin()?

Previous round removed the precompose call and I thought the logic
was sound, but I must be missing something.

The same question applies to other built-ins.

Standalone commands that go through execv_dashed_external() should
still have a call to precompose() in their own cmd_main() as the
prefix is not affected for them, but I suspect that they are not
expected to be run from a subdirectory to begin with?

> +const char *precompose_argv_prefix(int argc, const char **argv, const char *prefix)
> +{
> +	int i = 0;
>
>  	while (i < argc) {
> -		size_t namelen;
> -		oldarg = argv[i];
> -		if (has_non_ascii(oldarg, (size_t)-1, &namelen)) {
> -			newarg = reencode_string_iconv(oldarg, namelen, ic_precompose, 0, NULL);
> -			if (newarg)
> -				argv[i] = newarg;
> -		}
> +		argv[i] = precompose_string_if_needed(argv[i]);
>  		i++;
>  	}
> -	iconv_close(ic_precompose);
> +	if (prefix) {
> +		prefix = precompose_string_if_needed(prefix);
> +	}
> +	return prefix;
>  }

OK.  I missed that the previous round did return NULL when the
original should have been returned.  It is clear that this caller,
and the updated precompose_string_if_needed(), returns the original.

Good.

> diff --git a/git.c b/git.c
> index a00a0a4d94..16a485fbe7 100644
> --- a/git.c
> +++ b/git.c
> @@ -420,7 +420,7 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
>  			int nongit_ok;
>  			prefix = setup_git_directory_gently(&nongit_ok);
>  		}
> -
> +		prefix = precompose_argv_prefix(argc, argv, prefix);
>  		if (use_pager == -1 && p->option & (RUN_SETUP | RUN_SETUP_GENTLY) &&
>  		    !(p->option & DELAY_PAGER_CONFIG))
>  			use_pager = check_pager_config(p->cmd);
> diff --git a/parse-options.c b/parse-options.c
> index f0507432ee..fbea16eaf5 100644
> --- a/parse-options.c
> +++ b/parse-options.c
> @@ -869,7 +869,7 @@ int parse_options(int argc, const char **argv, const char *prefix,
>  		usage_with_options(usagestr, options);
>  	}
>
> -	precompose_argv(argc, argv);
> +	precompose_argv_prefix(argc, argv, NULL);

The correctness of this call also relies on that precompose() is
expected to be idempotent (not saying it is necessarily bad, but
just making a note), as argv[] must have been already processed
before a built-in calls this function.

> diff --git a/t/t3910-mac-os-precompose.sh b/t/t3910-mac-os-precompose.sh
> index 54ce19e353..8f7b49221f 100755
> --- a/t/t3910-mac-os-precompose.sh
> +++ b/t/t3910-mac-os-precompose.sh
> @@ -191,6 +191,22 @@ test_expect_failure 'handle existing decomposed filenames' '
>  	test_must_be_empty untracked
>  '
>
> +test_expect_success "unicode decomposed: git restore -p . " '
> +	DIRNAMEPWD=dir.Odiarnfc &&
> +	DIRNAMEINREPO=dir.$Adiarnfc &&
> +	export DIRNAMEPWD DIRNAMEINREPO &&

The above is fine, but

> +	git init $DIRNAMEPWD &&
> +	(
> +		cd $DIRNAMEPWD &&
> +		mkdir $DIRNAMEINREPO &&
> +		cd $DIRNAMEINREPO &&

Shouldn't these variable references be "quoted" for readers (I know
they happen to be free of $IFS whitespaces etc., but readers and
more importantly those who may casually cut-and-paste would not
know)?

> +		echo "Initial" >file &&
> +		git add file &&
> +		echo "More stuff" >>file &&
> +		echo y | git restore -p .
> +	)
> +'
> +
>  # Test if the global core.precomposeunicode stops autosensing
>  # Must be the last test case
>  test_expect_success "respect git config --global core.precomposeunicode" '
> --
> 2.30.0.155.g66e871b664

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v4 1/1] MacOS: precompose_argv_prefix()
  2021-01-06 11:35 git-bugreport-2021-01-06-1209.txt (git can't deal with special characters) Daniel Troger
                   ` (3 preceding siblings ...)
  2021-02-02 15:11 ` [PATCH v3 " tboegi
@ 2021-02-03 16:28 ` tboegi
  2021-02-03 19:33   ` Junio C Hamano
  4 siblings, 1 reply; 27+ messages in thread
From: tboegi @ 2021-02-03 16:28 UTC (permalink / raw)
  To: git, random_n0body, levraiphilippeblain; +Cc: Torsten Bögershausen

From: Torsten Bögershausen <tboegi@web.de>

The following sequence leads to a "BUG" assertion running under MacOS:

  DIR=git-test-restore-p
  Adiarnfd=$(printf 'A\314\210')
  DIRNAME=xx${Adiarnfd}yy
  mkdir $DIR &&
  cd $DIR &&
  git init &&
  mkdir $DIRNAME &&
  cd $DIRNAME &&
  echo "Initial" >file &&
  git add file &&
  echo "One more line" >>file &&
  echo y | git restore -p .

 Initialized empty Git repository in /tmp/git-test-restore-p/.git/
 BUG: pathspec.c:495: error initializing pathspec_item
 Cannot close git diff-index --cached --numstat
 [snip]

The command `git restore` is run from a directory inside a Git repo.
Git needs to split the $CWD into 2 parts:
The path to the repo and "the rest", if any.
"The rest" becomes a "prefix" later used inside the pathspec code.

As an example, "/path/to/repo/dir-inside-repå" would determine
"/path/to/repo" as the root of the repo, the place where the
configuration file .git/config is found.

The rest becomes the prefix ("dir-inside-repå"), from where the
pathspec machinery expands the ".", more about this later.
If there is a decomposed form, (making the decomposing visible like this),
"dir-inside-rep°a" doesn't match "dir-inside-repå".

Git commands need to:

 (a) read the configuration variable "core.precomposeunicode"
 (b) precocompose argv[]
 (c) precompose the prefix, if there was any

The first commit,
76759c7dff53 "git on Mac OS and precomposed unicode"
addressed (a) and (b).

The call to precompose_argv() was added into parse-options.c,
because that seemed to be a good place when the patch was written.

Commands that don't use parse-options need to do (a) and (b) themselfs.

The commands `diff-files`, `diff-index`, `diff-tree` and `diff`
learned (a) and (b) in
commit 90a78b83e0b8 "diff: run arguments through precompose_argv"

Branch names (or refs in general) using decomposed code points
resulting in decomposed file names had been fixed in
commit 8e712ef6fc97 "Honor core.precomposeUnicode in more places"

The bug report from above shows 2 things:
- more commands need to handle precomposed unicode
- (c) should be implemented for all commands using pathspecs

Solution:
precompose_argv() now handles the prefix (if needed), and is renamed into
precompose_argv_prefix().

Inside this function the config variable core.precomposeunicode is read
into the global variable precomposed_unicode, as before.
This reading is skipped if precomposed_unicode had been read before.

The original patch for preocomposed unicode, 76759c7dff53, placed
precompose_argv() into parse-options.c
Now add it into git.c (as well) for commands that don't use parse-options.
Note that precompose() may be called twice - it is idempotent.

There is certainly room for cleanups - this change intends to be a bug fix.
Cleanups needs more tests in e.g. t/t3910-mac-os-precompose.sh, and should
be done in future commits.

[1] git-bugreport-2021-01-06-1209.txt (git can't deal with special characters)
[2] https://lore.kernel.org/git/A102844A-9501-4A86-854D-E3B387D378AA@icloud.com/

Reported-by: Daniel Troger <random_n0body@icloud.com>
Helped-By: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 Changes since V3:
 - Make clear that precomposing twice doesn't do anything bad (idempotent)
 - Add "" around $DIRNAME in t/t3910-mac-os-precompose.sh, as suggested by Junio
   (I don't have a strong preference here, but "cut and paste" is a good point)
 - And yes, there may be more work to be done. For the moment I would like to
   get this merged into git.git, and address the rest later.

 builtin/diff-files.c         |  2 +-
 builtin/diff-index.c         |  2 +-
 builtin/diff-tree.c          |  2 +-
 builtin/diff.c               |  2 +-
 builtin/submodule--helper.c  |  2 +-
 compat/precompose_utf8.c     | 52 +++++++++++++++++++++++-------------
 compat/precompose_utf8.h     |  2 +-
 git-compat-util.h            |  4 +--
 git.c                        |  2 +-
 parse-options.c              |  2 +-
 t/t3910-mac-os-precompose.sh | 16 +++++++++++
 11 files changed, 59 insertions(+), 29 deletions(-)

diff --git a/builtin/diff-files.c b/builtin/diff-files.c
index 1e352dd8f7..e3851dd1c0 100644
--- a/builtin/diff-files.c
+++ b/builtin/diff-files.c
@@ -35,7 +35,7 @@ int cmd_diff_files(int argc, const char **argv, const char *prefix)
 	 */
 	rev.diffopt.ita_invisible_in_index = 1;

-	precompose_argv(argc, argv);
+	prefix = precompose_argv_prefix(argc, argv, prefix);

 	argc = setup_revisions(argc, argv, &rev, NULL);
 	while (1 < argc && argv[1][0] == '-') {
diff --git a/builtin/diff-index.c b/builtin/diff-index.c
index 7f5281c461..c33d7af478 100644
--- a/builtin/diff-index.c
+++ b/builtin/diff-index.c
@@ -25,7 +25,7 @@ int cmd_diff_index(int argc, const char **argv, const char *prefix)
 	git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
 	repo_init_revisions(the_repository, &rev, prefix);
 	rev.abbrev = 0;
-	precompose_argv(argc, argv);
+	prefix = precompose_argv_prefix(argc, argv, prefix);

 	argc = setup_revisions(argc, argv, &rev, NULL);
 	for (i = 1; i < argc; i++) {
diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c
index 9fc95e959f..178d12f07f 100644
--- a/builtin/diff-tree.c
+++ b/builtin/diff-tree.c
@@ -126,7 +126,7 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix)
 	memset(&s_r_opt, 0, sizeof(s_r_opt));
 	s_r_opt.tweak = diff_tree_tweak_rev;

-	precompose_argv(argc, argv);
+	prefix = precompose_argv_prefix(argc, argv, prefix);
 	argc = setup_revisions(argc, argv, opt, &s_r_opt);

 	memset(&w, 0, sizeof(w));
diff --git a/builtin/diff.c b/builtin/diff.c
index 780c33877f..3c87c95967 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -452,7 +452,7 @@ int cmd_diff(int argc, const char **argv, const char *prefix)

 	init_diff_ui_defaults();
 	git_config(git_diff_ui_config, NULL);
-	precompose_argv(argc, argv);
+	prefix = precompose_argv_prefix(argc, argv, prefix);

 	repo_init_revisions(the_repository, &rev, prefix);

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index c2bd882d17..9d505a6329 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1257,7 +1257,7 @@ static int compute_summary_module_list(struct object_id *head_oid,
 	git_config(git_diff_basic_config, NULL);
 	init_revisions(&rev, info->prefix);
 	rev.abbrev = 0;
-	precompose_argv(diff_args.nr, diff_args.v);
+	precompose_argv_prefix(diff_args.nr, diff_args.v, NULL);
 	setup_revisions(diff_args.nr, diff_args.v, &rev, NULL);
 	rev.diffopt.output_format = DIFF_FORMAT_NO_OUTPUT | DIFF_FORMAT_CALLBACK;
 	rev.diffopt.format_callback = submodule_summary_callback;
diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c
index 136250fbf6..ec560565a8 100644
--- a/compat/precompose_utf8.c
+++ b/compat/precompose_utf8.c
@@ -60,32 +60,46 @@ void probe_utf8_pathname_composition(void)
 	strbuf_release(&path);
 }

-
-void precompose_argv(int argc, const char **argv)
+static inline const char *precompose_string_if_needed(const char *in)
 {
-	int i = 0;
-	const char *oldarg;
-	char *newarg;
-	iconv_t ic_precompose;
+	size_t inlen;
+	size_t outlen;
+	if (has_non_ascii(in, (size_t)-1, &inlen)) {
+		iconv_t ic_prec;
+		char *out;
+		if (precomposed_unicode < 0)
+			git_config_get_bool("core.precomposeunicode", &precomposed_unicode);
+		if (precomposed_unicode != 1)
+			return in;
+		ic_prec = iconv_open(repo_encoding, path_encoding);
+		if (ic_prec == (iconv_t) -1)
+			return in;
+
+		out = reencode_string_iconv(in, inlen, ic_prec, 0, &outlen);
+		if (out) {
+			if (outlen == inlen && !memcmp(in, out, outlen))
+				free(out); /* no need to return indentical */
+			else
+				in = out;
+		}
+		iconv_close(ic_prec);

-	if (precomposed_unicode != 1)
-		return;
+	}
+	return in;
+}

-	ic_precompose = iconv_open(repo_encoding, path_encoding);
-	if (ic_precompose == (iconv_t) -1)
-		return;
+const char *precompose_argv_prefix(int argc, const char **argv, const char *prefix)
+{
+	int i = 0;

 	while (i < argc) {
-		size_t namelen;
-		oldarg = argv[i];
-		if (has_non_ascii(oldarg, (size_t)-1, &namelen)) {
-			newarg = reencode_string_iconv(oldarg, namelen, ic_precompose, 0, NULL);
-			if (newarg)
-				argv[i] = newarg;
-		}
+		argv[i] = precompose_string_if_needed(argv[i]);
 		i++;
 	}
-	iconv_close(ic_precompose);
+	if (prefix) {
+		prefix = precompose_string_if_needed(prefix);
+	}
+	return prefix;
 }


diff --git a/compat/precompose_utf8.h b/compat/precompose_utf8.h
index 6f843d3e1a..d70b84665c 100644
--- a/compat/precompose_utf8.h
+++ b/compat/precompose_utf8.h
@@ -28,7 +28,7 @@ typedef struct {
 	struct dirent_prec_psx *dirent_nfc;
 } PREC_DIR;

-void precompose_argv(int argc, const char **argv);
+const char *precompose_argv_prefix(int argc, const char **argv, const char *prefix);
 void probe_utf8_pathname_composition(void);

 PREC_DIR *precompose_utf8_opendir(const char *dirname);
diff --git a/git-compat-util.h b/git-compat-util.h
index 104993b975..93d9b4b7af 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -252,9 +252,9 @@ typedef unsigned long uintptr_t;
 #ifdef PRECOMPOSE_UNICODE
 #include "compat/precompose_utf8.h"
 #else
-static inline void precompose_argv(int argc, const char **argv)
+static inline const char *precompose_argv_prefix(int argc, const char **argv, const char *prefix)
 {
-	; /* nothing */
+	return prefix;
 }
 #define probe_utf8_pathname_composition()
 #endif
diff --git a/git.c b/git.c
index a00a0a4d94..16a485fbe7 100644
--- a/git.c
+++ b/git.c
@@ -420,7 +420,7 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
 			int nongit_ok;
 			prefix = setup_git_directory_gently(&nongit_ok);
 		}
-
+		prefix = precompose_argv_prefix(argc, argv, prefix);
 		if (use_pager == -1 && p->option & (RUN_SETUP | RUN_SETUP_GENTLY) &&
 		    !(p->option & DELAY_PAGER_CONFIG))
 			use_pager = check_pager_config(p->cmd);
diff --git a/parse-options.c b/parse-options.c
index f0507432ee..fbea16eaf5 100644
--- a/parse-options.c
+++ b/parse-options.c
@@ -869,7 +869,7 @@ int parse_options(int argc, const char **argv, const char *prefix,
 		usage_with_options(usagestr, options);
 	}

-	precompose_argv(argc, argv);
+	precompose_argv_prefix(argc, argv, NULL);
 	free(real_options);
 	free(ctx.alias_groups);
 	return parse_options_end(&ctx);
diff --git a/t/t3910-mac-os-precompose.sh b/t/t3910-mac-os-precompose.sh
index 54ce19e353..bc632a03ed 100755
--- a/t/t3910-mac-os-precompose.sh
+++ b/t/t3910-mac-os-precompose.sh
@@ -191,6 +191,22 @@ test_expect_failure 'handle existing decomposed filenames' '
 	test_must_be_empty untracked
 '

+test_expect_success "unicode decomposed: git restore -p . " '
+	DIRNAMEPWD=dir.Odiarnfc &&
+	DIRNAMEINREPO=dir.$Adiarnfc &&
+	export DIRNAMEPWD DIRNAMEINREPO &&
+	git init "$DIRNAMEPWD" &&
+	(
+		cd "$DIRNAMEPWD" &&
+		mkdir "$DIRNAMEINREPO" &&
+		cd "$DIRNAMEINREPO" &&
+		echo "Initial" >file &&
+		git add file &&
+		echo "More stuff" >>file &&
+		echo y | git restore -p .
+	)
+'
+
 # Test if the global core.precomposeunicode stops autosensing
 # Must be the last test case
 test_expect_success "respect git config --global core.precomposeunicode" '
--
2.30.0.155.g66e871b664


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 1/1] MacOS: precompose_argv_prefix()
  2021-02-03 16:28 ` [PATCH v4 " tboegi
@ 2021-02-03 19:33   ` Junio C Hamano
  2021-02-03 22:13     ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2021-02-03 19:33 UTC (permalink / raw)
  To: tboegi; +Cc: git, random_n0body, levraiphilippeblain

tboegi@web.de writes:

> From: Torsten Bögershausen <tboegi@web.de>
>
> The following sequence leads to a "BUG" assertion running under MacOS:
>
>   DIR=git-test-restore-p
>   Adiarnfd=$(printf 'A\314\210')
>   DIRNAME=xx${Adiarnfd}yy
>   mkdir $DIR &&
>   cd $DIR &&
>   git init &&
>   mkdir $DIRNAME &&
>   cd $DIRNAME &&
>   echo "Initial" >file &&
>   git add file &&
>   echo "One more line" >>file &&
>   echo y | git restore -p .
>
>  Initialized empty Git repository in /tmp/git-test-restore-p/.git/
>  BUG: pathspec.c:495: error initializing pathspec_item
>  Cannot close git diff-index --cached --numstat
>  [snip]
>
> The command `git restore` is run from a directory inside a Git repo.
> Git needs to split the $CWD into 2 parts:
> The path to the repo and "the rest", if any.
> "The rest" becomes a "prefix" later used inside the pathspec code.
>
> As an example, "/path/to/repo/dir-inside-repå" would determine
> "/path/to/repo" as the root of the repo, the place where the
> configuration file .git/config is found.
>
> The rest becomes the prefix ("dir-inside-repå"), from where the
> pathspec machinery expands the ".", more about this later.
> If there is a decomposed form, (making the decomposing visible like this),
> "dir-inside-rep°a" doesn't match "dir-inside-repå".
>
> Git commands need to:
>
>  (a) read the configuration variable "core.precomposeunicode"
>  (b) precocompose argv[]
>  (c) precompose the prefix, if there was any
>
> The first commit,
> 76759c7dff53 "git on Mac OS and precomposed unicode"
> addressed (a) and (b).
>
> The call to precompose_argv() was added into parse-options.c,
> because that seemed to be a good place when the patch was written.
>
> Commands that don't use parse-options need to do (a) and (b) themselfs.
>
> The commands `diff-files`, `diff-index`, `diff-tree` and `diff`
> learned (a) and (b) in
> commit 90a78b83e0b8 "diff: run arguments through precompose_argv"
>
> Branch names (or refs in general) using decomposed code points
> resulting in decomposed file names had been fixed in
> commit 8e712ef6fc97 "Honor core.precomposeUnicode in more places"
>
> The bug report from above shows 2 things:
> - more commands need to handle precomposed unicode
> - (c) should be implemented for all commands using pathspecs
>
> Solution:
> precompose_argv() now handles the prefix (if needed), and is renamed into
> precompose_argv_prefix().
>
> Inside this function the config variable core.precomposeunicode is read
> into the global variable precomposed_unicode, as before.
> This reading is skipped if precomposed_unicode had been read before.
>
> The original patch for preocomposed unicode, 76759c7dff53, placed
> precompose_argv() into parse-options.c
> Now add it into git.c (as well) for commands that don't use parse-options.
> Note that precompose() may be called twice - it is idempotent.

Just as the prefix-less variant was idempotent and that was the
reason why cmd_diff_files() had its own precompose() even if the
incoming argv[] is supposed to be already precomposed because it
was processed in another call to precompose() in run_builtin(),
this patch keeps these seemingly redundant calls, because it is not
meant as a clean-up but as a bugfix for the prefix part.

OK. ... Ah, no, the call in run_builtin() is a new thing.  We didn't
have it, and the redundant call is what this patch introduced, so
we need to be a bit more careful about the analysis here.  It is one
thing to say "we leave the existing iffy code and address only a
single bug" and do so.  It is entirely different to say so and then
do "we introduce an iffy code and address only a single bug".  We
need to admit that what we added _is_ iffy but supposed to be safe.
Just saying "it is supposed to be safe" without saying why it is
iffy is dishonest and does not help future developers who may want
to jump in and clean the code.

Perhaps

	Now add it into git.c::run_builtin() as well.  Existing
	precompose calls in diff-files.c and others would become
	redundant but because we do not want to make sure that there
	is no way for the control to reach them without passing
	run_builtin(), we'll keep them in place just in case.  The
	calls to precompose() are idempotent so it should not hurt.

or something?

>  - And yes, there may be more work to be done. For the moment I would like to
>    get this merged into git.git, and address the rest later.

Sure.

Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 1/1] MacOS: precompose_argv_prefix()
  2021-02-03 19:33   ` Junio C Hamano
@ 2021-02-03 22:13     ` Junio C Hamano
  2021-02-05 17:31       ` Torsten Bögershausen
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2021-02-03 22:13 UTC (permalink / raw)
  To: tboegi; +Cc: git, random_n0body, levraiphilippeblain

Junio C Hamano <gitster@pobox.com> writes:

> Just as the prefix-less variant was idempotent and that was the
> reason why cmd_diff_files() had its own precompose() even if the
> incoming argv[] is supposed to be already precomposed because it
> was processed in another call to precompose() in run_builtin(),
> this patch keeps these seemingly redundant calls, because it is not
> meant as a clean-up but as a bugfix for the prefix part.
>
> OK. ... Ah, no, the call in run_builtin() is a new thing.  We didn't
> have it, and the redundant call is what this patch introduced, so
> we need to be a bit more careful about the analysis here.  It is one
> thing to say "we leave the existing iffy code and address only a
> single bug" and do so.  It is entirely different to say so and then
> do "we introduce an iffy code and address only a single bug".  We
> need to admit that what we added _is_ iffy but supposed to be safe.
> Just saying "it is supposed to be safe" without saying why it is
> iffy is dishonest and does not help future developers who may want
> to jump in and clean the code.
>
> Perhaps
>
> 	Now add it into git.c::run_builtin() as well.  Existing
> 	precompose calls in diff-files.c and others would become
> 	redundant but because we do not want to make sure that there
> 	is no way for the control to reach them without passing
> 	run_builtin(), we'll keep them in place just in case.  The
> 	calls to precompose() are idempotent so it should not hurt.
>
> or something?

FYI, I've tweaked the proposed log message like the following before
queuing.  I think that would be explicit enough to remind us that we
may be able to improve the situation fairly easily.

Thanks.


1:  071dfe8793 ! 1:  5c327502db MacOS: precompose_argv_prefix()
    @@ Commit message
     
         The original patch for preocomposed unicode, 76759c7dff53, placed
         precompose_argv() into parse-options.c
    -    Now add it into git.c (as well) for commands that don't use parse-options.
    -    Note that precompose() may be called twice - it is idempotent.
    +
    +    Now add it into git.c::run_builtin() as well.  Existing precompose
    +    calls in diff-files.c and others may become redundant, and if we
    +    audit the callflows that reach these places to make sure that they
    +    can never be reached without going through the new call added to
    +    run_builtin(), we might be able to remove these existing ones.
    +
    +    But in this commit, we do not bother to do so and leave these
    +    precompose callsites as they are.  Because precompose() is
    +    idempotent and can be called on an already precomposed string
    +    safely, this is safer than removing existing calls without fully
    +    vetting the callflows.
     
         There is certainly room for cleanups - this change intends to be a bug fix.
         Cleanups needs more tests in e.g. t/t3910-mac-os-precompose.sh, and should



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 1/1] MacOS: precompose_argv_prefix()
  2021-02-03 22:13     ` Junio C Hamano
@ 2021-02-05 17:31       ` Torsten Bögershausen
  0 siblings, 0 replies; 27+ messages in thread
From: Torsten Bögershausen @ 2021-02-05 17:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, random_n0body, levraiphilippeblain

On Wed, Feb 03, 2021 at 02:13:53PM -0800, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Just as the prefix-less variant was idempotent and that was the
> > reason why cmd_diff_files() had its own precompose() even if the
> > incoming argv[] is supposed to be already precomposed because it
> > was processed in another call to precompose() in run_builtin(),
> > this patch keeps these seemingly redundant calls, because it is not
> > meant as a clean-up but as a bugfix for the prefix part.
> >
> > OK. ... Ah, no, the call in run_builtin() is a new thing.  We didn't
> > have it, and the redundant call is what this patch introduced, so
> > we need to be a bit more careful about the analysis here.  It is one
> > thing to say "we leave the existing iffy code and address only a
> > single bug" and do so.  It is entirely different to say so and then
> > do "we introduce an iffy code and address only a single bug".  We
> > need to admit that what we added _is_ iffy but supposed to be safe.
> > Just saying "it is supposed to be safe" without saying why it is
> > iffy is dishonest and does not help future developers who may want
> > to jump in and clean the code.
> >
> > Perhaps
> >
> > 	Now add it into git.c::run_builtin() as well.  Existing
> > 	precompose calls in diff-files.c and others would become
> > 	redundant but because we do not want to make sure that there
> > 	is no way for the control to reach them without passing
> > 	run_builtin(), we'll keep them in place just in case.  The
> > 	calls to precompose() are idempotent so it should not hurt.
> >
> > or something?
>
> FYI, I've tweaked the proposed log message like the following before
> queuing.  I think that would be explicit enough to remind us that we
> may be able to improve the situation fairly easily.
>

Thanks for helping me out.
That gives me some time to look deeper and prepare more test cases
ana a then cleanup on top of this patch (later).


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: git-bugreport-2021-01-06-1209.txt (git can't deal with special characters)
  2021-01-13 14:57   ` Daniel Troger
@ 2021-01-16 17:24     ` Torsten Bögershausen
  0 siblings, 0 replies; 27+ messages in thread
From: Torsten Bögershausen @ 2021-01-16 17:24 UTC (permalink / raw)
  To: Daniel Troger; +Cc: Philippe Blain, Randall S. Becker, git, Johannes Schindelin

On Wed, Jan 13, 2021 at 03:57:03PM +0100, Daniel Troger wrote:
> Hej Torsten,
>
> > To clean up the repo, you can do like this, explained in a dummy repo:
>
> your commands to remove one of the versions worked perfectly, thanks!
>
> > And I still womder, how did you mange to create the "decomposed version of å" ?
>
> I created the directory with finder or atom I suppose, I did not find an mkdir command for that directory in my 290MB of bash history.
>
> > I digged some hours into the stuff, add lots of debug traces, patches and stuff
>
> If I understand correctly, you already came up with a patch? That's great news! It's fine if it takes time as long as it gets fixed eventually.

That is a more tricky thing than I thought.
When Git starts, it runs getcwd() to find out where you started.
Inside that whole path there is the Git repo to be found, and the whole
path split into to parts:
The part pointing out the repo and
the part inside the repo.

This is done in setup.c, and it seems as if we need to precompose the whole
path, to work correctly.

However, the configuration saying "core.precomposeunicode" has not been read yet.

And setup.c doesn't know, where to find it, that is my understanding.

When I patch the getcwd() function to try to read the configuration,
it doesn't find it (yet).

It can find the global (or system) config.
Technically speaking, that can be used, but will probably cause other problems,
since sometimes the (repo) local configuration is different, sometimes not.

As far as I understand today, the best thing we can do is to check,
if getcwd() may need a precomposition: precompose and see if
the result is different.

Then error out, telling the user that you can not run Git inside
this directory.

I don't know, if this is worth the effort.

But: If somebody comes up with a patch, I am happy to review it
and/or get it into shape.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: git-bugreport-2021-01-06-1209.txt (git can't deal with special characters)
  2021-01-09 17:23 ` Torsten Bögershausen
@ 2021-01-13 14:57   ` Daniel Troger
  2021-01-16 17:24     ` Torsten Bögershausen
  0 siblings, 1 reply; 27+ messages in thread
From: Daniel Troger @ 2021-01-13 14:57 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Philippe Blain, Randall S. Becker, git, Johannes Schindelin

Hej Torsten, 

> To clean up the repo, you can do like this, explained in a dummy repo: 

your commands to remove one of the versions worked perfectly, thanks!

> And I still womder, how did you mange to create the "decomposed version of å" ?

I created the directory with finder or atom I suppose, I did not find an mkdir command for that directory in my 290MB of bash history.

> I digged some hours into the stuff, add lots of debug traces, patches and stuff

If I understand correctly, you already came up with a patch? That's great news! It's fine if it takes time as long as it gets fixed eventually.

Best regards

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: git-bugreport-2021-01-06-1209.txt (git can't deal with special characters)
  2021-01-08 19:56 git-bugreport-2021-01-06-1209.txt (git can't deal with special characters) Daniel Troger
@ 2021-01-09 17:23 ` Torsten Bögershausen
  2021-01-13 14:57   ` Daniel Troger
  0 siblings, 1 reply; 27+ messages in thread
From: Torsten Bögershausen @ 2021-01-09 17:23 UTC (permalink / raw)
  To: Daniel Troger; +Cc: Philippe Blain, Randall S. Becker, git, Johannes Schindelin

On Fri, Jan 08, 2021 at 08:56:08PM +0100, Daniel Troger wrote:
> > That did the trick, the test case is now reproducible here.
>
> > It seems as if there could be a patch in a couple of days, but no promises.
>
> That's great news, thank you all very much!

Hej Daniel,
Not sure if I can keep my promise, please see below.

> I was already afraid at first it was some weird macOS filesystem bug only happening on high sierra
> and earlier which was never going to be fixed.

I haven't heard about anything special about Mac OS High Sierra.
If there are strange things, please let us know.

(And I still womder, how did you mange to create the "decomposed version of å" ?
 Mac OS itself doesn't decompose the "å", in opposite to "ä" or "ö".
 It creates a name for a file in decomposed form for "ä", "ö" and a lot of other
 characters/code points, but not for "å".
 It does, however, treat the pre- and de-composed form of "å" equivilent)

>
> In the meantime I have managed to add "both directories" to git (with git add paulbr*).
> It's funny, now one directory shows up twice when viewing my repo on gitlab.
>
> And this is the message I get when cloning:
> warning: the following paths have collided (e.g. case-sensitive paths
> on a case-insensitive filesystem) and only one from the same
> colliding group is in the working tree:
>   '[...]/paulbrunngård-springyard/main.ts'
>   '[...]/paulbrunngård-springyard/main.ts'
>
> What will happen with the two folders in the git repo after the patch?
>

Now, that is another question.

First of all, I would recommend to "remove" one of the folders from Git,
because all files are tracked twice in Git, but "shadowed" by the file system.



To clean up the repo, you can do like this, explained in a dummy repo:

#One file (that is all we have) is tracked twice:
user@mac:/tmp/AAA> git ls-files
"paulbrunnga\314\212rd-springyard/horse"
"paulbrunng\303\245rd-springyard/horse"

# Remove all files from Git, but keep them on disk:
user@mac:/tmp/AAA> git rm -rf --cached .
rm 'paulbrunngård-springyard/horse'
rm 'paulbrunngård-springyard/horse'

# Re-add one version, the precomposed:
user@mac:/tmp/AAA> git add .
user@mac:/tmp/AAA> git status
On branch master
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
          deleted:    "paulbrunnga\314\212rd-springyard/horse"

user@mac:/tmp/AAA> git ls-files
"paulbrunng\303\245rd-springyard/horse"

#Not shown:
git commit -m "Remove duplicate directory"

####################

Now back to a potential patch:
That will take some time.
I digged some hours into the stuff, add lots of debug traces,
patches and stuff, and now we need a cleanup and get it into a better
shape to be acceptable.

All the best

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: git-bugreport-2021-01-06-1209.txt (git can't deal with special characters)
@ 2021-01-08 19:56 Daniel Troger
  2021-01-09 17:23 ` Torsten Bögershausen
  0 siblings, 1 reply; 27+ messages in thread
From: Daniel Troger @ 2021-01-08 19:56 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Philippe Blain, Randall S. Becker, git, Johannes Schindelin

> That did the trick, the test case is now reproducible here.

> It seems as if there could be a patch in a couple of days, but no promises.

That's great news, thank you all very much!

I was already afraid at first it was some weird macOS filesystem bug only happening on high sierra and earlier which was never going to be fixed.

In the meantime I have managed to add "both directories" to git (with git add paulbr*). It's funny, now one directory shows up twice when viewing my repo on gitlab.

And this is the message I get when cloning:
warning: the following paths have collided (e.g. case-sensitive paths
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:
  '[...]/paulbrunngård-springyard/main.ts'
  '[...]/paulbrunngård-springyard/main.ts'

What will happen with the two folders in the git repo after the patch?


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2021-02-05 17:37 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-06 11:35 git-bugreport-2021-01-06-1209.txt (git can't deal with special characters) Daniel Troger
2021-01-06 14:21 ` Torsten Bögershausen
2021-01-06 16:49   ` Daniel Troger
2021-01-06 21:47     ` Torsten Bögershausen
2021-01-06 22:21       ` Daniel Troger
2021-01-06 23:07         ` Randall S. Becker
2021-01-07 14:34           ` Philippe Blain
2021-01-07 15:49             ` Torsten Bögershausen
2021-01-07 16:21               ` Philippe Blain
2021-01-08 19:07                 ` Torsten Bögershausen
2021-01-24 15:13 ` [PATCH/RFC v1 1/1] git restore -p . and precomposed unicode tboegi
2021-01-24 19:51   ` Junio C Hamano
2021-01-25 16:53     ` Torsten Bögershausen
2021-01-29 17:15 ` [PATCH v2 1/1] MacOS: precompose_argv_prefix() tboegi
2021-01-29 23:19   ` Junio C Hamano
2021-01-31  0:43   ` Junio C Hamano
2021-01-31  9:50     ` Torsten =?unknown-8bit?Q?B=C3=B6gershausen?=
2021-02-02 15:11 ` [PATCH v3 " tboegi
2021-02-02 17:43   ` Junio C Hamano
2021-02-03 16:28 ` [PATCH v4 " tboegi
2021-02-03 19:33   ` Junio C Hamano
2021-02-03 22:13     ` Junio C Hamano
2021-02-05 17:31       ` Torsten Bögershausen
2021-01-08 19:56 git-bugreport-2021-01-06-1209.txt (git can't deal with special characters) Daniel Troger
2021-01-09 17:23 ` Torsten Bögershausen
2021-01-13 14:57   ` Daniel Troger
2021-01-16 17:24     ` Torsten Bögershausen

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git