git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
From: Mark Amery <markrobertamery@gmail.com>
To: "Torsten Bögershausen" <tboegi@web.de>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system
Date: Thu, 6 May 2021 10:12:40 +0100	[thread overview]
Message-ID: <CAD8jeghZKDcp=weHtcMZ4z8KaO1jQJqfPqaRtYgtiwrX-1+NNg@mail.gmail.com> (raw)
In-Reply-To: <20210506043429.zqgzxjrj643avrns@tb-raspi4>

So, I'm just a dumb Git user who doesn't even write C, so much of this
discussion is over my head, but I have a few thoughts that may be
helpful:

• The mv utility on Mac is capable of doing `mv bär.txt bÄr.txt` just
fine. Maybe `git mv` can learn something from whatever `mv` does?

• On a case-insensitive file system, `git mv somedir sOMEdir` is a
rename. But on a case-sensitive file system, it might NOT be a rename;
it might be the case that `somedir` and `sOMEdir` both exist and that
the command should put `somedir` inside `sOMEdir`. I mention this
because I can imagine some naive attempts at fixing the original bug
by doing a case-insensitive comparison of the two names ending up
breaking this behaviour on case-sensitive file systems by wrongly
treating such a command as a rename. It's probably worth having a test
that this scenario gets handled cleanly on case-sensitive file
systems? (I haven't checked whether Torsten's proposed diff falls into
this trap or not.)

• Above, Torsten mentions that there are filesystem-specific rules
about what names are equal to each other that Git can't easily handle,
because they go beyond just ASCII case changes. In that case, maybe
the right solution is to always defer the question to the filesystem
rather than Git trying to figure out the answer "in its head"?

  That is: first check the inode or file ID of the src and dst passed
to `git mv`. If they are different and the second one is a folder,
move src inside the existing folder. If either they are the same or
the second one is not a folder, then do a rename.

  It seems to me that this approach automatically handles stuff like
`git mv bär.txt bÄr.txt` plus any other rules about names being equal
(like two different sequences of code points that both express "à"),
all without Git ever needing to explicitly check whether two names are
case-insensitively equal. Am I missing something?

Sorry if any of the above is dumb or if I'm reiterating things others
have already said without realising it.

On Thu, May 6, 2021 at 5:34 AM Torsten Bögershausen <tboegi@web.de> wrote:
>
> On Wed, May 05, 2021 at 09:23:05AM +0900, Junio C Hamano wrote:
> > Torsten Bögershausen <tboegi@web.de> writes:
> >
> > > To my undestanding we try to rename
> > > foo/ into FOO/.
> > > But because FOO/ already "exists" as directory,
> > > Git tries to move foo/ into FOO/foo, which fails.
> > >
> > > And no, the problem is probably not restricted to MacOs,
> > > Windows and all case-insenstive file systems should show
> > > the same, but I haven't tested yet, so it's more a suspicion.
> > >
> > > The following diff allows to move foo/ into FOO/
> > > If someone wants to make a patch out if, that would be good.
> >
> > Is strcasecmp() sufficient for macOS whose filesystem has not just
> > case insensitivity but UTF-8 normalization issues?
> >
>
> Strictly speaking: no.
>
> The Git code doesn't handle UTF-8 uppper/lower case at all:
> git mv bar.txt BAR.TXT works because strcasecmp() is catching it.
>
> git mv bär.txt BÄR.TXT needs the long way:
> git mv bär.txt baer.txt && git mv baer.txt BÄR.TXT
>
> We have been restricting the case-change-is-allowed to ASCII filenames
> all the time.
> There is no information, which code points map onto each other in Git,
> since this is all file system dependent.
> NTFS has one way, HFS+, APFS another, VFAT a third one, and if I expose
> ext4 via SAMBA we probably have another one.
> Not mentioniong that ext4 can be use case-insensitve on later Linux kernels,
> which sticks to unicode.
> Or Git repos running on machines using ISO-8859-1, those should be rare these
> days.
>
> That said, people are renaming files in ASCII only and are happy,
> and in that sense renaming directories in ASCII can be supported
> without major hassle.
>
> And the inode approach mentioned as well:
> This could go on top of strcasecmp() to cover non-ASCII filenames
> or other oddities, if someone implements it.
>
>

  reply	other threads:[~2021-05-06  9:12 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-03 17:25 Mark Amery
2021-05-03 22:58 ` brian m. carlson
2021-05-04  3:46   ` Junio C Hamano
2021-05-04 11:20     ` brian m. carlson
2021-05-05 13:51       ` Johannes Schindelin
2021-05-06  0:38         ` Junio C Hamano
2021-05-04 15:19 ` Torsten Bögershausen
2021-05-05  0:23   ` Junio C Hamano
2021-05-05  2:12     ` brian m. carlson
2021-05-06  4:34     ` Torsten Bögershausen
2021-05-06  9:12       ` Mark Amery [this message]
2021-05-06 13:11         ` Bagas Sanjaya
2021-05-06 14:53         ` Torsten Bögershausen
2021-05-06 21:03         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAD8jeghZKDcp=weHtcMZ4z8KaO1jQJqfPqaRtYgtiwrX-1+NNg@mail.gmail.com' \
    --to=markrobertamery@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=tboegi@web.de \
    --subject='Re: Bug: Changing folder case with `git mv` crashes on case-insensitive file system' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).