git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Torsten Bögershausen" <tboegi@web.de>
To: Junio C Hamano <gitster@pobox.com>
Cc: kusmabite@gmail.com, git@vger.kernel.org
Subject: Re: [PATCH v2] Allow git mv FileA fILEa on case ignore file systems
Date: Sun, 10 Apr 2011 07:48:21 +0200	[thread overview]
Message-ID: <4DA144A5.2080103@web.de> (raw)
In-Reply-To: <7vsjuitk59.fsf@alter.siamese.dyndns.org>

On 20.03.11 06:50, Junio C Hamano wrote:
========================
>Torsten Bögershausen <tboegi@web.de> writes:
>
>> > The typical use case is when a file "FileA" should be renamed into fILEa
>> > and we are on a case insenstive file system (system core.ignorecase = true).
>> > Source and destination are the same file, it can be accessed under both names.
>> > This makes git think that the destination file exists.
>> > Unless used with --forced, git will refuse the "git mv FileA fILEa".
>> > This change will allow "git mv FileA fILEa" under the following condition:
>> > On Linux/Unix/Mac OS X the move is allowed when the inode of the source and
>> > destination are equal (and they are on the same device).
>> > This allows renames of MÄRCHEN into Märchen on Mac OS X.
>> > (As a side effect, a file can be renamed to a name which is already
>> > hard-linked to the same inode).
>> > On Windows, the function win_is_same_file() from compat/win32/same-file.c
>> > is used.
>> > It calls GetFileInformationByHandle() to check if both files are
>> > "the same".
>Yeek; is it just me or the above single block of text too dense to be
>readable? Can you use paragraph breaks?
Yes
>
>> > The typical use case is when a file "FileA" should be renamed into fILEa
>> > and we are on a case insenstive file system (system core.ignorecase = true).
>Huh? I don't think renaming "FileA" to "fILEa" is typical at all. It is
>very rarely done.
Probably this is not a good example, I changed it to "mv FILE File"

>
>> > (As a side effect, a file can be renamed to a name which is already
>> > hard-linked to the same inode).
>It is unclear "a side effect" means "an added bonus" or "a regression" in
>this sentence. I think this is latter.
If it is too much regression, we can check that the link count in struct stat
"if (1 == st_nlink)" .


>
>Allowing filesystem specific logic to detect that two different "names"
>actually refer to a single "file" and help renaming succeed is a sane
>approach, but I think this particular implementation is flawed.
>
>The important thing to notice is that "names" and "file" above refer to
>the entities from the end user's point of view. Two files hardlinked
>together on a filesystem with sane pathname handling are never the same
>"file". I would probably have called it equivalent_filenames() to stress
>the fact that two _different_ names alias to the same file. is_same_file()
>sounds more like you got two different filenames from the filesystem
>(i.e. readdir() gave you both at the same time) and you are trying to see
>if they are the same file, but that is not the case here.
Ok, I need to distinct beetwen file names and files.


>I also find it a bad taste to make this feature depend on win32; doesn't a
>Linux box mounting a vfat filesystem have the same issue that we should be
>able to solve with the same code?  Can't we instead have a configuration
>variable that tells git that the working tree is on a filesystem with
>broken pathname semantics, and what kind of workaround is needed?  Isn't
>core.ignorecase already that configuration variable for case insensitve
>filesystems [*1*]?
Agreed about the bad taste ;-)
The suggested patch works for Linux/UNIX/MAC OS X and Windows, 
but the description wasn't to clear about it.
The same code is used for all OS, except Windows that needs some specific code 
which is located in cygwin.c and mingw.c

>
>I would imagine that the implementation of equivalent_filenames(a,b) may
>be !strcmp(a,b) for a sane filesystem [*2*] and !strcasecmp(a,b) for a
>case insensitive filesystem.  For a totally wacky filesystem, your
>lstat(2) based one might end up to be the best we could do [*3*].
>
>When two different _names_ "A" and "a" refer to a single file, the only
>thing that should happen for "git mv A a" is for the cache entry for "A"
>to be moved to cache entry for "a", and no rename("A", "a") should be run,
>but I don't see any such change in the code. It may happen to work (or be
>a no-op) on Windows, but because builtin/mv.c is supposed to be generic
>(and that is the reason you introduced the is_same_file() abstraction in
>the first place), I'd still see this as a breakage.
>
Why shouldn't the rename() be done?
"git mv A B" changes both the indes and the file system.
Isn't it natural to have file name  "a" both in the index and in the 
file system after "git mv A a"?
Note: Windows and MAC OS X allow "mv A a" from command line, 
while Linux on VFAT gives an error "'A' and 'a' are the same file".

>
>[Footnote]
>
>*1* Off the top of my head, perhaps core.ignorecase may have to grow into
>boolean plus extra to cover "this is not just case insensitive, but isn't
>even case preserving" kind of broken filesystems like HFS+, but I didn't
>think things through.
It is not that bad.
My HFS+ here is both case insenstive and case preserving,
so that core.ignorecase can be used as it is.

Note: HFS+ can be formated to be case sensitive (at least at MAC OS X 10.6),
but the default is case insensitive.

>
>*2* Incidentally, wouldn't "!strcmp(a,b)" solution suddenly start allowing
>"git mv Makefile Makefile" that we currently disallow? Is it a regression
>(less safety against an unexpected input) or a good change?

When running "git mv git.c git.c", the current git says:
"fatal: can not move directory into itself, source=git.c, destination=git.c"
That test is done earlier in the code path, 
so that this move is still not allowed.
The error message is somewhat confusing, as git.c is a file.
Is it worth to send a separate patch for that?

>
>*3* If we can find a solution that does not involve any calls to the
>filesystem, it would be ideal, as we can reuse it later in codepaths where
>neither file "a" or file "b" exists on the filesystem yet (think: "we are
>about to create 'a' and 'b'---is that sensible, or will they overwrite
>with each other on this filesystem?").
>
That would mean, that git needs to know the encoding of the local filesystem.
But we don't have that at the moment.
What do you have in mind?


I run a test script (see at the end of the mail, save it under test-mv.sh and
execute it with /bin/sh test-mv.sh).
I tested Linux, MAC OS X, cygwin 1.5, cygwin 1.7 and current msys(git).
Short summary: There are different flavors which filenames are equivalent.

I send out a new patch, thank you for reading this long stuff.

/Torsten


=======================
Linux 2.6.34.7-0.7-default
/dev/sda6 on /D type vfat (rw,noexec,nosuid,nodev,gid=100,umask=0002,utf8=true)
enc=UTF-8
a=A    Case ignored
mv: `a' and `A' are the same file

æ=Æ    Case ignored
mv: `æ' and `Æ' are the same file

ø=ø    Case sensitive
mv:  ø  -->  Ø  OK

ä=Ä    Case ignored
mv: `ä' and `Ä' are the same file
==================================================================
Linux 2.6.34.7-0.7-default
/dev/sda6 on /D type vfat (rw)
enc=UTF-8
a=A    Case ignored
mv: `a' and `A' are the same file

æ=æ    Case sensitive
mv:  æ  -->  Æ  OK

ø=ø    Case sensitive
mv:  ø  -->  Ø  OK

ä=ä    Case sensitive
mv:  ä  -->  Ä  OK
==================================================================
Linux 2.6.34.7-0.7-default
//birne/tb on /birne/tb type cifs (rw)
enc=UTF-8
a=A    Case ignored
mv:  a  -->  A  OK

æ=Æ    Case ignored
mv:  æ  -->  Æ  OK

ø=Ø    Case ignored
mv:  ø  -->  Ø  OK

ä=Ä    Case ignored
mv:  ä  -->  Ä  OK
==================================================================
Darwin 10.7.0
/dev/disk0s2 on / (hfs, local, journaled)
enc=UTF-8
a=A    Case ignored
mv:  a  -->  A  OK

æ=Æ    Case ignored
mv:  æ  -->  Æ  OK

ø=Ø    Case ignored
mv:  ø  -->  Ø  OK

ä=Ä    Case ignored
mv:  ä  -->  Ä  OK
==================================================================
Darwin 10.7.0
/dev/disk1s3 on /Volumes/LACIEFAT (msdos, local, nodev, nosuid, noowners)
enc=UTF-8
a=A    Case ignored
mv:  a  -->  A  OK

æ=Æ    Case ignored
mv:  æ  -->  Æ  OK

ø=Ø    Case ignored
mv:  ø  -->  Ø  OK

ä=Ä    Case ignored
mv:  ä  -->  Ä  OK
==================================================================
Darwin 10.7.0
/dev/disk1s2 on /Volumes/LacieMacOS (hfs, local, nodev, nosuid, journaled)
enc=UTF-8
a=a    Case sensitive
mv:  a  -->  A  OK

æ=æ    Case sensitive
mv:  æ  -->  Æ  OK

ø=ø    Case sensitive
mv:  ø  -->  Ø  OK

ä=ä    Case sensitive
mv:  ä  -->  Ä  OK
==================================================================
CYGWIN_NT-5.1 1.5.25(0.156/4/2)
C:\cygwin on / type system (binmode)
enc=ISO-8859-1
a=A    Case ignored
mv:  a  -->  A  OK

æ=Æ    Case ignored
mv: `æ' and `Æ' are the same file

ø=Ø    Case ignored
mv: `ø' and `Ø' are the same file

ä=Ä    Case ignored
mv: `ä' and `Ä' are the same file
==================================================================
CYGWIN_NT-5.1 1.7.8(0.236/5/3)
C:/cygwin on / type ntfs (binary,auto)
enc=UTF-8
a=A    Case ignored
mv:  a  -->  A  OK

æ=Æ    Case ignored
mv: `æ' and `Æ' are the same file

ø=Ø    Case ignored
mv: `ø' and `Ø' are the same file

ä=Ä    Case ignored
mv: `ä' and `Ä' are the same file
==================================================================
MINGW32_NT-5.1 1.0.12(0.46/3/2)
C:\msysgit\msysgit on / type user (binmode,noumount)
enc=ISO-8859-1
a=A    Case ignored
mv:  a  -->  A  OK

æ=Æ    Case ignored
mv:  æ  -->  Æ  OK

ø=Ø    Case ignored
mv:  ø  -->  Ø  OK

ä=Ä    Case ignored
mv:  ä  -->  Ä  OK





==========================================================
#!/bin/sh
testmv() {
  (echo $1 > $1 && echo $2 > $2) || {
    echo >&2 "Wrong encoding $enc"
    cd ..
    rm -rf $$.trash
    exit 1
  }
  a=$(cat $1)
  printf "$1=$a"
  if test $(cat $1) != $1; then
    echo "    Case ignored"
  else
    echo "    Case sensitive"
  fi
  rm -f $1 $2

  echo $1 > $1
  if mv $1 $2; then
    echo "mv:  $1  -->  $2  OK"
  fi
  rm -f $1 $2
  echo
  return 0
}

rm -rf $$.trash
mkdir $$.trash || {
  echo >&2 "mkdir $$.trash failed"
  exit 1
}

cd $$.trash || {
  echo >&2 "cd $$.trash failed"
  exit 1
}
uname -sr
# get root dir
rdir=$PWD
while test "$rdir" != ""
do
  if mount | grep $rdir; then
    break
  fi
  rdir=${rdir%/*}
done
if test -z $rdir; then
  mount | grep " / "
fi

enc=ISO-8859-1
#Find out if utf-8 is used
case "$LANG" in
  *[uU][tT][fF]*8)
  enc=UTF-8
  ;;
esac
case "$LC_CTYPE" in
  *[uU][tT][fF]*8)
  enc=UTF-8
  ;;
esac

case $(uname) in
  Darwin)
  enc=UTF-8
  ;;
esac

echo enc=$enc

case "$enc" in
  UTF-8)
  ae=$(printf '\303\246')
  AE=$(printf '\303\206')
  oe=$(printf '\303\270')
  OE=$(printf '\303\230')
  auml=$(printf '\303\244')
  Auml=$(printf '\303\204')
  ;;
  *)
  ae=$(printf '\346')
  AE=$(printf '\306')
  oe=$(printf '\370')
  OE=$(printf '\330')
  auml=$(printf '\344')
  Auml=$(printf '\304')
  ;;
esac

testmv a A
testmv $ae $AE
testmv $oe $OE
testmv $auml $Auml

cd ..
rm -rf $$.trash

  reply	other threads:[~2011-04-10  5:48 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-19 14:28 [PATCH v2] Allow git mv FileA fILEa on case ignore file systems Torsten Bögershausen
2011-03-19 18:20 ` Erik Faye-Lund
2011-03-19 19:30   ` Piotr Krukowiecki
2011-03-20  5:50 ` Junio C Hamano
2011-04-10  5:48   ` Torsten Bögershausen [this message]
2011-04-11 16:55     ` Junio C Hamano
2011-04-11 20:05       ` Torsten Bögershausen
2011-04-12  6:16       ` Johannes Sixt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DA144A5.2080103@web.de \
    --to=tboegi@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=kusmabite@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).