git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Kevin Ballard <kevin@sb.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Theodore Tso <tytso@MIT.EDU>, Mike Hommey <mh@glandium.org>,
	git@vger.kernel.org
Subject: Re: git on MacOSX and files with decomposed utf-8 file names
Date: Wed, 23 Jan 2008 12:19:02 -0500	[thread overview]
Message-ID: <98F90EB6-1930-4643-8C6C-CA11CB123BAA@sb.org> (raw)
In-Reply-To: <alpine.LFD.1.00.0801230808440.1741@woody.linux-foundation.org>

[-- Attachment #1: Type: text/plain, Size: 4635 bytes --]

On Jan 23, 2008, at 11:16 AM, Linus Torvalds wrote:

> On Wed, 23 Jan 2008, Theodore Tso wrote:
>>
>> So this demonstrates that on my MacOS 10.4.11 system, on NFS, MacOS  
>> is
>> doing no normalization, as it is creating two files.  On HFS+, MacOS
>> is mapping both filenames to the same decomposed name.
>
> Well, it demonstrates that (a) the OS and (b) _perl_ don't mangle
> filenames on non-HFS+ filesystems.
>
> The problem is that since most native applications *expect* that name
> mangling, they'll probably do name mangling of their own  
> (internally) just
> to compare the names!

Well yes, any context in which a string is treated as Unicode instead  
of an opaque sequence of bytes will probably lead to normalization at  
some point (e.g. when searching text, I'm going to want Märchen and  
Märchen to be treated as the same string). The Mac OS X APIs use NFD,  
and everybody else uses NFC, but either way it's still normalization.

> So I would not be surprised if the globbing libraries, for example,  
> will
> do NFD-mangling in order to glob "correctly", so even programs  
> ported from
> real Unix might end up getting pathnames subtly changed into NFD as  
> part
> of some hot library-on-library action with UTF hackery inside.

Why would the globbing libraries have to do anything special to  
understand NFD? In fact, I prefer that they don't - it's very handy to  
be able to type Ma* and have that match Märchen, as the globbing  
library sees Ma??rchen and is happy to match the ??rchen against *.  
Were the filename in NFC, I couldn't do that. Similarly, Ma<tab>  
autocompletes the name Märchen for me. But the convenience is beside  
the point - what I'm trying to show here is that if the globbing  
library were NFD-aware, it probably would decide Ma* shouldn't match  
Märchen, right?

I assume globbing libraries et al don't do UTF-8 hackery in Linux,  
right? And yet using NFC-encoded filenames is fairly common? So why  
should it be any different on OS X, especially since HFS+ isn't the  
only option here (and thus doing NFD conversion in the library would  
mess up other filesystems)?

In fact, probably the biggest reason the NFD-encoding was done at the  
HFS+ level is because they simply couldn't trust user-level libraries  
to always do the NFD conversion for pathnames. And I quote:

"I would prefer that case sensitivity and unicode normalization were  
not the responsibility of the file system -- but I realize that we  
cannot just ignore the problem and let the other layers sort it all  
out."

> Things like the finder etc, which must be very aware of the fact that
> filenames get corrupted, would presumably internally always convert
> everything they get into NFD in order to compare names from different
> sources. And as part of that, programs may well corrupt the name  
> before
> they then use it to create a pathname.

I don't get why you're still calling it corruption when, on an HFS+  
system, NFD-encoding is correct. It would be corruption for HFS+ to  
write anything else but NFD.

> The fact that your perl program works under NFS, but creates NFD on  
> a VFAT
> volume, does imply that they probably used at least some of the same
> routines they use in HFS+ for VFAT. Not entirely surprising: doing  
> case
> insensitive stuff with Unicode is nasty code, so why not share it  
> (even if
> it's then incorrect for FAT)..
>
> Piece of crap it is, though. Apple has painted themselves into a nasty
> corner there.

There's no reason to assume that OS X is actually storing the NFD on  
the volume. In fact, it's quite explicitly not:

"As far as storing exactly what was passed in,  its not just HFS  
that's involved her.  In Mac OS X,  SMB, MSDOS, UDF, ISO 9660  
(Joliet), NTFS and ZFS file systems all store in one form -- NFC.  We  
store in NFC since that what is expected for these files systems.  If  
we were to allow KFD to pass through, it would cause problems when  
these names were accessed outside of Mac OS X.  So this is not just an  
HFS issue but an interchange issue for Mac OS X.  We have the legacy  
NFD use/expectation in our applications and we chose not to ignore the  
problem but make a conscience effort to have the appropriate form used  
(NFD in Mac OS X APIs, NFC elsewhere).  Its not perfect but neither is  
the agnostic approach where both forms can be used and you can have  
duplicate filenames in your file system."

-Kevin Ballard

-- 
Kevin Ballard
http://kevin.sb.org
kevin@sb.org
http://www.tildesoft.com



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 2432 bytes --]

  parent reply	other threads:[~2008-01-23 17:19 UTC|newest]

Thread overview: 260+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-16 15:17 git on MacOSX and files with decomposed utf-8 file names Mark Junker
2008-01-16 15:34 ` Johannes Schindelin
2008-01-16 15:43   ` Kevin Ballard
2008-01-16 16:32     ` Johannes Schindelin
2008-01-16 16:46       ` Jakub Narebski
2008-01-16 20:39         ` Kevin Ballard
2008-01-16 21:51           ` Jakub Narebski
2008-01-16 22:06             ` Kevin Ballard
2008-01-16 22:23               ` Johannes Schindelin
2008-01-16 23:16                 ` Kevin Ballard
2008-01-16 22:32               ` Linus Torvalds
2008-01-16 22:52                 ` Linus Torvalds
2008-01-16 23:11                 ` Kevin Ballard
2008-01-16 23:38                   ` Linus Torvalds
2008-01-16 23:57                     ` Pedro Melo
2008-01-17  0:16                       ` Linus Torvalds
2008-01-17  0:27                         ` Pedro Melo
2008-01-17  0:32                           ` David Kastrup
2008-01-17  0:40                             ` Pedro Melo
2008-01-17  0:54                               ` Wincent Colaiuta
2008-01-17  1:08                                 ` Johannes Schindelin
2008-01-17  1:41                                   ` Linus Torvalds
2008-01-17  4:07                                     ` Kevin Ballard
2008-01-17  0:35                           ` Johannes Schindelin
2008-01-17  0:45                             ` Pedro Melo
2008-01-18  8:29                         ` Peter Karlsson
2008-01-18 11:16                           ` Jakub Narebski
2008-01-16 23:58                     ` David Kastrup
2008-01-17  0:19                       ` Linus Torvalds
2008-01-17  0:09                     ` Kevin Ballard
2008-01-17  0:25                       ` Linus Torvalds
2008-01-17  0:33                         ` Johannes Schindelin
2008-01-17  0:43                           ` Pedro Melo
2008-01-17  0:57                             ` Johannes Schindelin
2008-01-17  1:06                           ` Linus Torvalds
2008-01-17  1:16                       ` Linus Torvalds
2008-01-17  3:52                         ` Kevin Ballard
2008-01-17  4:08                           ` Linus Torvalds
2008-01-17  4:30                             ` Kevin Ballard
2008-01-17  4:51                               ` Martin Langhoff
2008-01-17  5:23                                 ` Kevin Ballard
2008-01-17  6:13                                   ` Geert Bosch
2008-01-17  7:11                                     ` Mitch Tishmack
2008-01-17 10:22                                       ` Wincent Colaiuta
2008-01-17 13:44                                         ` Kevin Ballard
2008-01-17 15:57                                           ` Johannes Schindelin
2008-01-17 16:53                                             ` Kevin Ballard
2008-01-18  0:44                                               ` Robin Rosenberg
2008-01-17 14:02                                     ` Andrew Heybey
2008-01-17 15:04                                       ` Kevin Ballard
2008-01-19 19:29                                         ` Kyle Moffett
2008-01-19 19:57                                           ` Kevin Ballard
2008-01-17 10:08                             ` Wincent Colaiuta
2008-01-17 16:43                               ` Linus Torvalds
2008-01-17 18:09                                 ` Mark Junker
2008-01-17 18:12                                   ` Pedro Melo
2008-01-17 18:18                                     ` Johannes Schindelin
2008-01-17 18:36                                       ` Mark Junker
2008-01-17 18:38                                       ` Pedro Melo
2008-01-17 18:44                                     ` Linus Torvalds
2008-01-17 19:02                                       ` Pedro Melo
2008-01-17 18:42                                   ` Linus Torvalds
2008-01-17 18:50                                     ` Mark Junker
2008-01-17 18:52                                     ` Pedro Melo
     [not found]                                       ` <alpine.LFD.1.00.0801 171100330.14959@woody.linux-foundation.org>
2008-01-17 19:01                                       ` Theodore Tso
2008-01-17 19:11                                       ` Linus Torvalds
2008-01-18  0:18                                         ` Kevin Ballard
2008-01-18  0:35                                           ` Linus Torvalds
2008-01-18  1:05                                         ` Robin Rosenberg
2008-01-18  1:24                                           ` Linus Torvalds
2008-01-18  4:08                                             ` Brian Dessent
2008-01-18  8:49                                             ` Dmitry Potapov
2008-01-18  9:42                                             ` Robin Rosenberg
2008-01-18 10:30                                               ` Dmitry Potapov
2008-01-18 15:37                                                 ` Peter Karlsson
2008-01-18 17:24                                                   ` Jakub Narebski
2008-01-18 10:19                                         ` Peter Karlsson
2008-01-18 10:50                                           ` Dmitry Potapov
2008-01-18 15:30                                             ` Peter Karlsson
2008-01-18 17:11                                           ` Linus Torvalds
2008-01-18 20:24                                             ` Kevin Ballard
2008-01-19  8:48                                               ` Dmitry Potapov
2008-01-19 14:55                                                 ` Kevin Ballard
2008-01-19 21:17                                                   ` Dmitry Potapov
2008-01-19 18:58                                                 ` Linus Torvalds
2008-01-19 20:39                                                   ` Mark Junker
2008-01-19 22:58                                                   ` Johannes Schindelin
2008-01-20  6:14                                                     ` Dmitry Potapov
2008-01-20  6:53                                                       ` Linus Torvalds
2008-01-20 13:15                                                       ` Johannes Schindelin
2008-01-20  0:11                                                   ` Wincent Colaiuta
2008-01-20  1:04                                                     ` Linus Torvalds
2008-01-20  5:27                                                       ` Mike Hommey
2008-01-20  5:45                                                         ` Linus Torvalds
2008-01-20  7:00                                                           ` Mike Hommey
2008-01-20  7:26                                                             ` Linus Torvalds
2008-01-20  8:00                                                             ` Dmitry Potapov
2008-01-20  8:12                                                               ` Dmitry Potapov
2008-01-20  9:34                                                       ` Wincent Colaiuta
2008-01-18 20:28                                             ` Junio C Hamano
2008-01-18 20:50                                               ` Johannes Schindelin
2008-01-23  2:46                                               ` Eric W. Biederman
2008-01-23  2:57                                                 ` Junio C Hamano
2008-01-23 14:26                                                   ` Nicolas Pitre
2008-01-23 21:19                                                     ` Junio C Hamano
2008-01-21 14:14                                             ` Peter Karlsson
2008-01-21 16:43                                               ` Kevin Ballard
2008-01-21 16:48                                                 ` David Kastrup
2008-01-21 16:59                                                   ` Kevin Ballard
2008-01-21 20:43                                                     ` Dmitry Potapov
2008-01-21 20:53                                                       ` Kevin Ballard
2008-01-21 21:05                                                         ` David Kastrup
2008-01-21 23:01                                                         ` Dmitry Potapov
2008-01-21 16:53                                                 ` Jeff King
2008-01-21 17:08                                                 ` Nicolas Pitre
2008-01-21 17:25                                                   ` Kevin Ballard
2008-01-21 20:35                                                     ` David Kastrup
2008-01-21 20:32                                                   ` David Kastrup
2008-01-21 18:12                                                 ` Linus Torvalds
2008-01-21 19:05                                                   ` Kevin Ballard
2008-01-21 19:41                                                     ` Linus Torvalds
2008-01-21 19:58                                                       ` Kevin Ballard
2008-01-21 20:33                                                         ` Linus Torvalds
2008-01-21 20:53                                                           ` Kevin Ballard
     [not found]                                                             ` <alpine.LFD.1.0! 0.0801211323120.2957@woody.linux-foundation.org>
2008-01-21 20:58                                                             ` David Kastrup
2008-01-21 21:17                                                             ` Martin Langhoff
2008-01-21 21:28                                                               ` Kevin Ballard
2008-01-21 21:43                                                                 ` Martin Langhoff
2008-01-21 21:33                                                             ` Linus Torvalds
2008-01-21 21:49                                                               ` Kevin Ballard
2008-01-21 22:34                                                                 ` Linus Torvalds
2008-01-21 22:46                                                                   ` Kevin Ballard
2008-01-21 22:56                                                                     ` Martin Langhoff
     [not found]                                                                       ` <53C76BEA-2232-4940-8776-9DF1880089A4@sb.org>
2008-01-21 23:05                                                                         ` Kevin Ballard
2008-01-21 23:16                                                                         ` Martin Langhoff
2008-01-22  0:30                                                                           ` Kevin Ballard
2008-01-21 23:00                                                                     ` Theodore Tso
2008-01-21 23:09                                                                       ` Kevin Ballard
2008-01-21 23:44                                                                     ` Linus Torvalds
2008-01-22  0:47                                                                       ` Kevin Ballard
2008-01-22  1:01                                                                         ` Linus Torvalds
2008-01-22  1:13                                                                           ` Linus Torvalds
2008-01-22  2:33                                                                             ` Kevin Ballard
2008-01-22  2:50                                                                               ` Linus Torvalds
2008-01-22  3:04                                                                                 ` Kevin Ballard
2008-01-22  3:17                                                                                   ` Linus Torvalds
2008-01-22  3:21                                                                                   ` Martin Langhoff
2008-01-22  4:22                                                                                     ` Kevin Ballard
     [not found]                                                                                   ` <20080122133427.GB17804@mit.edu>
2008-01-23  0:08                                                                                     ` Theodore Tso
2008-01-23  0:38                                                                                       ` Kevin Ballard
2008-01-23  1:47                                                                                         ` Martin Langhoff
2008-01-23  2:06                                                                                         ` Theodore Tso
2008-01-23  8:45                                                                                         ` David Kastrup
2008-01-23  0:38                                                                                       ` Linus Torvalds
2008-01-23  1:14                                                                                         ` Martin Langhoff
2008-01-23  1:16                                                                                         ` Kevin Ballard
2008-01-23  1:27                                                                                           ` Martin Langhoff
2008-01-23  1:33                                                                                         ` Theodore Tso
2008-01-23  1:56                                                                                           ` Linus Torvalds
2008-01-23  2:02                                                                                             ` Kevin Ballard
2008-01-23  6:41                                                                                           ` Mike Hommey
2008-01-23  8:15                                                                                             ` Kevin Ballard
2008-01-23  8:43                                                                                               ` Dmitry Potapov
2008-01-23  9:02                                                                                                 ` Jonathan del Strother
2008-01-23  9:12                                                                                                   ` Dmitry Potapov
2008-01-23  9:19                                                                                                     ` Mike Hommey
2008-01-23  9:32                                                                                                       ` Dmitry Potapov
2008-01-23  9:40                                                                                               ` Mike Hommey
2008-01-23 13:38                                                                                                 ` Theodore Tso
2008-01-23 16:16                                                                                                   ` Linus Torvalds
2008-01-23 17:12                                                                                                     ` Theodore Tso
2008-01-23 17:19                                                                                                     ` Kevin Ballard [this message]
2008-01-23 17:32                                                                                                       ` Linus Torvalds
2008-01-24 21:02                                                                                                         ` On pathnames Junio C Hamano
2008-01-24 22:31                                                                                                           ` Nicolas Pitre
2008-01-25  3:55                                                                                                             ` Martin Langhoff
2008-01-25  4:18                                                                                                               ` Junio C Hamano
2008-01-25  4:12                                                                                                             ` Junio C Hamano
2008-01-25  8:08                                                                                                               ` Pedro Melo
2008-01-25 12:25                                                                                                               ` Johannes Schindelin
2008-01-25 12:50                                                                                                                 ` David Kastrup
2008-01-25 12:53                                                                                                                 ` Wincent Colaiuta
2008-01-24 23:56                                                                                                           ` Sean
2008-01-25  0:36                                                                                                           ` Johannes Schindelin
2008-01-25  4:00                                                                                                           ` Daniel Barkalow
2008-01-25  4:21                                                                                                             ` Junio C Hamano
2008-01-25 11:36                                                                                                               ` Johannes Schindelin
2008-01-25 16:25                                                                                                                 ` Daniel Barkalow
2008-01-25 17:34                                                                                                                   ` Johannes Schindelin
2008-01-25  5:59                                                                                                             ` Jeff King
2008-01-23 20:18                                                                                                       ` git on MacOSX and files with decomposed utf-8 file names Jay Soffian
     [not found]                                                                                                         ` <1DC841ED-634F-412C-9560-F37E4172A4CD@sb.org>
     [not found]                                                                                                           ` <76718490801231421l7b6552f8sec13f570360198b@mail.gmail.com>
     [not found]                                                                                                             ` <4F906435-A186-4E98-8865-F185D75F14D4@sb.org>
     [not found]                                                                                                               ` <76718490801231517h6d57e5bfkc19d394d38ad19db@mail.gmail.com>
2008-01-24  2:05                                                                                                                 ` Kevin Ballard
2008-01-24  3:11                                                                                                                   ` Junio C Hamano
2008-01-24  4:37                                                                                                                     ` Martin Langhoff
2008-01-24  5:30                                                                                                                       ` Kevin Ballard
2008-01-24  6:39                                                                                                                         ` Steffen Prohaska
2008-01-24 18:17                                                                                                                           ` Mitch Tishmack
2008-01-24 18:52                                                                                                                           ` Mitch Tishmack
2008-01-24 19:58                                                                                                                             ` Kevin Ballard
2008-01-23 23:37                                                                                                       ` Martin Langhoff
2008-01-23 16:58                                                                                                 ` Kevin Ballard
2008-01-23 17:39                                                                                                   ` Dmitry Potapov
2008-01-23 17:47                                                                                                     ` Kevin Ballard
2008-01-21 19:57                                                     ` Theodore Tso
2008-01-21 20:01                                                       ` Kevin Ballard
2008-01-21 20:15                                                         ` Theodore Tso
2008-01-21 20:31                                                           ` Kevin Ballard
2008-01-21 20:46                                                             ` Theodore Tso
2008-01-21 20:59                                                               ` Kevin Ballard
     [not found]                                                               ` <6E303071-82A4-4D69-AA0C-EC41168B9AFE@sb.org>
2008-01-21 21:18                                                                 ` Theodore Tso
2008-01-21 21:43                                                                   ` Kevin Ballard
2008-01-21 21:49                                                                     ` Martin Langhoff
2008-01-21 21:57                                                                       ` Kevin Ballard
2008-01-22  0:36                                                                         ` Johannes Schindelin
2008-01-22  0:42                                                                           ` Kevin Ballard
2008-01-22  0:48                                                                             ` David Kastrup
2008-01-22  1:06                                                                             ` Martin Langhoff
2008-01-22  1:34                                                                             ` Johannes Schindelin
2008-01-22  1:53                                                                               ` Martin Langhoff
2008-01-22  2:03                                                                                 ` Johannes Schindelin
2008-01-21 22:38                                                                     ` David Kastrup
2008-01-22  2:34                                                                       ` Kevin Ballard
2008-01-22  7:51                                                                         ` David Kastrup
2008-01-21 20:56                                                     ` Dmitry Potapov
2008-01-21 21:07                                                       ` Kevin Ballard
2008-01-21 22:41                                                         ` Dmitry Potapov
2008-01-21 22:53                                                           ` Kevin Ballard
2008-01-21 23:21                                                             ` Dmitry Potapov
2008-01-21 19:44                                                   ` Mike Hommey
2008-01-21 20:36                                                   ` Dmitry Potapov
2008-01-21 21:06                                                   ` Martin Langhoff
2008-01-21 21:09                                                     ` David Kastrup
2008-01-21 21:42                                                     ` Linus Torvalds
2008-01-21 22:45                                                       ` Martin Langhoff
2008-01-21 20:30                                                 ` Dmitry Potapov
2008-01-21 18:16                                               ` Linus Torvalds
2008-01-17 21:27                                   ` Dmitry Potapov
2008-01-17 22:01                                 ` JM Ibanez
2008-01-17 22:09                                   ` Johannes Schindelin
2008-01-18  1:27                                     ` Robin Rosenberg
2008-01-17 23:05                                   ` Linus Torvalds
2008-01-17 23:10                                   ` Dmitry Potapov
2008-01-16 23:52           ` Dmitry Potapov
2008-01-16 22:37       ` Eyvind Bernhardsen
2008-01-16 23:03     ` Wincent Colaiuta
2008-01-17  7:29     ` Miles Bader
2008-01-17  4:43 ` Jay Soffian
2008-01-17  4:59   ` Jay Soffian
2008-01-17  5:15     ` Junio C Hamano
2008-01-17 10:28       ` Wincent Colaiuta
2008-01-17 11:10         ` Johannes Schindelin
2008-01-17 11:23           ` Pedro Melo
2008-01-17 11:51             ` Wincent Colaiuta
2008-01-17 12:53               ` Johannes Schindelin
2008-01-17 13:40                 ` Wincent Colaiuta
2008-01-17 17:58               ` Junio C Hamano
2008-01-17 18:22                 ` Johan Herland
2008-01-17 13:05             ` Johannes Schindelin
2008-01-17 11:46           ` Wincent Colaiuta
2008-01-17  5:11   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98F90EB6-1930-4643-8C6C-CA11CB123BAA@sb.org \
    --to=kevin@sb.org \
    --cc=git@vger.kernel.org \
    --cc=mh@glandium.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@MIT.EDU \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).