git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* BUG report: unicode normalization on APFS (Mac OS High Sierra)
@ 2018-04-26 16:48 Elijah Newren
  2018-04-26 17:13 ` Torsten Bögershausen
  0 siblings, 1 reply; 11+ messages in thread
From: Elijah Newren @ 2018-04-26 16:48 UTC (permalink / raw)
  To: Git Mailing List

On HFS (which appears to be the default Mac filesystem prior to High
Sierra), unicode names are "normalized" before recording.  Thus with a
script like:

    mkdir tmp
    cd tmp

    auml=$(printf "\303\244")
    aumlcdiar=$(printf "\141\314\210")
    >"$auml"

    echo "auml:          " $(echo -n "$auml" | xxd)
    echo "aumlcdiar:     " $(echo -n "$aumlcdiar" | xxd)
    echo "Dir contents:  " $(echo -n * | xxd)

    echo "Stat auml:     " "$(stat -f "%i   %Sm   %Su %N" "$auml")"
    echo "Stat aumlcdiar:" "$(stat -f "%i   %Sm   %Su %N" "$aumlcdiar")"

We see output like:

    auml:           00000000: c3a4 ..
    aumlcdiar:      00000000: 61cc 88 a..
    Dir contents:   00000000: 61cc 88 a..
    Stat auml:      857473   Apr 26 09:40:40 2018   newren ä
    Stat aumlcdiar: 857473   Apr 26 09:40:40 2018   newren ä

On APFS, which appears to be the new default filesystem in Mac OS High
Sierra, we instead see:

    auml:           00000000: c3a4 ..
    aumlcdiar:      00000000: 61cc 88 a..
    Dir contents:   00000000: c3a4 ..
    Stat auml:      8591766636   Apr 26 09:40:59 2018   newren ä
    Stat aumlcdiar: 8591766636   Apr 26 09:40:59 2018   newren ä

i.e. APFS appears to record the filename as specified by the user, but
continues to allow the user to access it via any name that normalizes
to the same thing.  This difference causes t0050-filesystem.sh to fail
the final two tests.  I could change the "UTF8_NFD_TO_NFC" flag
checking in test-lib.sh to instead test the exit code of stat to make
it pass these two tests, but I have no idea if there are problems
elsewhere that this would just be papering over.

I dislike Mac OS and avoid it, so I'd prefer to find someone else
motivated to fix this.  If no one is, I may eventually try to fix this
up...in a year or three from now.  But is someone else interested?
Would this serve as a good microproject for our microprojects list (or
are the internals hairy enough that this is too big of a project for
that list)?


Elijah

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-04-30 15:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-26 16:48 BUG report: unicode normalization on APFS (Mac OS High Sierra) Elijah Newren
2018-04-26 17:13 ` Torsten Bögershausen
2018-04-26 17:23   ` Elijah Newren
2018-04-27 21:45     ` Totsten Bögershausen
2018-04-30 15:29       ` Elijah Newren
2018-04-30  6:35     ` [PATCH v1 1/1] test: Correct detection of UTF8_NFD_TO_NFC for APFS tboegi
2018-04-30  7:56       ` Junio C Hamano
2018-04-30  8:55         ` Torsten Bögershausen
2018-04-30 15:33       ` Elijah Newren
2018-04-30 15:41         ` Torsten Bögershausen
2018-04-30 15:47           ` Elijah Newren

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).