git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Git, Mac OS X and German special characters
@ 2010-05-20  7:26 Matthias Moeller
  2010-05-20  8:34 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 22+ messages in thread
From: Matthias Moeller @ 2010-05-20  7:26 UTC (permalink / raw
  To: git

Hi,

I have been using git (version 1.7.1) for quite some time to
backup/synchronize folders between my two linux workstations at home and
at work and my Apple laptop. Synchronization works well except for some
strange behavior under Mac OS X (10.6).

I have commited some files with German special characters, say,
"Übersicht.xls" under linux and pulled them on the Macbook. However, on
the laptop git status says:

# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       "U\314\210bersicht.xls"
nothing added to commit but untracked files present (use "git add" to track)

I have been searching the web for help and found lengthy discussions
which state that this is a common problem of the HFS+ filesystem.
What I did not find was a solution to this problem. Is there a solution
to this problem?

Thanks,
Matthias

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Git, Mac OS X and German special characters
  2010-05-20  7:26 Git, Mac OS X and German special characters Matthias Moeller
@ 2010-05-20  8:34 ` Ævar Arnfjörð Bjarmason
  2010-05-20  8:50   ` Michael J Gruber
  2010-05-20  8:55   ` demerphq
  0 siblings, 2 replies; 22+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-05-20  8:34 UTC (permalink / raw
  To: matthias.moeller; +Cc: git

On Thu, May 20, 2010 at 07:26, Matthias Moeller
<matthias.moeller@math.tu-dortmund.de> wrote:
> I have been searching the web for help and found lengthy discussions
> which state that this is a common problem of the HFS+ filesystem.
> What I did not find was a solution to this problem. Is there a solution
> to this problem?

Is this problem particular to Git, or do you also get it if you
e.g. rsync from the Linux box to the Mac OS X box?

> #       "U\314\210bersicht.xls"

You probably have to configure your shell on OSX to render UTF-8
correctly. It's just showing the raw escaped byte sequence instead of
a character there.

There isn't anything wrong with OSX in this case, filename encoding on
any POSIX system is only done by convention. You'll find that you have
similar problems on Linux if you encode filename in Big5 or
UTF-32.

Linux will happily accept it, but your shell / other applications will
render it as unknown goo because they expect UTF-8.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Git, Mac OS X and German special characters
  2010-05-20  8:34 ` Ævar Arnfjörð Bjarmason
@ 2010-05-20  8:50   ` Michael J Gruber
  2010-05-20  8:57     ` demerphq
                       ` (3 more replies)
  2010-05-20  8:55   ` demerphq
  1 sibling, 4 replies; 22+ messages in thread
From: Michael J Gruber @ 2010-05-20  8:50 UTC (permalink / raw
  To: Ævar Arnfjörð Bjarmason; +Cc: matthias.moeller, git

Ævar Arnfjörð Bjarmason venit, vidit, dixit 20.05.2010 10:34:
> On Thu, May 20, 2010 at 07:26, Matthias Moeller
> <matthias.moeller@math.tu-dortmund.de> wrote:
>> I have been searching the web for help and found lengthy discussions
>> which state that this is a common problem of the HFS+ filesystem.
>> What I did not find was a solution to this problem. Is there a solution
>> to this problem?
> 
> Is this problem particular to Git, or do you also get it if you
> e.g. rsync from the Linux box to the Mac OS X box?
> 
>> #       "U\314\210bersicht.xls"
> 
> You probably have to configure your shell on OSX to render UTF-8
> correctly. It's just showing the raw escaped byte sequence instead of
> a character there.
> 
> There isn't anything wrong with OSX in this case, filename encoding on
> any POSIX system is only done by convention. You'll find that you have
> similar problems on Linux if you encode filename in Big5 or
> UTF-32.
> 
> Linux will happily accept it, but your shell / other applications will
> render it as unknown goo because they expect UTF-8.

No, the problem with git status is not the display. Matthias' problem is
that git status reports a tracked file as untracked. The reason is that
on HFS+, you create a file with name A and get a file with name B, where
A and B are different representations of the same name. There seems to
be no way to reliably detect which one HFS+ uses.

Michael

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Git, Mac OS X and German special characters
  2010-05-20  8:34 ` Ævar Arnfjörð Bjarmason
  2010-05-20  8:50   ` Michael J Gruber
@ 2010-05-20  8:55   ` demerphq
  1 sibling, 0 replies; 22+ messages in thread
From: demerphq @ 2010-05-20  8:55 UTC (permalink / raw
  To: Ævar Arnfjörð Bjarmason; +Cc: matthias.moeller, git

On 20 May 2010 10:34, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> On Thu, May 20, 2010 at 07:26, Matthias Moeller
> <matthias.moeller@math.tu-dortmund.de> wrote:
>> I have been searching the web for help and found lengthy discussions
>> which state that this is a common problem of the HFS+ filesystem.
>> What I did not find was a solution to this problem. Is there a solution
>> to this problem?
>
> Is this problem particular to Git, or do you also get it if you
> e.g. rsync from the Linux box to the Mac OS X box?
>
>> #       "U\314\210bersicht.xls"
>
> You probably have to configure your shell on OSX to render UTF-8
> correctly. It's just showing the raw escaped byte sequence instead of
> a character there.
>
> There isn't anything wrong with OSX in this case, filename encoding on
> any POSIX system is only done by convention. You'll find that you have
> similar problems on Linux if you encode filename in Big5 or
> UTF-32.
>
> Linux will happily accept it, but your shell / other applications will
> render it as unknown goo because they expect UTF-8.

Except that isnt a normalized utf8 representation of capital U umlaut,
code point U+00DC, (utf8 c3,9c), instead presumably it has been
decomposed into a captial U followed by a combining character to add
in the umlaut, which IMO is pretty weird.

As far as i can tell the filename:

"Übersicht.xls"

Should be stored in utf8 as:

"\303\234bersicht.xls"

Also minor nit. UTF-32, as it contains nuls for latin-1 chars would be
much much worse than utf8 :-)

cheers,
Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Git, Mac OS X and German special characters
  2010-05-20  8:50   ` Michael J Gruber
@ 2010-05-20  8:57     ` demerphq
  2010-05-20  9:02     ` Torsten Bögershausen
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: demerphq @ 2010-05-20  8:57 UTC (permalink / raw
  To: Michael J Gruber
  Cc: Ævar Arnfjörð Bjarmason, matthias.moeller, git

On 20 May 2010 10:50, Michael J Gruber <git@drmicha.warpmail.net> wrote:
> Ævar Arnfjörð Bjarmason venit, vidit, dixit 20.05.2010 10:34:
>> On Thu, May 20, 2010 at 07:26, Matthias Moeller
>> <matthias.moeller@math.tu-dortmund.de> wrote:
>>> I have been searching the web for help and found lengthy discussions
>>> which state that this is a common problem of the HFS+ filesystem.
>>> What I did not find was a solution to this problem. Is there a solution
>>> to this problem?
>>
>> Is this problem particular to Git, or do you also get it if you
>> e.g. rsync from the Linux box to the Mac OS X box?
>>
>>> #       "U\314\210bersicht.xls"
>>
>> You probably have to configure your shell on OSX to render UTF-8
>> correctly. It's just showing the raw escaped byte sequence instead of
>> a character there.
>>
>> There isn't anything wrong with OSX in this case, filename encoding on
>> any POSIX system is only done by convention. You'll find that you have
>> similar problems on Linux if you encode filename in Big5 or
>> UTF-32.
>>
>> Linux will happily accept it, but your shell / other applications will
>> render it as unknown goo because they expect UTF-8.
>
> No, the problem with git status is not the display. Matthias' problem is
> that git status reports a tracked file as untracked. The reason is that
> on HFS+, you create a file with name A and get a file with name B, where
> A and B are different representations of the same name. There seems to
> be no way to reliably detect which one HFS+ uses.

Judging by the example given the problem is that HFS+ decomposes
Unicode file names into latin1+combining characters instead of using
normalized utf8.

This implies that if the utf8 is normalized first using canonical
unicode normalization rules (to eliminate the combining character)
that it can then be compared.

Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Git, Mac OS X and German special characters
  2010-05-20  8:50   ` Michael J Gruber
  2010-05-20  8:57     ` demerphq
@ 2010-05-20  9:02     ` Torsten Bögershausen
  2010-05-20  9:15       ` Michael J Gruber
  2010-05-20 15:50       ` Jay Soffian
  2010-05-20  9:16     ` Matthias Moeller
  2010-05-20 10:38     ` Thomas Singer
  3 siblings, 2 replies; 22+ messages in thread
From: Torsten Bögershausen @ 2010-05-20  9:02 UTC (permalink / raw
  To: Michael J Gruber
  Cc: Ævar Arnfjörð Bjarmason, matthias.moeller, git

Hej,
I have the same problem here.
Below there is a patch, which may solve the problem.
(Yes, whitespaces are broken. I'm still fighting with
git format-patch -s --cover-letter -M --stdout origin/master | git 
imap-send)
But this patch may be a start point for improvements.
Comments welcome
BR
/Torsten



Improved interwork between Mac OS X and linux when umlauts are used
When a git repository containing utf-8 coded umlaut characters
is cloned onto an Mac OS X machine, the Mac OS system will convert
all filenames returned by readdir() into denormalized utf-8.
As a result of this conversion, git will not find them on disk.
This helps by treating the NFD and NFD version of filenames as
identical on Mac OS.






Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
name-hash.c |   40 ++++++++++++++++++++++++++++++++++++++++
utf8.c      |   55 ++++++++++++++++++++++++++++++++++++++++++++++++-------
utf8.h      |   11 +++++++++++
3 files changed, 99 insertions(+), 7 deletions(-)

diff --git a/name-hash.c b/name-hash.c
index 0031d78..e6494e8 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -7,6 +7,7 @@
  */
#define NO_THE_INDEX_COMPATIBILITY_MACROS
#include "cache.h"
+#include "utf8.h"

/*
  * This removes bit 5 if bit 6 is set.
@@ -100,6 +101,25 @@ static int same_name(const struct cache_entry *ce, 
const char *name, int namelen
     return icase && slow_same_name(name, namelen, ce->name, len);
}

+#ifdef __APPLE__
+struct cache_entry *index_name_exists2(struct index_state *istate, 
const char *name, int icase)
+{
+    int namelen = (int)strlen(name);
+    unsigned int hash = hash_name(name, namelen);
+    struct cache_entry *ce;
+
+    ce = lookup_hash(hash, &istate->name_hash);
+    while (ce) {
+        if (!(ce->ce_flags & CE_UNHASHED)) {
+            if (same_name(ce, name, namelen, icase))
+                return ce;
+        }
+        ce = ce->next;
+    }
+    return NULL;
+}
+#endif
+
struct cache_entry *index_name_exists(struct index_state *istate, const 
char *name, int namelen, int icase)
{
     unsigned int hash = hash_name(name, namelen);
@@ -115,5 +135,25 @@ struct cache_entry *index_name_exists(struct 
index_state *istate, const char *na
         }
         ce = ce->next;
     }
+#ifdef __APPLE__
+    {
+        char *name_nfc_nfd;
+        name_nfc_nfd = str_nfc2nfd(name);
+        if (name_nfc_nfd) {
+            ce = index_name_exists2(istate, name_nfc_nfd, icase);
+            free(name_nfc_nfd);
+            if (ce)
+                return ce;
+        }
+        name_nfc_nfd = str_nfd2nfc(name);
+        if (name_nfc_nfd) {
+            ce = index_name_exists2(istate, name_nfc_nfd, icase);
+            free(name_nfc_nfd);
+            if (ce)
+                return ce;
+        }
+    }
+#endif
+
     return NULL;
}
diff --git a/utf8.c b/utf8.c
index 84cfc72..8e794dc 100644
--- a/utf8.c
+++ b/utf8.c
@@ -2,6 +2,11 @@
#include "strbuf.h"
#include "utf8.h"

+#ifdef __APPLE__
+static iconv_t my_iconv_nfd2nfc = (iconv_t) -1;
+static iconv_t my_iconv_nfc2nfd = (iconv_t) -1;
+#endif
+
/* This code is originally from http://www.cl.cam.ac.uk/~mgk25/ucs/ */

struct interval {
@@ -424,18 +429,13 @@ int is_encoding_utf8(const char *name)
#else
     typedef char * iconv_ibp;
#endif
-char *reencode_string(const char *in, const char *out_encoding, const 
char *in_encoding)
+
+char *reencode_string_iconv(const char *in, iconv_t conv)
{
-    iconv_t conv;
     size_t insz, outsz, outalloc;
     char *out, *outpos;
     iconv_ibp cp;

-    if (!in_encoding)
-        return NULL;
-    conv = iconv_open(out_encoding, in_encoding);
-    if (conv == (iconv_t) -1)
-        return NULL;
     insz = strlen(in);
     outsz = insz;
     outalloc = outsz + 1; /* for terminating NUL */
@@ -469,7 +469,48 @@ char *reencode_string(const char *in, const char 
*out_encoding, const char *in_e
             break;
         }
     }
+    return out;
+}
+
+char *reencode_string(const char *in, const char *out_encoding, const 
char *in_encoding)
+{
+    iconv_t conv;
+    char *out;
+
+    if (!in_encoding)
+        return NULL;
+    conv = iconv_open(out_encoding, in_encoding);
+    if (conv == (iconv_t) -1)
+        return NULL;
+    out = reencode_string_iconv(in, conv);
     iconv_close(conv);
     return out;
}
+
+#ifdef __APPLE__
+char*
+str_nfc2nfd(const char *in)
+{
+    if (my_iconv_nfc2nfd == (iconv_t) -1) {
+        my_iconv_nfc2nfd = iconv_open("utf-8-mac", "utf-8");
+        if (my_iconv_nfc2nfd == (iconv_t) -1) {
+            return NULL;
+        }
+    }
+    return reencode_string_iconv(in, my_iconv_nfc2nfd);
+}
+
+char*
+str_nfd2nfc(const char *in)
+{
+    if (my_iconv_nfd2nfc == (iconv_t) -1){
+        my_iconv_nfd2nfc = iconv_open("utf-8", "utf-8-mac");
+        if (my_iconv_nfd2nfc == (iconv_t) -1) {
+            return NULL;
+        }
+    }
+    return reencode_string_iconv(in, my_iconv_nfd2nfc);
+}
+#endif /* APPLE */
+
#endif
diff --git a/utf8.h b/utf8.h
index ebc4d2f..db29c8a 100644
--- a/utf8.h
+++ b/utf8.h
@@ -13,8 +13,19 @@ int strbuf_add_wrapped_text(struct strbuf *buf,

#ifndef NO_ICONV
char *reencode_string(const char *in, const char *out_encoding, const 
char *in_encoding);
+char *reencode_string_iconv(const char *in, iconv_t conv);
+#ifdef __APPLE__
+char *str_nfc2nfd(const char *in);
+char *str_nfd2nfc(const char *in);
+#else
+#define str_nfc2nfd(in) (NULL)
+#define str_nfd2nfc(in) (NULL)
+#endif
#else
#define reencode_string(a,b,c) NULL
+#define reencode_string2(a,b) NULL
+#define str_nfc2nfd(in) (NULL)
+#define str_nfd2nfc(in) (NULL)
#endif

#endif
-- 
1.7.1.dirty










On 20.05.10 10:50, Michael J Gruber wrote:
> Ævar Arnfjörð Bjarmason venit, vidit, dixit 20.05.2010 10:34:
>    
>> On Thu, May 20, 2010 at 07:26, Matthias Moeller
>> <matthias.moeller@math.tu-dortmund.de>  wrote:
>>      
>>> I have been searching the web for help and found lengthy discussions
>>> which state that this is a common problem of the HFS+ filesystem.
>>> What I did not find was a solution to this problem. Is there a solution
>>> to this problem?
>>>        
>> Is this problem particular to Git, or do you also get it if you
>> e.g. rsync from the Linux box to the Mac OS X box?
>>
>>      
>>> #       "U\314\210bersicht.xls"
>>>        
>> You probably have to configure your shell on OSX to render UTF-8
>> correctly. It's just showing the raw escaped byte sequence instead of
>> a character there.
>>
>> There isn't anything wrong with OSX in this case, filename encoding on
>> any POSIX system is only done by convention. You'll find that you have
>> similar problems on Linux if you encode filename in Big5 or
>> UTF-32.
>>
>> Linux will happily accept it, but your shell / other applications will
>> render it as unknown goo because they expect UTF-8.
>>      
> No, the problem with git status is not the display. Matthias' problem is
> that git status reports a tracked file as untracked. The reason is that
> on HFS+, you create a file with name A and get a file with name B, where
> A and B are different representations of the same name. There seems to
> be no way to reliably detect which one HFS+ uses.
>
> Michael
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>    

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Git, Mac OS X and German special characters
  2010-05-20  9:02     ` Torsten Bögershausen
@ 2010-05-20  9:15       ` Michael J Gruber
       [not found]         ` <4BF5294E.7060206@web.de>
  2010-05-20 15:30         ` Jay Soffian
  2010-05-20 15:50       ` Jay Soffian
  1 sibling, 2 replies; 22+ messages in thread
From: Michael J Gruber @ 2010-05-20  9:15 UTC (permalink / raw
  To: Torsten Bögershausen
  Cc: Ævar Arnfjörð Bjarmason, matthias.moeller, git

Torsten Bögershausen venit, vidit, dixit 20.05.2010 11:02:
> Hej,
> I have the same problem here.
> Below there is a patch, which may solve the problem.
> (Yes, whitespaces are broken. I'm still fighting with
> git format-patch -s --cover-letter -M --stdout origin/master | git 
> imap-send)
> But this patch may be a start point for improvements.
> Comments welcome
> BR
> /Torsten
> 
> 
> 
> Improved interwork between Mac OS X and linux when umlauts are used
> When a git repository containing utf-8 coded umlaut characters
> is cloned onto an Mac OS X machine, the Mac OS system will convert
> all filenames returned by readdir() into denormalized utf-8.
> As a result of this conversion, git will not find them on disk.
> This helps by treating the NFD and NFD version of filenames as
> identical on Mac OS.
> 
> 
> 
> 
> 
> 
> Signed-off-by: Torsten Bögershausen <tboegi@web.de>

You signed off, but is Markus Kuhn's code from UCS GPL2-licensed?
Also, a few tests would be nice.

I remember we had threads on this issue in the past. I haven't checked
yet (Thunderbird pruned my nntp history), but it is worth checking that
you addressed any issues mentioned there.

I have no Mac so I can't test, sorry. Would be happy to run Mac OS in a
vm, but you know...

Thanks for looking into this!

Michael

> ---
> name-hash.c |   40 ++++++++++++++++++++++++++++++++++++++++
> utf8.c      |   55 ++++++++++++++++++++++++++++++++++++++++++++++++-------
> utf8.h      |   11 +++++++++++
> 3 files changed, 99 insertions(+), 7 deletions(-)
> 
> diff --git a/name-hash.c b/name-hash.c
> index 0031d78..e6494e8 100644
> --- a/name-hash.c
> +++ b/name-hash.c
> @@ -7,6 +7,7 @@
>   */
> #define NO_THE_INDEX_COMPATIBILITY_MACROS
> #include "cache.h"
> +#include "utf8.h"
> 
> /*
>   * This removes bit 5 if bit 6 is set.
> @@ -100,6 +101,25 @@ static int same_name(const struct cache_entry *ce, 
> const char *name, int namelen
>      return icase && slow_same_name(name, namelen, ce->name, len);
> }
> 
> +#ifdef __APPLE__
> +struct cache_entry *index_name_exists2(struct index_state *istate, 
> const char *name, int icase)
> +{
> +    int namelen = (int)strlen(name);
> +    unsigned int hash = hash_name(name, namelen);
> +    struct cache_entry *ce;
> +
> +    ce = lookup_hash(hash, &istate->name_hash);
> +    while (ce) {
> +        if (!(ce->ce_flags & CE_UNHASHED)) {
> +            if (same_name(ce, name, namelen, icase))
> +                return ce;
> +        }
> +        ce = ce->next;
> +    }
> +    return NULL;
> +}
> +#endif
> +
> struct cache_entry *index_name_exists(struct index_state *istate, const 
> char *name, int namelen, int icase)
> {
>      unsigned int hash = hash_name(name, namelen);
> @@ -115,5 +135,25 @@ struct cache_entry *index_name_exists(struct 
> index_state *istate, const char *na
>          }
>          ce = ce->next;
>      }
> +#ifdef __APPLE__
> +    {
> +        char *name_nfc_nfd;
> +        name_nfc_nfd = str_nfc2nfd(name);
> +        if (name_nfc_nfd) {
> +            ce = index_name_exists2(istate, name_nfc_nfd, icase);
> +            free(name_nfc_nfd);
> +            if (ce)
> +                return ce;
> +        }
> +        name_nfc_nfd = str_nfd2nfc(name);
> +        if (name_nfc_nfd) {
> +            ce = index_name_exists2(istate, name_nfc_nfd, icase);
> +            free(name_nfc_nfd);
> +            if (ce)
> +                return ce;
> +        }
> +    }
> +#endif
> +
>      return NULL;
> }
> diff --git a/utf8.c b/utf8.c
> index 84cfc72..8e794dc 100644
> --- a/utf8.c
> +++ b/utf8.c
> @@ -2,6 +2,11 @@
> #include "strbuf.h"
> #include "utf8.h"
> 
> +#ifdef __APPLE__
> +static iconv_t my_iconv_nfd2nfc = (iconv_t) -1;
> +static iconv_t my_iconv_nfc2nfd = (iconv_t) -1;
> +#endif
> +
> /* This code is originally from http://www.cl.cam.ac.uk/~mgk25/ucs/ */
> 
> struct interval {
> @@ -424,18 +429,13 @@ int is_encoding_utf8(const char *name)
> #else
>      typedef char * iconv_ibp;
> #endif
> -char *reencode_string(const char *in, const char *out_encoding, const 
> char *in_encoding)
> +
> +char *reencode_string_iconv(const char *in, iconv_t conv)
> {
> -    iconv_t conv;
>      size_t insz, outsz, outalloc;
>      char *out, *outpos;
>      iconv_ibp cp;
> 
> -    if (!in_encoding)
> -        return NULL;
> -    conv = iconv_open(out_encoding, in_encoding);
> -    if (conv == (iconv_t) -1)
> -        return NULL;
>      insz = strlen(in);
>      outsz = insz;
>      outalloc = outsz + 1; /* for terminating NUL */
> @@ -469,7 +469,48 @@ char *reencode_string(const char *in, const char 
> *out_encoding, const char *in_e
>              break;
>          }
>      }
> +    return out;
> +}
> +
> +char *reencode_string(const char *in, const char *out_encoding, const 
> char *in_encoding)
> +{
> +    iconv_t conv;
> +    char *out;
> +
> +    if (!in_encoding)
> +        return NULL;
> +    conv = iconv_open(out_encoding, in_encoding);
> +    if (conv == (iconv_t) -1)
> +        return NULL;
> +    out = reencode_string_iconv(in, conv);
>      iconv_close(conv);
>      return out;
> }
> +
> +#ifdef __APPLE__
> +char*
> +str_nfc2nfd(const char *in)
> +{
> +    if (my_iconv_nfc2nfd == (iconv_t) -1) {
> +        my_iconv_nfc2nfd = iconv_open("utf-8-mac", "utf-8");
> +        if (my_iconv_nfc2nfd == (iconv_t) -1) {
> +            return NULL;
> +        }
> +    }
> +    return reencode_string_iconv(in, my_iconv_nfc2nfd);
> +}
> +
> +char*
> +str_nfd2nfc(const char *in)
> +{
> +    if (my_iconv_nfd2nfc == (iconv_t) -1){
> +        my_iconv_nfd2nfc = iconv_open("utf-8", "utf-8-mac");
> +        if (my_iconv_nfd2nfc == (iconv_t) -1) {
> +            return NULL;
> +        }
> +    }
> +    return reencode_string_iconv(in, my_iconv_nfd2nfc);
> +}
> +#endif /* APPLE */
> +
> #endif
> diff --git a/utf8.h b/utf8.h
> index ebc4d2f..db29c8a 100644
> --- a/utf8.h
> +++ b/utf8.h
> @@ -13,8 +13,19 @@ int strbuf_add_wrapped_text(struct strbuf *buf,
> 
> #ifndef NO_ICONV
> char *reencode_string(const char *in, const char *out_encoding, const 
> char *in_encoding);
> +char *reencode_string_iconv(const char *in, iconv_t conv);
> +#ifdef __APPLE__
> +char *str_nfc2nfd(const char *in);
> +char *str_nfd2nfc(const char *in);
> +#else
> +#define str_nfc2nfd(in) (NULL)
> +#define str_nfd2nfc(in) (NULL)
> +#endif
> #else
> #define reencode_string(a,b,c) NULL
> +#define reencode_string2(a,b) NULL
> +#define str_nfc2nfd(in) (NULL)
> +#define str_nfd2nfc(in) (NULL)
> #endif
> 
> #endif

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Git, Mac OS X and German special characters
  2010-05-20  8:50   ` Michael J Gruber
  2010-05-20  8:57     ` demerphq
  2010-05-20  9:02     ` Torsten Bögershausen
@ 2010-05-20  9:16     ` Matthias Moeller
  2010-05-20 10:38     ` Thomas Singer
  3 siblings, 0 replies; 22+ messages in thread
From: Matthias Moeller @ 2010-05-20  9:16 UTC (permalink / raw
  To: Michael J Gruber; +Cc: Ævar Arnfjörð Bjarmason, git

On 05/20/2010 10:50 AM, Michael J Gruber wrote:
>
>> Is this problem particular to Git, or do you also get it if you
>> e.g. rsync from the Linux box to the Mac OS X box?
>>     
> No, the problem with git status is not the display. Matthias' problem is
> that git status reports a tracked file as untracked. The reason is that
> on HFS+, you create a file with name A and get a file with name B, where
> A and B are different representations of the same name. There seems to
> be no way to reliably detect which one HFS+ uses.
>   

Yes, the problem is not the display but the filesystem. I had similar
problems with unison some time ago.
But there was a special fix for utf-8 and Mac OS X in one of the newer
unison versions.

Matthias

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Git, Mac OS X and German special characters
  2010-05-20  8:50   ` Michael J Gruber
                       ` (2 preceding siblings ...)
  2010-05-20  9:16     ` Matthias Moeller
@ 2010-05-20 10:38     ` Thomas Singer
  3 siblings, 0 replies; 22+ messages in thread
From: Thomas Singer @ 2010-05-20 10:38 UTC (permalink / raw
  To: Michael J Gruber; +Cc: git

On 20.05.2010 10:50, Michael J Gruber wrote:
> There seems to be no way to reliably detect which one HFS+ uses.

IIRC, HFS+ always stores and reports the file names with decomposed UTF-8,
even if the file is created using composed UTF-8. IMHO, Git should
standardize on the file and text encoding (e.g. commit messages) used in the
repository, so such problems can't occur. SVN has standardized on "UTF-8" in
the repository, but had/s similar problems on OS X with the decomposed
characters:

 http://subversion.tigris.org/issues/show_bug.cgi?id=2464

Tom

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Git, Mac OS X and German special characters
       [not found]         ` <4BF5294E.7060206@web.de>
@ 2010-05-20 14:29           ` Michael J Gruber
  0 siblings, 0 replies; 22+ messages in thread
From: Michael J Gruber @ 2010-05-20 14:29 UTC (permalink / raw
  To: Torsten Bögershausen; +Cc: Git Mailing List

Torsten Bögershausen venit, vidit, dixit 20.05.2010 14:21:
> Hej Michael,
> Thanks for the reply.
> 
>> You signed off, but is Markus Kuhn's code from UCS GPL2-licensed?
> Oh, I haven't added any code from Markus here.
> But if my sign off is a problem, we can remove it ;-)
> or move the code to another place. (And utf.c will still have code from UCS)

Your sign-off is fine if you can place the code under the terms of the
project.

In your patch there is a line

/* This code is originally from http://www.cl.cam.ac.uk/~mgk25/ucs/ */

but I missed the missing '+' in front - that comment was there before
your patch! Sorry for the confusion.

> 
>> Also, a few tests would be nice.
> Yes, fully agreed.
> My feeling is, that at least
> "git add", "git mv", "git rm" should be tested.
> I will fix that.
> But as I become more familiar with the git testsuite,
> it becomes more and more clear, that testing the new feature will
> do the same tests as already existing tests.
>  From that point of view, it seems easier to re-use the existing test
> cases and run them twice, once with clean ascii, and second time with
> an internationalized form.
> As not all platforms support utf-8, the internationalized tests may be
> either utf-8, 8859-1, or nothing at all.
>   
> I feel that at least 50% of the test cases should be "internationalized",
> like "git merge", "git pull" etc.
> (And re-writing the tests is a big issue, at least for me as a beginner)
> Anyway, I will make simple tests.

Simple tests are a good start. More importantly (compared to full
internationalisation), we need someone running them on Mac OS ;)

Michael

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Git, Mac OS X and German special characters
  2010-05-20  9:15       ` Michael J Gruber
       [not found]         ` <4BF5294E.7060206@web.de>
@ 2010-05-20 15:30         ` Jay Soffian
  1 sibling, 0 replies; 22+ messages in thread
From: Jay Soffian @ 2010-05-20 15:30 UTC (permalink / raw
  To: Michael J Gruber
  Cc: Torsten Bögershausen, Ævar Arnfjörð Bjarmason,
	matthias.moeller, git

2010/5/20 Michael J Gruber <git@drmicha.warpmail.net>:
> I remember we had threads on this issue in the past. I haven't checked
> yet (Thunderbird pruned my nntp history), but it is worth checking that
> you addressed any issues mentioned there.

This was the monster thread on it:

  http://thread.gmane.org/gmane.comp.version-control.git/70688

Linus added support for the case-insensitivity aliasing issue:

  1102952 (Make git-add behave more sensibly in a case-insensitive environment,
           2008-03-22)
  6835550 (When adding files to the index, add support for
case-independent matches,
           2008-03-22)

But he couldn't care less about HFS+ brain-damage, so he left that for
others. And hey, it's only taken 2 years for someone to step up to the
plate. :-)

j.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Git, Mac OS X and German special characters
  2010-05-20  9:02     ` Torsten Bögershausen
  2010-05-20  9:15       ` Michael J Gruber
@ 2010-05-20 15:50       ` Jay Soffian
  2010-05-20 18:22         ` Jay Soffian
  1 sibling, 1 reply; 22+ messages in thread
From: Jay Soffian @ 2010-05-20 15:50 UTC (permalink / raw
  To: Torsten Bögershausen
  Cc: Michael J Gruber, Ævar Arnfjörð Bjarmason,
	matthias.moeller, git

2010/5/20 Torsten Bögershausen <totte.enea@gmail.com>:
> Improved interwork between Mac OS X and linux when umlauts are used
> When a git repository containing utf-8 coded umlaut characters
> is cloned onto an Mac OS X machine, the Mac OS system will convert
> all filenames returned by readdir() into denormalized utf-8.
> As a result of this conversion, git will not find them on disk.
> This helps by treating the NFD and NFD version of filenames as
> identical on Mac OS.

So this is an edge case, but what happens if a repo has both the NFC
and NFD representation of a given name? It should be handled the same
way as if a repo has both "File" and "file" and you try to check-out
onto a case-insensitive filesystem.

Additionally, note this paragraph from 1102952 (Make git-add behave
more sensibly in a case-insensitive environment, 2008-03-22):

    However, if we actually have *both* a file called "File" and one called
    "file", and they don't have the same lstat() information (ie we're on a
    case-sensitive filesystem but have the "core.ignorecase" flag set), we
    will error out if we try to add them both.

To be consistent, shouldn't we have a core.HFSPlusCompat that can be
set on non-braindamaged filesystems to prevent filenames which would
alias on HFS+ from entering the repo?

   http://developer.apple.com/mac/library/technotes/tn/tn1150.html#UnicodeSubtleties

j.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Git, Mac OS X and German special characters
  2010-05-20 15:50       ` Jay Soffian
@ 2010-05-20 18:22         ` Jay Soffian
  0 siblings, 0 replies; 22+ messages in thread
From: Jay Soffian @ 2010-05-20 18:22 UTC (permalink / raw
  To: Torsten Bögershausen
  Cc: Michael J Gruber, Ævar Arnfjörð Bjarmason,
	matthias.moeller, git

On Thu, May 20, 2010 at 11:50 AM, Jay Soffian <jaysoffian@gmail.com> wrote:
> To be consistent, shouldn't we have a core.HFSPlusCompat that can be
> set on non-braindamaged filesystems to prevent filenames which would
> alias on HFS+ from entering the repo?
>
>   http://developer.apple.com/mac/library/technotes/tn/tn1150.html#UnicodeSubtleties

And here's an implementation (BSD-style license):

  http://src.chromium.org/svn/trunk/src/base/file_path.cc

See GetHFSDecomposedForm and HFSFastUnicodeCompare.

C++ so it'll require adapting.

j.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Git, Mac OS X and German special characters
@ 2011-10-01 12:44 Albert Zeyer
  2011-10-01 13:39 ` Andreas Ericsson
  2011-10-03 19:48 ` Torsten Bögershausen
  0 siblings, 2 replies; 22+ messages in thread
From: Albert Zeyer @ 2011-10-01 12:44 UTC (permalink / raw
  To: git

Hi,

There are problems on MacOSX with different UTF8 encodings of
filenames. A unicode string has multiple ways to be represented as
UTF8 and Git treats them as different filenames. This is the actual
bug. It should treat them all as the same filename. In some cases (as
on MacOSX), the underlying operating system may use a normalized UTF8
representation in some sort, i.e. change the actual UTF8 filename
representation.

Similar problems also exists in SVN, for example. This was reported
[here](http://subversion.tigris.org/issues/show_bug.cgi?id=2464).
There you can find also lengthy discussions about the topic. And also
[here](http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames).

This was already reported for Git earlier and there is also a patch
for Git [here](http://lists-archives.org/git/719832-git-mac-os-x-and-german-special-characters.html).

I wonder about the state of this. This hasn't been applied yet. Why?

Regards,
Albert

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Git, Mac OS X and German special characters
  2011-10-01 12:44 Albert Zeyer
@ 2011-10-01 13:39 ` Andreas Ericsson
       [not found]   ` <CAO1Q+jeLEp2ReNc9eOFoJxdGq6oRE3b+O=JvMNU0Kqx_eAX=7w@mail.gmail.com>
  2011-10-03 19:48 ` Torsten Bögershausen
  1 sibling, 1 reply; 22+ messages in thread
From: Andreas Ericsson @ 2011-10-01 13:39 UTC (permalink / raw
  To: Albert Zeyer; +Cc: git

On 10/01/2011 07:44 AM, Albert Zeyer wrote:
> Hi,
> 
> There are problems on MacOSX with different UTF8 encodings of
> filenames. A unicode string has multiple ways to be represented as
> UTF8 and Git treats them as different filenames. This is the actual
> bug. It should treat them all as the same filename. In some cases (as
> on MacOSX), the underlying operating system may use a normalized UTF8
> representation in some sort, i.e. change the actual UTF8 filename
> representation.
> 
> Similar problems also exists in SVN, for example. This was reported
> [here](http://subversion.tigris.org/issues/show_bug.cgi?id=2464).
> There you can find also lengthy discussions about the topic. And also
> [here](http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames).
> 
> This was already reported for Git earlier and there is also a patch
> for Git [here](http://lists-archives.org/git/719832-git-mac-os-x-and-german-special-characters.html).
> 
> I wonder about the state of this. This hasn't been applied yet. Why?
> 

Because the patch didn't address repositories carrying files with
more than one possible representation of the filename and that
could have lead to silent loss of data for unsuspecting users.

The real solution to your problem is, unfortunately, to either use
a different and more competent filesystem, or to avoid triggering
the bugs in the one you're currently using.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Git, Mac OS X and German special characters
       [not found]   ` <CAO1Q+jeLEp2ReNc9eOFoJxdGq6oRE3b+O=JvMNU0Kqx_eAX=7w@mail.gmail.com>
@ 2011-10-01 14:24     ` Andreas Ericsson
  2011-10-01 19:47       ` Andreas Krey
  0 siblings, 1 reply; 22+ messages in thread
From: Andreas Ericsson @ 2011-10-01 14:24 UTC (permalink / raw
  To: Albert Zeyer, Git Mailing List

Please don't cull the list when replying. Reply-to-all is the
standard on git@vger.

On 10/01/2011 08:57 AM, Albert Zeyer wrote:
> On Sat, Oct 1, 2011 at 3:39 PM, Andreas Ericsson<ae@op5.se>  wrote:
>> On 10/01/2011 07:44 AM, Albert Zeyer wrote:
>>> Hi,
>>>
>>> There are problems on MacOSX with different UTF8 encodings of
>>> filenames. A unicode string has multiple ways to be represented as
>>> UTF8 and Git treats them as different filenames. This is the actual
>>> bug. It should treat them all as the same filename. In some cases (as
>>> on MacOSX), the underlying operating system may use a normalized UTF8
>>> representation in some sort, i.e. change the actual UTF8 filename
>>> representation.
>>>
>>> Similar problems also exists in SVN, for example. This was reported
>>> [here](http://subversion.tigris.org/issues/show_bug.cgi?id=2464).
>>> There you can find also lengthy discussions about the topic. And also
>>> [here](http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames).
>>>
>>> This was already reported for Git earlier and there is also a patch
>>> for Git [here](http://lists-archives.org/git/719832-git-mac-os-x-and-german-special-characters.html).
>>>
>>> I wonder about the state of this. This hasn't been applied yet. Why?
>>>
>>
>> Because the patch didn't address repositories carrying files with
>> more than one possible representation of the filename and that
>> could have lead to silent loss of data for unsuspecting users.
>>
>> The real solution to your problem is, unfortunately, to either use
>> a different and more competent filesystem, or to avoid triggering
>> the bugs in the one you're currently using.
> 
> Well, I think it is a bug in Git itself that it treats different UTF8
> representations of the same filename as different filenames. It
> shouldn't have allowed such in the first place.
> 
> But I see your point. I guess I will work myself on a patch here or
> extend that one.


The trouble is that they may represent two different files on a
different filesystem. The Linux kernel repo has plenty of files
that exist with both uppercase and lowercase characters, like so:
SOMEFILE_driver.c
somefile_driver.c

This is perfectly valid on all sensible and case-sensitive
filesystems, but breaks horribly on HFS. There are other, far more
"interesting" cases when you involve special chars such as the
german umlaut, or the swedish åäö characters.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Git, Mac OS X and German special characters
  2011-10-01 14:24     ` Andreas Ericsson
@ 2011-10-01 19:47       ` Andreas Krey
  2011-10-01 22:02         ` Michael Witten
  0 siblings, 1 reply; 22+ messages in thread
From: Andreas Krey @ 2011-10-01 19:47 UTC (permalink / raw
  To: Andreas Ericsson; +Cc: Albert Zeyer, Git Mailing List

On Sat, 01 Oct 2011 09:24:08 +0000, Andreas Ericsson wrote:
...
> The trouble is that they may represent two different files on a
> different filesystem. The Linux kernel repo has plenty of files
> that exist with both uppercase and lowercase characters, like so:
> SOMEFILE_driver.c
> somefile_driver.c
> 
> This is perfectly valid on all sensible and case-sensitive
> filesystems, but breaks horribly on HFS.

It also breaks on windows, except in at least one country[1].
And the latter alone is good reason why no VCS should try to
forbid to use different characters that some filesystems
(and only some) consider the same.

> There are other, far more
> "interesting" cases when you involve special chars such as the
> german umlaut, or the swedish åäö characters.

Care to share some?

The question is, should git forbid two filenames that consist
of the *same* characters, only differently uni-encoded? I don't
think anyone would make two files named 'Büro', with different
unicode encodings. But as far as I know that is a shady area.

Andreas

[1] Which has 'i with dot' and 'i without dot' both in uppercase
    and lowercase variant, so I and i are not the 'same'.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Git, Mac OS X and German special characters
  2011-10-01 19:47       ` Andreas Krey
@ 2011-10-01 22:02         ` Michael Witten
  2011-10-01 23:14           ` Jakub Narebski
  2011-10-01 23:48           ` Albert Zeyer
  0 siblings, 2 replies; 22+ messages in thread
From: Michael Witten @ 2011-10-01 22:02 UTC (permalink / raw
  To: Andreas Krey; +Cc: Andreas Ericsson, Albert Zeyer, Git Mailing List

On Sat, Oct 1, 2011 at 19:47, Andreas Krey <a.krey@gmx.de> wrote:

> The question is, should git forbid two filenames that consist
> of the *same* characters, only differently uni-encoded? I don't
> think anyone would make two files named 'Büro', with different
> unicode encodings. But as far as I know that is a shady area.

So, let's leave git's current behavior as the default and provide
a config variable that when set, tells git to handle file names
in terms of characters rather than bytes.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Git, Mac OS X and German special characters
  2011-10-01 22:02         ` Michael Witten
@ 2011-10-01 23:14           ` Jakub Narebski
  2011-10-01 23:26             ` Michael Witten
  2011-10-01 23:48           ` Albert Zeyer
  1 sibling, 1 reply; 22+ messages in thread
From: Jakub Narebski @ 2011-10-01 23:14 UTC (permalink / raw
  To: Michael Witten
  Cc: Andreas Krey, Andreas Ericsson, Albert Zeyer, Git Mailing List

Michael Witten <mfwitten@gmail.com> writes:
> On Sat, Oct 1, 2011 at 19:47, Andreas Krey <a.krey@gmx.de> wrote:
> 
> > The question is, should git forbid two filenames that consist
> > of the *same* characters, only differently uni-encoded? I don't
> > think anyone would make two files named 'Büro', with different
> > unicode encodings. But as far as I know that is a shady area.
> 
> So, let's leave git's current behavior as the default and provide
> a config variable that when set, tells git to handle file names
> in terms of characters rather than bytes.

You meant here _graphemes_, not Unicode codepoint when talking about
characters, didn't you?

IIRC the problem with MacOS X is that it accepts different composition
when creating a file from what it returns when asking for contents of
directory (NFD if I remember correctly, which is less used).


There are some beginnings of sanely handling filesystem encoding in
Git (the framework), but it is currently underutilized only to handle
case-sensitivity and case-preserving.

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Git, Mac OS X and German special characters
  2011-10-01 23:14           ` Jakub Narebski
@ 2011-10-01 23:26             ` Michael Witten
  0 siblings, 0 replies; 22+ messages in thread
From: Michael Witten @ 2011-10-01 23:26 UTC (permalink / raw
  To: Jakub Narebski
  Cc: Andreas Krey, Andreas Ericsson, Albert Zeyer, Git Mailing List

2011/10/1 Jakub Narebski <jnareb@gmail.com>:

> Michael Witten <mfwitten@gmail.com> writes:
>> On Sat, Oct 1, 2011 at 19:47, Andreas Krey <a.krey@gmx.de> wrote:
>>
>> > The question is, should git forbid two filenames that consist
>> > of the *same* characters, only differently uni-encoded? I don't
>> > think anyone would make two files named 'Büro', with different
>> > unicode encodings. But as far as I know that is a shady area.
>>
>> So, let's leave git's current behavior as the default and provide
>> a config variable that when set, tells git to handle file names
>> in terms of characters rather than bytes.
>
> You meant here _graphemes_, not Unicode codepoint when talking about
> characters, didn't you?

Yes.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Git, Mac OS X and German special characters
  2011-10-01 22:02         ` Michael Witten
  2011-10-01 23:14           ` Jakub Narebski
@ 2011-10-01 23:48           ` Albert Zeyer
  1 sibling, 0 replies; 22+ messages in thread
From: Albert Zeyer @ 2011-10-01 23:48 UTC (permalink / raw
  To: Michael Witten; +Cc: Andreas Krey, Andreas Ericsson, Git Mailing List

On Sun, Oct 2, 2011 at 12:02 AM, Michael Witten <mfwitten@gmail.com> wrote:
> On Sat, Oct 1, 2011 at 19:47, Andreas Krey <a.krey@gmx.de> wrote:
>
>> The question is, should git forbid two filenames that consist
>> of the *same* characters, only differently uni-encoded? I don't
>> think anyone would make two files named 'Büro', with different
>> unicode encodings. But as far as I know that is a shady area.
>
> So, let's leave git's current behavior as the default and provide
> a config variable that when set, tells git to handle file names
> in terms of characters rather than bytes.

I just read the very lengthy discussion here:
http://thread.gmane.org/gmane.comp.version-control.git/70688

Basically all the arguments have already been discussed.

There are varios options. Most of them are not mutual exclusive, so it
would also be an option to implement most of them and let the user
pick what (s)he prefers.

* TreatFilenamesAsText or however you would call it. I.e. handle
filenames the same when they equal in Unicode.

Linus is very much against this because in rare situations, it could
destroy your data, like in this example:

	echo "foo" > Hütte # "Hütte" in NFC
	echo "bar" > Hütte # "Hütte" in NFD

The second write would overwrite silently the file generated by the
first write if those filenames would be handled the same. This (and
such) behavior is to be avoided, claims Linus, because it would more
often lead to not wanted behavior in third party applications.

* On MacOSX, wrap all filesystem functions (like readdir()) to convert
all filenames to NFC.

MacOSX normalizes the UTF8 representation of the filenames to NFD but
in most common situations (on most other systems), you end up with the
filename being in NFC.

As the filename is anyway normalized on OSX, it doesn't matter wether
it is handled as NFC or NFD and NFC will likely generate less trouble.
And this patch doesn't even really need an option.

This was one suggestion by Linus itself:
http://news.gmane.org/find-root.php?message_id=%3calpine.LFD.1.00.0801211323120.2957%40woody.linux%2dfoundation.org%3e

* Disallow any files with filenames which are not in NFC at all. This
makes some things a bit more safe (like on MacOSX; along with the
previous suggestion) and more clear (you always know that your
filename is in NFC).

* Some more clever readdir() which, when it gets a filename which is
not in the Git index but Unicode-equally to one filename in the Git
index, automatically replaces it by the filename in the index.

This is some sort of half way to a TreatFilenamesAsText option but
should produce less trouble.

This probably also doesn't need an extra option as it should very
likely generate less trouble (on OSX at least; and for other systems
which don't mangle the filename, they don't need to use this code at
all).

---

I will probably go and try to implement the clever-readdir(). And/or
maybe also the NFC conversation in such a readdir() wrapper.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Git, Mac OS X and German special characters
  2011-10-01 12:44 Albert Zeyer
  2011-10-01 13:39 ` Andreas Ericsson
@ 2011-10-03 19:48 ` Torsten Bögershausen
  1 sibling, 0 replies; 22+ messages in thread
From: Torsten Bögershausen @ 2011-10-03 19:48 UTC (permalink / raw
  To: Albert Zeyer; +Cc: git

The patch has probably not been applied, because it is not a god one.
I have been working on a better version,
but that is not 100% ready to be released.

I can post it in a couple of days,
(and yes, it does a NFD->NFC conversion in readdir() )
/Torsten
On 10/01/2011 02:44 PM, Albert Zeyer wrote:
> Hi,
>
> There are problems on MacOSX with different UTF8 encodings of
> filenames. A unicode string has multiple ways to be represented as
> UTF8 and Git treats them as different filenames. This is the actual
> bug. It should treat them all as the same filename. In some cases (as
> on MacOSX), the underlying operating system may use a normalized UTF8
> representation in some sort, i.e. change the actual UTF8 filename
> representation.
>
> Similar problems also exists in SVN, for example. This was reported
> [here](http://subversion.tigris.org/issues/show_bug.cgi?id=2464).
> There you can find also lengthy discussions about the topic. And also
> [here](http://svn.apache.org/repos/asf/subversion/trunk/notes/unicode-composition-for-filenames).
>
> This was already reported for Git earlier and there is also a patch
> for Git [here](http://lists-archives.org/git/719832-git-mac-os-x-and-german-special-characters.html).
>
> I wonder about the state of this. This hasn't been applied yet. Why?
>
> Regards,
> Albert
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2011-10-03 19:55 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-20  7:26 Git, Mac OS X and German special characters Matthias Moeller
2010-05-20  8:34 ` Ævar Arnfjörð Bjarmason
2010-05-20  8:50   ` Michael J Gruber
2010-05-20  8:57     ` demerphq
2010-05-20  9:02     ` Torsten Bögershausen
2010-05-20  9:15       ` Michael J Gruber
     [not found]         ` <4BF5294E.7060206@web.de>
2010-05-20 14:29           ` Michael J Gruber
2010-05-20 15:30         ` Jay Soffian
2010-05-20 15:50       ` Jay Soffian
2010-05-20 18:22         ` Jay Soffian
2010-05-20  9:16     ` Matthias Moeller
2010-05-20 10:38     ` Thomas Singer
2010-05-20  8:55   ` demerphq
  -- strict thread matches above, loose matches on Subject: below --
2011-10-01 12:44 Albert Zeyer
2011-10-01 13:39 ` Andreas Ericsson
     [not found]   ` <CAO1Q+jeLEp2ReNc9eOFoJxdGq6oRE3b+O=JvMNU0Kqx_eAX=7w@mail.gmail.com>
2011-10-01 14:24     ` Andreas Ericsson
2011-10-01 19:47       ` Andreas Krey
2011-10-01 22:02         ` Michael Witten
2011-10-01 23:14           ` Jakub Narebski
2011-10-01 23:26             ` Michael Witten
2011-10-01 23:48           ` Albert Zeyer
2011-10-03 19:48 ` Torsten Bögershausen

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).