git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/7] Case-insensitive filesystem support, take 1
@ 2008-03-22 17:21 Linus Torvalds
  2008-03-22 17:22 ` [PATCH 1/7] Make unpack_trees_options bit flags actual bitfields Linus Torvalds
                   ` (4 more replies)
  0 siblings, 5 replies; 29+ messages in thread
From: Linus Torvalds @ 2008-03-22 17:21 UTC (permalink / raw
  To: Junio C Hamano, Git Mailing List; +Cc: Frank, Dmitry Potapov


Ok, so I said it wasn't a high priority, but I had already done all the 
really core support for this in that we have the name hashes now that I 
wanted to use for looking up names case-insensitively, so I took it as a 
challenge to do this cleanly. I already knew how I wanted to do it, so how 
hard could it be?

First, a few caveats:

 - I've tested this series, both on a case-sensitive one (using hardlinks 
   to test corner cases) and on a vfat filesystem under Linux (which is 
   case-insensitive and *really* odd wrt case preservation - it remembers 
   the name of removed files, so it preserves case even across removal and 
   re-creation!)

 - HOWEVER. The testing has been very targeted, and I only convered a few 
   cases to really care. Things like case-renaming, for example, will be 
   trivial to do, but I didn't do it. So if you want to do a 

	git mv Abc abc

   on a case-insensitive filesystem, you currently still have to do it as 
   the sequence

	git mv Abc xyz
	git mv xyz abc

   because I did *not* make git-mv know about case-insensitivity.

 - The only two operations that care about case-insensitivity after this 
   series of seven patches are

    (a) "git status" and friends (like "git add") that use the directory 
        traversal code will know to ignore files that have a case- 
        insensitive version in the index. So if you have messed up the 
        case in the working tree (or the filesystem isn't even case- 
        preserving), then "git add" and "git status" won't show the 
        case-different file as being "unknown".

    (b) merging trees (git read-tree) knows about unexpectedly found files
        that are due to case-insensitive filesystems, and knows to ignore 
        them. This means that switching branches where the case of a file 
        changes works, and it means that going a merge across that case 
        also works.

 - I made this all conditional on

	[core]
		ignorecase = true

   because obviously I'm not at all interested in penalizing sane 
   filesystems. That said, I also worked at trying to make sure that it's 
   safe and possible to do this on a case-sensitive filesystem in case you 
   are working on a project that doesn't like case-sensitivity, so the 
   "git status" and "git add ." kind of operations won't add aliases

 - Finally: the "case independence" rules could be anything, but right now 
   I *only* do the standard US-ASCII versions. This will *not* help with 
   the insane OS X cases of UTF-8 normalization: to actually get that you 
   need to make sure that the hash-function for names and the comparison 
   functions work correctly with those more complex cases.

   So _conceptually_ this should all work for UTF-8 normalization 'case' 
   insensitivity too or on just generally utf-8 cases, but I didn't 
   actually do that more complex case. That's a separate area, and will 
   not affect the core logic.

And then one final caveat: I think the patch-series is fairly clean, and 
each patch in itself is pretty simple, but my testing has been limited, 
and not only haven't I extended the case-insensitivity to all operations, 
but I would suggest some care from people who test this.

But if you care about the crazy OS X UTF-8 normalization (which apparently 
can happen even on otherwise case-sensitive filesystems), or if you care 
about the more regular case insensitivity of HFS+ or Windows, you need to 
test this - even if you tests right now should be limited to US-ASCII 
only. Because I'll happily fix issues, but I'm not using those crap 
filesystems myself, nor will I be doing any more testing on VFAT unless 
people actually point out issues to me.

So it's up to you users of crap OS's to test the cases and make good 
reports, and I'll care just because I think it's a somewhat interesting 
problem (it's why I did this series), but I'll never do anything about 
this without prodding and good reports. Ok?

Junio: I think this is safe, if only because it's all so very 
straight-forward, and I tried very hard to make each change trivial and 
limited and fairly easy to understand. Some of the patches are pure 
cleanups that I hit when looking at the code and wanted to fix before I 
even made any other changes.

			Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/7] Make unpack_trees_options bit flags actual bitfields
  2008-03-22 17:21 [PATCH 0/7] Case-insensitive filesystem support, take 1 Linus Torvalds
@ 2008-03-22 17:22 ` Linus Torvalds
  2008-03-22 17:25   ` [PATCH 2/7] Move name hashing functions into a file of its own Linus Torvalds
  2008-03-22 17:36   ` [PATCH 1/7] Make unpack_trees_options bit flags actual bitfields Johannes Schindelin
  2008-03-22 22:01 ` [PATCH] t0050: Set core.ignorecase case to activate case insensitivity Steffen Prohaska
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2008-03-22 17:22 UTC (permalink / raw
  To: Junio C Hamano, Git Mailing List; +Cc: Frank, Dmitry Potapov


From: Linus Torvalds <torvalds@woody.linux-foundation.org>
Date: Fri, 21 Mar 2008 13:14:47 -0700

Instead of wasting space with whole integers for a single bit.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

This is really unrelated to the rest of the series, except for the fact 
that it irritated me when I was thinking about the required changes to 
unpack-trees.

 unpack-trees.h |   20 ++++++++++----------
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/unpack-trees.h b/unpack-trees.h
index 50453ed..ad8cc65 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -9,16 +9,16 @@ typedef int (*merge_fn_t)(struct cache_entry **src,
 		struct unpack_trees_options *options);
 
 struct unpack_trees_options {
-	int reset;
-	int merge;
-	int update;
-	int index_only;
-	int nontrivial_merge;
-	int trivial_merges_only;
-	int verbose_update;
-	int aggressive;
-	int skip_unmerged;
-	int gently;
+	unsigned int reset:1,
+		     merge:1,
+		     update:1,
+		     index_only:1,
+		     nontrivial_merge:1,
+		     trivial_merges_only:1,
+		     verbose_update:1,
+		     aggressive:1,
+		     skip_unmerged:1,
+		     gently:1;
 	const char *prefix;
 	int pos;
 	struct dir_struct *dir;
-- 
1.5.5.rc0.28.g61a0.dirty

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/7] Move name hashing functions into a file of its own
  2008-03-22 17:22 ` [PATCH 1/7] Make unpack_trees_options bit flags actual bitfields Linus Torvalds
@ 2008-03-22 17:25   ` Linus Torvalds
  2008-03-22 17:28     ` [PATCH 3/7] Make "index_name_exists()" return the cache_entry it found Linus Torvalds
  2008-03-22 17:36   ` [PATCH 1/7] Make unpack_trees_options bit flags actual bitfields Johannes Schindelin
  1 sibling, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2008-03-22 17:25 UTC (permalink / raw
  To: Junio C Hamano, Git Mailing List; +Cc: Frank, Dmitry Potapov


From: Linus Torvalds <torvalds@woody.linux-foundation.org>
Date: Fri, 21 Mar 2008 13:16:24 -0700

It's really totally separate functionality, and if we want to start
doing case-insensitive hash lookups, I'd rather do it when it's
separated out.

It also renames "remove_index_entry()" to "remove_name_hash()", because 
that really describes the thing better. It doesn't actually remove the 
index entry, that's done by "remove_index_entry_at()", which is something 
very different, despite the similarity in names.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

This makes no code changes what-so-ever apart from the movement/renaming, 
just moves things around a bit, and makes a function that used to be 
static (add_name_hash()) be exported because it's now in a different file.

 Makefile            |    1 +
 builtin-read-tree.c |    2 +-
 cache.h             |   31 ++++++++++++----------
 name-hash.c         |   73 +++++++++++++++++++++++++++++++++++++++++++++++++++
 read-cache.c        |   65 ++-------------------------------------------
 5 files changed, 95 insertions(+), 77 deletions(-)
 create mode 100644 name-hash.c

diff --git a/Makefile b/Makefile
index 7c70b00..6d35662 100644
--- a/Makefile
+++ b/Makefile
@@ -421,6 +421,7 @@ LIB_OBJS += log-tree.o
 LIB_OBJS += mailmap.o
 LIB_OBJS += match-trees.o
 LIB_OBJS += merge-file.o
+LIB_OBJS += name-hash.o
 LIB_OBJS += object.o
 LIB_OBJS += pack-check.o
 LIB_OBJS += pack-revindex.o
diff --git a/builtin-read-tree.c b/builtin-read-tree.c
index e9cfd2b..7ac3088 100644
--- a/builtin-read-tree.c
+++ b/builtin-read-tree.c
@@ -40,7 +40,7 @@ static int read_cache_unmerged(void)
 	for (i = 0; i < active_nr; i++) {
 		struct cache_entry *ce = active_cache[i];
 		if (ce_stage(ce)) {
-			remove_index_entry(ce);
+			remove_name_hash(ce);
 			if (last && !strcmp(ce->name, last->name))
 				continue;
 			cache_tree_invalidate_path(active_cache_tree, ce->name);
diff --git a/cache.h b/cache.h
index 2a1e7ec..2afc788 100644
--- a/cache.h
+++ b/cache.h
@@ -153,20 +153,6 @@ static inline void copy_cache_entry(struct cache_entry *dst, struct cache_entry
 	dst->ce_flags = (dst->ce_flags & ~CE_STATE_MASK) | state;
 }
 
-/*
- * We don't actually *remove* it, we can just mark it invalid so that
- * we won't find it in lookups.
- *
- * Not only would we have to search the lists (simple enough), but
- * we'd also have to rehash other hash buckets in case this makes the
- * hash bucket empty (common). So it's much better to just mark
- * it.
- */
-static inline void remove_index_entry(struct cache_entry *ce)
-{
-	ce->ce_flags |= CE_UNHASHED;
-}
-
 static inline unsigned create_ce_flags(size_t len, unsigned stage)
 {
 	if (len >= CE_NAMEMASK)
@@ -241,6 +227,23 @@ struct index_state {
 
 extern struct index_state the_index;
 
+/* Name hashing */
+extern void add_name_hash(struct index_state *istate, struct cache_entry *ce);
+/*
+ * We don't actually *remove* it, we can just mark it invalid so that
+ * we won't find it in lookups.
+ *
+ * Not only would we have to search the lists (simple enough), but
+ * we'd also have to rehash other hash buckets in case this makes the
+ * hash bucket empty (common). So it's much better to just mark
+ * it.
+ */
+static inline void remove_name_hash(struct cache_entry *ce)
+{
+	ce->ce_flags |= CE_UNHASHED;
+}
+
+
 #ifndef NO_THE_INDEX_COMPATIBILITY_MACROS
 #define active_cache (the_index.cache)
 #define active_nr (the_index.cache_nr)
diff --git a/name-hash.c b/name-hash.c
new file mode 100644
index 0000000..e56eb16
--- /dev/null
+++ b/name-hash.c
@@ -0,0 +1,73 @@
+/*
+ * name-hash.c
+ *
+ * Hashing names in the index state
+ *
+ * Copyright (C) 2008 Linus Torvalds
+ */
+#define NO_THE_INDEX_COMPATIBILITY_MACROS
+#include "cache.h"
+
+static unsigned int hash_name(const char *name, int namelen)
+{
+	unsigned int hash = 0x123;
+
+	do {
+		unsigned char c = *name++;
+		hash = hash*101 + c;
+	} while (--namelen);
+	return hash;
+}
+
+static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
+{
+	void **pos;
+	unsigned int hash;
+
+	if (ce->ce_flags & CE_HASHED)
+		return;
+	ce->ce_flags |= CE_HASHED;
+	ce->next = NULL;
+	hash = hash_name(ce->name, ce_namelen(ce));
+	pos = insert_hash(hash, ce, &istate->name_hash);
+	if (pos) {
+		ce->next = *pos;
+		*pos = ce;
+	}
+}
+
+static void lazy_init_name_hash(struct index_state *istate)
+{
+	int nr;
+
+	if (istate->name_hash_initialized)
+		return;
+	for (nr = 0; nr < istate->cache_nr; nr++)
+		hash_index_entry(istate, istate->cache[nr]);
+	istate->name_hash_initialized = 1;
+}
+
+void add_name_hash(struct index_state *istate, struct cache_entry *ce)
+{
+	ce->ce_flags &= ~CE_UNHASHED;
+	if (istate->name_hash_initialized)
+		hash_index_entry(istate, ce);
+}
+
+int index_name_exists(struct index_state *istate, const char *name, int namelen)
+{
+	unsigned int hash = hash_name(name, namelen);
+	struct cache_entry *ce;
+
+	lazy_init_name_hash(istate);
+	ce = lookup_hash(hash, &istate->name_hash);
+
+	while (ce) {
+		if (!(ce->ce_flags & CE_UNHASHED)) {
+			if (!cache_name_compare(name, namelen, ce->name, ce->ce_flags))
+				return 1;
+		}
+		ce = ce->next;
+	}
+	return 0;
+}
diff --git a/read-cache.c b/read-cache.c
index a92b25b..5dc998d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -23,80 +23,21 @@
 
 struct index_state the_index;
 
-static unsigned int hash_name(const char *name, int namelen)
-{
-	unsigned int hash = 0x123;
-
-	do {
-		unsigned char c = *name++;
-		hash = hash*101 + c;
-	} while (--namelen);
-	return hash;
-}
-
-static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
-{
-	void **pos;
-	unsigned int hash;
-
-	if (ce->ce_flags & CE_HASHED)
-		return;
-	ce->ce_flags |= CE_HASHED;
-	ce->next = NULL;
-	hash = hash_name(ce->name, ce_namelen(ce));
-	pos = insert_hash(hash, ce, &istate->name_hash);
-	if (pos) {
-		ce->next = *pos;
-		*pos = ce;
-	}
-}
-
-static void lazy_init_name_hash(struct index_state *istate)
-{
-	int nr;
-
-	if (istate->name_hash_initialized)
-		return;
-	for (nr = 0; nr < istate->cache_nr; nr++)
-		hash_index_entry(istate, istate->cache[nr]);
-	istate->name_hash_initialized = 1;
-}
-
 static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
 {
-	ce->ce_flags &= ~CE_UNHASHED;
 	istate->cache[nr] = ce;
-	if (istate->name_hash_initialized)
-		hash_index_entry(istate, ce);
+	add_name_hash(istate, ce);
 }
 
 static void replace_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
 {
 	struct cache_entry *old = istate->cache[nr];
 
-	remove_index_entry(old);
+	remove_name_hash(old);
 	set_index_entry(istate, nr, ce);
 	istate->cache_changed = 1;
 }
 
-int index_name_exists(struct index_state *istate, const char *name, int namelen)
-{
-	unsigned int hash = hash_name(name, namelen);
-	struct cache_entry *ce;
-
-	lazy_init_name_hash(istate);
-	ce = lookup_hash(hash, &istate->name_hash);
-
-	while (ce) {
-		if (!(ce->ce_flags & CE_UNHASHED)) {
-			if (!cache_name_compare(name, namelen, ce->name, ce->ce_flags))
-				return 1;
-		}
-		ce = ce->next;
-	}
-	return 0;
-}
-
 /*
  * This only updates the "non-critical" parts of the directory
  * cache, ie the parts that aren't tracked by GIT, and only used
@@ -438,7 +379,7 @@ int remove_index_entry_at(struct index_state *istate, int pos)
 {
 	struct cache_entry *ce = istate->cache[pos];
 
-	remove_index_entry(ce);
+	remove_name_hash(ce);
 	istate->cache_changed = 1;
 	istate->cache_nr--;
 	if (pos >= istate->cache_nr)
-- 
1.5.5.rc0.28.g61a0.dirty

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/7] Make "index_name_exists()" return the cache_entry it found
  2008-03-22 17:25   ` [PATCH 2/7] Move name hashing functions into a file of its own Linus Torvalds
@ 2008-03-22 17:28     ` Linus Torvalds
  2008-03-22 17:30       ` [PATCH 4/7] Make hash_name_lookup able to do case-independent lookups Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2008-03-22 17:28 UTC (permalink / raw
  To: Junio C Hamano, Git Mailing List; +Cc: Frank, Dmitry Potapov


From: Linus Torvalds <torvalds@woody.linux-foundation.org>
Date: Fri, 21 Mar 2008 15:53:00 -0700

This allows verify_absent() in unpack_trees() to use the hash chains
rather than looking it up using the binary search.

Perhaps more imporantly, it's also going to be useful for the next phase, 
where we actually start looking at the cache entry when we do 
case-insensitive lookups and checking the result.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
No real change, except verify_absent() can now use the name hashing rather 
than the binary search. But it's all still very much case-sensitive.

 cache.h        |    2 +-
 name-hash.c    |    6 +++---
 unpack-trees.c |    8 ++++----
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/cache.h b/cache.h
index 2afc788..76d95d2 100644
--- a/cache.h
+++ b/cache.h
@@ -353,7 +353,7 @@ extern int write_index(const struct index_state *, int newfd);
 extern int discard_index(struct index_state *);
 extern int unmerged_index(const struct index_state *);
 extern int verify_path(const char *path);
-extern int index_name_exists(struct index_state *istate, const char *name, int namelen);
+extern struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen);
 extern int index_name_pos(const struct index_state *, const char *name, int namelen);
 #define ADD_CACHE_OK_TO_ADD 1		/* Ok to add */
 #define ADD_CACHE_OK_TO_REPLACE 2	/* Ok to replace file/directory */
diff --git a/name-hash.c b/name-hash.c
index e56eb16..2678148 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -54,7 +54,7 @@ void add_name_hash(struct index_state *istate, struct cache_entry *ce)
 		hash_index_entry(istate, ce);
 }
 
-int index_name_exists(struct index_state *istate, const char *name, int namelen)
+struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen)
 {
 	unsigned int hash = hash_name(name, namelen);
 	struct cache_entry *ce;
@@ -65,9 +65,9 @@ int index_name_exists(struct index_state *istate, const char *name, int namelen)
 	while (ce) {
 		if (!(ce->ce_flags & CE_UNHASHED)) {
 			if (!cache_name_compare(name, namelen, ce->name, ce->ce_flags))
-				return 1;
+				return ce;
 		}
 		ce = ce->next;
 	}
-	return 0;
+	return NULL;
 }
diff --git a/unpack-trees.c b/unpack-trees.c
index a59f475..ca4c845 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -538,6 +538,7 @@ static int verify_absent(struct cache_entry *ce, const char *action,
 	if (!lstat(ce->name, &st)) {
 		int cnt;
 		int dtype = ce_to_dtype(ce);
+		struct cache_entry *result;
 
 		if (o->dir && excluded(o->dir, ce->name, &dtype))
 			/*
@@ -581,10 +582,9 @@ static int verify_absent(struct cache_entry *ce, const char *action,
 		 * delete this path, which is in a subdirectory that
 		 * is being replaced with a blob.
 		 */
-		cnt = index_name_pos(&o->result, ce->name, strlen(ce->name));
-		if (0 <= cnt) {
-			struct cache_entry *ce = o->result.cache[cnt];
-			if (ce->ce_flags & CE_REMOVE)
+		result = index_name_exists(&o->result, ce->name, ce_namelen(ce));
+		if (result) {
+			if (result->ce_flags & CE_REMOVE)
 				return 0;
 		}
 
-- 
1.5.5.rc0.28.g61a0.dirty

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/7] Make hash_name_lookup able to do case-independent lookups
  2008-03-22 17:28     ` [PATCH 3/7] Make "index_name_exists()" return the cache_entry it found Linus Torvalds
@ 2008-03-22 17:30       ` Linus Torvalds
  2008-03-22 17:33         ` [PATCH 5/7] Add 'core.ignorecase' option Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2008-03-22 17:30 UTC (permalink / raw
  To: Junio C Hamano, Git Mailing List; +Cc: Frank, Dmitry Potapov


From: Linus Torvalds <torvalds@woody.linux-foundation.org>
Date: Fri, 21 Mar 2008 15:55:19 -0700

Right now nobody uses it, but "index_name_exists()" gets a flag so
you can enable it on a case-by-case basis.

Signed-of-by: Linus Torvalds <torvalds@linux-foundation.org>
---

Oooh.. We actually have some (admittedly stupid) case insensitivity code 
starting to appear. So we now hash the names insensitively, and we have 
the _capability_ to do case-insensitive lookups, but nobody actually uses 
that insensitive lookup capability yet.

But things are now starting to get interesting.

 cache.h        |    4 ++--
 dir.c          |    2 +-
 name-hash.c    |   50 ++++++++++++++++++++++++++++++++++++++++++++++++--
 unpack-trees.c |    2 +-
 4 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/cache.h b/cache.h
index 76d95d2..a9ddaa1 100644
--- a/cache.h
+++ b/cache.h
@@ -264,7 +264,7 @@ static inline void remove_name_hash(struct cache_entry *ce)
 #define refresh_cache(flags) refresh_index(&the_index, (flags), NULL, NULL)
 #define ce_match_stat(ce, st, options) ie_match_stat(&the_index, (ce), (st), (options))
 #define ce_modified(ce, st, options) ie_modified(&the_index, (ce), (st), (options))
-#define cache_name_exists(name, namelen) index_name_exists(&the_index, (name), (namelen))
+#define cache_name_exists(name, namelen, igncase) index_name_exists(&the_index, (name), (namelen), (igncase))
 #endif
 
 enum object_type {
@@ -353,7 +353,7 @@ extern int write_index(const struct index_state *, int newfd);
 extern int discard_index(struct index_state *);
 extern int unmerged_index(const struct index_state *);
 extern int verify_path(const char *path);
-extern struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen);
+extern struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen, int igncase);
 extern int index_name_pos(const struct index_state *, const char *name, int namelen);
 #define ADD_CACHE_OK_TO_ADD 1		/* Ok to add */
 #define ADD_CACHE_OK_TO_REPLACE 2	/* Ok to replace file/directory */
diff --git a/dir.c b/dir.c
index edc458e..7362e83 100644
--- a/dir.c
+++ b/dir.c
@@ -371,7 +371,7 @@ static struct dir_entry *dir_entry_new(const char *pathname, int len)
 
 struct dir_entry *dir_add_name(struct dir_struct *dir, const char *pathname, int len)
 {
-	if (cache_name_exists(pathname, len))
+	if (cache_name_exists(pathname, len, 0))
 		return NULL;
 
 	ALLOC_GROW(dir->entries, dir->nr+1, dir->alloc);
diff --git a/name-hash.c b/name-hash.c
index 2678148..2253870 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -8,12 +8,25 @@
 #define NO_THE_INDEX_COMPATIBILITY_MACROS
 #include "cache.h"
 
+/*
+ * This removes bit 5 if bit 6 is set.
+ *
+ * That will make US-ASCII characters hash to their upper-case
+ * equivalent. We could easily do this one whole word at a time,
+ * but that's for future worries.
+ */
+static inline unsigned char icase_hash(unsigned char c)
+{
+	return c & ~((c & 0x40) >> 1);
+}
+
 static unsigned int hash_name(const char *name, int namelen)
 {
 	unsigned int hash = 0x123;
 
 	do {
 		unsigned char c = *name++;
+		c = icase_hash(c);
 		hash = hash*101 + c;
 	} while (--namelen);
 	return hash;
@@ -54,7 +67,40 @@ void add_name_hash(struct index_state *istate, struct cache_entry *ce)
 		hash_index_entry(istate, ce);
 }
 
-struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen)
+static int slow_same_name(const char *name1, int len1, const char *name2, int len2)
+{
+	if (len1 != len2)
+		return 0;
+
+	while (len1) {
+		unsigned char c1 = *name1++;
+		unsigned char c2 = *name2++;
+		len1--;
+		if (c1 != c2) {
+			c1 = toupper(c1);
+			c2 = toupper(c2);
+			if (c1 != c2)
+				return 0;
+		}
+	}
+	return 1;
+}
+
+static int same_name(const struct cache_entry *ce, const char *name, int namelen, int icase)
+{
+	int len = ce_namelen(ce);
+
+	/*
+	 * Always fo exact compare (even if we want a case-ignoring comparison
+	 * we do the quick exact one first, because it will be the common case).
+	 */
+	if (len == namelen && !cache_name_compare(name, namelen, ce->name, len))
+		return 1;
+
+	return icase && slow_same_name(name, namelen, ce->name, len);
+}
+
+struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen, int icase)
 {
 	unsigned int hash = hash_name(name, namelen);
 	struct cache_entry *ce;
@@ -64,7 +110,7 @@ struct cache_entry *index_name_exists(struct index_state *istate, const char *na
 
 	while (ce) {
 		if (!(ce->ce_flags & CE_UNHASHED)) {
-			if (!cache_name_compare(name, namelen, ce->name, ce->ce_flags))
+			if (same_name(ce, name, namelen, icase))
 				return ce;
 		}
 		ce = ce->next;
diff --git a/unpack-trees.c b/unpack-trees.c
index ca4c845..bf7d8f6 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -582,7 +582,7 @@ static int verify_absent(struct cache_entry *ce, const char *action,
 		 * delete this path, which is in a subdirectory that
 		 * is being replaced with a blob.
 		 */
-		result = index_name_exists(&o->result, ce->name, ce_namelen(ce));
+		result = index_name_exists(&o->result, ce->name, ce_namelen(ce), 0);
 		if (result) {
 			if (result->ce_flags & CE_REMOVE)
 				return 0;
-- 
1.5.5.rc0.28.g61a0.dirty

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 5/7] Add 'core.ignorecase' option
  2008-03-22 17:30       ` [PATCH 4/7] Make hash_name_lookup able to do case-independent lookups Linus Torvalds
@ 2008-03-22 17:33         ` Linus Torvalds
  2008-03-22 17:38           ` [PATCH 6/7] Make branch merging aware of underlying case-insensitive filsystems Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2008-03-22 17:33 UTC (permalink / raw
  To: Junio C Hamano, Git Mailing List; +Cc: Frank, Dmitry Potapov


From: Linus Torvalds <torvalds@woody.linux-foundation.org>
Date: Fri, 21 Mar 2008 16:52:46 -0700

..and start using it for directory entry traversal (ie "git status" will
not consider entries that match an existing entry case-insensitively to
be a new file)

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

Basically all of this patch is just setting up the "ignore_case" variable 
(default to false), but this one-liner in "dir_add_name()" is the actual 
meat of it all:

	-	if (cache_name_exists(pathname, len, 0))
	+	if (cache_name_exists(pathname, len, ignore_case))

because it now means that we will ignore unknown directory entries that 
match already-known files if they match case-insensitively and we have 
core.ignorecase being set.

So this actually starts using the case insensitivity logic, but it's for a 
really quite trivial and not very interesting case. 

 cache.h       |    1 +
 config.c      |    5 +++++
 dir.c         |    2 +-
 environment.c |    1 +
 4 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/cache.h b/cache.h
index a9ddaa1..9bce723 100644
--- a/cache.h
+++ b/cache.h
@@ -407,6 +407,7 @@ extern int delete_ref(const char *, const unsigned char *sha1);
 extern int trust_executable_bit;
 extern int quote_path_fully;
 extern int has_symlinks;
+extern int ignore_case;
 extern int assume_unchanged;
 extern int prefer_symlink_refs;
 extern int log_all_ref_updates;
diff --git a/config.c b/config.c
index 0624494..3d51868 100644
--- a/config.c
+++ b/config.c
@@ -342,6 +342,11 @@ int git_default_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.ignorecase")) {
+		ignore_case = git_config_bool(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.bare")) {
 		is_bare_repository_cfg = git_config_bool(var, value);
 		return 0;
diff --git a/dir.c b/dir.c
index 7362e83..b5bfbca 100644
--- a/dir.c
+++ b/dir.c
@@ -371,7 +371,7 @@ static struct dir_entry *dir_entry_new(const char *pathname, int len)
 
 struct dir_entry *dir_add_name(struct dir_struct *dir, const char *pathname, int len)
 {
-	if (cache_name_exists(pathname, len, 0))
+	if (cache_name_exists(pathname, len, ignore_case))
 		return NULL;
 
 	ALLOC_GROW(dir->entries, dir->nr+1, dir->alloc);
diff --git a/environment.c b/environment.c
index 6739a3f..5fcd5b2 100644
--- a/environment.c
+++ b/environment.c
@@ -14,6 +14,7 @@ char git_default_name[MAX_GITNAME];
 int trust_executable_bit = 1;
 int quote_path_fully = 1;
 int has_symlinks = 1;
+int ignore_case = 0;
 int assume_unchanged;
 int prefer_symlink_refs;
 int is_bare_repository_cfg = -1; /* unspecified */
-- 
1.5.5.rc0.28.g61a0.dirty

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] Make unpack_trees_options bit flags actual bitfields
  2008-03-22 17:22 ` [PATCH 1/7] Make unpack_trees_options bit flags actual bitfields Linus Torvalds
  2008-03-22 17:25   ` [PATCH 2/7] Move name hashing functions into a file of its own Linus Torvalds
@ 2008-03-22 17:36   ` Johannes Schindelin
  2008-03-22 17:47     ` Linus Torvalds
  1 sibling, 1 reply; 29+ messages in thread
From: Johannes Schindelin @ 2008-03-22 17:36 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List, Frank, Dmitry Potapov

Hi,

On Sat, 22 Mar 2008, Linus Torvalds wrote:

> From: Linus Torvalds <torvalds@woody.linux-foundation.org>
> Date: Fri, 21 Mar 2008 13:14:47 -0700

As this patch is really unrelated to the series, this comment is really 
unrelated to the content of the patch ;-)

Any point in doing the "From:" and "Date:" inside the mail body?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 6/7] Make branch merging aware of underlying case-insensitive filsystems
  2008-03-22 17:33         ` [PATCH 5/7] Add 'core.ignorecase' option Linus Torvalds
@ 2008-03-22 17:38           ` Linus Torvalds
  2008-03-22 17:45             ` [PATCH 7/7] Make unpack-tree update removed files before any updated files Linus Torvalds
  2008-03-23  6:13             ` [PATCH 6/7] Make branch merging aware of underlying case-insensitive filsystems Junio C Hamano
  0 siblings, 2 replies; 29+ messages in thread
From: Linus Torvalds @ 2008-03-22 17:38 UTC (permalink / raw
  To: Junio C Hamano, Git Mailing List; +Cc: Frank, Dmitry Potapov


From: Linus Torvalds <torvalds@woody.linux-foundation.org>
Date: Sat, 22 Mar 2008 09:35:59 -0700

If we find an unexpected file, see if that filename perhaps exists in a
case-insensitive way in the index, and whether the file matches that. If
so, ignore it as a known pre-existing file of a different name.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

All right, this is *it*. This is the actual core code that does something 
interesting.

I've tried to explain the behaviour in the comment, and let's face it, the 
patch is really really simple (yeah, 26 new lines but they are all really 
trivial and over half of them of them are actually the comments about 
what is going on).

The core of the code itself is just two lines, really, but it's all 
wrapped in a helper function and I tried to make it be really really 
obvious what is going on!

The reason why "src_index" had to become non-const is stupid: it's not 
because we actually do anything that really writes to the index, but the 
lazy index name hashing code means that even just a name lookup will 
possibly create the name hash in the index.

Oh well.

 unpack-trees.c |   26 ++++++++++++++++++++++++++
 unpack-trees.h |    2 +-
 2 files changed, 27 insertions(+), 1 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index bf7d8f6..95d3413 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -521,6 +521,22 @@ static int verify_clean_subdirectory(struct cache_entry *ce, const char *action,
 }
 
 /*
+ * This gets called when there was no index entry for the tree entry 'dst',
+ * but we found a file in the working tree that 'lstat()' said was fine,
+ * and we're on a case-insensitive filesystem.
+ *
+ * See if we can find a case-insensitive match in the index that also
+ * matches the stat information, and assume it's that other file!
+ */
+static int icase_exists(struct unpack_trees_options *o, struct cache_entry *dst, struct stat *st)
+{
+	struct cache_entry *src;
+
+	src = index_name_exists(o->src_index, dst->name, ce_namelen(dst), 1);
+	return src && !ie_match_stat(o->src_index, src, st, CE_MATCH_IGNORE_VALID);
+}
+
+/*
  * We do not want to remove or overwrite a working tree file that
  * is not tracked, unless it is ignored.
  */
@@ -540,6 +556,16 @@ static int verify_absent(struct cache_entry *ce, const char *action,
 		int dtype = ce_to_dtype(ce);
 		struct cache_entry *result;
 
+		/*
+		 * It may be that the 'lstat()' succeeded even though
+		 * target 'ce' was absent, because there is an old
+		 * entry that is different only in case..
+		 *
+		 * Ignore that lstat() if it matches.
+		 */
+		if (ignore_case && icase_exists(o, ce, &st))
+			return 0;
+
 		if (o->dir && excluded(o->dir, ce->name, &dtype))
 			/*
 			 * ce->name is explicitly excluded, so it is Ok to
diff --git a/unpack-trees.h b/unpack-trees.h
index ad8cc65..d436d6c 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -31,7 +31,7 @@ struct unpack_trees_options {
 	void *unpack_data;
 
 	struct index_state *dst_index;
-	const struct index_state *src_index;
+	struct index_state *src_index;
 	struct index_state result;
 };
 
-- 
1.5.5.rc0.28.g61a0.dirty

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 7/7] Make unpack-tree update removed files before any updated files
  2008-03-22 17:38           ` [PATCH 6/7] Make branch merging aware of underlying case-insensitive filsystems Linus Torvalds
@ 2008-03-22 17:45             ` Linus Torvalds
  2008-03-22 18:06               ` [PATCH 0/7] Final words Linus Torvalds
                                 ` (2 more replies)
  2008-03-23  6:13             ` [PATCH 6/7] Make branch merging aware of underlying case-insensitive filsystems Junio C Hamano
  1 sibling, 3 replies; 29+ messages in thread
From: Linus Torvalds @ 2008-03-22 17:45 UTC (permalink / raw
  To: Junio C Hamano, Git Mailing List; +Cc: Frank, Dmitry Potapov


From: Linus Torvalds <torvalds@woody.linux-foundation.org>
Date: Sat, 22 Mar 2008 09:48:41 -0700

This is immaterial on sane filesystems, but if you have a broken (aka
case-insensitive) filesystem, and the objective is to remove the file
'abc' and replace it with the file 'Abc', then we must make sure to do
the removal first.

Otherwise, you'd first update the file 'Abc' - which would just
overwrite the file 'abc' due to the broken case-insensitive filesystem -
and then remove file 'abc' - which would now brokenly remove the just
updated file 'Abc' on that broken filesystem.

By doing removals first, this won't happen.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

Ok, this one looks - and is, really - trivial, but it's actually the only 
one in the whole series that I'm even remotely nervous about. First off, 
it actually does what it does regardless of that "core.ignorecase" 
variable, but that wouldn't worry me if it wasn't for the fact that I 
don't remember/understand what the heck that "last_symlink" logic was 
there for.

I *think* the logic is only for removal, and that splitting up the single 
loop to be two loops is totally safe and actually cleans things up, but I 
really want somebody to take a look at this. 

This patch is important, because without it you can't reliably switch 
between branches with case-aliases on a case-insensitive filesystem, and 
strictly speaking I should have put it before the previous patch, but I 
put it last because of this worry of mine. Patches 1-6 I think are totally 
obvious and ready to go at any point. This one I really want Junio to 
double-check.

The patch itself is really really trivial. We used to do both removal and 
file updates in one phase, we just split it into two. So I don't worry 
about the code, I only worry about that "last_symlink" thing.

Anyway, this concludes the series. It's not _complete_ - the 
case-independent compare function is a joke and only gets US-ASCII 
correct, and there are other cases we will want to fix up. But in the end, 
I think this series of seven trivial patches really does make git somewhat 
aware of broken filesystems at a very core level.

 unpack-trees.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 95d3413..feae846 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -79,16 +79,21 @@ static int check_updates(struct unpack_trees_options *o)
 	for (i = 0; i < index->cache_nr; i++) {
 		struct cache_entry *ce = index->cache[i];
 
-		if (ce->ce_flags & (CE_UPDATE | CE_REMOVE))
-			display_progress(progress, ++cnt);
 		if (ce->ce_flags & CE_REMOVE) {
+			display_progress(progress, ++cnt);
 			if (o->update)
 				unlink_entry(ce->name, last_symlink);
 			remove_index_entry_at(&o->result, i);
 			i--;
 			continue;
 		}
+	}
+
+	for (i = 0; i < index->cache_nr; i++) {
+		struct cache_entry *ce = index->cache[i];
+
 		if (ce->ce_flags & CE_UPDATE) {
+			display_progress(progress, ++cnt);
 			ce->ce_flags &= ~CE_UPDATE;
 			if (o->update) {
 				errs |= checkout_entry(ce, &state, NULL);
-- 
1.5.5.rc0.28.g61a0.dirty

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] Make unpack_trees_options bit flags actual bitfields
  2008-03-22 17:36   ` [PATCH 1/7] Make unpack_trees_options bit flags actual bitfields Johannes Schindelin
@ 2008-03-22 17:47     ` Linus Torvalds
  2008-03-22 17:57       ` Johannes Schindelin
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2008-03-22 17:47 UTC (permalink / raw
  To: Johannes Schindelin
  Cc: Junio C Hamano, Git Mailing List, Frank, Dmitry Potapov



On Sat, 22 Mar 2008, Johannes Schindelin wrote:
> 
> Any point in doing the "From:" and "Date:" inside the mail body?

I encourage people to do that for the kernel ("From:" in particular), 
because while git-am will pick them up from the email headers, that is 
only true if the emails don't get passed around and commented upon by 
others.

Now, in git, the chain-of-command is very short (everybody -> Junio), so 
it really doesn't matter, but in the kernel, when people send out patches 
like this, the patches may be forwarded by others (who add their sign-off 
lines etc), and then it's really good to have that "From:" line in 
particular in the body of the email, because it's more likely that it will 
remain there (otherwise we depend on the person who signs off and forwards 
it to add it!)

			Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/7] Make unpack_trees_options bit flags actual bitfields
  2008-03-22 17:47     ` Linus Torvalds
@ 2008-03-22 17:57       ` Johannes Schindelin
  0 siblings, 0 replies; 29+ messages in thread
From: Johannes Schindelin @ 2008-03-22 17:57 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List, Frank, Dmitry Potapov

Hi,

On Sat, 22 Mar 2008, Linus Torvalds wrote:

> On Sat, 22 Mar 2008, Johannes Schindelin wrote:
> 
> > Any point in doing the "From:" and "Date:" inside the mail body?
> 
> I encourage people to do that for the kernel ("From:" in particular), 
> because while git-am will pick them up from the email headers, that is 
> only true if the emails don't get passed around and commented upon by 
> others.

Okay.

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 0/7] Final words
  2008-03-22 17:45             ` [PATCH 7/7] Make unpack-tree update removed files before any updated files Linus Torvalds
@ 2008-03-22 18:06               ` Linus Torvalds
  2008-03-22 18:28                 ` Linus Torvalds
  2008-03-22 21:19               ` [PATCH 8/7] When adding files to the index, add support for case-independent matches Linus Torvalds
  2008-03-23  5:49               ` [PATCH 7/7] Make unpack-tree update removed files before any updated files Junio C Hamano
  2 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2008-03-22 18:06 UTC (permalink / raw
  To: Junio C Hamano, Git Mailing List; +Cc: Frank, Dmitry Potapov



So the whole patch series looks like this:

	 Makefile            |    1 +
	 builtin-read-tree.c |    2 +-
	 cache.h             |   36 +++++++++-------
	 config.c            |    5 ++
	 dir.c               |    2 +-
	 environment.c       |    1 +
	 name-hash.c         |  119 +++++++++++++++++++++++++++++++++++++++++++++++++++
	 read-cache.c        |   65 +--------------------------
	 unpack-trees.c      |   43 ++++++++++++++++---
	 unpack-trees.h      |   22 +++++-----
	 10 files changed, 199 insertions(+), 97 deletions(-)
	 create mode 100644 name-hash.c

and clearly does add more lines than it deletes, but it all really is 
pretty simple, and none of this is rocket science or even very intrusive. 
What took me longest to do was not the actual code itself, but to get 
_just_ the right approach so that the end result would be as simple and 
nonintrusive as possible. That core patch 6/7 was redone at least ten 
times before I was happy with it.

Anyway, perhaps exactly because I tried very hard to make it all make 
sense, I'm actually very very happy with the patch. I suspect it's too 
late for v1.5.5 even if I think all the patches are really simple, but I'm 
hoping it can go into at least "pu" and have people actually *test* it.

Talking about testing, the kind of safety I wanted to get with this patch 
is perhaps best described by the tests I did not on case-insensitive 
filesystems, but on regular *good* filesystems together with setting the 
"core.ignorecase" config variable.

Here's an example of how that patch 6/7 works and tries to be really 
careful even on a case-sensitive filesystem:

	mkdir test-case
	cd test-case
	git init
	git config core.ignorecase true

	echo "File" > File
	git add File
	git commit -m "Create 'File'"

	git checkout -b other
	git rm File
	echo "file" > file
	git add file
	git commit -m "Create 'file'"

	echo "File" > File
	git checkout master

and now it complains about

	error: Untracked working tree file 'File' would be overwritten by merge.

which is correct, because while it is doing its case-insensitivity checks, 
it also noticed that "File" did *not* match the stat information for 
'file', so it really _is_ an untracked working tree file.

So it's actually trying to be a lot more careful than just saying "ok, we 
already know about 'File'". See what happens next:

	rm File
	ln file File
	git checkout master

and now it very happily did the switch to master, even though 'File' got 
overwritten, because now it again found that untracked file 'File', but 
now it could match it up *exactly* against the case-insensitive file 
'file', so git was happy that it wasn't actually throwing away any info, 
and the fact that it overwrite 'File' was ok, because it considered it the 
same file as 'file'.

So the whole thing is not only able to handle these name aliases, it 
actually handles them by checking that it's safe.

Final note: I also did notice that I didn't fix the 'git add" case like I 
thought I did, it currently only fixes "git status". So I still want to 
fix "git add" and "git mv" to do the right thing when there are case- 
insensitive aliases, but that's a separate issue from this particular 
series..

		Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/7] Final words
  2008-03-22 18:06               ` [PATCH 0/7] Final words Linus Torvalds
@ 2008-03-22 18:28                 ` Linus Torvalds
  0 siblings, 0 replies; 29+ messages in thread
From: Linus Torvalds @ 2008-03-22 18:28 UTC (permalink / raw
  To: Junio C Hamano, Git Mailing List; +Cc: Frank, Dmitry Potapov



On Sat, 22 Mar 2008, Linus Torvalds wrote:
> 
> Final note: I also did notice that I didn't fix the 'git add" case like I 
> thought I did, it currently only fixes "git status". So I still want to 
> fix "git add" and "git mv" to do the right thing when there are case- 
> insensitive aliases, but that's a separate issue from this particular 
> series..

git-add will want more than this, but this is an example of what we should 
do - if 'ignore_case' is set, we probably should disallow adding the same 
case-insensitive name twice to the index.

This does *not* guarantee that the index never would have aliases when 
core.ignorecase is set, since the index might have been populated from a 
tree that was generated on a sane filesystem, and we still allow that, but 
things like this are probably good things to do for projects that want to 
work case-insensitively.

So even if you have a case-sensitive filesystem, the goal (I think) should 
be that you can set core.ignorecase to true, and that should help you work 
with other people who may be stuck on case-insensitive crud.

Anyway, the reason "git add" didn't actually work with the simple change 
to dir_add_name() is that "git add" doesn't load the index until *after* 
it has done the directory traversal (because it actually *wants* to see 
files that are already in the index). 

Something like this at least disallows the dual add if the case has 
changed.

		Linus

----
 read-cache.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 5dc998d..6aee6e0 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -476,6 +476,13 @@ int add_file_to_index(struct index_state *istate, const char *path, int verbose)
 		return 0;
 	}
 
+	if (ignore_case) {
+		struct cache_entry *alias;
+		alias = index_name_exists(istate, ce->name, ce_namelen(ce), 1);
+		if (alias)
+			die("Will not add file alias '%s' ('%s' already exists in index)", ce->name, alias->name);
+	}
+
 	if (index_path(ce->sha1, path, &st, 1))
 		die("unable to index file %s", path);
 	if (add_index_entry(istate, ce, ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE))

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 8/7] When adding files to the index, add support for case-independent matches
  2008-03-22 17:45             ` [PATCH 7/7] Make unpack-tree update removed files before any updated files Linus Torvalds
  2008-03-22 18:06               ` [PATCH 0/7] Final words Linus Torvalds
@ 2008-03-22 21:19               ` Linus Torvalds
  2008-03-22 21:22                 ` [PATCH 9/7] Make git-add behave more sensibly in a case-insensitive environment Linus Torvalds
  2008-03-23  5:49               ` [PATCH 7/7] Make unpack-tree update removed files before any updated files Junio C Hamano
  2 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2008-03-22 21:19 UTC (permalink / raw
  To: Junio C Hamano, Git Mailing List; +Cc: Frank, Dmitry Potapov


From: Linus Torvalds <torvalds@woody.linux-foundation.org>
Date: Sat, 22 Mar 2008 13:19:49 -0700

This simplifies the matching case of "I already have this file and it is 
up-to-date" and makes it do the right thing in the face of 
case-insensitive aliases.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

This is patch 1/2 to make "git add" act better. Discard the previous 
simplistic and overly die()-eager patch that I hadn't signed off on.

 read-cache.c |   12 +++++-------
 1 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 5dc998d..8c57adf 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -431,9 +431,9 @@ static int index_name_pos_also_unmerged(struct index_state *istate,
 
 int add_file_to_index(struct index_state *istate, const char *path, int verbose)
 {
-	int size, namelen, pos;
+	int size, namelen;
 	struct stat st;
-	struct cache_entry *ce;
+	struct cache_entry *ce, *alias;
 	unsigned ce_option = CE_MATCH_IGNORE_VALID|CE_MATCH_RACY_IS_DIRTY;
 
 	if (lstat(path, &st))
@@ -466,13 +466,11 @@ int add_file_to_index(struct index_state *istate, const char *path, int verbose)
 		ce->ce_mode = ce_mode_from_stat(ent, st.st_mode);
 	}
 
-	pos = index_name_pos(istate, ce->name, namelen);
-	if (0 <= pos &&
-	    !ce_stage(istate->cache[pos]) &&
-	    !ie_match_stat(istate, istate->cache[pos], &st, ce_option)) {
+	alias = index_name_exists(istate, ce->name, ce_namelen(ce), ignore_case);
+	if (alias && !ce_stage(alias) && !ie_match_stat(istate, alias, &st, ce_option)) {
 		/* Nothing changed, really */
 		free(ce);
-		ce_mark_uptodate(istate->cache[pos]);
+		ce_mark_uptodate(alias);
 		return 0;
 	}
 
-- 
1.5.5.rc0.31.gdcfd.dirty

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 9/7] Make git-add behave more sensibly in a case-insensitive environment
  2008-03-22 21:19               ` [PATCH 8/7] When adding files to the index, add support for case-independent matches Linus Torvalds
@ 2008-03-22 21:22                 ` Linus Torvalds
  0 siblings, 0 replies; 29+ messages in thread
From: Linus Torvalds @ 2008-03-22 21:22 UTC (permalink / raw
  To: Junio C Hamano, Git Mailing List; +Cc: Frank, Dmitry Potapov



From: Linus Torvalds <torvalds@woody.linux-foundation.org>
Subject: [PATCH 2/2] Make git-add behave more sensibly in a case-insensitive environment

This expands on the previous patch, and allows "git add" to sanely handle 
a filename that has changed case, keeping the case in the index constant, 
and avoiding aliases.

In particular, if you have an index entry called "File", but the
checked-out tree is case-corrupted and has an entry called "file"
instead, doing a

	git add .

(or naming "file" explicitly) will automatically notice that we have an
alias, and will replace the name "file" with the existing index
capitalization (ie "File").

However, if we actually have *both* a file called "File" and one called
"file", and they don't have the same lstat() information (ie we're on a
case-sensitive filesystem but have the "core.ignorecase" flag set), we
will error out if we try to add them both.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

The previous patch handled the "nothing changed" case, this one actually 
handles the case of the data needing to be updated.

The CE_ADDED flag is an in-memory flag that just protects a single "git 
add" invocation from changing the same alias twice. That would not be 
right, but if you do separate

	git add File
	git add file

commands, the second one will happily update the information that the 
first one added even if it was different - but

	git add File file

would be an error if they don't have the same stat() information.

 cache.h      |    1 +
 read-cache.c |   37 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 37 insertions(+), 1 deletions(-)

diff --git a/cache.h b/cache.h
index 9bce723..81727e4 100644
--- a/cache.h
+++ b/cache.h
@@ -133,6 +133,7 @@ struct cache_entry {
 #define CE_UPDATE    (0x10000)
 #define CE_REMOVE    (0x20000)
 #define CE_UPTODATE  (0x40000)
+#define CE_ADDED     (0x80000)
 
 #define CE_HASHED    (0x100000)
 #define CE_UNHASHED  (0x200000)
diff --git a/read-cache.c b/read-cache.c
index 8c57adf..26ed644 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -429,6 +429,38 @@ static int index_name_pos_also_unmerged(struct index_state *istate,
 	return pos;
 }
 
+static int different_name(struct cache_entry *ce, struct cache_entry *alias)
+{
+	int len = ce_namelen(ce);
+	return ce_namelen(alias) != len || memcmp(ce->name, alias->name, len);
+}
+
+/*
+ * If we add a filename that aliases in the cache, we will use the
+ * name that we already have - but we don't want to update the same
+ * alias twice, because that implies that there were actually two
+ * different files with aliasing names!
+ *
+ * So we use the CE_ADDED flag to verify that the alias was an old
+ * one before we accept it as 
+ */
+static struct cache_entry *create_alias_ce(struct cache_entry *ce, struct cache_entry *alias)
+{
+	int len;
+	struct cache_entry *new;
+
+	if (alias->ce_flags & CE_ADDED)
+		die("Will not add file alias '%s' ('%s' already exists in index)", ce->name, alias->name);
+
+	/* Ok, create the new entry using the name of the existing alias */
+	len = ce_namelen(alias);
+	new = xcalloc(1, cache_entry_size(len));
+	memcpy(new->name, alias->name, len);
+	copy_cache_entry(new, ce);
+	free(ce);
+	return new;
+}
+
 int add_file_to_index(struct index_state *istate, const char *path, int verbose)
 {
 	int size, namelen;
@@ -471,11 +503,14 @@ int add_file_to_index(struct index_state *istate, const char *path, int verbose)
 		/* Nothing changed, really */
 		free(ce);
 		ce_mark_uptodate(alias);
+		alias->ce_flags |= CE_ADDED;
 		return 0;
 	}
-
 	if (index_path(ce->sha1, path, &st, 1))
 		die("unable to index file %s", path);
+	if (ignore_case && alias && different_name(ce, alias))
+		ce = create_alias_ce(ce, alias);
+	ce->ce_flags |= CE_ADDED;
 	if (add_index_entry(istate, ce, ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE))
 		die("unable to add %s to index",path);
 	if (verbose)
-- 
1.5.5.rc0.31.gdcfd.dirty

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH] t0050: Set core.ignorecase case to activate case insensitivity
  2008-03-22 17:21 [PATCH 0/7] Case-insensitive filesystem support, take 1 Linus Torvalds
  2008-03-22 17:22 ` [PATCH 1/7] Make unpack_trees_options bit flags actual bitfields Linus Torvalds
@ 2008-03-22 22:01 ` Steffen Prohaska
  2008-03-25  6:57 ` [PATCH] git-init: autodetect core.ignorecase Dmitry Potapov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 29+ messages in thread
From: Steffen Prohaska @ 2008-03-22 22:01 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List, Frank, Dmitry Potapov

Case insensitive file handling is only activated by
core.ignorecase = true.  Hence, we need to set it to give the tests
in t0050 a chance to succeed.

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
---
 t/t0050-filesystem.sh |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

On Sat, 22 Mar 2008, Linus Torvalds wrote:

>  - I made this all conditional on
> 
> 	[core]
> 		ignorecase = true
> 
>    because obviously I'm not at all interested in penalizing sane 
>    filesystems. That said, I also worked at trying to make sure that it's 
>    safe and possible to do this on a case-sensitive filesystem in case you 
>    are working on a project that doesn't like case-sensitivity, so the 
>    "git status" and "git add ." kind of operations won't add aliases

With this commit applied test 2 of t0050 passes.  This is the minimal
change to make t0050 useful.  Eventually test_expect_failure should be
replaced with test_expect_success.

    Steffen

diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
index 3fbad77..cb109ff 100755
--- a/t/t0050-filesystem.sh
+++ b/t/t0050-filesystem.sh
@@ -34,6 +34,7 @@ test_expect_success 'see if we expect ' '
 
 test_expect_success "setup case tests" '
 
+	git config core.ignorecase true &&
 	touch camelcase &&
 	git add camelcase &&
 	git commit -m "initial" &&
-- 
1.5.4.4.613.gaa46e5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/7] Make unpack-tree update removed files before any updated files
  2008-03-22 17:45             ` [PATCH 7/7] Make unpack-tree update removed files before any updated files Linus Torvalds
  2008-03-22 18:06               ` [PATCH 0/7] Final words Linus Torvalds
  2008-03-22 21:19               ` [PATCH 8/7] When adding files to the index, add support for case-independent matches Linus Torvalds
@ 2008-03-23  5:49               ` Junio C Hamano
  2 siblings, 0 replies; 29+ messages in thread
From: Junio C Hamano @ 2008-03-23  5:49 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Git Mailing List, Frank, Dmitry Potapov

Linus Torvalds <torvalds@linux-foundation.org> writes:

> Ok, this one looks - and is, really - trivial, but it's actually the only 
> one in the whole series that I'm even remotely nervous about. First off, 
> it actually does what it does regardless of that "core.ignorecase" 
> variable, but that wouldn't worry me if it wasn't for the fact that I 
> don't remember/understand what the heck that "last_symlink" logic was 
> there for.

last_symlink is just a cached information used by underlying
has_symlink_leading_path() function for optimization.

The motivation behind has_symlink_leading_path() is reasonably well
described in

 - f859c84: Add has_symlink_leading_path() function., 2007-05-11,

 - 64cab59: apply: do not get confused by symlinks in the middle,
   2007-05-11, and

 - 16a4c61: read-tree -m -u: avoid getting confused by intermediate
   symlinks., 2007-05-10

The short version is that:

 - sometimes we want to make sure a path a/b/c/d exists (or does not
   exist) in the work tree;

 - however, !lstat("a/b/c/d") is not quite it.  if a/b is a symlink in the
   work tree that points at somewhere that happens to have c/d underneath,
   !lstat() says "yeah, there is", but that one is _different_ from what
   checking out a cache entry a/b/c/d would produce (because in that case
   we will remove a/b symlink, create a/b/ directory and deposit blob b
   there).

 - So we often need to see if a given path has symlink component in the
   leading part in the work tree (e.g. given "a/b/c/d", we would need to
   check if any of "a", "a/b", "a/b/c" is a symlink).

The function has_symlink_leading_path() answers that question, and its
second argument is a buffer to cache "the last work tree path found to be
a symlink", so if you call it with "a/b/c/d" and then with "a/b/c/e" in
the above example situation, the second call can re-use the information
the first call found out, which is "a/b is a symlink".


I do not think your patch breaks the passing around of last_symlink cached
information.  Although the three commits I quoted above are all backed by
real-world breakage cases that they did fix, the issues they deal with are
indeed tricky cases.  Although your patch (the change in 7/7) should not
make any difference to the issues, thinking about them is already making
me feel nervous.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/7] Make branch merging aware of underlying case-insensitive filsystems
  2008-03-22 17:38           ` [PATCH 6/7] Make branch merging aware of underlying case-insensitive filsystems Linus Torvalds
  2008-03-22 17:45             ` [PATCH 7/7] Make unpack-tree update removed files before any updated files Linus Torvalds
@ 2008-03-23  6:13             ` Junio C Hamano
  2008-03-23 15:41               ` Linus Torvalds
  1 sibling, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2008-03-23  6:13 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Git Mailing List, Frank, Dmitry Potapov

Linus Torvalds <torvalds@linux-foundation.org> writes:

> @@ -540,6 +556,16 @@ static int verify_absent(struct cache_entry *ce, const char *action,
>  		int dtype = ce_to_dtype(ce);
>  		struct cache_entry *result;
>  
> +		/*
> +		 * It may be that the 'lstat()' succeeded even though
> +		 * target 'ce' was absent, because there is an old
> +		 * entry that is different only in case..
> +		 *
> +		 * Ignore that lstat() if it matches.
> +		 */
> +		if (ignore_case && icase_exists(o, ce, &st))
> +			return 0;
> +

It may well be the case that this lstat() returning success was caused by
a ghost match with a file with different case, and I think it is the right
thing to say "no, it does not exist" if that is the case.

I wonder what happens when the file with the same case does exist that we
are trying to make sure is missing?

As far as I can tell, icase_exists() does not ask "does a file with this
name in different case exist, and a file with this exact case doesn't?"
but asks "does a file with this name, or another name that is different
only in case, exist?".

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/7] Make branch merging aware of underlying case-insensitive filsystems
  2008-03-23  6:13             ` [PATCH 6/7] Make branch merging aware of underlying case-insensitive filsystems Junio C Hamano
@ 2008-03-23 15:41               ` Linus Torvalds
  0 siblings, 0 replies; 29+ messages in thread
From: Linus Torvalds @ 2008-03-23 15:41 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Git Mailing List, Frank, Dmitry Potapov



On Sat, 22 Mar 2008, Junio C Hamano wrote:
> 
> I wonder what happens when the file with the same case does exist that we
> are trying to make sure is missing?

Can't happen. This whole code-path only triggers if the entry didn't exist 
in the index when we merge a tree.

So we know a priori that the source index didn't contain the thing.

> As far as I can tell, icase_exists() does not ask "does a file with this
> name in different case exist, and a file with this exact case doesn't?"
> but asks "does a file with this name, or another name that is different
> only in case, exist?".

Correct. But see the call chain - this thing is only called if index is 
NULL, ie "there was no entry in the index".

So in this case, the other comment (above "icase_exists()") talks about 
that:

	This gets called when there was no index entry for the tree entry 
	'dst', but we found a file in the working tree that 'lstat()' said 
	was fine, [...]

and you can verify that "verify_absent()" only gets called by things where 
"index" was NULL (only three callers, and two of them are expressly inside 
a "if (!old)" case, and the third one is right after a "if (index) return"
statement.

[ There's _one_ special case: the "index" thing may have been NULL not 
  because there was no path in the source index, but because we didn't 
  even look at the index in the first place! So strictly speaking, we 
  should have a test for "o->merge" being set, but afaik that must always 
  be true if we have "o->update" set, and again, this logic only triggers 
  for that case.

  So the only case that doesn't set "o->merge" to get the index is 
  "builtin-read-tree.c" when you do a plain tree-only merge, but that one 
  has

	if ((opts.update||opts.index_only) && !opts.merge)
		usage(read_tree_usage);

  to make sure that you cannot update the working tree without taking the 
  index into account ]

Anyway, I think it's all good. 

			Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH] git-init: autodetect core.ignorecase
  2008-03-22 17:21 [PATCH 0/7] Case-insensitive filesystem support, take 1 Linus Torvalds
  2008-03-22 17:22 ` [PATCH 1/7] Make unpack_trees_options bit flags actual bitfields Linus Torvalds
  2008-03-22 22:01 ` [PATCH] t0050: Set core.ignorecase case to activate case insensitivity Steffen Prohaska
@ 2008-03-25  6:57 ` Dmitry Potapov
  2008-03-25  9:59   ` Johannes Schindelin
  2008-03-25  8:14 ` [PATCH 0/7] Case-insensitive filesystem support, take 1 Dmitry Potapov
  2008-03-25 11:39 ` Derek Fawcus
  4 siblings, 1 reply; 29+ messages in thread
From: Dmitry Potapov @ 2008-03-25  6:57 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Git Mailing List, Linus Torvalds, Frank, Dmitry Potapov

We already detect if symbolic links are supported by the filesystem.
This patch adds autodetect for case-insensitive filesystems, such
as VFAT and others.

Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
---

This patch goes on top of lt/case-insensitive

 builtin-init-db.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/builtin-init-db.c b/builtin-init-db.c
index 79eaf8d..62f7c08 100644
--- a/builtin-init-db.c
+++ b/builtin-init-db.c
@@ -254,8 +254,8 @@ static int create_default_files(const char *git_dir, const char *template_path)
 			git_config_set("core.worktree", work_tree);
 	}
 
-	/* Check if symlink is supported in the work tree */
 	if (!reinit) {
+		/* Check if symlink is supported in the work tree */
 		path[len] = 0;
 		strcpy(path + len, "tXXXXXX");
 		if (!close(xmkstemp(path)) &&
@@ -266,6 +266,12 @@ static int create_default_files(const char *git_dir, const char *template_path)
 			unlink(path); /* good */
 		else
 			git_config_set("core.symlinks", "false");
+
+		/* Check if the filesystem is case-insensitive */
+		path[len] = 0;
+		strcpy(path + len, "CoNfIg");
+		if (access(path, F_OK))
+			git_config_set("core.ignorecase", "true");
 	}
 
 	return reinit;
-- 
1.5.5.rc1.10.g3948

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/7] Case-insensitive filesystem support, take 1
  2008-03-22 17:21 [PATCH 0/7] Case-insensitive filesystem support, take 1 Linus Torvalds
                   ` (2 preceding siblings ...)
  2008-03-25  6:57 ` [PATCH] git-init: autodetect core.ignorecase Dmitry Potapov
@ 2008-03-25  8:14 ` Dmitry Potapov
  2008-03-25 21:04   ` Linus Torvalds
  2008-03-25 11:39 ` Derek Fawcus
  4 siblings, 1 reply; 29+ messages in thread
From: Dmitry Potapov @ 2008-03-25  8:14 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List, Frank

On Sat, Mar 22, 2008 at 10:21:05AM -0700, Linus Torvalds wrote:
> 
>  - I've tested this series, both on a case-sensitive one (using hardlinks 
>    to test corner cases) and on a vfat filesystem under Linux (which is 
>    case-insensitive and *really* odd wrt case preservation - it remembers 
>    the name of removed files, so it preserves case even across removal and 
>    re-creation!)

I also have observed this problem with VFAT on Linux, but the effect was not
stable. It looks like old information is preserved somewhere in caches...

Anyway, I have tested this series of patches a bit on Windows and so far
I have found the following:

- merge different branches were two file names are only differ by case
  will cause that the result branch has two file names that differ only
  by case and one of them will be overwritten by the other and shown as
  modified in the worktree by git status.

- git status cares only about case-insensitivity only for files and not
  for directories. Thus, if case of letters in a directory name is changed
  then this directory will be shown as untracked.

- pattern specified in .gitignore are match as case-sensitive despite
  core.ignorecase set to true.

Personally, I don't care about any of the above issues much as I rarely
work on Windows and when I do, I always check that all filenames are in
low case except Makefile (and a few more exceptions). So, I have never
had any problem with using Git on case-insensitive system...

Dmitry

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] git-init: autodetect core.ignorecase
  2008-03-25  6:57 ` [PATCH] git-init: autodetect core.ignorecase Dmitry Potapov
@ 2008-03-25  9:59   ` Johannes Schindelin
  2008-03-25 10:49     ` [PATCH v2] " Dmitry Potapov
  2008-03-25 11:03     ` [PATCH] " Dmitry Potapov
  0 siblings, 2 replies; 29+ messages in thread
From: Johannes Schindelin @ 2008-03-25  9:59 UTC (permalink / raw
  To: Dmitry Potapov; +Cc: Junio C Hamano, Git Mailing List, Linus Torvalds, Frank

Hi,

On Tue, 25 Mar 2008, Dmitry Potapov wrote:

> diff --git a/builtin-init-db.c b/builtin-init-db.c
> index 79eaf8d..62f7c08 100644
> --- a/builtin-init-db.c
> +++ b/builtin-init-db.c
> @@ -254,8 +254,8 @@ static int create_default_files(const char *git_dir, const char *template_path)
>  			git_config_set("core.worktree", work_tree);
>  	}
>  
> -	/* Check if symlink is supported in the work tree */
>  	if (!reinit) {
> +		/* Check if symlink is supported in the work tree */
>  		path[len] = 0;
>  		strcpy(path + len, "tXXXXXX");
>  		if (!close(xmkstemp(path)) &&
> @@ -266,6 +266,12 @@ static int create_default_files(const char *git_dir, const char *template_path)
>  			unlink(path); /* good */
>  		else
>  			git_config_set("core.symlinks", "false");
> +
> +		/* Check if the filesystem is case-insensitive */
> +		path[len] = 0;
> +		strcpy(path + len, "CoNfIg");
> +		if (access(path, F_OK))
> +			git_config_set("core.ignorecase", "true");

Clever!

Last time I checked, the "HEAD" file on VFAT was converted to "head" when 
the repository was initialised on Win32 (IIRC) and read on Linux (IIRC).  
Maybe this problem has gone away, but if not, it should definitely be 
fixed (depending on core.ignorecase).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2] git-init: autodetect core.ignorecase
  2008-03-25  9:59   ` Johannes Schindelin
@ 2008-03-25 10:49     ` Dmitry Potapov
  2008-03-25 11:03     ` [PATCH] " Dmitry Potapov
  1 sibling, 0 replies; 29+ messages in thread
From: Dmitry Potapov @ 2008-03-25 10:49 UTC (permalink / raw
  To: Junio C Hamano
  Cc: Johannes Schindelin, Git Mailing List, Linus Torvalds, Frank

We already detect if symbolic links are supported by the filesystem.
This patch adds autodetect for case-insensitive filesystems, such
as VFAT and others.

Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
---

There was a stupid mistake in a previous patch...  (Inadvertantly,
I sent the old version of my patch, which was incorrect, and did not
realize that until saw Johannes' reply to my email.)

 builtin-init-db.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/builtin-init-db.c b/builtin-init-db.c
index 79eaf8d..62f7c08 100644
--- a/builtin-init-db.c
+++ b/builtin-init-db.c
@@ -254,8 +254,8 @@ static int create_default_files(const char *git_dir, const char *template_path)
 			git_config_set("core.worktree", work_tree);
 	}
 
-	/* Check if symlink is supported in the work tree */
 	if (!reinit) {
+		/* Check if symlink is supported in the work tree */
 		path[len] = 0;
 		strcpy(path + len, "tXXXXXX");
 		if (!close(xmkstemp(path)) &&
@@ -266,6 +266,12 @@ static int create_default_files(const char *git_dir, const char *template_path)
 			unlink(path); /* good */
 		else
 			git_config_set("core.symlinks", "false");
+
+		/* Check if the filesystem is case-insensitive */
+		path[len] = 0;
+		strcpy(path + len, "CoNfIg");
+		if (!access(path, F_OK))
+			git_config_set("core.ignorecase", "true");
 	}
 
 	return reinit;
-- 
1.5.5.rc1.10.g3948

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] git-init: autodetect core.ignorecase
  2008-03-25  9:59   ` Johannes Schindelin
  2008-03-25 10:49     ` [PATCH v2] " Dmitry Potapov
@ 2008-03-25 11:03     ` Dmitry Potapov
  1 sibling, 0 replies; 29+ messages in thread
From: Dmitry Potapov @ 2008-03-25 11:03 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: Git Mailing List

On Tue, Mar 25, 2008 at 10:59:00AM +0100, Johannes Schindelin wrote:
> 
> Last time I checked, the "HEAD" file on VFAT was converted to "head" when 
> the repository was initialised on Win32 (IIRC) and read on Linux (IIRC).  

It is result of mounting the VFAT disk on Linux with shortname=lower,
which is the default :( Perhaps, it was a reasonable default for DOS
days, but now it only creates troubles. I usually mount VFAT disks with
shortname=winnt on Linux, though some people prefer shortname=mixed.

Dmitry

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/7] Case-insensitive filesystem support, take 1
  2008-03-22 17:21 [PATCH 0/7] Case-insensitive filesystem support, take 1 Linus Torvalds
                   ` (3 preceding siblings ...)
  2008-03-25  8:14 ` [PATCH 0/7] Case-insensitive filesystem support, take 1 Dmitry Potapov
@ 2008-03-25 11:39 ` Derek Fawcus
  2008-03-25 18:26   ` Jan Hudec
  4 siblings, 1 reply; 29+ messages in thread
From: Derek Fawcus @ 2008-03-25 11:39 UTC (permalink / raw
  To: git

On Sat, Mar 22, 2008 at 10:21:05AM -0700, Linus Torvalds wrote:
>    ... and on a vfat filesystem under Linux (which is 
>    case-insensitive and *really* odd wrt case preservation - it remembers 
>    the name of removed files, so it preserves case even across removal and 
>    re-creation!)

Interesting.
That sounds a bit like the claimed windows 95 properly of 'tunneling' renames.
ISTR that it was to catch a move via 'shortname' which then preserved the longname.

However I'd have expected the Linux version to always use the long name...

DF

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/7] Case-insensitive filesystem support, take 1
  2008-03-25 11:39 ` Derek Fawcus
@ 2008-03-25 18:26   ` Jan Hudec
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Hudec @ 2008-03-25 18:26 UTC (permalink / raw
  To: Derek Fawcus; +Cc: git

On Tue, Mar 25, 2008 at 11:39:56 +0000, Derek Fawcus wrote:
> On Sat, Mar 22, 2008 at 10:21:05AM -0700, Linus Torvalds wrote:
> >    ... and on a vfat filesystem under Linux (which is 
> >    case-insensitive and *really* odd wrt case preservation - it remembers 
> >    the name of removed files, so it preserves case even across removal and 
> >    re-creation!)
> 
> Interesting.
> That sounds a bit like the claimed windows 95 properly of 'tunneling' renames.
> ISTR that it was to catch a move via 'shortname' which then preserved the longname.
> 
> However I'd have expected the Linux version to always use the long name...

... if it's there -- which it might not. Linux may create it always, but
Windows will not if they think they don't need to.

-- 
						 Jan 'Bulb' Hudec <bulb@ucw.cz>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/7] Case-insensitive filesystem support, take 1
  2008-03-25  8:14 ` [PATCH 0/7] Case-insensitive filesystem support, take 1 Dmitry Potapov
@ 2008-03-25 21:04   ` Linus Torvalds
  2008-03-26  2:46     ` Dmitry Potapov
  0 siblings, 1 reply; 29+ messages in thread
From: Linus Torvalds @ 2008-03-25 21:04 UTC (permalink / raw
  To: Dmitry Potapov; +Cc: Junio C Hamano, Git Mailing List, Frank



On Tue, 25 Mar 2008, Dmitry Potapov wrote:
> 
> - merge different branches were two file names are only differ by case
>   will cause that the result branch has two file names that differ only
>   by case and one of them will be overwritten by the other and shown as
>   modified in the worktree by git status.

Ok. So there's two issues here:

 - the git trees themselves had two different names

   This is not something I'm *ever* planning on changing. All my "case 
   insensitive" patches were about the *working*tree*, not about core git 
   data structures themselves.

   In other words: git itself is internally very much a case-sensitive 
   program, and the index and the trees are case-sensitive and will remain 
   so forever as far as I'm concerned. So when you do a tree-level merge 
   of two trees that have two different names that are equivalent in case, 
   git will create a result that has those two names. Because git itself 
   is not case-insensitive.

 - HOWEVER - when checking things out, we should probably notice that 
   we're now writing the two different files out and over-writing one of 
   them, and fail at that stage. I don't know what a good failure 
   behaviour would be, though. I'll have to think about it.

IOW, all my case-insensitivity checking was very much designed to be about 
the working tree, not about git internal representations. Put another way, 
they should really only affect code that does "lstat()" to check whether 
a file exists or code that does "open()" to open/create a file.

> - git status cares only about case-insensitivity only for files and not
>   for directories. Thus, if case of letters in a directory name is changed
>   then this directory will be shown as untracked.

Ahh, yes. This is sad. It comes about because while we can look up whole 
names in the index case-insensitively, we have no equivalent way to look 
up individual path components, so that still uses the "index_name_pos()" 
model and then looking aroung the failure point to see if we hit a 
subdirectory. Remember: the index doesn't actually contain directories at 
all, just lists of files.

This will not be trivial to fix. 

> - pattern specified in .gitignore are match as case-sensitive despite
>   core.ignorecase set to true.

This should probably be fairly straightforward. All the logic here is in 
the function "excluded_1()" in dir.c - and largely it would be about 
changing that "strcmp()" into a "strcasecmp()" and using the FNM_CASEFOLD 
flag to fnmatch().

The only half-way subtle issues here are

 - do we really want to use strcasecmp() (which may match differently than 
   our hashing matches!) or do we want to expand on our icase_cmp() or 
   similar in hash-name.c (I think the latter is the right thing, but it 
   requires a bit more work)

 - FNM_CASEFOLD has the same issue, but also adds the wrinkle of being a 
   GNU-only extension. Which is sad, since most systems that have glibc 
   would never need it in the first place. So then we get back to the 
   whole issue of maybe having to reimplement 'fnmatch()', or at least a 
   subset of it that git uses.

So that last issue is conceptually simple and straightforward to fix, but 
fixing it right would almost certainly be a few hundred lines of code 
(fnmatch() in particular is nasty if you want to do all the cases, but 
perhaps just '*' is ok?).

The first two issues are nontrivial.

			Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/7] Case-insensitive filesystem support, take 1
  2008-03-25 21:04   ` Linus Torvalds
@ 2008-03-26  2:46     ` Dmitry Potapov
  2008-03-26  3:37       ` Linus Torvalds
  0 siblings, 1 reply; 29+ messages in thread
From: Dmitry Potapov @ 2008-03-26  2:46 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List, Frank

On Tue, Mar 25, 2008 at 02:04:58PM -0700, Linus Torvalds wrote:
> 
> IOW, all my case-insensitivity checking was very much designed to be about 
> the working tree, not about git internal representations. Put another way, 
> they should really only affect code that does "lstat()" to check whether 
> a file exists or code that does "open()" to open/create a file.

Of course, case-insensitivity is about the working tree only. But when
I merge another branch to the current one, git normally checks that it
is not going to overwrite existing files in the *work tree* and refuses
to do the merge if some files may be overwritten.

So if I work on a case-insensitive filesystem and have a file in a different
case and core.ignorecase=false, then the merge fails as expected!

But core.ignorecase=true, which is supposed to do a better job for case-
insensitive filesystems, actually causes the problem here.

Here is my test script:

====
mkdir git-test
cd git-test
git init
git config core.ignorecase true
echo foo > foo
git add foo
git commit -m 'initial commit'

git checkout -b other

echo file > file
git add file
git commit -m 'add file'

git checkout master
echo File > File
git add File
git commit -m 'add File'

# I expect merge to fail here... and it does fail if core.ignorecase
# is set to false, but with core.ignorecase = true, git will overwrite
# 'File'.
# git config core.ignorecase false
git merge other
===

Dmitry

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/7] Case-insensitive filesystem support, take 1
  2008-03-26  2:46     ` Dmitry Potapov
@ 2008-03-26  3:37       ` Linus Torvalds
  0 siblings, 0 replies; 29+ messages in thread
From: Linus Torvalds @ 2008-03-26  3:37 UTC (permalink / raw
  To: Dmitry Potapov; +Cc: Junio C Hamano, Git Mailing List, Frank



On Wed, 26 Mar 2008, Dmitry Potapov wrote:
> 
> Of course, case-insensitivity is about the working tree only. But when
> I merge another branch to the current one, git normally checks that it
> is not going to overwrite existing files in the *work tree* and refuses
> to do the merge if some files may be overwritten.

I do agree - but this is not really about the *merge*. It is about the 
checkout.

Imagine that the merge had been done on a sane filesystem, and you're just 
pulling it or cloning it. No merge on your machine, but the problem is the 
same.

		Linus

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2008-03-26  3:40 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-22 17:21 [PATCH 0/7] Case-insensitive filesystem support, take 1 Linus Torvalds
2008-03-22 17:22 ` [PATCH 1/7] Make unpack_trees_options bit flags actual bitfields Linus Torvalds
2008-03-22 17:25   ` [PATCH 2/7] Move name hashing functions into a file of its own Linus Torvalds
2008-03-22 17:28     ` [PATCH 3/7] Make "index_name_exists()" return the cache_entry it found Linus Torvalds
2008-03-22 17:30       ` [PATCH 4/7] Make hash_name_lookup able to do case-independent lookups Linus Torvalds
2008-03-22 17:33         ` [PATCH 5/7] Add 'core.ignorecase' option Linus Torvalds
2008-03-22 17:38           ` [PATCH 6/7] Make branch merging aware of underlying case-insensitive filsystems Linus Torvalds
2008-03-22 17:45             ` [PATCH 7/7] Make unpack-tree update removed files before any updated files Linus Torvalds
2008-03-22 18:06               ` [PATCH 0/7] Final words Linus Torvalds
2008-03-22 18:28                 ` Linus Torvalds
2008-03-22 21:19               ` [PATCH 8/7] When adding files to the index, add support for case-independent matches Linus Torvalds
2008-03-22 21:22                 ` [PATCH 9/7] Make git-add behave more sensibly in a case-insensitive environment Linus Torvalds
2008-03-23  5:49               ` [PATCH 7/7] Make unpack-tree update removed files before any updated files Junio C Hamano
2008-03-23  6:13             ` [PATCH 6/7] Make branch merging aware of underlying case-insensitive filsystems Junio C Hamano
2008-03-23 15:41               ` Linus Torvalds
2008-03-22 17:36   ` [PATCH 1/7] Make unpack_trees_options bit flags actual bitfields Johannes Schindelin
2008-03-22 17:47     ` Linus Torvalds
2008-03-22 17:57       ` Johannes Schindelin
2008-03-22 22:01 ` [PATCH] t0050: Set core.ignorecase case to activate case insensitivity Steffen Prohaska
2008-03-25  6:57 ` [PATCH] git-init: autodetect core.ignorecase Dmitry Potapov
2008-03-25  9:59   ` Johannes Schindelin
2008-03-25 10:49     ` [PATCH v2] " Dmitry Potapov
2008-03-25 11:03     ` [PATCH] " Dmitry Potapov
2008-03-25  8:14 ` [PATCH 0/7] Case-insensitive filesystem support, take 1 Dmitry Potapov
2008-03-25 21:04   ` Linus Torvalds
2008-03-26  2:46     ` Dmitry Potapov
2008-03-26  3:37       ` Linus Torvalds
2008-03-25 11:39 ` Derek Fawcus
2008-03-25 18:26   ` Jan Hudec

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).