git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 1/5] Add 'df_name_compare()' helper function
  2008-03-06  4:28 [PATCH 0/5] Split-up "unpack_trees()" cleanup series Linus Torvalds
@ 2008-03-06  2:25 ` Linus Torvalds
  2008-03-06 13:03   ` David Kastrup
  2008-03-06  2:59 ` [PATCH 2/5] Make 'traverse_tree()' use linked structure rather than 'const char *base' Linus Torvalds
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2008-03-06  2:25 UTC (permalink / raw)


This new helper is identical to base_name_compare(), except it compares
conflicting directory/file entries as equal in order to help handling DF
conflicts (thus the name).

Note that while a directory name compares as equal to a regular file
with the new helper, they then individually compare _differently_ to a
filename that has a dot after the basename (because '\0' < '.' < '/').

So a directory called "foo/" will compare equal to a file "foo", even
though "foo.c" will compare after "foo" and before "foo/"

This will be used by routines that want to traverse the git namespace
but then handle conflicting entries together when possible.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 cache.h      |    1 +
 read-cache.c |   35 +++++++++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/cache.h b/cache.h
index e230302..6eb16cb 100644
--- a/cache.h
+++ b/cache.h
@@ -536,6 +536,7 @@ extern int create_symref(const char *ref, const char *refs_heads_master, const c
 extern int validate_headref(const char *ref);
 
 extern int base_name_compare(const char *name1, int len1, int mode1, const char *name2, int len2, int mode2);
+extern int df_name_compare(const char *name1, int len1, int mode1, const char *name2, int len2, int mode2);
 extern int cache_name_compare(const char *name1, int len1, const char *name2, int len2);
 
 extern void *read_object_with_reference(const unsigned char *sha1,
diff --git a/read-cache.c b/read-cache.c
index 657f0c5..bf649a3 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -351,6 +351,41 @@ int base_name_compare(const char *name1, int len1, int mode1,
 	return (c1 < c2) ? -1 : (c1 > c2) ? 1 : 0;
 }
 
+/*
+ * df_name_compare() is identical to base_name_compare(), except it
+ * compares conflicting directory/file entries as equal. Note that
+ * while a directory name compares as equal to a regular file, they
+ * then individually compare _differently_ to a filename that has
+ * a dot after the basename (because '\0' < '.' < '/').
+ *
+ * This is used by routines that want to traverse the git namespace
+ * but then handle conflicting entries together when possible.
+ */
+int df_name_compare(const char *name1, int len1, int mode1,
+		    const char *name2, int len2, int mode2)
+{
+	int len = len1 < len2 ? len1 : len2, cmp;
+	unsigned char c1, c2;
+
+	cmp = memcmp(name1, name2, len);
+	if (cmp)
+		return cmp;
+	/* Directories and files compare equal (same length, same name) */
+	if (len1 == len2)
+		return 0;
+	c1 = name1[len];
+	if (!c1 && S_ISDIR(mode1))
+		c1 = '/';
+	c2 = name2[len];
+	if (!c2 && S_ISDIR(mode2))
+		c2 = '/';
+	if (c1 == '/' && !c2)
+		return 0;
+	if (c2 == '/' && !c1)
+		return 0;
+	return c1 - c2;
+}
+
 int cache_name_compare(const char *name1, int flags1, const char *name2, int flags2)
 {
 	int len1 = flags1 & CE_NAMEMASK;
-- 
1.5.4.3.452.g67136



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/5] Make 'traverse_tree()' use linked structure rather than 'const char *base'
  2008-03-06  4:28 [PATCH 0/5] Split-up "unpack_trees()" cleanup series Linus Torvalds
  2008-03-06  2:25 ` [PATCH 1/5] Add 'df_name_compare()' helper function Linus Torvalds
@ 2008-03-06  2:59 ` Linus Torvalds
  2008-03-06  3:44 ` [PATCH 3/5] Add return value to 'traverse_tree()' callback Linus Torvalds
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Linus Torvalds @ 2008-03-06  2:59 UTC (permalink / raw)


This makes the calling convention a bit less obvious, but a lot more
flexible.  Instead of allocating and extending a new 'base' string, we
just link the top-most name into a linked list of the 'info' structure
when traversing a subdirectory, and we can generate the basename by
following the list.

Perhaps even more importantly, the linked list of info structures also
gives us a place to naturally save off other information than just the
directory name.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 merge-tree.c |   51 +++++++++++++++++++++++++++------------------------
 tree-walk.c  |   35 +++++++++++++++++++++++++++++++++--
 tree-walk.h  |   20 ++++++++++++++++++--
 3 files changed, 78 insertions(+), 28 deletions(-)

diff --git a/merge-tree.c b/merge-tree.c
index e083246..a3511b7 100644
--- a/merge-tree.c
+++ b/merge-tree.c
@@ -168,7 +168,13 @@ static struct merge_list *create_entry(unsigned stage, unsigned mode, const unsi
 	return res;
 }
 
-static void resolve(const char *base, struct name_entry *branch1, struct name_entry *result)
+static char *traverse_path(const struct traverse_info *info, const struct name_entry *n)
+{
+	char *path = xmalloc(traverse_path_len(info, n) + 1);
+	return make_traverse_path(path, info, n);
+}
+
+static void resolve(const struct traverse_info *info, struct name_entry *branch1, struct name_entry *result)
 {
 	struct merge_list *orig, *final;
 	const char *path;
@@ -177,7 +183,7 @@ static void resolve(const char *base, struct name_entry *branch1, struct name_en
 	if (!branch1)
 		return;
 
-	path = xstrdup(mkpath("%s%s", base, result->path));
+	path = traverse_path(info, result);
 	orig = create_entry(2, branch1->mode, branch1->sha1, path);
 	final = create_entry(0, result->mode, result->sha1, path);
 
@@ -186,9 +192,8 @@ static void resolve(const char *base, struct name_entry *branch1, struct name_en
 	add_merge_entry(final);
 }
 
-static int unresolved_directory(const char *base, struct name_entry n[3])
+static int unresolved_directory(const struct traverse_info *info, struct name_entry n[3])
 {
-	int baselen, pathlen;
 	char *newbase;
 	struct name_entry *p;
 	struct tree_desc t[3];
@@ -204,13 +209,7 @@ static int unresolved_directory(const char *base, struct name_entry n[3])
 	}
 	if (!S_ISDIR(p->mode))
 		return 0;
-	baselen = strlen(base);
-	pathlen = tree_entry_len(p->path, p->sha1);
-	newbase = xmalloc(baselen + pathlen + 2);
-	memcpy(newbase, base, baselen);
-	memcpy(newbase + baselen, p->path, pathlen);
-	memcpy(newbase + baselen + pathlen, "/", 2);
-
+	newbase = traverse_path(info, p);
 	buf0 = fill_tree_descriptor(t+0, n[0].sha1);
 	buf1 = fill_tree_descriptor(t+1, n[1].sha1);
 	buf2 = fill_tree_descriptor(t+2, n[2].sha1);
@@ -224,7 +223,7 @@ static int unresolved_directory(const char *base, struct name_entry n[3])
 }
 
 
-static struct merge_list *link_entry(unsigned stage, const char *base, struct name_entry *n, struct merge_list *entry)
+static struct merge_list *link_entry(unsigned stage, const struct traverse_info *info, struct name_entry *n, struct merge_list *entry)
 {
 	const char *path;
 	struct merge_list *link;
@@ -234,17 +233,17 @@ static struct merge_list *link_entry(unsigned stage, const char *base, struct na
 	if (entry)
 		path = entry->path;
 	else
-		path = xstrdup(mkpath("%s%s", base, n->path));
+		path = traverse_path(info, n);
 	link = create_entry(stage, n->mode, n->sha1, path);
 	link->link = entry;
 	return link;
 }
 
-static void unresolved(const char *base, struct name_entry n[3])
+static void unresolved(const struct traverse_info *info, struct name_entry n[3])
 {
 	struct merge_list *entry = NULL;
 
-	if (unresolved_directory(base, n))
+	if (unresolved_directory(info, n))
 		return;
 
 	/*
@@ -252,9 +251,9 @@ static void unresolved(const char *base, struct name_entry n[3])
 	 * list has the stages in order - link_entry adds new
 	 * links at the front.
 	 */
-	entry = link_entry(3, base, n + 2, entry);
-	entry = link_entry(2, base, n + 1, entry);
-	entry = link_entry(1, base, n + 0, entry);
+	entry = link_entry(3, info, n + 2, entry);
+	entry = link_entry(2, info, n + 1, entry);
+	entry = link_entry(1, info, n + 0, entry);
 
 	add_merge_entry(entry);
 }
@@ -288,36 +287,40 @@ static void unresolved(const char *base, struct name_entry n[3])
  * The successful merge rules are the same as for the three-way merge
  * in git-read-tree.
  */
-static void threeway_callback(int n, unsigned long mask, struct name_entry *entry, const char *base)
+static void threeway_callback(int n, unsigned long mask, struct name_entry *entry, struct traverse_info *info)
 {
 	/* Same in both? */
 	if (same_entry(entry+1, entry+2)) {
 		if (entry[0].sha1) {
-			resolve(base, NULL, entry+1);
+			resolve(info, NULL, entry+1);
 			return;
 		}
 	}
 
 	if (same_entry(entry+0, entry+1)) {
 		if (entry[2].sha1 && !S_ISDIR(entry[2].mode)) {
-			resolve(base, entry+1, entry+2);
+			resolve(info, entry+1, entry+2);
 			return;
 		}
 	}
 
 	if (same_entry(entry+0, entry+2)) {
 		if (entry[1].sha1 && !S_ISDIR(entry[1].mode)) {
-			resolve(base, NULL, entry+1);
+			resolve(info, NULL, entry+1);
 			return;
 		}
 	}
 
-	unresolved(base, entry);
+	unresolved(info, entry);
 }
 
 static void merge_trees(struct tree_desc t[3], const char *base)
 {
-	traverse_trees(3, t, base, threeway_callback);
+	struct traverse_info info;
+
+	setup_traverse_info(&info, base);
+	info.fn = threeway_callback;
+	traverse_trees(3, t, &info);
 }
 
 static void *get_tree_descriptor(struct tree_desc *desc, const char *rev)
diff --git a/tree-walk.c b/tree-walk.c
index 142205d..f9f7d22 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -104,7 +104,38 @@ int tree_entry(struct tree_desc *desc, struct name_entry *entry)
 	return 1;
 }
 
-void traverse_trees(int n, struct tree_desc *t, const char *base, traverse_callback_t callback)
+void setup_traverse_info(struct traverse_info *info, const char *base)
+{
+	int pathlen = strlen(base);
+
+	memset(info, 0, sizeof(*info));
+	if (pathlen && base[pathlen-1] == '/')
+		pathlen--;
+	info->pathlen = pathlen ? pathlen + 1 : 0;
+	info->name.path = base;
+	info->name.sha1 = (void *)(base + pathlen + 1);
+}
+
+char *make_traverse_path(char *path, const struct traverse_info *info, const struct name_entry *n)
+{
+	int len = tree_entry_len(n->path, n->sha1);
+	int pathlen = info->pathlen;
+
+	path[pathlen + len] = 0;
+	for (;;) {
+		memcpy(path + pathlen, n->path, len);
+		if (!pathlen)
+			break;
+		path[--pathlen] = '/';
+		n = &info->name;
+		len = tree_entry_len(n->path, n->sha1);
+		info = info->prev;
+		pathlen -= len;
+	}
+	return path;
+}
+
+void traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
 {
 	struct name_entry *entry = xmalloc(n*sizeof(*entry));
 
@@ -150,7 +181,7 @@ void traverse_trees(int n, struct tree_desc *t, const char *base, traverse_callb
 			}
 			entry_clear(entry + i);
 		}
-		callback(n, mask, entry, base);
+		info->fn(n, mask, entry, info);
 	}
 	free(entry);
 }
diff --git a/tree-walk.h b/tree-walk.h
index db0fbdc..7c4ae64 100644
--- a/tree-walk.h
+++ b/tree-walk.h
@@ -33,10 +33,26 @@ int tree_entry(struct tree_desc *, struct name_entry *);
 
 void *fill_tree_descriptor(struct tree_desc *desc, const unsigned char *sha1);
 
-typedef void (*traverse_callback_t)(int n, unsigned long mask, struct name_entry *entry, const char *base);
+struct traverse_info;
+typedef void (*traverse_callback_t)(int n, unsigned long mask, struct name_entry *entry, struct traverse_info *);
+void traverse_trees(int n, struct tree_desc *t, struct traverse_info *info);
 
-void traverse_trees(int n, struct tree_desc *t, const char *base, traverse_callback_t callback);
+struct traverse_info {
+	struct traverse_info *prev;
+	struct name_entry name;
+	int pathlen;
+
+	traverse_callback_t fn;
+	void *data;
+};
 
 int get_tree_entry(const unsigned char *, const char *, unsigned char *, unsigned *);
+extern char *make_traverse_path(char *path, const struct traverse_info *info, const struct name_entry *n);
+extern void setup_traverse_info(struct traverse_info *info, const char *base);
+
+static inline int traverse_path_len(const struct traverse_info *info, const struct name_entry *n)
+{
+	return info->pathlen + tree_entry_len(n->path, n->sha1);
+}
 
 #endif
-- 
1.5.4.3.452.g67136



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/5] Add return value to 'traverse_tree()' callback
  2008-03-06  4:28 [PATCH 0/5] Split-up "unpack_trees()" cleanup series Linus Torvalds
  2008-03-06  2:25 ` [PATCH 1/5] Add 'df_name_compare()' helper function Linus Torvalds
  2008-03-06  2:59 ` [PATCH 2/5] Make 'traverse_tree()' use linked structure rather than 'const char *base' Linus Torvalds
@ 2008-03-06  3:44 ` Linus Torvalds
  2008-03-06  4:06 ` [PATCH 4/5] Make 'traverse_trees()' traverse conflicting DF entries in parallel Linus Torvalds
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Linus Torvalds @ 2008-03-06  3:44 UTC (permalink / raw)


This allows the callback to return an error value, but it can also
specify which of the tree entries that it actually used up by returning
a positive mask value.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 merge-tree.c |    9 +++++----
 tree-walk.c  |   22 +++++++++++++++-------
 tree-walk.h  |    4 ++--
 3 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/merge-tree.c b/merge-tree.c
index a3511b7..8be0b9f 100644
--- a/merge-tree.c
+++ b/merge-tree.c
@@ -287,31 +287,32 @@ static void unresolved(const struct traverse_info *info, struct name_entry n[3])
  * The successful merge rules are the same as for the three-way merge
  * in git-read-tree.
  */
-static void threeway_callback(int n, unsigned long mask, struct name_entry *entry, struct traverse_info *info)
+static int threeway_callback(int n, unsigned long mask, struct name_entry *entry, struct traverse_info *info)
 {
 	/* Same in both? */
 	if (same_entry(entry+1, entry+2)) {
 		if (entry[0].sha1) {
 			resolve(info, NULL, entry+1);
-			return;
+			return mask;
 		}
 	}
 
 	if (same_entry(entry+0, entry+1)) {
 		if (entry[2].sha1 && !S_ISDIR(entry[2].mode)) {
 			resolve(info, entry+1, entry+2);
-			return;
+			return mask;
 		}
 	}
 
 	if (same_entry(entry+0, entry+2)) {
 		if (entry[1].sha1 && !S_ISDIR(entry[1].mode)) {
 			resolve(info, NULL, entry+1);
-			return;
+			return mask;
 		}
 	}
 
 	unresolved(info, entry);
+	return mask;
 }
 
 static void merge_trees(struct tree_desc t[3], const char *base)
diff --git a/tree-walk.c b/tree-walk.c
index f9f7d22..7170e37 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -135,8 +135,9 @@ char *make_traverse_path(char *path, const struct traverse_info *info, const str
 	return path;
 }
 
-void traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
+int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
 {
+	int ret = 0;
 	struct name_entry *entry = xmalloc(n*sizeof(*entry));
 
 	for (;;) {
@@ -171,19 +172,26 @@ void traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
 			break;
 
 		/*
-		 * Update the tree entries we've walked, and clear
-		 * all the unused name-entries.
+		 * Clear all the unused name-entries.
 		 */
 		for (i = 0; i < n; i++) {
-			if (mask & (1ul << i)) {
-				update_tree_entry(t+i);
+			if (mask & (1ul << i))
 				continue;
-			}
 			entry_clear(entry + i);
 		}
-		info->fn(n, mask, entry, info);
+		ret = info->fn(n, mask, entry, info);
+		if (ret < 0)
+			break;
+		if (ret)
+			mask &= ret;
+		ret = 0;
+		for (i = 0; i < n; i++) {
+			if (mask & (1ul << i))
+				update_tree_entry(t + i);
+		}
 	}
 	free(entry);
+	return ret;
 }
 
 static int find_tree_entry(struct tree_desc *t, const char *name, unsigned char *result, unsigned *mode)
diff --git a/tree-walk.h b/tree-walk.h
index 7c4ae64..c123cfe 100644
--- a/tree-walk.h
+++ b/tree-walk.h
@@ -34,8 +34,8 @@ int tree_entry(struct tree_desc *, struct name_entry *);
 void *fill_tree_descriptor(struct tree_desc *desc, const unsigned char *sha1);
 
 struct traverse_info;
-typedef void (*traverse_callback_t)(int n, unsigned long mask, struct name_entry *entry, struct traverse_info *);
-void traverse_trees(int n, struct tree_desc *t, struct traverse_info *info);
+typedef int (*traverse_callback_t)(int n, unsigned long mask, struct name_entry *entry, struct traverse_info *);
+int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info);
 
 struct traverse_info {
 	struct traverse_info *prev;
-- 
1.5.4.3.452.g67136



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/5] Make 'traverse_trees()' traverse conflicting DF entries in parallel
  2008-03-06  4:28 [PATCH 0/5] Split-up "unpack_trees()" cleanup series Linus Torvalds
                   ` (2 preceding siblings ...)
  2008-03-06  3:44 ` [PATCH 3/5] Add return value to 'traverse_tree()' callback Linus Torvalds
@ 2008-03-06  4:06 ` Linus Torvalds
  2008-03-06  4:15 ` [PATCH 5/5] Move 'unpack_trees()' over to 'traverse_trees()' interface Linus Torvalds
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: Linus Torvalds @ 2008-03-06  4:06 UTC (permalink / raw)


This makes the traverse_trees() entry comparator routine use the more
relaxed form of name comparison that considers files and directories
with the same name identical.

We pass in a separate mask for just the directory entries, so that the
callback routine can decide (if it wants to) to only handle one or the
other type, but generally most (all?) users are expected to really want
to see the case of a name 'foo' showing up in one tree as a file and in
another as a directory at the same time.

In particular, moving 'unpack_trees()' over to use this tree traversal
mechanism requires this.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 merge-tree.c |    2 +-
 tree-walk.c  |    8 ++++++--
 tree-walk.h  |    3 ++-
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/merge-tree.c b/merge-tree.c
index 8be0b9f..02fc10f 100644
--- a/merge-tree.c
+++ b/merge-tree.c
@@ -287,7 +287,7 @@ static void unresolved(const struct traverse_info *info, struct name_entry n[3])
  * The successful merge rules are the same as for the three-way merge
  * in git-read-tree.
  */
-static int threeway_callback(int n, unsigned long mask, struct name_entry *entry, struct traverse_info *info)
+static int threeway_callback(int n, unsigned long mask, unsigned long dirmask, struct name_entry *entry, struct traverse_info *info)
 {
 	/* Same in both? */
 	if (same_entry(entry+1, entry+2)) {
diff --git a/tree-walk.c b/tree-walk.c
index 7170e37..842cb6a 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -62,7 +62,7 @@ void *fill_tree_descriptor(struct tree_desc *desc, const unsigned char *sha1)
 
 static int entry_compare(struct name_entry *a, struct name_entry *b)
 {
-	return base_name_compare(
+	return df_name_compare(
 			a->path, tree_entry_len(a->path, a->sha1), a->mode,
 			b->path, tree_entry_len(b->path, b->sha1), b->mode);
 }
@@ -142,6 +142,7 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
 
 	for (;;) {
 		unsigned long mask = 0;
+		unsigned long dirmask = 0;
 		int i, last;
 
 		last = -1;
@@ -166,10 +167,13 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
 					mask = 0;
 			}
 			mask |= 1ul << i;
+			if (S_ISDIR(entry[i].mode))
+				dirmask |= 1ul << i;
 			last = i;
 		}
 		if (!mask)
 			break;
+		dirmask &= mask;
 
 		/*
 		 * Clear all the unused name-entries.
@@ -179,7 +183,7 @@ int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info)
 				continue;
 			entry_clear(entry + i);
 		}
-		ret = info->fn(n, mask, entry, info);
+		ret = info->fn(n, mask, dirmask, entry, info);
 		if (ret < 0)
 			break;
 		if (ret)
diff --git a/tree-walk.h b/tree-walk.h
index c123cfe..42110a4 100644
--- a/tree-walk.h
+++ b/tree-walk.h
@@ -34,7 +34,7 @@ int tree_entry(struct tree_desc *, struct name_entry *);
 void *fill_tree_descriptor(struct tree_desc *desc, const unsigned char *sha1);
 
 struct traverse_info;
-typedef int (*traverse_callback_t)(int n, unsigned long mask, struct name_entry *entry, struct traverse_info *);
+typedef int (*traverse_callback_t)(int n, unsigned long mask, unsigned long dirmask, struct name_entry *entry, struct traverse_info *);
 int traverse_trees(int n, struct tree_desc *t, struct traverse_info *info);
 
 struct traverse_info {
@@ -42,6 +42,7 @@ struct traverse_info {
 	struct name_entry name;
 	int pathlen;
 
+	unsigned long conflicts;
 	traverse_callback_t fn;
 	void *data;
 };
-- 
1.5.4.3.452.g67136



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/5] Move 'unpack_trees()' over to 'traverse_trees()' interface
  2008-03-06  4:28 [PATCH 0/5] Split-up "unpack_trees()" cleanup series Linus Torvalds
                   ` (3 preceding siblings ...)
  2008-03-06  4:06 ` [PATCH 4/5] Make 'traverse_trees()' traverse conflicting DF entries in parallel Linus Torvalds
@ 2008-03-06  4:15 ` Linus Torvalds
  2008-03-06  4:51 ` [PATCH 0/5] Split-up "unpack_trees()" cleanup series Linus Torvalds
  2008-03-07  0:13 ` Daniel Barkalow
  6 siblings, 0 replies; 12+ messages in thread
From: Linus Torvalds @ 2008-03-06  4:15 UTC (permalink / raw)


This not only deletes more code than it adds, it gets rid of a
singularly hard-to-understand function (unpack_trees_rec()), and
replaces it with a set of smaller and simpler functions that use the
generic tree traversal mechanism to walk over one or more git trees in
parallel.

It's still not the most wonderful interface, and by no means is the new
code easy to understand either, but it's at least a bit less opaque.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 unpack-trees.c |  530 ++++++++++++++++++++++++++------------------------------
 1 files changed, 249 insertions(+), 281 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 3e448d8..ee9be29 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -7,270 +7,12 @@
 #include "progress.h"
 #include "refs.h"
 
-#define DBRT_DEBUG 1
-
-struct tree_entry_list {
-	struct tree_entry_list *next;
-	unsigned int mode;
-	const char *name;
-	const unsigned char *sha1;
-};
-
-static struct tree_entry_list *create_tree_entry_list(struct tree_desc *desc)
-{
-	struct name_entry one;
-	struct tree_entry_list *ret = NULL;
-	struct tree_entry_list **list_p = &ret;
-
-	while (tree_entry(desc, &one)) {
-		struct tree_entry_list *entry;
-
-		entry = xmalloc(sizeof(struct tree_entry_list));
-		entry->name = one.path;
-		entry->sha1 = one.sha1;
-		entry->mode = one.mode;
-		entry->next = NULL;
-
-		*list_p = entry;
-		list_p = &entry->next;
-	}
-	return ret;
-}
-
-static int entcmp(const char *name1, int dir1, const char *name2, int dir2)
-{
-	int len1 = strlen(name1);
-	int len2 = strlen(name2);
-	int len = len1 < len2 ? len1 : len2;
-	int ret = memcmp(name1, name2, len);
-	unsigned char c1, c2;
-	if (ret)
-		return ret;
-	c1 = name1[len];
-	c2 = name2[len];
-	if (!c1 && dir1)
-		c1 = '/';
-	if (!c2 && dir2)
-		c2 = '/';
-	ret = (c1 < c2) ? -1 : (c1 > c2) ? 1 : 0;
-	if (c1 && c2 && !ret)
-		ret = len1 - len2;
-	return ret;
-}
-
 static inline void remove_entry(int remove)
 {
 	if (remove >= 0)
 		remove_cache_entry_at(remove);
 }
 
-static int unpack_trees_rec(struct tree_entry_list **posns, int len,
-			    const char *base, struct unpack_trees_options *o,
-			    struct tree_entry_list *df_conflict_list)
-{
-	int remove;
-	int baselen = strlen(base);
-	int src_size = len + 1;
-	int retval = 0;
-
-	do {
-		int i;
-		const char *first;
-		int firstdir = 0;
-		int pathlen;
-		unsigned ce_size;
-		struct tree_entry_list **subposns;
-		struct cache_entry **src;
-		int any_files = 0;
-		int any_dirs = 0;
-		char *cache_name;
-		int ce_stage;
-		int skip_entry = 0;
-
-		/* Find the first name in the input. */
-
-		first = NULL;
-		cache_name = NULL;
-
-		/* Check the cache */
-		if (o->merge && o->pos < active_nr) {
-			/* This is a bit tricky: */
-			/* If the index has a subdirectory (with
-			 * contents) as the first name, it'll get a
-			 * filename like "foo/bar". But that's after
-			 * "foo", so the entry in trees will get
-			 * handled first, at which point we'll go into
-			 * "foo", and deal with "bar" from the index,
-			 * because the base will be "foo/". The only
-			 * way we can actually have "foo/bar" first of
-			 * all the things is if the trees don't
-			 * contain "foo" at all, in which case we'll
-			 * handle "foo/bar" without going into the
-			 * directory, but that's fine (and will return
-			 * an error anyway, with the added unknown
-			 * file case.
-			 */
-
-			cache_name = active_cache[o->pos]->name;
-			if (strlen(cache_name) > baselen &&
-			    !memcmp(cache_name, base, baselen)) {
-				cache_name += baselen;
-				first = cache_name;
-			} else {
-				cache_name = NULL;
-			}
-		}
-
-#if DBRT_DEBUG > 1
-		if (first)
-			fprintf(stderr, "index %s\n", first);
-#endif
-		for (i = 0; i < len; i++) {
-			if (!posns[i] || posns[i] == df_conflict_list)
-				continue;
-#if DBRT_DEBUG > 1
-			fprintf(stderr, "%d %s\n", i + 1, posns[i]->name);
-#endif
-			if (!first || entcmp(first, firstdir,
-					     posns[i]->name,
-					     S_ISDIR(posns[i]->mode)) > 0) {
-				first = posns[i]->name;
-				firstdir = S_ISDIR(posns[i]->mode);
-			}
-		}
-		/* No name means we're done */
-		if (!first)
-			goto leave_directory;
-
-		pathlen = strlen(first);
-		ce_size = cache_entry_size(baselen + pathlen);
-
-		src = xcalloc(src_size, sizeof(struct cache_entry *));
-
-		subposns = xcalloc(len, sizeof(struct tree_list_entry *));
-
-		remove = -1;
-		if (cache_name && !strcmp(cache_name, first)) {
-			any_files = 1;
-			src[0] = active_cache[o->pos];
-			remove = o->pos;
-			if (o->skip_unmerged && ce_stage(src[0]))
-				skip_entry = 1;
-		}
-
-		for (i = 0; i < len; i++) {
-			struct cache_entry *ce;
-
-			if (!posns[i] ||
-			    (posns[i] != df_conflict_list &&
-			     strcmp(first, posns[i]->name))) {
-				continue;
-			}
-
-			if (posns[i] == df_conflict_list) {
-				src[i + o->merge] = o->df_conflict_entry;
-				continue;
-			}
-
-			if (S_ISDIR(posns[i]->mode)) {
-				struct tree *tree = lookup_tree(posns[i]->sha1);
-				struct tree_desc t;
-				any_dirs = 1;
-				parse_tree(tree);
-				init_tree_desc(&t, tree->buffer, tree->size);
-				subposns[i] = create_tree_entry_list(&t);
-				posns[i] = posns[i]->next;
-				src[i + o->merge] = o->df_conflict_entry;
-				continue;
-			}
-
-			if (skip_entry) {
-				subposns[i] = df_conflict_list;
-				posns[i] = posns[i]->next;
-				continue;
-			}
-
-			if (!o->merge)
-				ce_stage = 0;
-			else if (i + 1 < o->head_idx)
-				ce_stage = 1;
-			else if (i + 1 > o->head_idx)
-				ce_stage = 3;
-			else
-				ce_stage = 2;
-
-			ce = xcalloc(1, ce_size);
-			ce->ce_mode = create_ce_mode(posns[i]->mode);
-			ce->ce_flags = create_ce_flags(baselen + pathlen,
-						       ce_stage);
-			memcpy(ce->name, base, baselen);
-			memcpy(ce->name + baselen, first, pathlen + 1);
-
-			any_files = 1;
-
-			hashcpy(ce->sha1, posns[i]->sha1);
-			src[i + o->merge] = ce;
-			subposns[i] = df_conflict_list;
-			posns[i] = posns[i]->next;
-		}
-		if (any_files) {
-			if (skip_entry) {
-				o->pos++;
-				while (o->pos < active_nr &&
-				       !strcmp(active_cache[o->pos]->name,
-					       src[0]->name))
-					o->pos++;
-			} else if (o->merge) {
-				int ret;
-
-#if DBRT_DEBUG > 1
-				fprintf(stderr, "%s:\n", first);
-				for (i = 0; i < src_size; i++) {
-					fprintf(stderr, " %d ", i);
-					if (src[i])
-						fprintf(stderr, "%06x %s\n", src[i]->ce_mode, sha1_to_hex(src[i]->sha1));
-					else
-						fprintf(stderr, "\n");
-				}
-#endif
-				ret = o->fn(src, o, remove);
-				if (ret < 0)
-					return ret;
-
-#if DBRT_DEBUG > 1
-				fprintf(stderr, "Added %d entries\n", ret);
-#endif
-				o->pos += ret;
-			} else {
-				remove_entry(remove);
-				for (i = 0; i < src_size; i++) {
-					if (src[i]) {
-						add_cache_entry(src[i], ADD_CACHE_OK_TO_ADD|ADD_CACHE_SKIP_DFCHECK);
-					}
-				}
-			}
-		}
-		if (any_dirs) {
-			char *newbase = xmalloc(baselen + 2 + pathlen);
-			memcpy(newbase, base, baselen);
-			memcpy(newbase + baselen, first, pathlen);
-			newbase[baselen + pathlen] = '/';
-			newbase[baselen + pathlen + 1] = '\0';
-			if (unpack_trees_rec(subposns, len, newbase, o,
-					     df_conflict_list)) {
-				retval = -1;
-				goto leave_directory;
-			}
-			free(newbase);
-		}
-		free(subposns);
-		free(src);
-	} while (1);
-
- leave_directory:
-	return retval;
-}
-
 /* Unlink the last component and attempt to remove leading
  * directories, in case this unlink is the removal of the
  * last entry in the directory -- empty directories are removed.
@@ -346,15 +88,241 @@ static void check_updates(struct unpack_trees_options *o)
 	stop_progress(&progress);
 }
 
-int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options *o)
+static inline int call_unpack_fn(struct cache_entry **src, struct unpack_trees_options *o, int remove)
+{
+	int ret = o->fn(src, o, remove);
+	if (ret > 0) {
+		o->pos += ret;
+		ret = 0;
+	}
+	return ret;
+}
+
+static int unpack_index_entry(struct cache_entry *ce, struct unpack_trees_options *o)
+{
+	struct cache_entry *src[5] = { ce, };
+	if (ce_stage(ce)) {
+		if (o->skip_unmerged) {
+			o->pos++;
+		} else {
+			remove_entry(o->pos);
+		}
+		return 0;
+	}
+	return call_unpack_fn(src, o, o->pos);
+}
+
+int traverse_trees_recursive(int n, unsigned long dirmask, unsigned long df_conflicts, struct name_entry *names, struct traverse_info *info)
+{
+	int i;
+	struct tree_desc t[3];
+	struct traverse_info newinfo;
+	struct name_entry *p;
+
+	p = names;
+	while (!p->mode)
+		p++;
+
+	newinfo = *info;
+	newinfo.prev = info;
+	newinfo.name = *p;
+	newinfo.pathlen += tree_entry_len(p->path, p->sha1) + 1;
+	newinfo.conflicts |= df_conflicts;
+
+	for (i = 0; i < n; i++, dirmask >>= 1) {
+		const unsigned char *sha1 = NULL;
+		if (dirmask & 1)
+			sha1 = names[i].sha1;
+		fill_tree_descriptor(t+i, sha1);
+	}
+	traverse_trees(n, t, &newinfo);
+	return 0;
+}
+
+/*
+ * Compare the traverse-path to the cache entry without actually
+ * having to generate the textual representation of the traverse
+ * path.
+ *
+ * NOTE! This *only* compares up to the size of the traverse path
+ * itself - the caller needs to do the final check for the cache
+ * entry having more data at the end!
+ */
+static int do_compare_entry(const struct cache_entry *ce, const struct traverse_info *info, const struct name_entry *n)
+{
+	int len, pathlen, ce_len;
+	const char *ce_name;
+
+	if (info->prev) {
+		int cmp = do_compare_entry(ce, info->prev, &info->name);
+		if (cmp)
+			return cmp;
+	}
+	pathlen = info->pathlen;
+	ce_len = ce_namelen(ce);
+
+	/* If ce_len < pathlen then we must have previously hit "name == directory" entry */
+	if (ce_len < pathlen)
+		return -1;
+
+	ce_len -= pathlen;
+	ce_name = ce->name + pathlen;
+
+	len = tree_entry_len(n->path, n->sha1);
+	return df_name_compare(ce_name, ce_len, S_IFREG, n->path, len, n->mode);
+}
+
+static int compare_entry(const struct cache_entry *ce, const struct traverse_info *info, const struct name_entry *n)
+{
+	int cmp = do_compare_entry(ce, info, n);
+	if (cmp)
+		return cmp;
+
+	/*
+	 * Even if the beginning compared identically, the ce should
+	 * compare as bigger than a directory leading up to it!
+	 */
+	return ce_namelen(ce) > traverse_path_len(info, n);
+}
+
+static struct cache_entry *create_ce_entry(const struct traverse_info *info, const struct name_entry *n, int stage)
+{
+	int len = traverse_path_len(info, n);
+	struct cache_entry *ce = xcalloc(1, cache_entry_size(len));
+
+	ce->ce_mode = create_ce_mode(n->mode);
+	ce->ce_flags = create_ce_flags(len, stage);
+	hashcpy(ce->sha1, n->sha1);
+	make_traverse_path(ce->name, info, n);
+
+	return ce;
+}
+
+static int unpack_nondirectories(int n, unsigned long mask, unsigned long dirmask, struct cache_entry *src[5],
+	const struct name_entry *names, const struct traverse_info *info, int remove)
 {
-	struct tree_entry_list **posns;
 	int i;
-	struct tree_entry_list df_conflict_list;
+	struct unpack_trees_options *o = info->data;
+	unsigned long conflicts;
+
+	/* Do we have *only* directories? Nothing to do */
+	if (mask == dirmask && !src[0])
+		return 0;
+
+	conflicts = info->conflicts;
+	if (o->merge)
+		conflicts >>= 1;
+	conflicts |= dirmask;
+
+	/*
+	 * Ok, we've filled in up to any potential index entry in src[0],
+	 * now do the rest.
+	 */
+	for (i = 0; i < n; i++) {
+		int stage;
+		unsigned int bit = 1ul << i;
+		if (conflicts & bit) {
+			src[i + o->merge] = o->df_conflict_entry;
+			continue;
+		}
+		if (!(mask & bit))
+			continue;
+		if (!o->merge)
+			stage = 0;
+		else if (i + 1 < o->head_idx)
+			stage = 1;
+		else if (i + 1 > o->head_idx)
+			stage = 3;
+		else
+			stage = 2;
+		src[i + o->merge] = create_ce_entry(info, names + i, stage);
+	}
+
+	if (o->merge)
+		return call_unpack_fn(src, o, remove);
+
+	n += o->merge;
+	remove_entry(remove);
+	for (i = 0; i < n; i++)
+		add_cache_entry(src[i], ADD_CACHE_OK_TO_ADD|ADD_CACHE_SKIP_DFCHECK);
+	return 0;
+}
+
+static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, struct name_entry *names, struct traverse_info *info)
+{
+	struct cache_entry *src[5] = { NULL, };
+	struct unpack_trees_options *o = info->data;
+	int remove = -1;
+	const struct name_entry *p = names;
+
+	/* Find first entry with a real name (we could use "mask" too) */
+	while (!p->mode)
+		p++;
+
+	/* Are we supposed to look at the index too? */
+	if (o->merge) {
+		while (o->pos < active_nr) {
+			struct cache_entry *ce = active_cache[o->pos];
+			int cmp = compare_entry(ce, info, p);
+			if (cmp < 0) {
+				if (unpack_index_entry(ce, o) < 0)
+					return -1;
+				continue;
+			}
+			if (!cmp) {
+				if (ce_stage(ce)) {
+					/*
+					 * If we skip unmerged index entries, we'll skip this
+					 * entry *and* the tree entries associated with it!
+					 */
+					if (o->skip_unmerged)
+						return mask;
+					remove_entry(o->pos);
+					continue;
+				}
+				src[0] = ce;
+				remove = o->pos;
+			}
+			break;
+		}
+	}
+
+	if (unpack_nondirectories(n, mask, dirmask, src, names, info, remove) < 0)
+		return -1;
+
+	/* Now handle any directories.. */
+	if (dirmask) {
+		unsigned long conflicts = mask & ~dirmask;
+		if (o->merge) {
+			conflicts <<= 1;
+			if (src[0])
+				conflicts |= 1;
+		}
+		traverse_trees_recursive(n, dirmask, conflicts, names, info);
+		return mask;
+	}
+
+	return mask;
+}
+
+static int unpack_failed(struct unpack_trees_options *o, const char *message)
+{
+	if (!o->gently) {
+		if (message)
+			return error(message);
+		return -1;
+	}
+	discard_cache();
+	read_cache();
+	return -1;
+}
+
+int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options *o)
+{
 	static struct cache_entry *dfc;
 
-	memset(&df_conflict_list, 0, sizeof(df_conflict_list));
-	df_conflict_list.next = &df_conflict_list;
+	if (len > 4)
+		die("unpack_trees takes at most four trees");
 	memset(&state, 0, sizeof(state));
 	state.base_dir = "";
 	state.force = 1;
@@ -368,29 +336,29 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	o->df_conflict_entry = dfc;
 
 	if (len) {
-		posns = xmalloc(len * sizeof(struct tree_entry_list *));
-		for (i = 0; i < len; i++)
-			posns[i] = create_tree_entry_list(t+i);
-
-		if (unpack_trees_rec(posns, len, o->prefix ? o->prefix : "",
-				     o, &df_conflict_list)) {
-			if (o->gently) {
-				discard_cache();
-				read_cache();
-			}
-			return -1;
-		}
+		const char *prefix = o->prefix ? o->prefix : "";
+		struct traverse_info info;
+
+		setup_traverse_info(&info, prefix);
+		info.fn = unpack_callback;
+		info.data = o;
+
+		if (traverse_trees(len, t, &info) < 0)
+			return unpack_failed(o, NULL);
 	}
 
-	if (o->trivial_merges_only && o->nontrivial_merge) {
-		if (o->gently) {
-			discard_cache();
-			read_cache();
+	/* Any left-over entries in the index? */
+	if (o->merge) {
+		while (o->pos < active_nr) {
+			struct cache_entry *ce = active_cache[o->pos];
+			if (unpack_index_entry(ce, o) < 0)
+				return unpack_failed(o, NULL);
 		}
-		return o->gently ? -1 :
-			error("Merge requires file-level merging");
 	}
 
+	if (o->trivial_merges_only && o->nontrivial_merge)
+		return unpack_failed(o, "Merge requires file-level merging");
+
 	check_updates(o);
 	return 0;
 }
-- 
1.5.4.3.452.g67136


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 0/5] Split-up "unpack_trees()" cleanup series
@ 2008-03-06  4:28 Linus Torvalds
  2008-03-06  2:25 ` [PATCH 1/5] Add 'df_name_compare()' helper function Linus Torvalds
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: Linus Torvalds @ 2008-03-06  4:28 UTC (permalink / raw)


Ok, here's the patch I sent out earlier, except now it's been split up
into a series fo five patches (and has some trivial one-liner cleanups
here and there, mainly to shrink the patches a bit)

It's a series of five patches:
 -  Add 'df_name_compare()' helper function
 - Make 'traverse_tree()' use linked structure rather than 'const char *base'
 - Add return value to 'traverse_tree()' callback
 - Make 'traverse_trees()' traverse conflicting DF entries in parallel
 - Move 'unpack_trees()' over to 'traverse_trees()' interface

with a more lines added than removed:

 cache.h        |    1 +
 merge-tree.c   |   58 ++++---
 read-cache.c   |   35 ++++
 tree-walk.c    |   59 ++++++-
 tree-walk.h    |   23 ++-
 unpack-trees.c |  530 ++++++++++++++++++++++++++------------------------------
 6 files changed, 387 insertions(+), 319 deletions(-)

but especially in this new split-up format, it's now more possible to
see how the last patch actually simplifies things. 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] Split-up "unpack_trees()" cleanup series
  2008-03-06  4:28 [PATCH 0/5] Split-up "unpack_trees()" cleanup series Linus Torvalds
                   ` (4 preceding siblings ...)
  2008-03-06  4:15 ` [PATCH 5/5] Move 'unpack_trees()' over to 'traverse_trees()' interface Linus Torvalds
@ 2008-03-06  4:51 ` Linus Torvalds
  2008-03-07  0:13 ` Daniel Barkalow
  6 siblings, 0 replies; 12+ messages in thread
From: Linus Torvalds @ 2008-03-06  4:51 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List



On Wed, 5 Mar 2008, Linus Torvalds wrote:
>
> Ok, here's the patch I sent out earlier, except now it's been split up
> into a series fo five patches (and has some trivial one-liner cleanups
> here and there, mainly to shrink the patches a bit)

Oh, and sorry about the 

	To: undisclosed-recipients:  ;

thing - I thought I'd just bounce the messages from the mbox I did with 
git-format-patch, but since those things didn't have any "To:" fields, 
that resulted in some rather ugly email headers.

My bad.

		Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/5] Add 'df_name_compare()' helper function
  2008-03-06  2:25 ` [PATCH 1/5] Add 'df_name_compare()' helper function Linus Torvalds
@ 2008-03-06 13:03   ` David Kastrup
  2008-03-06 15:58     ` Linus Torvalds
  0 siblings, 1 reply; 12+ messages in thread
From: David Kastrup @ 2008-03-06 13:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> This new helper is identical to base_name_compare(), except it compares
> conflicting directory/file entries as equal in order to help handling DF
> conflicts (thus the name).
>
> Note that while a directory name compares as equal to a regular file
> with the new helper, they then individually compare _differently_ to a
> filename that has a dot after the basename (because '\0' < '.' < '/').

Uh, that screams just bloody murder at me with regard to working on
sorted material.  You'll never even get into the situation where the
conflicting "equal" entries will be compared if you presorted them with
something in between.  Consider the case of a merge of

A:
blubb
blubb.hi

B:
blubb.hi
blubb/

Any traversal that is done reasonably efficiently will never compare
blubb to blubb/ and I don't see how to get around this.

Basically, I think you need a special traversal routine.  This routine
will push all non-directory entries during a parallel traverse into a
might-conflict-queue, appending a slash in the course.  The front of
this queue then gets compared with the next processed entry.  If it is
larger, it is kept at the front, if it is equal, the conflict gets
treated/resolved, if it is smaller, it gets popped (since it can't
conflict anymore).

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/5] Add 'df_name_compare()' helper function
  2008-03-06 13:03   ` David Kastrup
@ 2008-03-06 15:58     ` Linus Torvalds
  2008-03-06 21:50       ` David Kastrup
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2008-03-06 15:58 UTC (permalink / raw)
  To: David Kastrup; +Cc: git



On Thu, 6 Mar 2008, David Kastrup wrote:
> 
> Uh, that screams just bloody murder at me with regard to working on
> sorted material.  You'll never even get into the situation where the
> conflicting "equal" entries will be compared if you presorted them with
> something in between.

And you really can't.

> Consider the case of a merge of
> 
> A:
> blubb
> blubb.hi
> 
> B:
> blubb.hi
> blubb/
> 
> Any traversal that is done reasonably efficiently will never compare
> blubb to blubb/ and I don't see how to get around this.

Correct. There _is_ no sort order that will find the conflict in a single 
pass, yet get the sorting right for this case, because there is no sort 
that can satisfy both the "get DF conflicts together" _and_ the "keep 
things in the right order" requirements.

Btw, the old code was no better (and was probably worse) - it used 
"strcmp()" to order the things, so it effectively did the same thing, it 
just wasn't as obvious about it (and got the case of "foo/" vs "foo." 
entirely wrong). The new code makes this hack very explicit.

> Basically, I think you need a special traversal routine.

Yes, we need to handle it in two passes. Which is actually hopefully not 
all that hard, but it is totally impossible (at least for me) with the old 
code that was so hard to follow.

So what I did was to rewrite the code so that it can be followed and 
maintained, and now my plan is to simply put the result in a separate 
index. 

We've always really wanted to do that *anyway* (right now we have this 
"discard and re-read index" cruft in the callers and in the error cases to 
get back the index we overwrote when merging!), and I bet we could have 
done that with the old code too, but _I_ couldn't have done it.

The problem with having a single index as both source and destination is 
that you cannot do a two-phase thing, because you are basically destroying 
one of your sources in your first phase (or put another way: in the second 
phase, you don't really know whether the index entry you now find was 
there to begin with, of whether you just generated it yourself earlier).

So what that series of five patches do is to not fix a single bug, but to 
make the thing a bit easier for me to look at (and I think it's good for 
other reasons too - one less arbitrary tree traversal function!)

I'm hoping that people can look at the patches and test them, and say 
"yeah, that looks like it does the same thing we used to", and then when I 
get the energy (hopefully later today) I'll just make it take a separate 
source and destination index.

		Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/5] Add 'df_name_compare()' helper function
  2008-03-06 15:58     ` Linus Torvalds
@ 2008-03-06 21:50       ` David Kastrup
  0 siblings, 0 replies; 12+ messages in thread
From: David Kastrup @ 2008-03-06 21:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Thu, 6 Mar 2008, David Kastrup wrote:
>
>> Consider the case of a merge of
>> 
>> A:
>> blubb
>> blubb.hi
>> 
>> B:
>> blubb.hi
>> blubb/
>> 
>> Any traversal that is done reasonably efficiently will never compare
>> blubb to blubb/ and I don't see how to get around this.
>
> Correct. There _is_ no sort order that will find the conflict in a
> single pass,

[...]

>> Basically, I think you need a special traversal routine.
>
> Yes, we need to handle it in two passes. Which is actually hopefully
> not all that hard, but it is totally impossible (at least for me) with
> the old code that was so hard to follow.

Well, as I said: a single pass is ok when additionally supported by a
FIFO keeping x around until x/ (or its place in the order of things)
passes by.  This will be O(1) with regards to comparisons, and typically
cheap with regard to memory requirements (things get unfriendly if there
are billions of files or even directories obeying the pattern x.*, but
only with regard to memory, not speed).

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] Split-up "unpack_trees()" cleanup series
  2008-03-06  4:28 [PATCH 0/5] Split-up "unpack_trees()" cleanup series Linus Torvalds
                   ` (5 preceding siblings ...)
  2008-03-06  4:51 ` [PATCH 0/5] Split-up "unpack_trees()" cleanup series Linus Torvalds
@ 2008-03-07  0:13 ` Daniel Barkalow
  2008-03-07  2:04   ` Linus Torvalds
  6 siblings, 1 reply; 12+ messages in thread
From: Daniel Barkalow @ 2008-03-07  0:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

On Wed, 5 Mar 2008, Linus Torvalds wrote:

> Ok, here's the patch I sent out earlier, except now it's been split up
> into a series fo five patches (and has some trivial one-liner cleanups
> here and there, mainly to shrink the patches a bit)
> 
> It's a series of five patches:
>  -  Add 'df_name_compare()' helper function
>  - Make 'traverse_tree()' use linked structure rather than 'const char *base'
>  - Add return value to 'traverse_tree()' callback
>  - Make 'traverse_trees()' traverse conflicting DF entries in parallel
>  - Move 'unpack_trees()' over to 'traverse_trees()' interface

This all looks good to me.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/5] Split-up "unpack_trees()" cleanup series
  2008-03-07  0:13 ` Daniel Barkalow
@ 2008-03-07  2:04   ` Linus Torvalds
  0 siblings, 0 replies; 12+ messages in thread
From: Linus Torvalds @ 2008-03-07  2:04 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git



On Thu, 6 Mar 2008, Daniel Barkalow wrote:
>
> On Wed, 5 Mar 2008, Linus Torvalds wrote:
> > 
> > It's a series of five patches:
> >  -  Add 'df_name_compare()' helper function
> >  - Make 'traverse_tree()' use linked structure rather than 'const char *base'
> >  - Add return value to 'traverse_tree()' callback
> >  - Make 'traverse_trees()' traverse conflicting DF entries in parallel
> >  - Move 'unpack_trees()' over to 'traverse_trees()' interface
> 
> This all looks good to me.

There's a really stupid bug in function that compares filenames using 
the the linked list structure which makes it not compare the first path 
component when using "--prefix".

So it can only hit in the case of us having a "prefix" entry that makes 
the name of the "root" info structure non-empty, and I suspect that 
because of all the other horrid crud we do for --prefix=xyzzy handling in 
builtin-read-tree.c you can't actually trigger this bug, but my other 
cleanups (which I'll send out once I've tested them a bit more) will make 
this bug trigger.

This fairly obvious patch fixes it by just making sure that we always end 
up having a ->prev entry if we have a pathname component.

			Linus

---
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Thu Mar 6 15:44:48 2008 -0800

Fix tree-walking compare_entry() in the presense of --prefix

When we make the "root" tree-walk info entry have a pathname in it, we
need to have a ->prev pointer so that compare_entry will actually notice
and traverse into the root.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 tree-walk.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/tree-walk.c b/tree-walk.c
index 842cb6a..02e2aed 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -107,6 +107,7 @@ int tree_entry(struct tree_desc *desc, struct name_entry *entry)
 void setup_traverse_info(struct traverse_info *info, const char *base)
 {
 	int pathlen = strlen(base);
+	static struct traverse_info dummy;
 
 	memset(info, 0, sizeof(*info));
 	if (pathlen && base[pathlen-1] == '/')
@@ -114,6 +115,8 @@ void setup_traverse_info(struct traverse_info *info, const char *base)
 	info->pathlen = pathlen ? pathlen + 1 : 0;
 	info->name.path = base;
 	info->name.sha1 = (void *)(base + pathlen + 1);
+	if (pathlen)
+		info->prev = &dummy;
 }
 
 char *make_traverse_path(char *path, const struct traverse_info *info, const struct name_entry *n)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-03-07  2:05 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-06  4:28 [PATCH 0/5] Split-up "unpack_trees()" cleanup series Linus Torvalds
2008-03-06  2:25 ` [PATCH 1/5] Add 'df_name_compare()' helper function Linus Torvalds
2008-03-06 13:03   ` David Kastrup
2008-03-06 15:58     ` Linus Torvalds
2008-03-06 21:50       ` David Kastrup
2008-03-06  2:59 ` [PATCH 2/5] Make 'traverse_tree()' use linked structure rather than 'const char *base' Linus Torvalds
2008-03-06  3:44 ` [PATCH 3/5] Add return value to 'traverse_tree()' callback Linus Torvalds
2008-03-06  4:06 ` [PATCH 4/5] Make 'traverse_trees()' traverse conflicting DF entries in parallel Linus Torvalds
2008-03-06  4:15 ` [PATCH 5/5] Move 'unpack_trees()' over to 'traverse_trees()' interface Linus Torvalds
2008-03-06  4:51 ` [PATCH 0/5] Split-up "unpack_trees()" cleanup series Linus Torvalds
2008-03-07  0:13 ` Daniel Barkalow
2008-03-07  2:04   ` Linus Torvalds

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).