git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/6] Initial subproject support (RFC?)
@ 2007-04-10  4:12 Linus Torvalds
       [not found] ` <Pi ne.LNX.4.64.0704092115020.6730@woody.linux-foundation.org>
                   ` (7 more replies)
  0 siblings, 8 replies; 101+ messages in thread
From: Linus Torvalds @ 2007-04-10  4:12 UTC (permalink / raw
  To: Git Mailing List, Junio C Hamano


Ok, the following is a series of six patches that implement some very 
low-level plumbing for what I consider sane subproject support.

NOTE! I want to make it very clear that this series of patches does not 
make subprojects "usable". They are very core plumbing that allows people 
to think about the issues, and shows how the low-level code could (and in 
my opinion, should) be done.

Some of the early patches are just cleanups and very basic stuff required 
to actually get to the meat of it all. I actually think that they are all 
in a state where they could be applied, if only because they don't 
actually really *do* anything unless you start generating index files 
entries (and trees) that have the "gitlink" entries in them.

I've actually done some testing with a repository that has these kinds of 
subproject pointers in them, and no, it's really not fully fleshed out 
yet, but yes, I can actually do a commit in one of the subprojects, and 
when I do that, the "raw" diff literally looks like this:

	[torvalds@woody superproject]$ git diff --raw
	:160000 160000 5813084832d3c680a3436b0253639c94ed55445d 0000000... M    sub-B

and I can do a "git commit -a" in the superproject to commit the new 
state.

NOTE! This series of six patches does not actually contain everything you 
need to do that - in particular, this series will not actually connect up 
the magic to make "git add" (and thus "git commit") actually create the 
gitlink entries for subprojects. That's another (quite small) patch, but I 
haven't cleaned it up enough to be submittable yet.

I split my original larger patch up into more manageable pieces, so that 
you should be able to actually just read the patches themselves and get a 
reasonable idea about what it's doing, even *without* actually testing it. 
And obviously, "make test" still completes happily, if only because none 
of the tests actually trigger any of the new code.

The patches are all fairly small, and the two first ones are really just 
totally independent cleanups/fixes:

 - diff-lib: use ce_mode_from_stat() rather than messing with modes manually:

	 diff-lib.c |   15 +++------------
	 1 files changed, 3 insertions(+), 12 deletions(-)

 - Avoid overflowing name buffer in deep directory structures:

	 dir.c |    3 +++
	 1 files changed, 3 insertions(+), 0 deletions(-)

 - Add 'resolve_gitlink_ref()' helper function:

	 refs.c |   79 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
	 refs.h |    3 ++
	 2 files changed, 82 insertions(+), 0 deletions(-)

 - Add "S_IFDIRLNK" file mode infrastructure for git links:

	 cache.h |   20 +++++++++++++++++++-
	 1 files changed, 19 insertions(+), 1 deletions(-)

 - Teach "fsck" not to follow subproject links:

	 builtin-fsck.c |    9 ++++++++-
	 tree.c         |   15 ++++++++++++++-
	 2 files changed, 22 insertions(+), 2 deletions(-)

 - Teach core object handling functions about gitlinks:

	 builtin-ls-tree.c |   20 +++++++++++++++++++-
	 cache-tree.c      |    2 +-
	 read-cache.c      |   35 +++++++++++++++++++++++++++++++----
	 sha1_file.c       |    3 +++
	 4 files changed, 54 insertions(+), 6 deletions(-)

and will follow in the next few emails..

			Linus

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH 1/6] diff-lib: use ce_mode_from_stat() rather than messing with modes manually
  2007-04-10  4:12 [PATCH 0/6] Initial subproject support (RFC?) Linus Torvalds
       [not found] ` <Pi ne.LNX.4.64.0704092115020.6730@woody.linux-foundation.org>
@ 2007-04-10  4:13 ` Linus Torvalds
  2007-04-10  4:13 ` [PATCH 2/6] Avoid overflowing name buffer in deep directory structures Linus Torvalds
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 101+ messages in thread
From: Linus Torvalds @ 2007-04-10  4:13 UTC (permalink / raw
  To: Git Mailing List, Junio C Hamano


The diff helpers used to do the magic mode canonicalization and all the
other special mode handling by hand ("trust executable bit" and "has
symlink support" handling).

That's bogus. Use "ce_mode_from_stat()" that does this all for us.

This is also going to be required when we add support for links to other
git repositories.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 diff-lib.c |   15 +++------------
 1 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/diff-lib.c b/diff-lib.c
index 5c5b05b..c6d1273 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -357,7 +357,7 @@ int run_diff_files(struct rev_info *revs, int silent_on_removed)
 					continue;
 			}
 			else
-				dpath->mode = canon_mode(st.st_mode);
+				dpath->mode = ntohl(ce_mode_from_stat(ce, st.st_mode));
 
 			while (i < entries) {
 				struct cache_entry *nce = active_cache[i];
@@ -374,8 +374,7 @@ int run_diff_files(struct rev_info *revs, int silent_on_removed)
 					int mode = ntohl(nce->ce_mode);
 					num_compare_stages++;
 					hashcpy(dpath->parent[stage-2].sha1, nce->sha1);
-					dpath->parent[stage-2].mode =
-						canon_mode(mode);
+					dpath->parent[stage-2].mode = ntohl(ce_mode_from_stat(nce, mode));
 					dpath->parent[stage-2].status =
 						DIFF_STATUS_MODIFIED;
 				}
@@ -424,15 +423,7 @@ int run_diff_files(struct rev_info *revs, int silent_on_removed)
 		if (!changed && !revs->diffopt.find_copies_harder)
 			continue;
 		oldmode = ntohl(ce->ce_mode);
-
-		newmode = canon_mode(st.st_mode);
-		if (!trust_executable_bit &&
-		    S_ISREG(newmode) && S_ISREG(oldmode) &&
-		    ((newmode ^ oldmode) == 0111))
-			newmode = oldmode;
-		else if (!has_symlinks &&
-		    S_ISREG(newmode) && S_ISLNK(oldmode))
-			newmode = oldmode;
+		newmode = ntohl(ce_mode_from_stat(ce, st.st_mode));
 		diff_change(&revs->diffopt, oldmode, newmode,
 			    ce->sha1, (changed ? null_sha1 : ce->sha1),
 			    ce->name, NULL);
-- 
1.5.1.110.g1e4c

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 2/6] Avoid overflowing name buffer in deep directory structures
  2007-04-10  4:12 [PATCH 0/6] Initial subproject support (RFC?) Linus Torvalds
       [not found] ` <Pi ne.LNX.4.64.0704092115020.6730@woody.linux-foundation.org>
  2007-04-10  4:13 ` [PATCH 1/6] diff-lib: use ce_mode_from_stat() rather than messing with modes manually Linus Torvalds
@ 2007-04-10  4:13 ` Linus Torvalds
  2007-04-10  4:14 ` [PATCH 3/6] Add 'resolve_gitlink_ref()' helper function Linus Torvalds
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 101+ messages in thread
From: Linus Torvalds @ 2007-04-10  4:13 UTC (permalink / raw
  To: Git Mailing List, Junio C Hamano


This just makes sure that when we do a read_directory(), we check
that the filename fits in the buffer we allocated (with a bit of
slop)

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 dir.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/dir.c b/dir.c
index 7426fde..4f5a224 100644
--- a/dir.c
+++ b/dir.c
@@ -353,6 +353,9 @@ static int read_directory_recursive(struct dir_struct *dir, const char *path, co
 			     !strcmp(de->d_name + 1, "git")))
 				continue;
 			len = strlen(de->d_name);
+			/* Ignore overly long pathnames! */
+			if (len + baselen + 8 > sizeof(fullname))
+				continue;
 			memcpy(fullname + baselen, de->d_name, len+1);
 			if (simplify_away(fullname, baselen + len, simplify))
 				continue;
-- 
1.5.1.110.g1e4c

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 3/6] Add 'resolve_gitlink_ref()' helper function
  2007-04-10  4:12 [PATCH 0/6] Initial subproject support (RFC?) Linus Torvalds
                   ` (2 preceding siblings ...)
  2007-04-10  4:13 ` [PATCH 2/6] Avoid overflowing name buffer in deep directory structures Linus Torvalds
@ 2007-04-10  4:14 ` Linus Torvalds
  2007-04-10  9:38   ` Alex Riesen
  2007-04-10  4:14 ` [PATCH 4/6] Add "S_IFDIRLNK" file mode infrastructure for git links Linus Torvalds
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 101+ messages in thread
From: Linus Torvalds @ 2007-04-10  4:14 UTC (permalink / raw
  To: Git Mailing List, Junio C Hamano


This new function resolves a ref in *another* git repository.  It's
named for its intended use: to look up the git link to a subproject.

It's not actually wired up to anything yet, but we're getting closer to
having fundamental plumbing support for "links" from one git directory
to another, which is the basis of subproject support.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 refs.c |   79 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 refs.h |    3 ++
 2 files changed, 82 insertions(+), 0 deletions(-)

diff --git a/refs.c b/refs.c
index d2b7b7f..229da74 100644
--- a/refs.c
+++ b/refs.c
@@ -215,6 +215,85 @@ static struct ref_list *get_loose_refs(void)
 
 /* We allow "recursive" symbolic refs. Only within reason, though */
 #define MAXDEPTH 5
+#define MAXREFLEN (1024)
+
+static int resolve_gitlink_packed_ref(char *name, int pathlen, const char *refname, unsigned char *result)
+{
+	FILE *f;
+	struct cached_refs refs;
+	struct ref_list *ref;
+	int retval;
+
+	strcpy(name + pathlen, "packed-refs");
+	f = fopen(name, "r");
+	if (!f)
+		return -1;
+	read_packed_refs(f, &refs);
+	ref = refs.packed;
+	retval = -1;
+	while (ref) {
+		if (!strcmp(ref->name, refname)) {
+			retval = 0;
+			memcpy(result, ref->sha1, 20);
+			break;
+		}
+		ref = ref->next;
+	}
+	free_ref_list(refs.packed);
+	return retval;
+}
+
+static int resolve_gitlink_ref_recursive(char *name, int pathlen, const char *refname, unsigned char *result, int recursion)
+{
+	int fd, len = strlen(refname);
+	char buffer[128], *p;
+
+	if (recursion > MAXDEPTH || len > MAXREFLEN)
+		return -1;
+	memcpy(name + pathlen, refname, len+1);
+	fd = open(name, O_RDONLY);
+	if (fd < 0)
+		return resolve_gitlink_packed_ref(name, pathlen, refname, result);
+
+	len = read(fd, buffer, sizeof(buffer)-1);
+	close(fd);
+	if (len < 0)
+		return -1;
+	while (len && isspace(buffer[len-1]))
+		len--;
+	buffer[len] = 0;
+
+	/* Was it a detached head or an old-fashioned symlink? */
+	if (!get_sha1_hex(buffer, result))
+		return 0;
+
+	/* Symref? */
+	if (strncmp(buffer, "ref:", 4))
+		return -1;
+	p = buffer + 4;
+	while (isspace(*p))
+		p++;
+
+	return resolve_gitlink_ref_recursive(name, pathlen, p, result, recursion+1);
+}
+
+int resolve_gitlink_ref(const char *path, const char *refname, unsigned char *result)
+{
+	int len = strlen(path), retval;
+	char *gitdir;
+
+	while (len && path[len-1] == '/')
+		len--;
+	if (!len)
+		return -1;
+	gitdir = xmalloc(len + MAXREFLEN + 8);
+	memcpy(gitdir, path, len);
+	memcpy(gitdir + len, "/.git/", 7);
+
+	retval = resolve_gitlink_ref_recursive(gitdir, len+6, refname, result, 0);
+	free(gitdir);
+	return retval;
+}
 
 const char *resolve_ref(const char *ref, unsigned char *sha1, int reading, int *flag)
 {
diff --git a/refs.h b/refs.h
index acedffc..f61f6d9 100644
--- a/refs.h
+++ b/refs.h
@@ -60,4 +60,7 @@ extern int check_ref_format(const char *target);
 /** rename ref, return 0 on success **/
 extern int rename_ref(const char *oldref, const char *newref, const char *logmsg);
 
+/** resolve ref in nested "gitlink" repository */
+extern int resolve_gitlink_ref(const char *name, const char *refname, unsigned char *result);
+
 #endif /* REFS_H */
-- 
1.5.1.110.g1e4c

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 4/6] Add "S_IFDIRLNK" file mode infrastructure for git links
  2007-04-10  4:12 [PATCH 0/6] Initial subproject support (RFC?) Linus Torvalds
                   ` (3 preceding siblings ...)
  2007-04-10  4:14 ` [PATCH 3/6] Add 'resolve_gitlink_ref()' helper function Linus Torvalds
@ 2007-04-10  4:14 ` Linus Torvalds
  2007-04-10  4:15 ` [PATCH 5/6] Teach "fsck" not to follow subproject links Linus Torvalds
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 101+ messages in thread
From: Linus Torvalds @ 2007-04-10  4:14 UTC (permalink / raw
  To: Git Mailing List, Junio C Hamano


This just adds the basic helper functions to recognize and work with git
tree entries that are links to other git repositories ("subprojects").
They still aren't actually connected up to any of the code-paths, but
now all the infrastructure is in place.

The next commit will start actually adding actual subproject support.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 cache.h |   20 +++++++++++++++++++-
 1 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/cache.h b/cache.h
index eb57507..1b3d00e 100644
--- a/cache.h
+++ b/cache.h
@@ -25,6 +25,22 @@
 #endif
 
 /*
+ * A "directory link" is a link to another git directory.
+ *
+ * The value 0160000 is not normally a valid mode, and
+ * also just happens to be S_IFDIR + S_IFLNK
+ *
+ * NOTE! We *really* shouldn't depend on the S_IFxxx macros
+ * always having the same values everywhere. We should use
+ * our internal git values for these things, and then we can
+ * translate that to the OS-specific value. It just so
+ * happens that everybody shares the same bit representation
+ * in the UNIX world (and apparently wider too..)
+ */
+#define S_IFDIRLNK	0160000
+#define S_ISDIRLNK(m)	(((m) & S_IFMT) == S_IFDIRLNK)
+
+/*
  * Intensive research over the course of many years has shown that
  * port 9418 is totally unused by anything else. Or
  *
@@ -104,6 +120,8 @@ static inline unsigned int create_ce_mode(unsigned int mode)
 {
 	if (S_ISLNK(mode))
 		return htonl(S_IFLNK);
+	if (S_ISDIR(mode) || S_ISDIRLNK(mode))
+		return htonl(S_IFDIRLNK);
 	return htonl(S_IFREG | ce_permissions(mode));
 }
 static inline unsigned int ce_mode_from_stat(struct cache_entry *ce, unsigned int mode)
@@ -121,7 +139,7 @@ static inline unsigned int ce_mode_from_stat(struct cache_entry *ce, unsigned in
 }
 #define canon_mode(mode) \
 	(S_ISREG(mode) ? (S_IFREG | ce_permissions(mode)) : \
-	S_ISLNK(mode) ? S_IFLNK : S_IFDIR)
+	S_ISLNK(mode) ? S_IFLNK : S_ISDIR(mode) ? S_IFDIR : S_IFDIRLNK)
 
 #define cache_entry_size(len) ((offsetof(struct cache_entry,name) + (len) + 8) & ~7)
 
-- 
1.5.1.110.g1e4c

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 5/6] Teach "fsck" not to follow subproject links
  2007-04-10  4:12 [PATCH 0/6] Initial subproject support (RFC?) Linus Torvalds
                   ` (4 preceding siblings ...)
  2007-04-10  4:14 ` [PATCH 4/6] Add "S_IFDIRLNK" file mode infrastructure for git links Linus Torvalds
@ 2007-04-10  4:15 ` Linus Torvalds
  2007-04-11 22:41   ` Sam Vilain
  2007-04-10  4:20 ` [PATCH 6/6] Teach core object handling functions about gitlinks Linus Torvalds
  2007-04-10  4:46 ` [PATCH 0/6] Initial subproject support (RFC?) Linus Torvalds
  7 siblings, 1 reply; 101+ messages in thread
From: Linus Torvalds @ 2007-04-10  4:15 UTC (permalink / raw
  To: Git Mailing List, Junio C Hamano


Since the subprojects don't necessarily even exist in the current tree,
much less in the current git repository (they are totally independent
repositories), we do not want to try to follow the chain from one git
repository to another through a gitlink.

This involves teaching fsck to ignore references to gitlink objects from
a tree and from the current index.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 builtin-fsck.c |    9 ++++++++-
 tree.c         |   15 ++++++++++++++-
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/builtin-fsck.c b/builtin-fsck.c
index 4d8b66c..f22de8d 100644
--- a/builtin-fsck.c
+++ b/builtin-fsck.c
@@ -253,6 +253,7 @@ static int fsck_tree(struct tree *item)
 		case S_IFREG | 0644:
 		case S_IFLNK:
 		case S_IFDIR:
+		case S_IFDIRLNK:
 			break;
 		/*
 		 * This is nonstandard, but we had a few of these
@@ -695,8 +696,14 @@ int cmd_fsck(int argc, char **argv, const char *prefix)
 		int i;
 		read_cache();
 		for (i = 0; i < active_nr; i++) {
-			struct blob *blob = lookup_blob(active_cache[i]->sha1);
+			unsigned int mode;
+			struct blob *blob;
 			struct object *obj;
+
+			mode = ntohl(active_cache[i]->ce_mode);
+			if (S_ISDIRLNK(mode))
+				continue;
+			blob = lookup_blob(active_cache[i]->sha1);
 			if (!blob)
 				continue;
 			obj = &blob->object;
diff --git a/tree.c b/tree.c
index d188c0f..dbb63fc 100644
--- a/tree.c
+++ b/tree.c
@@ -143,6 +143,14 @@ struct tree *lookup_tree(const unsigned char *sha1)
 	return (struct tree *) obj;
 }
 
+/*
+ * NOTE! Tree refs to external git repositories
+ * (ie gitlinks) do not count as real references.
+ *
+ * You don't have to have those repositories
+ * available at all, much less have the objects
+ * accessible from the current repository.
+ */
 static void track_tree_refs(struct tree *item)
 {
 	int n_refs = 0, i;
@@ -152,8 +160,11 @@ static void track_tree_refs(struct tree *item)
 
 	/* Count how many entries there are.. */
 	init_tree_desc(&desc, item->buffer, item->size);
-	while (tree_entry(&desc, &entry))
+	while (tree_entry(&desc, &entry)) {
+		if (S_ISDIRLNK(entry.mode))
+			continue;
 		n_refs++;
+	}
 
 	/* Allocate object refs and walk it again.. */
 	i = 0;
@@ -162,6 +173,8 @@ static void track_tree_refs(struct tree *item)
 	while (tree_entry(&desc, &entry)) {
 		struct object *obj;
 
+		if (S_ISDIRLNK(entry.mode))
+			continue;
 		if (S_ISDIR(entry.mode))
 			obj = &lookup_tree(entry.sha1)->object;
 		else
-- 
1.5.1.110.g1e4c

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-10  4:12 [PATCH 0/6] Initial subproject support (RFC?) Linus Torvalds
                   ` (5 preceding siblings ...)
  2007-04-10  4:15 ` [PATCH 5/6] Teach "fsck" not to follow subproject links Linus Torvalds
@ 2007-04-10  4:20 ` Linus Torvalds
  2007-04-10  8:40   ` Frank Lichtenheld
                     ` (2 more replies)
  2007-04-10  4:46 ` [PATCH 0/6] Initial subproject support (RFC?) Linus Torvalds
  7 siblings, 3 replies; 101+ messages in thread
From: Linus Torvalds @ 2007-04-10  4:20 UTC (permalink / raw
  To: Git Mailing List, Junio C Hamano


This teaches the really fundamental core SHA1 object handling routines
about gitlinks.  We can compare trees with gitlinks in them (although we
can not actually generate patches for them yet - just raw git diffs),
and they show up as commits in "git ls-tree".

We also know to compare gitlinks as if they were directories (ie the
normal "sort as trees" rules apply).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---

Ok, that's it for now.

NOTE NOTE NOTE! I'd like to note once more that this doesn't actually get 
you working subproject support. Not only do I need to connect up a few 
more low-level helper functions (things like "git diff" don't know how to 
generate even rudimentary "subproject X changed" patches, nor can you 
actually yet *add* subprojects), but quite apart from that low-level 
stuff, anything more high-level (like "git fetch" and friends) will need 
to know about subprojects.

In general, think of this like the early git plumbing: it's the early
"content-addressable filesystem" part. The actual SCM parts going on top 
of it are yet to be done.

I'm hoping/expecting that there are more people who have the ability and 
the interest to work on the higher-level interfaces once the core plumbing 
support is there. There's still some plumbing to be done, but after that, 
maybe more people (and maybe the SoC people) can start filling out the 
higher-level details..

Comments on the patches/approach so far?

 builtin-ls-tree.c |   20 +++++++++++++++++++-
 cache-tree.c      |    2 +-
 read-cache.c      |   35 +++++++++++++++++++++++++++++++----
 sha1_file.c       |    3 +++
 4 files changed, 54 insertions(+), 6 deletions(-)

diff --git a/builtin-ls-tree.c b/builtin-ls-tree.c
index 6472610..1cb4dca 100644
--- a/builtin-ls-tree.c
+++ b/builtin-ls-tree.c
@@ -6,6 +6,7 @@
 #include "cache.h"
 #include "blob.h"
 #include "tree.h"
+#include "commit.h"
 #include "quote.h"
 #include "builtin.h"
 
@@ -59,7 +60,24 @@ static int show_tree(const unsigned char *sha1, const char *base, int baselen,
 	int retval = 0;
 	const char *type = blob_type;
 
-	if (S_ISDIR(mode)) {
+	if (S_ISDIRLNK(mode)) {
+		/*
+		 * Maybe we want to have some recursive version here?
+		 *
+		 * Something like:
+		 *
+		if (show_subprojects(base, baselen, pathname)) {
+			if (fork()) {
+				chdir(base);
+				exec ls-tree;
+			}
+			waitpid();
+		}
+		 *
+		 * ..or similar..
+		 */
+		type = commit_type;
+	} else if (S_ISDIR(mode)) {
 		if (show_recursive(base, baselen, pathname)) {
 			retval = READ_TREE_RECURSIVE;
 			if (!(ls_options & LS_SHOW_TREES))
diff --git a/cache-tree.c b/cache-tree.c
index 9b73c86..6369cc7 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -326,7 +326,7 @@ static int update_one(struct cache_tree *it,
 			mode = ntohl(ce->ce_mode);
 			entlen = pathlen - baselen;
 		}
-		if (!missing_ok && !has_sha1_file(sha1))
+		if (mode != S_IFDIRLNK && !missing_ok && !has_sha1_file(sha1))
 			return error("invalid object %s", sha1_to_hex(sha1));
 
 		if (!ce->ce_mode)
diff --git a/read-cache.c b/read-cache.c
index 54573ce..8fe94cd 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -5,6 +5,7 @@
  */
 #include "cache.h"
 #include "cache-tree.h"
+#include "refs.h"
 
 /* Index extensions.
  *
@@ -91,6 +92,23 @@ static int ce_compare_link(struct cache_entry *ce, size_t expected_size)
 	return match;
 }
 
+static int ce_compare_gitlink(struct cache_entry *ce)
+{
+	unsigned char sha1[20];
+
+	/*
+	 * We don't actually require that the .git directory
+	 * under DIRLNK directory be a valid git directory. It
+	 * might even be missing (in case nobody populated that
+	 * sub-project).
+	 *
+	 * If so, we consider it always to match.
+	 */
+	if (resolve_gitlink_ref(ce->name, "HEAD", sha1) < 0)
+		return 0;
+	return hashcmp(sha1, ce->sha1);
+}
+
 static int ce_modified_check_fs(struct cache_entry *ce, struct stat *st)
 {
 	switch (st->st_mode & S_IFMT) {
@@ -102,6 +120,9 @@ static int ce_modified_check_fs(struct cache_entry *ce, struct stat *st)
 		if (ce_compare_link(ce, xsize_t(st->st_size)))
 			return DATA_CHANGED;
 		break;
+	case S_IFDIRLNK:
+		/* No need to do anything, we did the exact compare in "match_stat_basic" */
+		break;
 	default:
 		return TYPE_CHANGED;
 	}
@@ -127,6 +148,12 @@ static int ce_match_stat_basic(struct cache_entry *ce, struct stat *st)
 		    (has_symlinks || !S_ISREG(st->st_mode)))
 			changed |= TYPE_CHANGED;
 		break;
+	case S_IFDIRLNK:
+		if (!S_ISDIR(st->st_mode))
+			changed |= TYPE_CHANGED;
+		else if (ce_compare_gitlink(ce))
+			changed |= DATA_CHANGED;
+		break;
 	default:
 		die("internal error: ce_mode is %o", ntohl(ce->ce_mode));
 	}
@@ -250,9 +277,9 @@ int base_name_compare(const char *name1, int len1, int mode1,
 		return cmp;
 	c1 = name1[len];
 	c2 = name2[len];
-	if (!c1 && S_ISDIR(mode1))
+	if (!c1 && (S_ISDIR(mode1) || S_ISDIRLNK(mode1)))
 		c1 = '/';
-	if (!c2 && S_ISDIR(mode2))
+	if (!c2 && (S_ISDIR(mode2) || S_ISDIRLNK(mode1)))
 		c2 = '/';
 	return (c1 < c2) ? -1 : (c1 > c2) ? 1 : 0;
 }
@@ -334,8 +361,8 @@ int add_file_to_cache(const char *path, int verbose)
 	if (lstat(path, &st))
 		die("%s: unable to stat (%s)", path, strerror(errno));
 
-	if (!S_ISREG(st.st_mode) && !S_ISLNK(st.st_mode))
-		die("%s: can only add regular files or symbolic links", path);
+	if (!S_ISREG(st.st_mode) && !S_ISLNK(st.st_mode) && !S_ISDIR(st.st_mode))
+		die("%s: can only add regular files, symbolic links or git-directories", path);
 
 	namelen = strlen(path);
 	size = cache_entry_size(namelen);
diff --git a/sha1_file.c b/sha1_file.c
index 4304fe9..ab915fa 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -13,6 +13,7 @@
 #include "commit.h"
 #include "tag.h"
 #include "tree.h"
+#include "refs.h"
 
 #ifndef O_NOATIME
 #if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
@@ -2332,6 +2333,8 @@ int index_path(unsigned char *sha1, const char *path, struct stat *st, int write
 				     path);
 		free(target);
 		break;
+	case S_IFDIR:
+		return resolve_gitlink_ref(path, "HEAD", sha1);
 	default:
 		return error("%s: unsupported file type", path);
 	}
-- 
1.5.1.110.g1e4c

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* Re: [PATCH 0/6] Initial subproject support (RFC?)
  2007-04-10  4:12 [PATCH 0/6] Initial subproject support (RFC?) Linus Torvalds
                   ` (6 preceding siblings ...)
  2007-04-10  4:20 ` [PATCH 6/6] Teach core object handling functions about gitlinks Linus Torvalds
@ 2007-04-10  4:46 ` Linus Torvalds
  2007-04-10 13:04   ` Alex Riesen
  2007-04-10 13:39   ` [PATCH] allow git-update-index work on subprojects Alex Riesen
  7 siblings, 2 replies; 101+ messages in thread
From: Linus Torvalds @ 2007-04-10  4:46 UTC (permalink / raw
  To: Git Mailing List, Junio C Hamano



On Mon, 9 Apr 2007, Linus Torvalds wrote:
> 
> NOTE! This series of six patches does not actually contain everything you 
> need to do that - in particular, this series will not actually connect up 
> the magic to make "git add" (and thus "git commit") actually create the 
> gitlink entries for subprojects. That's another (quite small) patch, but I 
> haven't cleaned it up enough to be submittable yet.

Here is, for your enjoyment, the last patch I used to actually test this 
all. I do *not* submit it as a patch for actual inclusion - the other 
patches in the series are, I think, ready to actually be merged. This one 
is not.

It's broken for a few reasons:

 - it allows you to do "git add subproject" to add the subproject to the 
   index (and then use "git commit" to commit it), but even something as 
   simple as "git commit -a" doesn't work right, because the sequence that 
   "git commit -a" uses to update the index doesn't work with the current 
   state of the plumbing (ie the

	git-diff-files --name-only -z |
		git-update-index --remove -z --stdin

   thing doesn't work right.

 - even for "git add", the logic isn't really right. It should take the 
   old index state into account to decide if it wants to add it as a 
   subproject. 

so this patch really isn't very good, but it allows people who are 
interested to perhaps actually test something. For example, my test repo 
was actually created with this:

	[torvalds@woody superproject]$ git log --raw
	commit 649ad968bdd79cb3b0f50feb819b7e9b134d3a1a
	Author: Linus Torvalds <torvalds@woody.linux-foundation.org>
	Date:   Mon Apr 9 21:36:53 2007 -0700
	
	    This commits the modification to sub-project B
	
	:160000 160000 5813084832d3c680a3436b0253639c94ed55445d 17d246a35f27a46762328281eb6e9d4558f91e9d M      sub-B

	commit f3c55ffcc000a8c0fecc6801e8909d084e3d419e
	Author: Linus Torvalds <torvalds@woody.linux-foundation.org>
	Date:   Mon Apr 9 16:12:29 2007 -0700
	
	    Superproject with two subprojects
	
	:000000 160000 0000000... c0daf4c85d48879ab450a6a887bbb241eb0de00a A    sub-A
	:000000 160000 0000000... 5813084832d3c680a3436b0253639c94ed55445d A    sub-B

	commit 45eb14edb43b10e3d3ac7a495a1ec861e85dc36f
	Author: Linus Torvalds <torvalds@woody.linux-foundation.org>
	Date:   Mon Apr 9 15:36:24 2007 -0700
	
	    Add top-level Makefile for super-project
	
	:000000 100644 0000000... 57e8394... A  Makefile

so you can see how things look at a low level (ie a "gitlink" is just a 
tree entry with mode 0160000, and the SHA1 is just the SHA1 of the HEAD 
commit in the subproject)

		Linus

---
diff --git a/dir.c b/dir.c
index 4f5a224..ef284a2 100644
--- a/dir.c
+++ b/dir.c
@@ -378,6 +378,14 @@ static int read_directory_recursive(struct dir_struct *dir, const char *path, co
 					continue;
 				/* fallthrough */
 			case DT_DIR:
+				/* Does it have a git directory? If so, it's a DIRLNK */
+				if (!dir->no_dirlinks) {
+					memcpy(fullname + baselen + len, "/.git/", 7);
+					if (!stat(fullname, &st)) {
+						if (S_ISDIR(st.st_mode))
+							break;
+					}
+				}
 				memcpy(fullname + baselen + len, "/", 2);
 				len++;
 				if (dir->show_other_directories &&
diff --git a/dir.h b/dir.h
index 33c31f2..1931609 100644
--- a/dir.h
+++ b/dir.h
@@ -33,7 +33,8 @@ struct dir_struct {
 	int nr, alloc;
 	unsigned int show_ignored:1,
 		     show_other_directories:1,
-		     hide_empty_directories:1;
+		     hide_empty_directories:1,
+		     no_dirlinks;
 	struct dir_entry **entries;
 
 	/* Exclude info */

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-10  4:20 ` [PATCH 6/6] Teach core object handling functions about gitlinks Linus Torvalds
@ 2007-04-10  8:40   ` Frank Lichtenheld
  2007-04-10 11:31     ` Alex Riesen
  2007-04-10 14:55     ` Linus Torvalds
  2007-04-10 16:28   ` Josef Weidendorfer
  2007-04-11  8:06   ` Martin Waitz
  2 siblings, 2 replies; 101+ messages in thread
From: Frank Lichtenheld @ 2007-04-10  8:40 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano

On Mon, Apr 09, 2007 at 09:20:29PM -0700, Linus Torvalds wrote:
> diff --git a/sha1_file.c b/sha1_file.c
> index 4304fe9..ab915fa 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -13,6 +13,7 @@
>  #include "commit.h"
>  #include "tag.h"
>  #include "tree.h"
> +#include "refs.h"
>  
>  #ifndef O_NOATIME
>  #if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
> @@ -2332,6 +2333,8 @@ int index_path(unsigned char *sha1, const char *path, struct stat *st, int write
>  				     path);
>  		free(target);
>  		break;
> +	case S_IFDIR:
> +		return resolve_gitlink_ref(path, "HEAD", sha1);
>  	default:
>  		return error("%s: unsupported file type", path);
>  	}

Not that I have time right now to look up the exact context (only read
the patch), but I would've expected a "case S_IFDIRLNK:" here?

Gruesse,
-- 
Frank Lichtenheld <frank@lichtenheld.de>
www: http://www.djpig.de/

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 3/6] Add 'resolve_gitlink_ref()' helper function
  2007-04-10  4:14 ` [PATCH 3/6] Add 'resolve_gitlink_ref()' helper function Linus Torvalds
@ 2007-04-10  9:38   ` Alex Riesen
  2007-04-10 14:58     ` Linus Torvalds
  0 siblings, 1 reply; 101+ messages in thread
From: Alex Riesen @ 2007-04-10  9:38 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano

On 4/10/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> +int resolve_gitlink_ref(const char *path, const char *refname, unsigned char *result)
> +{
> +       int len = strlen(path), retval;
> +       char *gitdir;
> +
> +       while (len && path[len-1] == '/')
> +               len--;
> +       if (!len)
> +               return -1;
> +       gitdir = xmalloc(len + MAXREFLEN + 8);
> +       memcpy(gitdir, path, len);
> +       memcpy(gitdir + len, "/.git/", 7);

Can't a subproject be bare?

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-10  8:40   ` Frank Lichtenheld
@ 2007-04-10 11:31     ` Alex Riesen
  2007-04-10 14:55     ` Linus Torvalds
  1 sibling, 0 replies; 101+ messages in thread
From: Alex Riesen @ 2007-04-10 11:31 UTC (permalink / raw
  To: Linus Torvalds, Git Mailing List, Junio C Hamano

On 4/10/07, Frank Lichtenheld <frank@lichtenheld.de> wrote:
> On Mon, Apr 09, 2007 at 09:20:29PM -0700, Linus Torvalds wrote:
> > +     case S_IFDIR:
> > +             return resolve_gitlink_ref(path, "HEAD", sha1);
> >       default:
> >               return error("%s: unsupported file type", path);
> >       }
>
> Not that I have time right now to look up the exact context (only read
> the patch), but I would've expected a "case S_IFDIRLNK:" here?
>

No, the st_mode comes directly from file system. It knows nothing about
dirlinks.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 0/6] Initial subproject support (RFC?)
  2007-04-10  4:46 ` [PATCH 0/6] Initial subproject support (RFC?) Linus Torvalds
@ 2007-04-10 13:04   ` Alex Riesen
  2007-04-10 15:13     ` Linus Torvalds
  2007-04-11  8:32     ` Martin Waitz
  2007-04-10 13:39   ` [PATCH] allow git-update-index work on subprojects Alex Riesen
  1 sibling, 2 replies; 101+ messages in thread
From: Alex Riesen @ 2007-04-10 13:04 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano

On 4/10/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> Here is, for your enjoyment, the last patch I used to actually test this
> all. I do *not* submit it as a patch for actual inclusion - the other
> patches in the series are, I think, ready to actually be merged. This one
> is not.
>
> It's broken for a few reasons:
>
>  - it allows you to do "git add subproject" to add the subproject to the
>    index (and then use "git commit" to commit it), but even something as
>    simple as "git commit -a" doesn't work right, because the sequence that
>    "git commit -a" uses to update the index doesn't work with the current
>    state of the plumbing (ie the
>
>         git-diff-files --name-only -z |
>                 git-update-index --remove -z --stdin
>
>    thing doesn't work right.
>
>  - even for "git add", the logic isn't really right. It should take the
>    old index state into account to decide if it wants to add it as a
>    subproject.
>

The other thing which will be missed a lot (I miss it that much)
is a subproject-recursive git-commit and git-status.
It is very possible that the default should be different for
the git-commit and git-status: git-commit is likely to have it
off whereas git-status will very much depend on how fast
the usual response is (or wished for). An integrator on very fast
machine may like it on for both, a subproject developer can have
it off for both (to avoid accidental commits and generally being
not interested in anything besides his code), an occasional person
can have the status defaulting to on and commit to off - to avoid
accidental commits in subprojects which are just tracked.

A separate config option and a command-line switch, probably.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH] allow git-update-index work on subprojects
@ 2007-04-10 13:39   ` Alex Riesen
  2007-04-10 23:19     ` [PATCH] Allow " Alex Riesen
  0 siblings, 1 reply; 101+ messages in thread
From: Alex Riesen @ 2007-04-10 13:39 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 2185 bytes --]

On 4/10/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>
> On Mon, 9 Apr 2007, Linus Torvalds wrote:
> >
> > NOTE! This series of six patches does not actually contain everything you
> > need to do that - in particular, this series will not actually connect up
> > the magic to make "git add" (and thus "git commit") actually create the
> > gitlink entries for subprojects. That's another (quite small) patch, but I
> > haven't cleaned it up enough to be submittable yet.
>
> Here is, for your enjoyment, the last patch I used to actually test this
> all. I do *not* submit it as a patch for actual inclusion - the other
> patches in the series are, I think, ready to actually be merged. This one
> is not.
>
> It's broken for a few reasons:
>
>  - it allows you to do "git add subproject" to add the subproject to the
>    index (and then use "git commit" to commit it), but even something as
>    simple as "git commit -a" doesn't work right, because the sequence that
>    "git commit -a" uses to update the index doesn't work with the current
>    state of the plumbing (ie the
>
>         git-diff-files --name-only -z |
>                 git-update-index --remove -z --stdin
>

At least git-update-index should work.

---
 builtin-update-index.c |    8 +++-----
 1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/builtin-update-index.c b/builtin-update-index.c
index 47d42ed..55c9f93 100644
--- a/builtin-update-index.c
+++ b/builtin-update-index.c
@@ -94,12 +94,10 @@ static int process_file(const char *path)
 				             path);
 			}
 		}
-		if (0 == status)
-			return error("%s: is a directory - add files inside instead",
-			             path);
-		else
+		if (status)
 			return error("lstat(\"%s\"): %s", path,
 				     strerror(errno));
+		/* could be a subproject */
 	}

 	namelen = strlen(path);
@@ -211,7 +209,7 @@ static void update_one(const char *path, const
char *prefix, int prefix_length)
 		goto free_return;
 	}
 	if (process_file(p))
-		die("Unable to process file %s", path);
+		die("Unable to process \"%s\"", path);
 	report("add '%s'", path);
  free_return:
 	if (p < path || p > path + strlen(path))
-- 
1.5.1.147.gbaa5

[-- Attachment #2: 0001-allow-git-update-index-work-on-subprojects.txt --]
[-- Type: text/plain, Size: 1165 bytes --]

From 252adc55f9a8a7ee36be1abf76d8511d6f12d4f3 Mon Sep 17 00:00:00 2001
From: Alex Riesen <ariesen@harmanbecker.com>
Date: Tue, 10 Apr 2007 15:19:30 +0200
Subject: [PATCH] allow git-update-index work on subprojects

---
 builtin-update-index.c |    8 +++-----
 1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/builtin-update-index.c b/builtin-update-index.c
index 47d42ed..55c9f93 100644
--- a/builtin-update-index.c
+++ b/builtin-update-index.c
@@ -94,12 +94,10 @@ static int process_file(const char *path)
 				             path);
 			}
 		}
-		if (0 == status)
-			return error("%s: is a directory - add files inside instead",
-			             path);
-		else
+		if (status)
 			return error("lstat(\"%s\"): %s", path,
 				     strerror(errno));
+		/* could be a subproject */
 	}
 
 	namelen = strlen(path);
@@ -211,7 +209,7 @@ static void update_one(const char *path, const char *prefix, int prefix_length)
 		goto free_return;
 	}
 	if (process_file(p))
-		die("Unable to process file %s", path);
+		die("Unable to process \"%s\"", path);
 	report("add '%s'", path);
  free_return:
 	if (p < path || p > path + strlen(path))
-- 
1.5.1.147.gbaa5


^ permalink raw reply related	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-10  8:40   ` Frank Lichtenheld
  2007-04-10 11:31     ` Alex Riesen
@ 2007-04-10 14:55     ` Linus Torvalds
  1 sibling, 0 replies; 101+ messages in thread
From: Linus Torvalds @ 2007-04-10 14:55 UTC (permalink / raw
  To: Frank Lichtenheld; +Cc: Git Mailing List, Junio C Hamano



On Tue, 10 Apr 2007, Frank Lichtenheld wrote:

> On Mon, Apr 09, 2007 at 09:20:29PM -0700, Linus Torvalds wrote:
> > @@ -2332,6 +2333,8 @@ int index_path(unsigned char *sha1, const char *path, struct stat *st, int write
> >  				     path);
> >  		free(target);
> >  		break;
> > +	case S_IFDIR:
> > +		return resolve_gitlink_ref(path, "HEAD", sha1);
> >  	default:
> >  		return error("%s: unsupported file type", path);
> >  	}
> 
> Not that I have time right now to look up the exact context (only read
> the patch), but I would've expected a "case S_IFDIRLNK:" here?

So we have this strange (and worrying) dualism inside git: we use the same 
macros *both* for "stat data" *and* for "git-internal file modes".

So sometimes a mode is the result of a [l]stat() call like above, and then 
a gitlink is just a directory and we use S_IFDIR. And if it comes from the 
index, then it uses the internal git representation, and is S_IFDIRLNK.

I'm not very happy about it, but I'm actually most unhappy about it since 
I could imagine that the constants themselves are different on different 
OS's (eg VMS - a Unix-related OS will use the same constants for 
historical reasons).

In this particular place (index-path), we obviously not only have a stat() 
result, but more importantly, we never come here for a "normal" directory, 
since a normal directory would have been expanded into its component paths 
by the "read_directory()" logic.

So that interaction with directory expansion is somewhat non-obvious: 
normal directories are expanded recursively into the files they contain, 
while git directories end up being visible to internals as real 
directories, and are turned into gitlinks by code like the above.

		Linus

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 3/6] Add 'resolve_gitlink_ref()' helper function
  2007-04-10  9:38   ` Alex Riesen
@ 2007-04-10 14:58     ` Linus Torvalds
  2007-04-10 15:35       ` Alex Riesen
  2007-04-10 15:54       ` Josef Weidendorfer
  0 siblings, 2 replies; 101+ messages in thread
From: Linus Torvalds @ 2007-04-10 14:58 UTC (permalink / raw
  To: Alex Riesen; +Cc: Git Mailing List, Junio C Hamano



On Tue, 10 Apr 2007, Alex Riesen wrote:
>
> On 4/10/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > +int resolve_gitlink_ref(const char *path, const char *refname, unsigned
> > char *result)
> > +{
> > +       int len = strlen(path), retval;
> > +       char *gitdir;
> > +
> > +       while (len && path[len-1] == '/')
> > +               len--;
> > +       if (!len)
> > +               return -1;
> > +       gitdir = xmalloc(len + MAXREFLEN + 8);
> > +       memcpy(gitdir, path, len);
> > +       memcpy(gitdir + len, "/.git/", 7);
> 
> Can't a subproject be bare?

Not when it is checked out, no. That's what "checked out" means ;)

If a subproject is bare, it never gets resolved, because it's never 
checked out in a superproject.

So a subproject *can* be bare, but when it's bare it is just a totally 
regular independent git project, simply by *definition* of not being 
checked out inside a superproject.

But hey, that was just a design decision of mine, and if people can argue 
for it being wrong, I don't think I'm married to it ;)

		Linus

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 0/6] Initial subproject support (RFC?)
  2007-04-10 13:04   ` Alex Riesen
@ 2007-04-10 15:13     ` Linus Torvalds
  2007-04-10 15:48       ` Alex Riesen
  2007-04-11  8:32     ` Martin Waitz
  1 sibling, 1 reply; 101+ messages in thread
From: Linus Torvalds @ 2007-04-10 15:13 UTC (permalink / raw
  To: Alex Riesen; +Cc: Git Mailing List, Junio C Hamano



On Tue, 10 Apr 2007, Alex Riesen wrote:
> 
> The other thing which will be missed a lot (I miss it that much)
> is a subproject-recursive git-commit and git-status.

Note that I was definitely planning on adding them too, but they are at a 
higher level. 

So the long-term plan is/was to add a flag to "git diff" (and "git 
ls-tree" etc) to say "recurse into subprojects".

You cound perhaps even make that flag the default with some .git/config 
option, if your superproject is small enough.

But this series of 6 (and the seventh ugly hack) is literally meant for 
just the really core object-handling stuff, and even there it's not really 
complete.

For example, you cannot even clone a superproject yet, simply because 
git-upload-pack doesn't know that it's not supposed to follow the gitlink 
things etc. So there's a lot of details left even for the really *core* 
stuff, but I wanted to post the series of six patches because those six 
patches are actually enough to reach the point where you can start looking 
at individual problems (like "git upload-pack") and fix them 
incrementally.

So I'd like this to be merged somewhere, not because "it works" or "it's 
complete", but because it's in a shape where I think a lot of people can 
start fixing small details. 

For example, with just two smallish updates:
 - teach "git upload-pack" not to try to follow gitlinks
 - teach "git read-tree" to check out a git-link as just an empty 
   subdirectory
you should already be pretty close to being able to clone a superproject. 
You'd still have to clone the subprojects one-by-one manually, and that 
would be more of a porcelain'ish issue to teach git clone to fetch 
submodules too (with some ".gitmodules" file that contains the rules for 
that!)

But no, I didn't do any of that. I literally did just the "tree object 
format change" to support the *notion* of gitlinks - not all the pieces to 
then actually *implement* the notion are done by a long shot.

I think everybody agrees that we need some kind of subproject support, and 
the KDE repository certainly shows that subprojects need to be truly 
independent (because if they aren't, you end up with all the scaling 
issues that we see now - including something as simple as just "fsck" 
taking way way too long unless you have 4GB of RAM or more), and this sets 
the basic rules for that.

But they really are pretty low-level rules. For example, to go back to the 
KDE thing: we'd also need to teach *importers* to import certain 
subdirectories as submodules (or have a git->git translator that turns a 
subdirectory into a separate submodule).

So those are examples of things that obviously need to be done, and that 
my patches do not address in *any* way. They are really low-level plumbing 
support, kind of like the old original days when you had to run 

	git-update-index ...
	tree=$(git-write-tree)
	commit=$(git-commit-tree -p $parent $tree <$msgfile)

by hand. A few monts later it was "git commit -a", but it started out 
with just fairly low-level plumbing..

		Linus

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 3/6] Add 'resolve_gitlink_ref()' helper function
  2007-04-10 14:58     ` Linus Torvalds
@ 2007-04-10 15:35       ` Alex Riesen
  2007-04-10 15:52         ` Linus Torvalds
  2007-04-10 15:54       ` Josef Weidendorfer
  1 sibling, 1 reply; 101+ messages in thread
From: Alex Riesen @ 2007-04-10 15:35 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano

On 4/10/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >
> > Can't a subproject be bare?
>
> Not when it is checked out, no. That's what "checked out" means ;)
>
> If a subproject is bare, it never gets resolved, because it's never
> checked out in a superproject.
>
> So a subproject *can* be bare, but when it's bare it is just a totally
> regular independent git project, simply by *definition* of not being
> checked out inside a superproject.
>
> But hey, that was just a design decision of mine, and if people can argue
> for it being wrong, I don't think I'm married to it ;)

I didn't actually had a use case in mind as I asked it.
After a bit of thinking I could imagine a repo which is
used for integration exclusively (no compilation or looking
at the files at all).

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 0/6] Initial subproject support (RFC?)
  2007-04-10 15:13     ` Linus Torvalds
@ 2007-04-10 15:48       ` Alex Riesen
  2007-04-10 16:07         ` Linus Torvalds
  0 siblings, 1 reply; 101+ messages in thread
From: Alex Riesen @ 2007-04-10 15:48 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano

On 4/10/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> So I'd like this to be merged somewhere, not because "it works" or "it's
> complete", but because it's in a shape where I think a lot of people can
> start fixing small details.

It is already "merged somewhere": as soon as the patches left landed
on vger, it is not possible to loose (and even destroy) them.
The feature is just too much sought after.

> For example, with just two smallish updates:
>  - teach "git upload-pack" not to try to follow gitlinks
>  - teach "git read-tree" to check out a git-link as just an empty
>    subdirectory

which also should fix switching between the branches with subprojects.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 3/6] Add 'resolve_gitlink_ref()' helper function
  2007-04-10 15:35       ` Alex Riesen
@ 2007-04-10 15:52         ` Linus Torvalds
  2007-04-10 15:57           ` Alex Riesen
  0 siblings, 1 reply; 101+ messages in thread
From: Linus Torvalds @ 2007-04-10 15:52 UTC (permalink / raw
  To: Alex Riesen; +Cc: Git Mailing List, Junio C Hamano



On Tue, 10 Apr 2007, Alex Riesen wrote:
> 
> After a bit of thinking I could imagine a repo which is
> used for integration exclusively (no compilation or looking
> at the files at all).

Well, you also cannot *commit* to a bare repository, so it's a bit 
pointless for integration reasons. You'd still have to commit all changes 
somewhere else.

That said, it's definitely designed so that if you want to automate 
tracking other peoples bare repositories, you can do so: you'd just have 
to *really* script it with something like

	git update-index --cacheinfo 0160000 <sha1> <dirname>

(which is how you could create those commits to a bare repo too, so it's 
not like this is really even any different)

		Linus

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 3/6] Add 'resolve_gitlink_ref()' helper function
  2007-04-10 14:58     ` Linus Torvalds
  2007-04-10 15:35       ` Alex Riesen
@ 2007-04-10 15:54       ` Josef Weidendorfer
  1 sibling, 0 replies; 101+ messages in thread
From: Josef Weidendorfer @ 2007-04-10 15:54 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Alex Riesen, Git Mailing List, Junio C Hamano

On Tuesday 10 April 2007, Linus Torvalds wrote:
> 
> On Tue, 10 Apr 2007, Alex Riesen wrote:
> >
> > On 4/10/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > > +int resolve_gitlink_ref(const char *path, const char *refname, unsigned
> > > char *result)
> > > +{
> > > +       int len = strlen(path), retval;
> > > +       char *gitdir;
> > > +
> > > +       while (len && path[len-1] == '/')
> > > +               len--;
> > > +       if (!len)
> > > +               return -1;
> > > +       gitdir = xmalloc(len + MAXREFLEN + 8);
> > > +       memcpy(gitdir, path, len);
> > > +       memcpy(gitdir + len, "/.git/", 7);
> > 
> > Can't a subproject be bare?
> 
> Not when it is checked out, no. That's what "checked out" means ;)
> 
> If a subproject is bare, it never gets resolved, because it's never 
> checked out in a superproject.
> 
> So a subproject *can* be bare, but when it's bare it is just a totally 
> regular independent git project, simply by *definition* of not being 
> checked out inside a superproject.
> 
> But hey, that was just a design decision of mine, and if people can argue 
> for it being wrong, I don't think I'm married to it ;)

It would be nice if a redirection via a "gitdir = ..." line
in .git/link of the subproject (when existing) would be possible.
This was part of the light-weight checkout proposal.

In contrast to contrib/workdir/git-new-workdir, this would allow
for (to be implemented) magic symlinks to stay intact when
moving the submodule directory around.

However, this can be added later.

Josef

PS: I wonder how long it takes to move the official KDE repository over to git ;-)

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 3/6] Add 'resolve_gitlink_ref()' helper function
  2007-04-10 15:52         ` Linus Torvalds
@ 2007-04-10 15:57           ` Alex Riesen
  2007-04-10 16:16             ` Linus Torvalds
  0 siblings, 1 reply; 101+ messages in thread
From: Alex Riesen @ 2007-04-10 15:57 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano

On 4/10/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >
> > After a bit of thinking I could imagine a repo which is
> > used for integration exclusively (no compilation or looking
> > at the files at all).
>
> Well, you also cannot *commit* to a bare repository, so it's a bit
> pointless for integration reasons. You'd still have to commit all changes
> somewhere else.

Yes. Subprojects are push-only for storing and reference purposes.
Superproject can have integrated data checks in Makefiles.

> That said, it's definitely designed so that if you want to automate
> tracking other peoples bare repositories, you can do so: you'd just have
> to *really* script it with something like
>
>         git update-index --cacheinfo 0160000 <sha1> <dirname>
>
> (which is how you could create those commits to a bare repo too, so it's
> not like this is really even any different)

Nice :)

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 0/6] Initial subproject support (RFC?)
  2007-04-10 15:48       ` Alex Riesen
@ 2007-04-10 16:07         ` Linus Torvalds
  2007-04-10 16:43           ` Alex Riesen
  2007-04-10 19:32           ` Junio C Hamano
  0 siblings, 2 replies; 101+ messages in thread
From: Linus Torvalds @ 2007-04-10 16:07 UTC (permalink / raw
  To: Alex Riesen; +Cc: Git Mailing List, Junio C Hamano



On Tue, 10 Apr 2007, Alex Riesen wrote:
> 
> It is already "merged somewhere": as soon as the patches left landed
> on vger, it is not possible to loose (and even destroy) them.
> The feature is just too much sought after.

Well, unless it hits something like Junios 'pu' (or 'next') branch, or 
somebody (like you?) ends up maintaining a repo with this, it's just 
unnecessarily hard to have lots of people working together on it..

I'm obviously interested in working on it, but at the same time, I don't 
expect to be a primary *user* of it, so I'm hoping others will come in and 
start looking at it.

It looks promising that you're getting involved, but I suspect you may be 
a bit too optimistic when you say "just too much sought after". We've been 
*talking* about subprojects for a long long time, and we've had other 
patches fail. So...

> > For example, with just two smallish updates:
> >  - teach "git upload-pack" not to try to follow gitlinks
> >  - teach "git read-tree" to check out a git-link as just an empty
> >    subdirectory
> 
> which also should fix switching between the branches with subprojects.

Yes. It would require either git-read-tree or the git-checkout script 
around it knowing to then also check out the subproject branches.

It's actually not *entirely* obvious what you should do when you switch 
branches (or even just do a "git reset --hard") in the superproject. The 
branches in the subprojects are likely to be totally different from the 
superproject, so as far as I can see, you end up having two choices when 
you reset a subproject:

 - either basically create a "disconnected HEAD" in the subproject(s) when 
   you switch them around as a consequence of resetting/switching the 
   branch in the superproject.

 - or you'd stay on the same branch in the subproject, and just reset that 
   branch..

 - or you describe the branch name in the ".gitmodules" file in the
   superproject, and use whatever branch in the submodule that is 
   described in the supermodule that you reset/check-out.

 - or possibly other policies.

So there is bound to be various "policy" issues like this worth sorting 
out. I don't think they matter that deeply.

I would _personally_ tend to like the notion of using ".gitmodules" in the 
supermodule to describe things like this, exactly because it's a policy 
decision - not something that git itself should really decide about, but 
that the supermodule maintainers can just decide to agree on.

But I haven't really even thought about all the things I'd want to have in 
the .gitmodules. We'd obviously need to list the default URL's for the 
submodules some way etc, but I haven't really sat down and thought about 
what all the higher-level porcelain really would need to know.

I suspect that somebody who has used and set up CVS "modules" setups 
should be thinking about that. I've been a "stupid user" for CVS modules 
setups, but I've never actually needed to really know how they *work*.

		Linus

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 3/6] Add 'resolve_gitlink_ref()' helper function
  2007-04-10 15:57           ` Alex Riesen
@ 2007-04-10 16:16             ` Linus Torvalds
  0 siblings, 0 replies; 101+ messages in thread
From: Linus Torvalds @ 2007-04-10 16:16 UTC (permalink / raw
  To: Alex Riesen; +Cc: Git Mailing List, Junio C Hamano



On Tue, 10 Apr 2007, Alex Riesen wrote:
>
> On 4/10/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > That said, it's definitely designed so that if you want to automate
> > tracking other peoples bare repositories, you can do so: you'd just have
> > to *really* script it with something like
> > 
> >         git update-index --cacheinfo 0160000 <sha1> <dirname>
> > 
> > (which is how you could create those commits to a bare repo too, so it's
> > not like this is really even any different)
> 
> Nice :)

Well, the *really* nice thing about doing it like this is that you can 
actually update subprojects without even having them even be *local* to 
where you do the superproject.

IOW, you could literally build up the superproject by saying that you want 
to track "all git projects I care about" somewhere else, and do a series 
of automated

	git ls-remote sub-project-xyzzy tracking-branch-xyzzy | ...

and basically create the "superproject" without ever actually downloading 
or populating the subprojects at all.

Then, if everything is set up correctly, you can basically use the 
superproject as an "auto-mirror" - whenever you want to get all the 
projects you care about, you just clone that superproject, and (once 
you've taught "git clone" to fetch the subprojects, of course ;^) you'd 
basically fetch them all from their appropriate locations - without ever 
having the actual superproject have to even *really* care about it.

So basically, a superproject could be used as just a "gathering point", 
without having to actually *contain* any of the subprojects. The actual 
sources for subprojects may be on totally different servers. That's what 
real distribution is all about.

		Linus

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-10  4:20 ` [PATCH 6/6] Teach core object handling functions about gitlinks Linus Torvalds
  2007-04-10  8:40   ` Frank Lichtenheld
@ 2007-04-10 16:28   ` Josef Weidendorfer
  2007-04-10 16:50     ` Alex Riesen
                       ` (2 more replies)
  2007-04-11  8:06   ` Martin Waitz
  2 siblings, 3 replies; 101+ messages in thread
From: Josef Weidendorfer @ 2007-04-10 16:28 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano

On Tuesday 10 April 2007, Linus Torvalds wrote:
> ...
> +	if (resolve_gitlink_ref(ce->name, "HEAD", sha1) < 0)
> +		return 0;
> +	return hashcmp(sha1, ce->sha1);

So this does mean that the SHA1 of a gitlink entry corresponds
to the commit in the subproject?

I wonder if it is not useful to be able to add some attribute(s)
to a gitlink, i.e. first reference a gitlink object in the superproject,
which then references the submodule commit, and also holds some
further attributes. These attributes can not be put into the subproject,
as it should be independent.

An example for such an attribute would be a subproject name/ID.
An argument for this: The user should be able to specify some policies
for submodules, like "do not clone/checkout this submodule". But the
path where the submodule resides in a given commit is not useful here,
as a submodule can reside at different paths in the history of the
supermodule.

Josef

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 0/6] Initial subproject support (RFC?)
  2007-04-10 16:07         ` Linus Torvalds
@ 2007-04-10 16:43           ` Alex Riesen
  2007-04-10 19:32           ` Junio C Hamano
  1 sibling, 0 replies; 101+ messages in thread
From: Alex Riesen @ 2007-04-10 16:43 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano

On 4/10/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >
> > It is already "merged somewhere": as soon as the patches left landed
> > on vger, it is not possible to loose (and even destroy) them.
> > The feature is just too much sought after.
>
> Well, unless it hits something like Junios 'pu' (or 'next') branch, or
> somebody (like you?) ends up maintaining a repo with this, it's just
> unnecessarily hard to have lots of people working together on it..
>
> I'm obviously interested in working on it, but at the same time, I don't
> expect to be a primary *user* of it, so I'm hoping others will come in and
> start looking at it.
>
> It looks promising that you're getting involved, but I suspect you may be
> a bit too optimistic when you say "just too much sought after". We've been
> *talking* about subprojects for a long long time, and we've had other
> patches fail. So...

The people who need the feature are still using other VCS.
Some do not even know about git, the others are more interested
in their own projects than in hacking on git (like KDE or Ubuntu
people). And then there are commercial projects with thirdparty
libraries, components or data. The other VCS' provide the feature,
even if they do it wrong and badly (I never could go back in time in my
day-work project, always asked myself what was the point of using
Perforce at all).
So, I suspect it is the people who are unable or unwilling
to contribute to git (to anything, really) who need the feature most.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-10 16:28   ` Josef Weidendorfer
@ 2007-04-10 16:50     ` Alex Riesen
  2007-04-10 17:23       ` Josef Weidendorfer
  2007-04-10 18:45     ` Linus Torvalds
  2007-04-11 23:36     ` Sam Vilain
  2 siblings, 1 reply; 101+ messages in thread
From: Alex Riesen @ 2007-04-10 16:50 UTC (permalink / raw
  To: Josef Weidendorfer; +Cc: Linus Torvalds, Git Mailing List, Junio C Hamano

On 4/10/07, Josef Weidendorfer <Josef.Weidendorfer@gmx.de> wrote:
> On Tuesday 10 April 2007, Linus Torvalds wrote:
> > ...
> > +     if (resolve_gitlink_ref(ce->name, "HEAD", sha1) < 0)
> > +             return 0;
> > +     return hashcmp(sha1, ce->sha1);
>
> So this does mean that the SHA1 of a gitlink entry corresponds
> to the commit in the subproject?

Right.

> I wonder if it is not useful to be able to add some attribute(s)
> to a gitlink, i.e. first reference a gitlink object in the superproject,
> which then references the submodule commit, and also holds some
> further attributes. These attributes can not be put into the subproject,
> as it should be independent.

These attributes can be put into a file in superproject tree and
checked in at the same as the gitlink. No real need for introducing
another object type (right now there is no gitlink object type, just
an entry in tree with special mode).

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-10 16:50     ` Alex Riesen
@ 2007-04-10 17:23       ` Josef Weidendorfer
  0 siblings, 0 replies; 101+ messages in thread
From: Josef Weidendorfer @ 2007-04-10 17:23 UTC (permalink / raw
  To: Alex Riesen; +Cc: Linus Torvalds, Git Mailing List, Junio C Hamano

On Tuesday 10 April 2007, Alex Riesen wrote:
> On 4/10/07, Josef Weidendorfer <Josef.Weidendorfer@gmx.de> wrote:
> > On Tuesday 10 April 2007, Linus Torvalds wrote:
> > > ...
> > > +     if (resolve_gitlink_ref(ce->name, "HEAD", sha1) < 0)
> > > +             return 0;
> > > +     return hashcmp(sha1, ce->sha1);
> >
> > So this does mean that the SHA1 of a gitlink entry corresponds
> > to the commit in the subproject?
> 
> Right.
> 
> > I wonder if it is not useful to be able to add some attribute(s)
> > to a gitlink, i.e. first reference a gitlink object in the superproject,
> > which then references the submodule commit, and also holds some
> > further attributes. These attributes can not be put into the subproject,
> > as it should be independent.
> 
> These attributes can be put into a file in superproject tree and
> checked in at the same as the gitlink. No real need for introducing
> another object type (right now there is no gitlink object type, just
> an entry in tree with special mode).

Like... .gitattributes ? ;-)
Ok, this could work; however, there of course is the possibility of
inconsistencies when e.g. manually moving subprojects around.

How is consistency ensured for .gitattributes ?
I see that for .gitignore consistency, the user is responsible.

Josef

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-10 16:28   ` Josef Weidendorfer
  2007-04-10 16:50     ` Alex Riesen
@ 2007-04-10 18:45     ` Linus Torvalds
  2007-04-10 19:04       ` Andy Parkins
                         ` (2 more replies)
  2007-04-11 23:36     ` Sam Vilain
  2 siblings, 3 replies; 101+ messages in thread
From: Linus Torvalds @ 2007-04-10 18:45 UTC (permalink / raw
  To: Josef Weidendorfer; +Cc: Git Mailing List, Junio C Hamano



On Tue, 10 Apr 2007, Josef Weidendorfer wrote:

> On Tuesday 10 April 2007, Linus Torvalds wrote:
> > ...
> > +	if (resolve_gitlink_ref(ce->name, "HEAD", sha1) < 0)
> > +		return 0;
> > +	return hashcmp(sha1, ce->sha1);
> 
> So this does mean that the SHA1 of a gitlink entry corresponds
> to the commit in the subproject?

Yes.

> I wonder if it is not useful to be able to add some attribute(s)
> to a gitlink, i.e. first reference a gitlink object in the superproject,
> which then references the submodule commit, and also holds some
> further attributes. These attributes can not be put into the subproject,
> as it should be independent.

The special "link" object has come up before, and I actually thought I'd 
do it that way first, but there were a few reasons why I didn't:

 - I tend to like "minimal", and the patches I sent out really are pretty 
   minimal, in the sense that they introduce just _one_ new concept, in 
   one place (it's basically a "tree entry" - so it shows up in tree 
   reading and writing, and nowhere else. The index, of course, is the 
   staging area for trees, so the index was also affected, but that was 
   really a very direct result of that "it's a new tree entry" thing).

 - in a "link" object, the only thing that would normally *change* is 
   really just the commit SHA1. Everything else is really pretty static. 
   As such, I decided that it's just a waste of a perfectly fine object to 
   have several thousands of the "link" objects that really only differ in 
   the pointer to the commit.

 - the "static" part, which you might as well have somewhere else, tends 
   to be stuff that you would need to be able to override locally, and as 
   such it does *not* really have a global meaning that is useful 
   historically.

   For example, the things that you'd want to associate with the gitlink 
   are things like "where would I find the repository that the commit is 
   part of" and "what is a description of that submodule" and "what are 
   the relationships between the submodules". These are things that aren't 
   necessarily even totally independent: in CVS, for example, you have 
   module names that are really not submodules themselves, but are really 
   just aliases for *collections* of submodules.

   So a 1:1 link object simply wouldn't make much sense anyway, and you'd 
   want to override those defaults with site-specific ones (maybe there is 
   a "canonical" address for the submodule repository, but if you have a 
   copy of it locally on-site, when you clone, you'd rather use the 
   *local* copy over the standard site, for example).

So all of this just made me say:
 - the tree entry just contains the commit ID of the subproject, and 
   *nothing* else.
 - any incidental data probably isn't 1:1 with tree entries anyway (both 
   over time: you have tree entries being updated with new commit ID's, 
   but the incidental data does *not* change, and over "space": different 
   repositories might want to use their local preferences for incidental 
   rules)
 - which all implies that the extra information should go in a separate 
   file that actually describes the modules.

In fact, it shouldn't be _one_ separate file: it should be at least two, 
since you'd want to have the *defaults* (which get cloned along with the 
superproject) in a revision-controlled file, and then have local *extra* 
information that is local. 

This is exactly the same as the situation with the ".gitignore" file 
(which is revision-controlled and cloned with the respository) and the 
".git/ignore" file (which is repository-local).

I've been thinking either ".gitmodules" (and ".git/modules") or to just 
extend the ".git/config" file parser to *also* parse a version-controlled 
".gitconfig" file, and just describe the modules there. The config file 
really has pretty nice syntax, and I think module descriptions in many 
ways end up similar to remote branch descriptions, so it would fit in 
there, I think.

(But there's nothing that says that the ".gitmodules" file couldn't just 
use the same parser as the git config file, so I don't really strongly 
care either way. I just think it would be nice to be able to say

	[module "kdelibs"]
		dir = kdelibs
		url = git://git.kde.org/kdelibs
		description = "Basic KDE libraries module"

	[module "base"]
		alias = "kdelibs", "kdebase", "kdenetwork"

or whatever. You get the idea..)

		Linus

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-10 18:45     ` Linus Torvalds
@ 2007-04-10 19:04       ` Andy Parkins
  2007-04-10 19:20         ` Linus Torvalds
                           ` (2 more replies)
  2007-04-10 19:29       ` Josef Weidendorfer
  2007-04-12  0:42       ` Torgil Svensson
  2 siblings, 3 replies; 101+ messages in thread
From: Andy Parkins @ 2007-04-10 19:04 UTC (permalink / raw
  To: git; +Cc: Linus Torvalds, Josef Weidendorfer, Junio C Hamano

On Tuesday 2007, April 10, Linus Torvalds wrote:

> (But there's nothing that says that the ".gitmodules" file couldn't
> just use the same parser as the git config file, so I don't really
> strongly care either way. I just think it would be nice to be able to
> say
>
> 	[module "kdelibs"]
> 		dir = kdelibs
> 		url = git://git.kde.org/kdelibs
> 		description = "Basic KDE libraries module"
>
> 	[module "base"]
> 		alias = "kdelibs", "kdebase", "kdenetwork"
>
> or whatever. You get the idea..)

Would it be nicer if .gitmodules were line-based to aid in merging?


Andy
-- 
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-10 19:04       ` Andy Parkins
@ 2007-04-10 19:20         ` Linus Torvalds
  2007-04-10 20:19           ` Junio C Hamano
  2007-04-10 19:41         ` David Lang
  2007-04-10 20:06         ` Junio C Hamano
  2 siblings, 1 reply; 101+ messages in thread
From: Linus Torvalds @ 2007-04-10 19:20 UTC (permalink / raw
  To: Andy Parkins; +Cc: git, Josef Weidendorfer, Junio C Hamano



On Tue, 10 Apr 2007, Andy Parkins wrote:
> 
> Would it be nicer if .gitmodules were line-based to aid in merging?

I seriously doubt you'll ever be merging or changing this a lot. So I 
don't think it's a huge concern.

		Linus

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-10 18:45     ` Linus Torvalds
  2007-04-10 19:04       ` Andy Parkins
@ 2007-04-10 19:29       ` Josef Weidendorfer
  2007-04-10 19:45         ` Linus Torvalds
  2007-04-12  0:42       ` Torgil Svensson
  2 siblings, 1 reply; 101+ messages in thread
From: Josef Weidendorfer @ 2007-04-10 19:29 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano

On Tuesday 10 April 2007, Linus Torvalds wrote:
> 	[module "kdelibs"]
> 		dir = kdelibs
> 		url = git://git.kde.org/kdelibs
> 		description = "Basic KDE libraries module"
> 
> 	[module "base"]
> 		alias = "kdelibs", "kdebase", "kdenetwork"

So when moving the kdelibs submodule around, you would
have to update the .gitmodules file.

I like it.

Josef

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 0/6] Initial subproject support (RFC?)
  2007-04-10 16:07         ` Linus Torvalds
  2007-04-10 16:43           ` Alex Riesen
@ 2007-04-10 19:32           ` Junio C Hamano
  2007-04-10 20:11             ` Linus Torvalds
  1 sibling, 1 reply; 101+ messages in thread
From: Junio C Hamano @ 2007-04-10 19:32 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Alex Riesen, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, 10 Apr 2007, Alex Riesen wrote:
>> 
>> It is already "merged somewhere": as soon as the patches left landed
>> on vger, it is not possible to loose (and even destroy) them.
>> The feature is just too much sought after.
>
> Well, unless it hits something like Junios 'pu' (or 'next') branch, or 
> somebody (like you?) ends up maintaining a repo with this, it's just 
> unnecessarily hard to have lots of people working together on it..

Well, I was planning to apply this directly on 'master' after
giving them another pass.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-10 19:04       ` Andy Parkins
  2007-04-10 19:20         ` Linus Torvalds
@ 2007-04-10 19:41         ` David Lang
  2007-04-10 20:06         ` Junio C Hamano
  2 siblings, 0 replies; 101+ messages in thread
From: David Lang @ 2007-04-10 19:41 UTC (permalink / raw
  To: Andy Parkins; +Cc: git, Linus Torvalds, Josef Weidendorfer, Junio C Hamano

On Tue, 10 Apr 2007, Andy Parkins wrote:

> On Tuesday 2007, April 10, Linus Torvalds wrote:
>
>> (But there's nothing that says that the ".gitmodules" file couldn't
>> just use the same parser as the git config file, so I don't really
>> strongly care either way. I just think it would be nice to be able to
>> say
>>
>> 	[module "kdelibs"]
>> 		dir = kdelibs
>> 		url = git://git.kde.org/kdelibs
>> 		description = "Basic KDE libraries module"
>>
>> 	[module "base"]
>> 		alias = "kdelibs", "kdebase", "kdenetwork"
>>
>> or whatever. You get the idea..)
>
> Would it be nicer if .gitmodules were line-based to aid in merging?

this is very similar to the problem I asked about with merging config files a 
couple weeks ago. the answer then was that when we get .gitattributes we should 
be able to specify content specific merge programs that could deal with this 
sort of thing on a per-file basis. That sounds like the answer to your concern 
as well, rather then makeing things order dependant and otherwise harder to read 
to make it able to be merged with the current tools (which assume line-based 
order-dependant content)

David Lang

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-10 19:29       ` Josef Weidendorfer
@ 2007-04-10 19:45         ` Linus Torvalds
  2007-04-11 23:47           ` Sam Vilain
  0 siblings, 1 reply; 101+ messages in thread
From: Linus Torvalds @ 2007-04-10 19:45 UTC (permalink / raw
  To: Josef Weidendorfer; +Cc: Git Mailing List, Junio C Hamano



On Tue, 10 Apr 2007, Josef Weidendorfer wrote:
> 
> So when moving the kdelibs submodule around, you would
> have to update the .gitmodules file.

Right. The assumption here is:
 - submodules almost never actually change. You might add a new one 
   occasionally, and once a decade you might do some bigger 
   re-organization, but in general it's pretty much static.
 - when you do move submodules around, it's probably a big flag-day anyway 
   (ie I would expect that it's a big reorg, and that you'd quite likely 
   expect developers to have to re-check out their tree if you did major 
   surgery).

That's certainly how it works under CVS. I bet we can make it much nicer 
than CVS, but the point is, people really don't expect submodules to be 
something that you move around very dynamically. You want to be *able* to 
move them around, but it's not a normal operation.

> I like it.

The advantage with splitting things out like this is that it allows you 
much more flexibility than something automatic and deeply integrated does. 

You can still edit the modules setup even if you yourself might not even 
have that particular module checked out! That may sound insane, but it's 
actually *required* for things like "oh, the standard server for that 
module went away, I need to edit the module settings to get it from xyz 
instead".

		Linus

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-10 19:04       ` Andy Parkins
  2007-04-10 19:20         ` Linus Torvalds
  2007-04-10 19:41         ` David Lang
@ 2007-04-10 20:06         ` Junio C Hamano
  2 siblings, 0 replies; 101+ messages in thread
From: Junio C Hamano @ 2007-04-10 20:06 UTC (permalink / raw
  To: Andy Parkins; +Cc: git, Linus Torvalds, Josef Weidendorfer

Andy Parkins <andyparkins@gmail.com> writes:

> On Tuesday 2007, April 10, Linus Torvalds wrote:
>
>> (But there's nothing that says that the ".gitmodules" file couldn't
>> just use the same parser as the git config file, so I don't really
>> strongly care either way. I just think it would be nice to be able to
>> say
>>
>> 	[module "kdelibs"]
>> 		dir = kdelibs
>> 		url = git://git.kde.org/kdelibs
>> 		description = "Basic KDE libraries module"
>>
>> 	[module "base"]
>> 		alias = "kdelibs", "kdebase", "kdenetwork"
>>
>> or whatever. You get the idea..)
>
> Would it be nicer if .gitmodules were line-based to aid in merging?

I personally feel that if there are cases that merge conflict is
hard to resolve, there is something wrong in the communication
between project members.  In other words, merging this *should*
be hard.

Really, if somebody wants to have project X at directory sub/X/
and somebody else wants the same at directory X/, merging the
modules file would be the least of your concern -- resulting
toplevel would not build correctly until you decide which tree
hierarchy should be picked, and later exchange of results among
project members would not be usable easily to half the people
who picked the hierarchy differently from you did.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 0/6] Initial subproject support (RFC?)
  2007-04-10 19:32           ` Junio C Hamano
@ 2007-04-10 20:11             ` Linus Torvalds
  2007-04-10 20:52               ` Junio C Hamano
  0 siblings, 1 reply; 101+ messages in thread
From: Linus Torvalds @ 2007-04-10 20:11 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Alex Riesen, Git Mailing List



On Tue, 10 Apr 2007, Junio C Hamano wrote:
> 
> Well, I was planning to apply this directly on 'master' after
> giving them another pass.

Goodie. I gave them another pass myself, and noticed a small leak and a 
stupid copy-paste problem, fixed thus..

		Linus

---
diff --git a/read-cache.c b/read-cache.c
index 8fe94cd..f458f50 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -279,7 +279,7 @@ int base_name_compare(const char *name1, int len1, int mode1,
 	c2 = name2[len];
 	if (!c1 && (S_ISDIR(mode1) || S_ISDIRLNK(mode1)))
 		c1 = '/';
-	if (!c2 && (S_ISDIR(mode2) || S_ISDIRLNK(mode1)))
+	if (!c2 && (S_ISDIR(mode2) || S_ISDIRLNK(mode2)))
 		c2 = '/';
 	return (c1 < c2) ? -1 : (c1 > c2) ? 1 : 0;
 }
diff --git a/refs.c b/refs.c
index 229da74..11a67a8 100644
--- a/refs.c
+++ b/refs.c
@@ -229,6 +229,7 @@ static int resolve_gitlink_packed_ref(char *name, int pathlen, const char *refna
 	if (!f)
 		return -1;
 	read_packed_refs(f, &refs);
+	fclose(f);
 	ref = refs.packed;
 	retval = -1;
 	while (ref) {

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-10 19:20         ` Linus Torvalds
@ 2007-04-10 20:19           ` Junio C Hamano
  2007-04-10 20:33             ` Linus Torvalds
  0 siblings, 1 reply; 101+ messages in thread
From: Junio C Hamano @ 2007-04-10 20:19 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Andy Parkins, git, Josef Weidendorfer

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, 10 Apr 2007, Andy Parkins wrote:
>> 
>> Would it be nicer if .gitmodules were line-based to aid in merging?
>
> I seriously doubt you'll ever be merging or changing this a lot. So I 
> don't think it's a huge concern.

I think Andy's comment comes from our earlier discussion on the
other in-tree configuration, .gitattributes file.

We were talking about using in-tree .gitattributes for deciding
if we apply crlf to each paths and other things like which 3-way
file-level merge backend to apply, and need to make the system
gracefully degrade even when in-tree .gitattributes have
conflict markers during a merge.  And for that purpose, it is
certainly easier to arrange "pick each line, while ignoring <<<
or === or >>>, and if there are conflicting duplicates do
something sensible about them", if the file is line oriented.

But I do not think the .gitmodules thing needs that.  If we have
conflicting (or non-conflicting for that matter) submodule
moves, that's a _MAJOR_ project re-organization, and I do not
think we would even want to automatically descend into
submodules for merging or checking-out when we have such a
situation in the higher level project.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-10 20:19           ` Junio C Hamano
@ 2007-04-10 20:33             ` Linus Torvalds
  2007-04-12  0:12               ` Sam Vilain
  0 siblings, 1 reply; 101+ messages in thread
From: Linus Torvalds @ 2007-04-10 20:33 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Andy Parkins, git, Josef Weidendorfer



On Tue, 10 Apr 2007, Junio C Hamano wrote:
> 
> But I do not think the .gitmodules thing needs that.  If we have
> conflicting (or non-conflicting for that matter) submodule
> moves, that's a _MAJOR_ project re-organization, and I do not
> think we would even want to automatically descend into
> submodules for merging or checking-out when we have such a
> situation in the higher level project.

100% agreed. 

Also, note that while the ".gitmodules" (or whatever) file will be 
required to do things like "git pull", the basic tree-level logic that I 
sent out obviously doesn't need/use .gitmodules at all.

So there's a very real issue where a repository with submodules still 
"works", even with a .gitmodules file that is totally scrogged and doesn't 
have the right information (yet), it's just that it may simply not be able 
to do all the operations because it cannot figure out where to pull 
missing subproject data from etc..

So there is no reason to believe that we need to magically and 
automatically resolve conflicts - if conflicts happen, functionality is 
reduced, but it's not reduced so much that you cannot use the tree and try 
to resolve them (which is important, btw, since often before you commit 
your fix for the conflicts you'd want to *test* that fix, so we definitely 
don't want these kinds of files to be so central that it gets hard to get 
normal work done without them).

It really boils down to the same design issue: the way I think submodules 
should work is that they are very loosely coupled with the supermodule. 
The fact that the ".gitmodules" file isn't *that* critical comes largely 
from that loose coupling.

		Linus

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 0/6] Initial subproject support (RFC?)
  2007-04-10 20:11             ` Linus Torvalds
@ 2007-04-10 20:52               ` Junio C Hamano
  2007-04-10 21:02                 ` Sam Ravnborg
                                   ` (2 more replies)
  0 siblings, 3 replies; 101+ messages in thread
From: Junio C Hamano @ 2007-04-10 20:52 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Alex Riesen, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, 10 Apr 2007, Junio C Hamano wrote:
>> 
>> Well, I was planning to apply this directly on 'master' after
>> giving them another pass.
>
> Goodie. I gave them another pass myself, and noticed a small leak and a 
> stupid copy-paste problem, fixed thus..

Yeah, I noticed the first one but not the second.  Thanks.

> diff --git a/read-cache.c b/read-cache.c
> index 8fe94cd..f458f50 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -279,7 +279,7 @@ int base_name_compare(const char *name1, int len1, int mode1,
>  	c2 = name2[len];
>  	if (!c1 && (S_ISDIR(mode1) || S_ISDIRLNK(mode1)))
>  		c1 = '/';
> -	if (!c2 && (S_ISDIR(mode2) || S_ISDIRLNK(mode1)))
> +	if (!c2 && (S_ISDIR(mode2) || S_ISDIRLNK(mode2)))
>  		c2 = '/';
>  	return (c1 < c2) ? -1 : (c1 > c2) ? 1 : 0;
>  }
> diff --git a/refs.c b/refs.c
> index 229da74..11a67a8 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -229,6 +229,7 @@ static int resolve_gitlink_packed_ref(char *name, int pathlen, const char *refna
>  	if (!f)
>  		return -1;
>  	read_packed_refs(f, &refs);
> +	fclose(f);
>  	ref = refs.packed;
>  	retval = -1;
>  	while (ref) {

By the way,...

People occasionally ask "how would I make a small fix to a
commit that is buried in the history", so let me take a moment
to give them a recipe.

Let's say while reviewing the code after applying all of the
6-series, you noticed the above thinko.  First find out which
commit caused it:

$ git checkout lt/gitlink
$ git blame -L229,+7 master.. -- refs.c
b60108a1 (Linus Torvalds 2007-04-09 21:14:26 -0700 229) 	if (!f)
b60108a1 (Linus Torvalds 2007-04-09 21:14:26 -0700 230) 		re..
b60108a1 (Linus Torvalds 2007-04-09 21:14:26 -0700 231) 	read_packe..
b60108a1 (Linus Torvalds 2007-04-09 21:14:26 -0700 232) 	ref = refs..
b60108a1 (Linus Torvalds 2007-04-09 21:14:26 -0700 233) 	retval = -..
b60108a1 (Linus Torvalds 2007-04-09 21:14:26 -0700 234) 	while (ref..
b60108a1 (Linus Torvalds 2007-04-09 21:14:26 -0700 235) 		if..

The commit to fix is b60108a1 (this is what I have in my private
repo, and I'll be rebuilding the series with this example, so
you will never see this commit object name in the end result
I'll be pushing out).  So I detach the HEAD at that commit and
make a fix:

$ git checkout b60108a1
$ edit refs.c
$ git diff; # just to make sure
$ git commit -a --amend

At this point, the detached HEAD and the original branch look
like this:

$ git show-branch lt/gitlink HEAD
! [lt/gitlink] Teach core object handling functions about gitlinks
 * [HEAD] Add 'resolve_gitlink_ref()' helper function
--
 * [HEAD] Add 'resolve_gitlink_ref()' helper function
+  [lt/gitlink] Teach core object handling functions about gitlinks
+  [lt/gitlink^] Teach "fsck" not to follow subproject links
+  [lt/gitlink~2] Add "S_IFDIRLNK" file mode infrastructure for git links
+  [lt/gitlink~3] Add 'resolve_gitlink_ref()' helper function
+* [HEAD^] Avoid overflowing name buffer in deep directory structures

We fixed lt/gitlink~3 and the fixed-up commit is at HEAD.  We
want to rebase the rest of lt/gitlink on top of HEAD, like this:

$ git rebase HEAD lt/gitlink

This will take us back on lt/gitlink branch, set the tip of the
branch to the commit we just made with the fix-up, and the first
round will try to apply the change lt/gitlink~3 brings in on top
of our HEAD.  This _will_ fail, but that is to be expected, as
we intend to replace that with what we just amended.  Just reset
it away and keep going.

$ git reset --hard
$ git rebase --skip

Dealing with the other one in read-cache.c can be handled
similarly after this. Luckily blame finds out that it is the
last in the series (i.e. at the tip of lt/gitlink branch), so
usual "fix the topmost commit" procedure applies.

$ edit read-cache.c
$ git diff ;# checking...
$ git commit -a --amend

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 0/6] Initial subproject support (RFC?)
  2007-04-10 20:52               ` Junio C Hamano
@ 2007-04-10 21:02                 ` Sam Ravnborg
  2007-04-10 21:27                   ` Junio C Hamano
  2007-04-10 21:03                 ` Nicolas Pitre
  2007-04-11  8:08                 ` David Kågedal
  2 siblings, 1 reply; 101+ messages in thread
From: Sam Ravnborg @ 2007-04-10 21:02 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Linus Torvalds, Alex Riesen, Git Mailing List

On Tue, Apr 10, 2007 at 01:52:46PM -0700, Junio C Hamano wrote:
> 
> People occasionally ask "how would I make a small fix to a
> commit that is buried in the history", so let me take a moment
> to give them a recipe.

That recipe looks ummm complicated...
What I usually do is:

git format-patch HEAD~4..HEAD
git reset --hard HAED~4
patch -p1 < 0004*
...edit...
delete diff from 0004*
git diff >> 0004*
git reset --hard
git am 000*


Maybe this is as complicated as your example but this
is very simple to deal with.
And I do not destroy history or anything.

But that said I do not use topic brances but simply
clone my local repository as needed.
And I always deal with a linear history.


[I post this mostly to check if this is insane
and I need to understand the way you propose to do stuff]

	Sam

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 0/6] Initial subproject support (RFC?)
  2007-04-10 20:52               ` Junio C Hamano
  2007-04-10 21:02                 ` Sam Ravnborg
@ 2007-04-10 21:03                 ` Nicolas Pitre
  2007-04-15 23:21                   ` J. Bruce Fields
  2007-04-11  8:08                 ` David Kågedal
  2 siblings, 1 reply; 101+ messages in thread
From: Nicolas Pitre @ 2007-04-10 21:03 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Linus Torvalds, Alex Riesen, Git Mailing List

On Tue, 10 Apr 2007, Junio C Hamano wrote:

> By the way,...
> 
> People occasionally ask "how would I make a small fix to a
> commit that is buried in the history", so let me take a moment
> to give them a recipe.

This is definitively good Documentation/howto/ material.


Nicolas

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 0/6] Initial subproject support (RFC?)
  2007-04-10 21:02                 ` Sam Ravnborg
@ 2007-04-10 21:27                   ` Junio C Hamano
  0 siblings, 0 replies; 101+ messages in thread
From: Junio C Hamano @ 2007-04-10 21:27 UTC (permalink / raw
  To: Sam Ravnborg; +Cc: Linus Torvalds, Alex Riesen, Git Mailing List

Sam Ravnborg <sam@ravnborg.org> writes:

> On Tue, Apr 10, 2007 at 01:52:46PM -0700, Junio C Hamano wrote:
>> 
>> People occasionally ask "how would I make a small fix to a
>> commit that is buried in the history", so let me take a moment
>> to give them a recipe.
>
> That recipe looks ummm complicated...
> What I usually do is:
>
> git format-patch HEAD~4..HEAD
> git reset --hard HAED~4
> patch -p1 < 0004*
> ...edit...
> delete diff from 0004*
> git diff >> 0004*
> git reset --hard
> git am 000*
>
>
> Maybe this is as complicated as your example but this
> is very simple to deal with.
> And I do not destroy history or anything.
>
> But that said I do not use topic brances but simply
> clone my local repository as needed.
> And I always deal with a linear history.
>
>
> [I post this mostly to check if this is insane
> and I need to understand the way you propose to do stuff]

It's really the same.  You keep 000* file, I keep them in the
original branch and have "git rebase" take care of the details.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* [PATCH] Allow git-update-index work on subprojects
  2007-04-10 13:39   ` [PATCH] allow git-update-index work on subprojects Alex Riesen
@ 2007-04-10 23:19     ` Alex Riesen
  2007-04-11  2:55       ` Junio C Hamano
  0 siblings, 1 reply; 101+ messages in thread
From: Alex Riesen @ 2007-04-10 23:19 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano

Also, make "git commit -a" work with modifications of subproject HEADs.

---

This one works with update-index --remove (which is what git-commit -a
uses). It is ugly. I tried to keep the "F -> D/F" behaviour of
update-index. Still have to check if "F -> Subproject" works.

 builtin-update-index.c |   45 +++++++++++++++++++++++++--------------------
 1 files changed, 25 insertions(+), 20 deletions(-)

diff --git a/builtin-update-index.c b/builtin-update-index.c
index eba756d..d075d50 100644
--- a/builtin-update-index.c
+++ b/builtin-update-index.c
@@ -62,7 +62,7 @@ static int mark_valid(const char *path)
 
 static int process_file(const char *path)
 {
-	int size, namelen, option, status;
+	int size, namelen = -1, option, status;
 	struct cache_entry *ce;
 	struct stat st;
 
@@ -73,7 +73,7 @@ static int process_file(const char *path)
 	 */
 	cache_tree_invalidate_path(active_cache_tree, path);
 
-	if (status < 0 || S_ISDIR(st.st_mode)) {
+	if (!status && S_ISDIR(st.st_mode)) {
 		/* When we used to have "path" and now we want to add
 		 * "path/file", we need a way to remove "path" before
 		 * being able to add "path/file".  However,
@@ -82,27 +82,32 @@ static int process_file(const char *path)
 		 * friendly, especially since we can do the opposite
 		 * case just fine without --force-remove.
 		 */
-		if (status == 0 || (errno == ENOENT || errno == ENOTDIR)) {
-			if (allow_remove) {
-				if (remove_file_from_cache(path))
-					return error("%s: cannot remove from the index",
-					             path);
-				else
-					return 0;
-			} else if (status < 0) {
+		namelen = strlen(path);
+		int pos = cache_name_pos(path, namelen);
+		if (0 <= pos && S_ISREG(ntohl(active_cache[pos]->ce_mode)) &&
+		    allow_remove) {
+			if (remove_file_from_cache(path))
+				return error("%s: cannot remove from the index", path);
+			else
+				return 0;
+		}
+	}
+
+	if (status < 0) {
+		if (errno == ENOENT || errno == ENOTDIR) {
+			if (!allow_remove)
 				return error("%s: does not exist and --remove not passed",
-				             path);
-			}
+					     path);
+			if (remove_file_from_cache(path))
+				return error("%s: cannot remove from the index",
+					     path);
+			return 0;
 		}
-		if (0 == status)
-			return error("%s: is a directory - add files inside instead",
-			             path);
-		else
-			return error("lstat(\"%s\"): %s", path,
-				     strerror(errno));
+		return error("lstat(\"%s\"): %s", path, strerror(errno));
 	}
 
-	namelen = strlen(path);
+	if (namelen < 0)
+		namelen = strlen(path);
 	size = cache_entry_size(namelen);
 	ce = xcalloc(1, size);
 	memcpy(ce->name, path, namelen);
@@ -211,7 +216,7 @@ static void update_one(const char *path, const char *prefix, int prefix_length)
 		goto free_return;
 	}
 	if (process_file(p))
-		die("Unable to process file %s", path);
+		die("Unable to process \"%s\"", path);
 	report("add '%s'", path);
  free_return:
 	if (p < path || p > path + strlen(path))
-- 
1.5.1.135.g19a57-dirty

^ permalink raw reply related	[flat|nested] 101+ messages in thread

* Re: [PATCH] Allow git-update-index work on subprojects
  2007-04-10 23:19     ` [PATCH] Allow " Alex Riesen
@ 2007-04-11  2:55       ` Junio C Hamano
  0 siblings, 0 replies; 101+ messages in thread
From: Junio C Hamano @ 2007-04-11  2:55 UTC (permalink / raw
  To: Alex Riesen; +Cc: Linus Torvalds, Git Mailing List

Alex Riesen <raa.lkml@gmail.com> writes:

> Also, make "git commit -a" work with modifications of subproject HEADs.
>
> ---
>
> This one works with update-index --remove (which is what git-commit -a
> uses). It is ugly. I tried to keep the "F -> D/F" behaviour of
> update-index. Still have to check if "F -> Subproject" works.
>
>  builtin-update-index.c |   45 +++++++++++++++++++++++++--------------------
>  1 files changed, 25 insertions(+), 20 deletions(-)
>
> diff --git a/builtin-update-index.c b/builtin-update-index.c
> index eba756d..d075d50 100644
> --- a/builtin-update-index.c
> +++ b/builtin-update-index.c
> @@ -62,7 +62,7 @@ static int mark_valid(const char *path)
>  
>  static int process_file(const char *path)
>  {
> -	int size, namelen, option, status;
> +	int size, namelen = -1, option, status;
>  	struct cache_entry *ce;
>  	struct stat st;
>  
> @@ -73,7 +73,7 @@ static int process_file(const char *path)
>  	 */
>  	cache_tree_invalidate_path(active_cache_tree, path);
>  
> +	if (!status && S_ISDIR(st.st_mode)) {
>  		/* When we used to have "path" and now we want to add
>  		 * "path/file", we need a way to remove "path" before
>  		 * being able to add "path/file".  However,
> @@ -82,27 +82,32 @@ static int process_file(const char *path)
>  		 * friendly, especially since we can do the opposite
>  		 * case just fine without --force-remove.
>  		 */
> +		namelen = strlen(path);
> +		int pos = cache_name_pos(path, namelen);
> +		if (0 <= pos && S_ISREG(ntohl(active_cache[pos]->ce_mode)) &&
> +		    allow_remove) {
> +			if (remove_file_from_cache(path))
> +				return error("%s: cannot remove from the index", path);
> +			else
> +				return 0;
> +		}
> +	}

If I used to have a symlink S and now the filesystem has a file
S/F which I am running "update-index --add --remove" on, what
happens?

If I have a subproject at path P, and mistakenly try to add path
P/F with "update-index --add --remove P/F", it should be
refused, shouldn't it?

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-10  4:20 ` [PATCH 6/6] Teach core object handling functions about gitlinks Linus Torvalds
  2007-04-10  8:40   ` Frank Lichtenheld
  2007-04-10 16:28   ` Josef Weidendorfer
@ 2007-04-11  8:06   ` Martin Waitz
  2007-04-11  8:29     ` Alex Riesen
                       ` (2 more replies)
  2 siblings, 3 replies; 101+ messages in thread
From: Martin Waitz @ 2007-04-11  8:06 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 3458 bytes --]

hoi :)

thanks Linus for your nice implementation.  Your core code is so much
nicer than my hacked-up prototype :-).

I only had little time to actually have a look at it but the core is
very similiar to my approach and I'll try to rebase some of my code on
top of yours in the following days.

The only thing I disagree with you is in using HEAD of the submodule:

On Mon, Apr 09, 2007 at 09:20:29PM -0700, Linus Torvalds wrote:
> +static int ce_compare_gitlink(struct cache_entry *ce)
> +{
> +	unsigned char sha1[20];
> +
> +	/*
> +	 * We don't actually require that the .git directory
> +	 * under DIRLNK directory be a valid git directory. It
> +	 * might even be missing (in case nobody populated that
> +	 * sub-project).
> +	 *
> +	 * If so, we consider it always to match.
> +	 */
> +	if (resolve_gitlink_ref(ce->name, "HEAD", sha1) < 0)
> +		return 0;
> +	return hashcmp(sha1, ce->sha1);
> +}


> @@ -2332,6 +2333,8 @@ int index_path(unsigned char *sha1, const char *path, struct stat *st, int write
>  				     path);
>  		free(target);
>  		break;
> +	case S_IFDIR:
> +		return resolve_gitlink_ref(path, "HEAD", sha1);
>  	default:
>  		return error("%s: unsupported file type", path);
>  	}

Always using HEAD of the submodule makes branches in the submodule
useless.

Whenever you do a checkout in the supermodule you also have to update
the submodule and this update has to change the same thing which is read
above.
Updating the branch which HEAD points to is dangerous.  You could
overwrite some unrelated branch just because the user forgot to switch
back to his supermodule-tracking-branch.  The user would always have to
make sure that all the submodules are in the correct state for an update
of the supermodule.
Updating HEAD directly is possible now and may make some sense, but you
still get problems when you want to switch to some temporary branch in
the submodule.  You have no chance to get back to the original supermodule
version and now your temporary submodule branch gets shown as the new
submodule version which should be part of the supermodule.
The submodule version which is stored in the supermodules tree is kind
of a hidden/remote reference/branch.  When working on a remote branch
we first create a local working branch and then sync it with the remote
one.  I think that it makes sense to use the same model for submodules:
have one local branch in the submodule which is used for all work that
is done in the supermodule context.

So my advice is:
Always read and write one dedicated branch (hardcoded "master" or
configurable) when the supermodule wants to access a submodule.

Then you have two type of branches:
You can branch the supermodule and have you own branch of the entire
project with all submodules.  Use this if you want to commit your
work on the submodule into the supermodule.
You can also branch the submodule to effectively disconnect the
submodule from the supermodule temporarily.  You can use this to
do some experimental/debugging stuff which should not yet go into
the supermodule.  Once you want this branch to show up in the
supermodule, just merge it to "master" and commit it to the supermodule
(and now its in the supermodule branch, the submodule branch is not
needed any more).


See also the discussion about it in the messages around
http://marc.info/?l=git&m=116636334226668&w=2

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 0/6] Initial subproject support (RFC?)
  2007-04-10 20:52               ` Junio C Hamano
  2007-04-10 21:02                 ` Sam Ravnborg
  2007-04-10 21:03                 ` Nicolas Pitre
@ 2007-04-11  8:08                 ` David Kågedal
  2007-04-11  9:32                   ` Junio C Hamano
  2 siblings, 1 reply; 101+ messages in thread
From: David Kågedal @ 2007-04-11  8:08 UTC (permalink / raw
  To: git

Junio C Hamano <junkio@cox.net> writes:

> $ git rebase HEAD lt/gitlink
> 
> This will take us back on lt/gitlink branch, set the tip of the
> branch to the commit we just made with the fix-up, and the first
> round will try to apply the change lt/gitlink~3 brings in on top
> of our HEAD.  This _will_ fail, but that is to be expected, as
> we intend to replace that with what we just amended.  Just reset
> it away and keep going.
> 
> $ git reset --hard
> $ git rebase --skip

Wouldn't

$ git rebase --onto HEAD lt/gitlink~3 lt/gitlink

do the trick in one step?

-- 
David Kågedal

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-11  8:06   ` Martin Waitz
@ 2007-04-11  8:29     ` Alex Riesen
  2007-04-11  8:36       ` Martin Waitz
  2007-04-11  9:47     ` Andy Parkins
  2007-04-11 15:16     ` Linus Torvalds
  2 siblings, 1 reply; 101+ messages in thread
From: Alex Riesen @ 2007-04-11  8:29 UTC (permalink / raw
  To: Martin Waitz; +Cc: Linus Torvalds, Git Mailing List, Junio C Hamano

On 4/11/07, Martin Waitz <tali@admingilde.org> wrote:
> So my advice is:
> Always read and write one dedicated branch (hardcoded "master" or
> configurable) when the supermodule wants to access a submodule.

In this case it does not correspond to the working tree anymore.
HEAD is the "closest" to working tree of submodule.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 0/6] Initial subproject support (RFC?)
  2007-04-10 13:04   ` Alex Riesen
  2007-04-10 15:13     ` Linus Torvalds
@ 2007-04-11  8:32     ` Martin Waitz
  2007-04-11  8:42       ` Alex Riesen
  1 sibling, 1 reply; 101+ messages in thread
From: Martin Waitz @ 2007-04-11  8:32 UTC (permalink / raw
  To: Alex Riesen; +Cc: Linus Torvalds, Git Mailing List, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1110 bytes --]

hoi :)

On Tue, Apr 10, 2007 at 03:04:33PM +0200, Alex Riesen wrote:
> The other thing which will be missed a lot (I miss it that much)
> is a subproject-recursive git-commit and git-status.

git-status should really point out if a subproject has any changes,
as it does for files.  Only that a submodule may have more types of
possible changes: has new commits which are not yet in the supermodule
index, has an dirty index of its own, dirty working directory.

But for commit it really does not make any sense.  The commit in the
submodule is totally independent to the commit in the supermodule.
You'd want the the submodule commit message to not refer to any
supermodule stuff (as you likely want to reuse the submodule in other
supermodules), while the supermodule commit is much more high-level and
only records that the submodule got changed.

When viewed from the supermodule, a submodule is just part of its tree,
just as normal files.  So a submodule commit is conceptually similiar to
changing a file, and you don't change files while you commit, also ;-).

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-11  8:29     ` Alex Riesen
@ 2007-04-11  8:36       ` Martin Waitz
  2007-04-11  8:49         ` Alex Riesen
  2007-04-11  9:15         ` Junio C Hamano
  0 siblings, 2 replies; 101+ messages in thread
From: Martin Waitz @ 2007-04-11  8:36 UTC (permalink / raw
  To: Alex Riesen; +Cc: Linus Torvalds, Git Mailing List, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 796 bytes --]

hoi :)

On Wed, Apr 11, 2007 at 10:29:24AM +0200, Alex Riesen wrote:
> On 4/11/07, Martin Waitz <tali@admingilde.org> wrote:
> >So my advice is:
> >Always read and write one dedicated branch (hardcoded "master" or
> >configurable) when the supermodule wants to access a submodule.
> 
> In this case it does not correspond to the working tree anymore.
> HEAD is the "closest" to working tree of submodule.

yes.

This has been discussed in length already.
Please have a look at the archives.

Your working tree now contains a complete git repository which has
features which are not available for normal files.  Notable, you
have the possibility to create branches in the submodule.
If you insist in using HEAD you throw away those submodule capabilities.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 0/6] Initial subproject support (RFC?)
  2007-04-11  8:32     ` Martin Waitz
@ 2007-04-11  8:42       ` Alex Riesen
  2007-04-11  8:57         ` Martin Waitz
  0 siblings, 1 reply; 101+ messages in thread
From: Alex Riesen @ 2007-04-11  8:42 UTC (permalink / raw
  To: Martin Waitz; +Cc: Linus Torvalds, Git Mailing List, Junio C Hamano

On 4/11/07, Martin Waitz <tali@admingilde.org> wrote:
> > The other thing which will be missed a lot (I miss it that much)
> > is a subproject-recursive git-commit and git-status.
>
> git-status should really point out if a subproject has any changes,

Only if I want it to. HEAD change check (which is cheap enough
to be done unconditionally) can be done always.

> But for commit it really does not make any sense.  The commit in the
> submodule is totally independent to the commit in the supermodule.

Right. Perhaps not a commit in submodule but a recursive check
for working directory changes in submodules. So that you can
make that you don't make a superproject commit which cannot
be resolved to what you had in all the working directories:

  git commit -a --check-clean-subprojects

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-11  8:36       ` Martin Waitz
@ 2007-04-11  8:49         ` Alex Riesen
  2007-04-11  9:20           ` Martin Waitz
  2007-04-11  9:15         ` Junio C Hamano
  1 sibling, 1 reply; 101+ messages in thread
From: Alex Riesen @ 2007-04-11  8:49 UTC (permalink / raw
  To: Martin Waitz; +Cc: Linus Torvalds, Git Mailing List, Junio C Hamano

On 4/11/07, Martin Waitz <tali@admingilde.org> wrote:
> > >Always read and write one dedicated branch (hardcoded "master" or
> > >configurable) when the supermodule wants to access a submodule.
> >
> > In this case it does not correspond to the working tree anymore.
> > HEAD is the "closest" to working tree of submodule.
>
> yes.

"Yes" what? It should _not_ correspond to HEAD?

> This has been discussed in length already.
> Please have a look at the archives.

I should. But at least a short summary of the reasons
would be nice.

> Your working tree now contains a complete git repository which has
> features which are not available for normal files.  Notable, you
> have the possibility to create branches in the submodule.
> If you insist in using HEAD you throw away those submodule capabilities.
>

In this (a very special, I believe) case, why not use git update-index
--cacheinfo?

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 0/6] Initial subproject support (RFC?)
  2007-04-11  8:42       ` Alex Riesen
@ 2007-04-11  8:57         ` Martin Waitz
  0 siblings, 0 replies; 101+ messages in thread
From: Martin Waitz @ 2007-04-11  8:57 UTC (permalink / raw
  To: Alex Riesen; +Cc: Linus Torvalds, Git Mailing List, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1265 bytes --]

hoi :)

On Wed, Apr 11, 2007 at 10:42:57AM +0200, Alex Riesen wrote:
> On 4/11/07, Martin Waitz <tali@admingilde.org> wrote:
> >> The other thing which will be missed a lot (I miss it that much)
> >> is a subproject-recursive git-commit and git-status.
> >
> >git-status should really point out if a subproject has any changes,
> 
> Only if I want it to. HEAD change check (which is cheap enough
> to be done unconditionally) can be done always.

Yes, that's the equivalent of checking normal files.
The recursive check for dirty files/index should be configurable.

> >But for commit it really does not make any sense.  The commit in the
> >submodule is totally independent to the commit in the supermodule.
> 
> Right. Perhaps not a commit in submodule but a recursive check
> for working directory changes in submodules. So that you can
> make that you don't make a superproject commit which cannot
> be resolved to what you had in all the working directories:
> 
>  git commit -a --check-clean-subprojects

For -a such a check may even make sense unconditionally.
And without -a I don't see any value in such a check.
So we can just add that check to -a if we see that dirty submodules
are a problem for users.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-11  8:36       ` Martin Waitz
  2007-04-11  8:49         ` Alex Riesen
@ 2007-04-11  9:15         ` Junio C Hamano
  2007-04-11 10:03           ` Martin Waitz
  1 sibling, 1 reply; 101+ messages in thread
From: Junio C Hamano @ 2007-04-11  9:15 UTC (permalink / raw
  To: Martin Waitz; +Cc: Alex Riesen, Linus Torvalds, Git Mailing List

Martin Waitz <tali@admingilde.org> writes:

> Your working tree now contains a complete git repository which has
> features which are not available for normal files.  Notable, you
> have the possibility to create branches in the submodule.
> If you insist in using HEAD you throw away those submodule capabilities.

Why?  If you are working in the parent module (e.g integration)
and notice breakage due to a bug in a submodule, it is very
plausible that you would want to cd into the directory you have
the submodule checked out, which has its own .git/ as its
repository, and perform a fix-up there, with the goal of coming
up with a commit usable by the parent project pointed at by the
HEAD of the submodule repository.  And while working toward that
goal, you will use branches, rebase, rewind or use StGIT there
in that submodule repository.  It does not forbid you from using
any of these things -- as long as you end up with a good commit
at HEAD that the supermodule can use.

Once you come up with a suitable commit sitting at HEAD of the
submodule repository, you cd up to the parent module.  Top-level
git-diff would notice that the commit recorded at the submodule
path has been updated (because you now have a good commit at
HEAD of the submodule repository, while earlier the one in your
index was a dud).

So it is not clear to me what your argument about throwing away
capabilities is.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-11  8:49         ` Alex Riesen
@ 2007-04-11  9:20           ` Martin Waitz
  0 siblings, 0 replies; 101+ messages in thread
From: Martin Waitz @ 2007-04-11  9:20 UTC (permalink / raw
  To: Alex Riesen; +Cc: Linus Torvalds, Git Mailing List, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1444 bytes --]

hoi :)

On Wed, Apr 11, 2007 at 10:49:18AM +0200, Alex Riesen wrote:
> On 4/11/07, Martin Waitz <tali@admingilde.org> wrote:
> >> >Always read and write one dedicated branch (hardcoded "master" or
> >> >configurable) when the supermodule wants to access a submodule.
> >>
> >> In this case it does not correspond to the working tree anymore.
> >> HEAD is the "closest" to working tree of submodule.
> >
> >yes.
> 
> "Yes" what? It should _not_ correspond to HEAD?

Not neccessarily, yes.

Branches in the submodule make no sense unless they are independent
from supermodule branches.  And then changing to another branch in
the submodule automatically means that your current submodule working
directory should be independent to the supermodule.

git-status in the supermodule should of course warn when a submodule
is on a different branch, so that you don't accidently loose submodule
commits which did not get committed to the supermodule.

> >Your working tree now contains a complete git repository which has
> >features which are not available for normal files.  Notable, you
> >have the possibility to create branches in the submodule.
> >If you insist in using HEAD you throw away those submodule capabilities.
> >
> 
> In this (a very special, I believe) case, why not use git update-index
> --cacheinfo?

I think misunderstood each other.
For me branching is not special case.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 0/6] Initial subproject support (RFC?)
  2007-04-11  8:08                 ` David Kågedal
@ 2007-04-11  9:32                   ` Junio C Hamano
  2007-04-15 23:25                     ` J. Bruce Fields
  0 siblings, 1 reply; 101+ messages in thread
From: Junio C Hamano @ 2007-04-11  9:32 UTC (permalink / raw
  To: David Kågedal; +Cc: git

David Kågedal <davidk@lysator.liu.se> writes:

> Junio C Hamano <junkio@cox.net> writes:
>
>> ...  This _will_ fail, but that is to be expected, as
>> we intend to replace that with what we just amended.  Just reset
>> it away and keep going.
>> 
>> $ git reset --hard
>> $ git rebase --skip
>
> Wouldn't
>
> $ git rebase --onto HEAD lt/gitlink~3 lt/gitlink
>
> do the trick in one step?

It is probably more Kosher, and I used to always do that, but it
is much longer to type, and I use both perhaps 50%/50% depending
on the mood.

When the fix-up only adds stuff, 3-way merge would say that the
commit before fixing up (lt/gitlink~3 in our example, which you
are explicitly excluding, while I am letting rebase to see it)
has already been applied, in which case the procedure would not
even stop.  The case illustrated in my message which only adds a
forgotten line "fclose(f)" falls into that category.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-11  8:06   ` Martin Waitz
  2007-04-11  8:29     ` Alex Riesen
@ 2007-04-11  9:47     ` Andy Parkins
  2007-04-11 11:31       ` Martin Waitz
  2007-04-11 15:16     ` Linus Torvalds
  2 siblings, 1 reply; 101+ messages in thread
From: Andy Parkins @ 2007-04-11  9:47 UTC (permalink / raw
  To: git; +Cc: Martin Waitz, Linus Torvalds, Junio C Hamano

On Wednesday 2007 April 11 09:06, Martin Waitz wrote:

> The only thing I disagree with you is in using HEAD of the submodule:

I know we've had this discussion before, but I'm going to bring it up again - 
mainly because Linus's implementation exactly matches what I envisaged when 
we originally spoke of this.  I think in your "Updating the branch which HEAD 
points to is dangerous" section, the main thing you're not taking into 
account is that git can make detached checkouts.  Updating HEAD is not 
dangerous - updating refs is; and I don't think anyone is proposing that a 
submodule ref should ever be updated by a supermodule.

I think you're also too strongly focussed on the idea that the supermodule 
tracks submodule branches - it cannot branches are not part of "the" 
repository they point at "a" repository.  References are outside the 
repository pointing in, and hence the supermodule cannot refer to them at its 
core.

Now, if you check out a revision in the supermodule, that's going to look up 
the submodule revision stored in the DIRLINK tree entry which will recurse 
into the submodule and checkout that revision - almost certainly as a 
detached HEAD.  There are three possibilities then:
 - The submodule revision is in the past and no submodule branch points at it
 - The submodule revision is current and a submodule branch points at it
 - The submodule revision is current and multiple submodule branches point at 
   it
The supermodule checkout will have to make a decision whether to update the 
submodule HEAD (in one case it's obvious: a revision in the past has to be 
detached HEAD as there is no suitable branch).  It's also possible that the 
single submodule branch case is easy - undetach HEAD; however I don't think 
that is universally correct.

I know you're very much in favour of making branches in the submodule 
correspond to branches in the supermodule, but I just don't see a way of 
making it work - the supermodule cannot know about submodule branches, 
branches are not part of the repository, they just point at the repository.  
My branches could be different from your branches.

It may be that some handy configuration settings and some clever porcelain 
could keep them in sync for your working repository - but it's never going to 
be the case that checking out "master" in the supermodule can be universally 
resolved to mean "checkout master in the submodule".

The way submodules should be treated is that the whole submodule is analogous 
to a single repository-tracked file - that's essentially what a submodule is 
in the end but the content of the "file" is the submodule revision.

There is one difference from ordinary files, a submodule has two "modified" 
states, not one:
 1. HEAD of submodule is different from DIRLINK revision
 2. Submodule is dirty

In state (1) the submodule has to have git-add run on it in the supermodule, 
just as you would with a modified file, to get it into the index (or not if 
you don't want to commit that change).  In state (2) it should be impossible 
to git-add, because the state of the submodule doesn't represent something 
that could be restored - there is nothing reasonable that could be written to 
the DIRLINK tree object.  This is certainly a porcelain issue, because it's 
only really a warning that "git-add" isn't doing what you think it's doing 
when the submodule is dirty.

Now, if you change branch in the submodule, the supermodule will see that as a 
change in the submodule (as it should).  If you changed back, it will be 
restored and the supermodule will again see it as unchanged.  If you commit 
on the submodule, the supermodule will see that as a change and you'll have 
to git-add the submodule and commit in the supermodule.  The submodule is on 
whatever branch it is on - at all times.

The only time I can see this causing difficulties is when you want to checkout 
the tip of a submodule branch - how is the supermodule to know when it is 
correct to change HEAD from being detached to being attached?  I suppose it's 
got to be config-based; and out-of-tree config at that.


Andy
-- 
Dr Andy Parkins, M Eng (hons), MIET
andyparkins@gmail.com

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-11  9:15         ` Junio C Hamano
@ 2007-04-11 10:03           ` Martin Waitz
  2007-04-11 20:01             ` Junio C Hamano
  0 siblings, 1 reply; 101+ messages in thread
From: Martin Waitz @ 2007-04-11 10:03 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Alex Riesen, Linus Torvalds, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 2908 bytes --]

hoi :)

On Wed, Apr 11, 2007 at 02:15:36AM -0700, Junio C Hamano wrote:
> Martin Waitz <tali@admingilde.org> writes:
> 
> > Your working tree now contains a complete git repository which has
> > features which are not available for normal files.  Notable, you
> > have the possibility to create branches in the submodule.
> > If you insist in using HEAD you throw away those submodule capabilities.
> 
> Why?  If you are working in the parent module (e.g integration)
> and notice breakage due to a bug in a submodule, it is very
> plausible that you would want to cd into the directory you have
> the submodule checked out, which has its own .git/ as its
> repository, and perform a fix-up there, with the goal of coming
> up with a commit usable by the parent project pointed at by the
> HEAD of the submodule repository.  And while working toward that
> goal, you will use branches, rebase, rewind or use StGIT there
> in that submodule repository.  It does not forbid you from using
> any of these things -- as long as you end up with a good commit
> at HEAD that the supermodule can use.

that's perfectly fine.
I only require one more thing: make sure that your commit is on
one dedicated branch (simply by merging your working/rebased/whatever
branch into the dedicated one) and not on some random one.

Again: for your above example this is not neccessary and using HEAD
would indeed be perfectly fine.

But you also have to update the submodule when you do a checkout in
the supermodule.  So what do you update?  Updating 'HEAD' is not
very concrete, please have a look at my initial mail to Linus.

What is stored in the supermodule?  It stores a reference to a specific
point in the history of the submodule.  As such I am convinced that
the right counterpart inside the submodule is a refs/heads/whatever,
and not the branch selector HEAD.
You can have other branches next to the one which is tracked by the
supermodule.  If you always update HEAD you don't have a clear
distinction between the branch which is tracked and other branches.

> Once you come up with a suitable commit sitting at HEAD of the
> submodule repository, you cd up to the parent module.  Top-level
> git-diff would notice that the commit recorded at the submodule
> path has been updated (because you now have a good commit at
> HEAD of the submodule repository, while earlier the one in your
> index was a dud).
> 
> So it is not clear to me what your argument about throwing away
> capabilities is.

If the supermodule just updates some random submodule branch I happen to
use at the time of a supermodule pull then submodule branches are
of much lower value.
Suddenly you have to make sure for yourself that the correct branch
gets updated.
For me, different branches should be independent and I want git to
always update the correct one.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-11  9:47     ` Andy Parkins
@ 2007-04-11 11:31       ` Martin Waitz
  0 siblings, 0 replies; 101+ messages in thread
From: Martin Waitz @ 2007-04-11 11:31 UTC (permalink / raw
  To: Andy Parkins; +Cc: git, Linus Torvalds, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 4417 bytes --]

hoi :)

On Wed, Apr 11, 2007 at 10:47:00AM +0100, Andy Parkins wrote:
> On Wednesday 2007 April 11 09:06, Martin Waitz wrote:
> 
> > The only thing I disagree with you is in using HEAD of the submodule:
> 
> I know we've had this discussion before, but I'm going to bring it up again - 
> mainly because Linus's implementation exactly matches what I envisaged when 
> we originally spoke of this.  I think in your "Updating the branch which HEAD 
> points to is dangerous" section, the main thing you're not taking into 
> account is that git can make detached checkouts.  Updating HEAD is not 
> dangerous - updating refs is; and I don't think anyone is proposing that a 
> submodule ref should ever be updated by a supermodule.

Then we already agree on the most important part.
My argument is mostly against updating the ref which is behind HEAD, not
HEAD per se.  And I haven't thought about using detached HEADs until I
wrote the mail.

> I think you're also too strongly focussed on the idea that the supermodule 
> tracks submodule branches - it cannot branches are not part of "the" 
> repository they point at "a" repository.  References are outside the 
> repository pointing in, and hence the supermodule cannot refer to them at its 
> core.

No, that may be an misunderstanding because my very first prototype
really did track branches.  In the meantime I changed my mind, my
current prototypes all track submodule commits directly.
But in doing so we create a branch of its own: remember, a branch in
git is just a moving reference into the history.  Such a reference
can be stored in .git/refs/heads or it can be stored in the index/tree of
the supermodule.  The difference is not really big.
So we do not track a branch, but we create a branch by tracking.

> Now, if you check out a revision in the supermodule, that's going to look up 
> the submodule revision stored in the DIRLINK tree entry which will recurse 
> into the submodule and checkout that revision - almost certainly as a 
> detached HEAD.  There are three possibilities then:
>  - The submodule revision is in the past and no submodule branch points at it
>  - The submodule revision is current and a submodule branch points at it
>  - The submodule revision is current and multiple submodule branches point at 
>    it
> The supermodule checkout will have to make a decision whether to update the 
> submodule HEAD (in one case it's obvious: a revision in the past has to be 
> detached HEAD as there is no suitable branch).  It's also possible that the 
> single submodule branch case is easy - undetach HEAD; however I don't think 
> that is universally correct.

I don't like to guess which branches to update.
I'd prefer to just unconditionally update one specific one.

> I know you're very much in favour of making branches in the submodule 
> correspond to branches in the supermodule, but I just don't see a way of 
> making it work - the supermodule cannot know about submodule branches, 
> branches are not part of the repository, they just point at the repository.  
> My branches could be different from your branches.

That would not work, you are right.
Please see my above comment about tracking & branches.

> The way submodules should be treated is that the whole submodule is analogous 
> to a single repository-tracked file - that's essentially what a submodule is 
> in the end but the content of the "file" is the submodule revision.

Wholeheartedly agreed.

> Now, if you change branch in the submodule, the supermodule will see
> that as a change in the submodule (as it should).  If you changed
> back, it will be restored and the supermodule will again see it as
> unchanged.  If you commit on the submodule, the supermodule will see
> that as a change and you'll have to git-add the submodule and commit
> in the supermodule.  The submodule is on whatever branch it is on - at
> all times.

> The only time I can see this causing difficulties is when you want to
> checkout the tip of a submodule branch - how is the supermodule to
> know when it is correct to change HEAD from being detached to being
> attached?  I suppose it's got to be config-based; and out-of-tree
> config at that.

Again, doing things conditionally here just adds to confusion.
Just have one dedicated branch and be done with it.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-11  8:06   ` Martin Waitz
  2007-04-11  8:29     ` Alex Riesen
  2007-04-11  9:47     ` Andy Parkins
@ 2007-04-11 15:16     ` Linus Torvalds
  2007-04-11 22:49       ` Sam Vilain
  2007-04-11 23:54       ` Martin Waitz
  2 siblings, 2 replies; 101+ messages in thread
From: Linus Torvalds @ 2007-04-11 15:16 UTC (permalink / raw
  To: Martin Waitz; +Cc: Git Mailing List, Junio C Hamano



On Wed, 11 Apr 2007, Martin Waitz wrote:
> 
> I only had little time to actually have a look at it but the core is
> very similiar to my approach and I'll try to rebase some of my code on
> top of yours in the following days.
> 
> The only thing I disagree with you is in using HEAD of the submodule:

Well, I don't actually see much choice. HEAD is just shorthand for 
"whatever is checked out".

> Always using HEAD of the submodule makes branches in the submodule
> useless.

No. 

Branches in submodules actually in many ways are *more* important than 
branches in supermodules - it's just that with the CVS mentality, you 
would never actually see that, because CVS obviously doesn't really 
support such a notion.

So I'd argue that branches in submodules give you:

 - you can develop the submodule *independently* of the supermodule, but 
   still be able to easily merge back and forth.

   Quite often, the submodule would be developed entirely _outside_ of the 
   supermodule, and the "branch" that gets the most development would thus
   actually be the "vendor branch", entirely outside the supermodule. Call 
   that the "main" branch or whatever, inside the supermodule it would 
   often be something like the remote "remotes/origin/master" branch.

   So inside the supermodule, the HEAD would generally point to something 
   that is *not* necessarily the "main development" branch, because the 
   supermodule maintainer would quite logically and often have his own 
   modifications to the original project on that branch. It migth be a 
   detached branch, or just a local branch inside the submodule.

 - branches inside submodules are *also* very useful even inside the 
   supermodule, ie they again allow topic work to be fetched into the
   submodule *without* having to actually be part of the supermodule,
   or as a way to track a certain experimental branch of the supermodule.

   I suspect that most supermodule usage is as an "integrator" branch, 
   which means that the supermodule tends to follow the "main 
   development", and the whole point of the supermodule is largely to have 
   a collection of "stable things that work together". 

   In contrast, branches within submodules are useful for doing all the 
   development that is *not* yet ready to be committed to the supermodule, 
   exactly because it's not yet been tested in the full "make World" kind 
   of situation.

> Whenever you do a checkout in the supermodule you also have to update
> the submodule and this update has to change the same thing which is read
> above.

I suspect (but will not guarantee) that the right approach is that a 
supermodule checkout usually just uses a "detached HEAD" setup. Within the 
context of the supermodule, only the actual commit SHA1 matters, not what 
branch it was developed on (side note: I haven't decided if we should 
allow the SHA1 to be a signed tag object too - the current patches 
obviously don't care since they never follow the SHA1 anyway, and it might 
be a good idea).

So I strongly suspect (and that is what the patch series embodies) that as 
far as the supermodule is concerned, it should *not* matter at all what 
branch the subproject was on. The subproject can use branches for 
development, and the supermodule really doesn't care what the local 
branchname was when a commit was made - because branch-names are *local* 
things, and a branch that is called "experimental" in one environment 
might be called "master" in another.

So once the commit hits the superproject, the branch identities just go 
away (only as far as the superproject is concerned, of course - the 
subproject still stays with whatever branches it has), and the only thing 
that matters is the commit SHA1.

> Updating the branch which HEAD points to is dangerous.

I would strongly suggest that the *superproject* never really change the 
status of the subproject HEAD, except it updates it for "pull/reset", and 
then it just would use whatever the subproject decided to use.

The subproject HEAD policy would be entirely under the control of the 
subproject. If the subproject wants to use a branch to track the 
superproject, go wild: have a real branch that is called "my-integration" 
and make HEAD a symref to that (and thus any work in the superproject will 
update that branch - something that is visible when you pull directly from 
that subproject!)

But quite often, I suspect that a subproject would just use a detached 
HEAD. The subproject may have branches of its own, of course, but you can 
think of HEAD as not being connected to any of it's "own" branches, but 
simply being the "superproject branch". That's a fairly accurate picture 
of reality, and using "detached HEAD" sounds like a very natural thing to 
do in that situation.

So I really think you can do both, and I think using HEAD inside the 
superproject gives you exactly that flexibility - you can decide on a 
per-subproject basis whether HEAD should track a real local branch in a 
subproject, or whether it should be detached.

(Side note: if you do *not* use detatched HEAD, I suspect the .gitmodules 
file could also contain the branchname to be used for the subproject 
tracking, but I think that's a detail, and quite debatable)

> So my advice is:
> Always read and write one dedicated branch (hardcoded "master" or
> configurable) when the supermodule wants to access a submodule.

So the main reasons I don't think that is a good idea are:

 - it's less flexible: see above on why you might want to use a dedicated 
   branch *or* just detached HEAD, and why you might want to choose your 
   own name for the dedicated branch.

 - it's also going to be quite confusing when the superproject sees 
   something *else* than what is actually checked out. This is an equally 
   strong argument for just using HEAD - when we actually implement a

	 git diff --subproject

   flag that recurses into the subproject, if you don't use HEAD inside 
   the subproject, that suddenly becomes a *very* confusing thing.

In other words, I really think HEAD is absolutely the right thing to use, 
but that said, I obviously wrote "resolve_gitlink_ref()" so that it can 
take any ref-name, and we *can* change that later, or make it a per-module 
config option or whatever.

		Linus

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-11 10:03           ` Martin Waitz
@ 2007-04-11 20:01             ` Junio C Hamano
  2007-04-11 22:19               ` Martin Waitz
  0 siblings, 1 reply; 101+ messages in thread
From: Junio C Hamano @ 2007-04-11 20:01 UTC (permalink / raw
  To: Martin Waitz; +Cc: Alex Riesen, Linus Torvalds, Git Mailing List

Martin Waitz <tali@admingilde.org> writes:

> What is stored in the supermodule?  It stores a reference to a specific
> point in the history of the submodule.  As such I am convinced that
> the right counterpart inside the submodule is a refs/heads/whatever,
> and not the branch selector HEAD.

Because 'submodule' is a project on its own, it can make
progress while the parent project is still using the stable
commit.  Think of this:

 - Your application uses product of another project as a
   library (e.g. you are doing video application and embedding
   ffmpeg).
   
 - Your 'master' commit records a commit in the library
   subproject.  Maybe library subproject declared stable 1.0 and
   that is what you used to integrate.

 - But being an independent project on its own, the library
   project can make progress, outside the context of this
   aggregated work (i.e. your application).  Next time you do:

	$ cd ffmpeg ; git fetch

   there may not be any branch that points at the exact "stable 1.0"
   commit.

When you do a "checkout -f --recurse-into-subprojects" from the
toplevel, I suspect that you would need to detach HEAD in the
subproject repository grafted in your application tree to move
it to the exact commit the toplevel project (i.e. your
application) wants, and match the working tree to that commit.
The toplevel simply should _not_ have to care what branch that
commit comes from.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-11 20:01             ` Junio C Hamano
@ 2007-04-11 22:19               ` Martin Waitz
  2007-04-11 22:36                 ` Linus Torvalds
  0 siblings, 1 reply; 101+ messages in thread
From: Martin Waitz @ 2007-04-11 22:19 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Alex Riesen, Linus Torvalds, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1697 bytes --]

hoi :)

On Wed, Apr 11, 2007 at 01:01:17PM -0700, Junio C Hamano wrote:
> When you do a "checkout -f --recurse-into-subprojects" from the
> toplevel, I suspect that you would need to detach HEAD in the
> subproject repository grafted in your application tree to move
> it to the exact commit the toplevel project (i.e. your
> application) wants, and match the working tree to that commit.
> The toplevel simply should _not_ have to care what branch that
> commit comes from.

yes.

But why does everybody want to detach the submodule HEAD, instead
of creating one 'special' branch which holds the commit which is
used by the supermodule?

If you then want to switch to another submodule branch you loose
the reference that comes from the supermodule.

I want to create the extra branch exactly _because_ there is
independent work going on in the submodule (or the project it is
based on).  As you can switch between detached HEAD and an
independent branch you can also switch between the 'supermodule branch'
and independent branches -- only that you can easily switch back
if you have an branch of your own.

BTW: I also think that your --recurse-into-subprojects should
be implied.
If you check out one index entry, you should be able to read it
back afterwards.  That is a nice property everyone expects from
normal files and we should try to keep that for submodules.
When checkout_entry wants to touch a submodule we can simply rewrite
the 'supermodule branch' in the submodule.  If HEAD happens to point
to it we also read-tree the submodule.
This is easy to understand and implement and I have some good experience
with this model.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-11 22:19               ` Martin Waitz
@ 2007-04-11 22:36                 ` Linus Torvalds
  0 siblings, 0 replies; 101+ messages in thread
From: Linus Torvalds @ 2007-04-11 22:36 UTC (permalink / raw
  To: Martin Waitz; +Cc: Junio C Hamano, Alex Riesen, Git Mailing List



On Thu, 12 Apr 2007, Martin Waitz wrote:
> 
> But why does everybody want to detach the submodule HEAD, instead
> of creating one 'special' branch which holds the commit which is
> used by the supermodule?

I don't think "everybody" wants it.

But the point is, *regardless* of whether you want a "detached HEAD" or 
you want a "'special' branch", you should always use HEAD to look up the 
commit, and using HEAD *allows* both (ie just make HEAD a symref to the 
'special' branch if you want that behaviour).

And if you *do* use a special branch, HEAD *must* match that special 
branch anyway, since when you commit in the supermodule, the only 
behaviour that makes sense is to commit the currently checked out state!

> I want to create the extra branch exactly _because_ there is
> independent work going on in the submodule (or the project it is
> based on).

And that is entirely appropriate.

But that still means that HEAD must point to that branch (when in the 
submodule), since that branch must be the one that is checked out. If it 
isn't the branch that is checked out, normal operations like "git diff" 
etc wouldn't make sense from the supermodule.

And that is why *regardless* of whether you use a special branch or not, 
HEAD is the right thing to look up.

		Linus

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 5/6] Teach "fsck" not to follow subproject links
  2007-04-10  4:15 ` [PATCH 5/6] Teach "fsck" not to follow subproject links Linus Torvalds
@ 2007-04-11 22:41   ` Sam Vilain
  2007-04-11 22:48     ` Linus Torvalds
  0 siblings, 1 reply; 101+ messages in thread
From: Sam Vilain @ 2007-04-11 22:41 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano

Linus Torvalds wrote:
> Since the subprojects don't necessarily even exist in the current tree,
> much less in the current git repository (they are totally independent
> repositories), we do not want to try to follow the chain from one git
> repository to another through a gitlink.
>   

Does this consider the case where the intent of the subprojects are to
collate multiple, small projects into one bigger project?

In that case, you might want to keep all of the subprojects in the same
git repository.

Sam.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 5/6] Teach "fsck" not to follow subproject links
  2007-04-11 22:41   ` Sam Vilain
@ 2007-04-11 22:48     ` Linus Torvalds
  2007-04-11 22:59       ` Sam Vilain
  0 siblings, 1 reply; 101+ messages in thread
From: Linus Torvalds @ 2007-04-11 22:48 UTC (permalink / raw
  To: Sam Vilain; +Cc: Git Mailing List, Junio C Hamano



On Thu, 12 Apr 2007, Sam Vilain wrote:
>
> Linus Torvalds wrote:
> > Since the subprojects don't necessarily even exist in the current tree,
> > much less in the current git repository (they are totally independent
> > repositories), we do not want to try to follow the chain from one git
> > repository to another through a gitlink.
> >   
> 
> Does this consider the case where the intent of the subprojects are to
> collate multiple, small projects into one bigger project?
> 
> In that case, you might want to keep all of the subprojects in the same
> git repository.

I assume you mean "you might want to keep all of the subprojects' objects 
in the same git object directory".

And yes, that's absolutely true, but it's technically no different from 
just using GIT_OBJECT_DIRECTORY to share objects between totally unrelated 
projects, or using git/alternates to share objects between (probably 
*less* unrelated repositories, but still clearly individual repos).

So the main point of superproject/subprojects is to allow independence 
(because independence is what allows it to scale), but there is nothing to 
say that things *have* to kept totally isolated. 

			Linus

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-11 15:16     ` Linus Torvalds
@ 2007-04-11 22:49       ` Sam Vilain
  2007-04-11 23:54       ` Martin Waitz
  1 sibling, 0 replies; 101+ messages in thread
From: Sam Vilain @ 2007-04-11 22:49 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Martin Waitz, Git Mailing List, Junio C Hamano

Linus Torvalds wrote:
> (Side note: if you do *not* use detatched HEAD, I suspect the .gitmodules 
> file could also contain the branchname to be used for the subproject 
> tracking, but I think that's a detail, and quite debatable)
>   

To discuss this detail, what about keeping refs, such as
refs/submodules/branch/path/* (or some other convention) which are
updated on commit? Then you can also easily clone just the submodule.

Sam.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 5/6] Teach "fsck" not to follow subproject links
  2007-04-11 22:48     ` Linus Torvalds
@ 2007-04-11 22:59       ` Sam Vilain
  2007-04-11 23:16         ` Linus Torvalds
  2007-04-11 23:30         ` Dana How
  0 siblings, 2 replies; 101+ messages in thread
From: Sam Vilain @ 2007-04-11 22:59 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano

Linus Torvalds wrote:
> > Does this consider the case where the intent of the subprojects are to
> > collate multiple, small projects into one bigger project?
> > 
> > In that case, you might want to keep all of the subprojects in the same
> > git repository.
>
> I assume you mean "you might want to keep all of the subprojects' objects 
> in the same git object directory".
>
> And yes, that's absolutely true, but it's technically no different from 
> just using GIT_OBJECT_DIRECTORY to share objects between totally unrelated 
> projects, or using git/alternates to share objects between (probably 
> *less* unrelated repositories, but still clearly individual repos).
>   

Would that be the only distinction?

Would submodules be descended into for object reachability questions?

> So the main point of superproject/subprojects is to allow independence 
> (because independence is what allows it to scale), but there is nothing to 
> say that things *have* to kept totally isolated. 
>   

I'm particularly interested in repositories with, say, thousands of
submodules but only a few hundred meg. I really want to avoid the
situation where each of those submodules gets checked or descended into
separately for updates etc.

Sam.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 5/6] Teach "fsck" not to follow subproject links
  2007-04-11 23:16         ` Linus Torvalds
@ 2007-04-11 23:05           ` David Lang
  2007-04-11 23:53             ` Linus Torvalds
  2007-04-12  0:34           ` Junio C Hamano
  1 sibling, 1 reply; 101+ messages in thread
From: David Lang @ 2007-04-11 23:05 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Sam Vilain, Git Mailing List, Junio C Hamano

On Wed, 11 Apr 2007, Linus Torvalds wrote:
> On Thu, 12 Apr 2007, Sam Vilain wrote:
>
> The reason a *full* global fsck is so expensive is that it would have an
> absolutely humungous working set, and effectively keep everything in
> memory through it all. Doing it in stages ("fsck smaller individiual trees
> separately") is actually the same amount of absolute work, but the working
> set never grows, so it scales much better.
>
> (fsck'ing projects individually also happens to allow you to do the
> sub-project fsck's in parallel across multiple CPU's or multiple machines,
> so it actually scales much better that way too - but the big problem
> tends to be excessive memory use, so the "SMP parallel version" only
> makes sense if you have tons of memory and can afford to do these things
> at the same time!)

would it make sense to have a --multiple-project option for fsck that would let 
you specify multiple 'projects' that share a object set and have the default 
checking not do the reachability checks that cause problems in this case?

Then people can share the objects if they want to and still do a full check, but 
would get warned that the full check would take a lot of time. which is not a 
big problem for a housekeeping thing that's run infrequently to find unreachable 
objects (which is something that should seldom happen in a well managed project)

David Lang

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 5/6] Teach "fsck" not to follow subproject links
  2007-04-11 22:59       ` Sam Vilain
@ 2007-04-11 23:16         ` Linus Torvalds
  2007-04-11 23:05           ` David Lang
  2007-04-12  0:34           ` Junio C Hamano
  2007-04-11 23:30         ` Dana How
  1 sibling, 2 replies; 101+ messages in thread
From: Linus Torvalds @ 2007-04-11 23:16 UTC (permalink / raw
  To: Sam Vilain; +Cc: Git Mailing List, Junio C Hamano



On Thu, 12 Apr 2007, Sam Vilain wrote:
> >
> > And yes, that's absolutely true, but it's technically no different from 
> > just using GIT_OBJECT_DIRECTORY to share objects between totally unrelated 
> > projects, or using git/alternates to share objects between (probably 
> > *less* unrelated repositories, but still clearly individual repos).
> 
> Would that be the only distinction?
> 
> Would submodules be descended into for object reachability questions?

I think we'll eventually want that *regardless* of how the object handling 
is done (a kind of "cross-submodule boundary check"), but I think that's 
actually outside of the scope of the current fsck.

The current fsck goes to great lengths to make sure that the internal 
consistency of a repository is good. That's also why it takes so long, and 
why it is such an expensive operation to do (notably when you do a 
"--full" check).

In contrast, the "cross-submodule boundary check" is a much cheaper 
operation, *if* you have already verified that the projects are internally 
consistent. It literally boils down to doing a very simplified commit 
chain walker that only parses tree objects and simply spits out the 
SHA1's of the sub-tree commits (and their location in the tree), and then 
a separate phase that just verifies those against the submodules.

And that separate phase - once you've done the fsck for all the 
*individual* repositories - is truly trivial. It's literally just a matter 
of "is that SHA1 a valid commit object". That's *cheap*.

See?

> I'm particularly interested in repositories with, say, thousands of
> submodules but only a few hundred meg. I really want to avoid the
> situation where each of those submodules gets checked or descended into
> separately for updates etc.

So I think that the way to verify a superproject is:

 - fsck each and every project totally independently. This is something 
   you have to do *anyway*.

 - either as you fsck, or as a separate phase after the fsck, just 
   traverse the trees and spit out "these are the SHA1's of subprojects"

 - finally, just go through the list of SHA1's (after every project has 
   been fsck'd) and verify that they exist (since if they exist, they will 
   have everything that is reachable from them, as that's one of the 
   things that the *local* fsck verifies)

Notice? At no point do you actually need to do a "global fsck". You can do 
totally independent local fsck's, and then a really cheap test of 
connectedness once those fsck's have completed.

The reason a *full* global fsck is so expensive is that it would have an 
absolutely humungous working set, and effectively keep everything in 
memory through it all. Doing it in stages ("fsck smaller individiual trees 
separately") is actually the same amount of absolute work, but the working 
set never grows, so it scales much better.

(fsck'ing projects individually also happens to allow you to do the 
sub-project fsck's in parallel across multiple CPU's or multiple machines, 
so it actually scales much better that way too - but the big problem 
tends to be excessive memory use, so the "SMP parallel version" only 
makes sense if you have tons of memory and can afford to do these things 
at the same time!)

			Linus

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 5/6] Teach "fsck" not to follow subproject links
  2007-04-11 22:59       ` Sam Vilain
  2007-04-11 23:16         ` Linus Torvalds
@ 2007-04-11 23:30         ` Dana How
  1 sibling, 0 replies; 101+ messages in thread
From: Dana How @ 2007-04-11 23:30 UTC (permalink / raw
  To: Sam Vilain; +Cc: Linus Torvalds, Git Mailing List, Junio C Hamano, danahow

On 4/11/07, Sam Vilain <sam@vilain.net> wrote:
> Linus Torvalds wrote:
> > > Does this consider the case where the intent of the subprojects are to
> > > collate multiple, small projects into one bigger project?
> > >
> > > In that case, you might want to keep all of the subprojects in the same
> > > git repository.
> >
> > I assume you mean "you might want to keep all of the subprojects' objects
> > in the same git object directory".
> >
> > And yes, that's absolutely true, but it's technically no different from
> > just using GIT_OBJECT_DIRECTORY to share objects between totally unrelated
> > projects, or using git/alternates to share objects between (probably
> > *less* unrelated repositories, but still clearly individual repos).
> >
>
> Would that be the only distinction?
>
> Would submodules be descended into for object reachability questions?
>
> > So the main point of superproject/subprojects is to allow independence
> > (because independence is what allows it to scale), but there is nothing to
> > say that things *have* to kept totally isolated.
> >
>
> I'm particularly interested in repositories with, say, thousands of
> submodules but only a few hundred meg. I really want to avoid the
> situation where each of those submodules gets checked or descended into
> separately for updates etc.

This seems slightly related to the hazy picture I'm forming of how
I'd like to use git at our site.  Essentially, everyone would have their
own working tree with .git directory, but .git/objects is a symlink
to a shared object repository.  How do you fully run git-fsck on this
shared object repository?  The actual heads (roots) are distributed amongst
many .git/refs directories (I suppose you could do something akin
to git-fsck $(cat /somepaths*/.git/refs/*), but that means you know
where all the repositories are).  So in this setup, maybe I'd want to run
fsck twice: the first time checking everything but not complaining about
dangling commit objects [but listing them?], and maybe a 2nd finding
all these in the users' repos [still need to know where these are].
Please note this is just a thought experiment at this point.

Anyway,  git started out with a 1:1 relationship between working tree,
index, and object repository. Various things could weaken that --
alternates, subprojects with different relationships to their object
repositories, etc. -- so special commands like git fsck which
focus mostly on the object repository may need a little tweaking eventually.

-- 
Dana L. How  danahow@gmail.com  +1 650 804 5991 cell

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 5/6] Teach "fsck" not to follow subproject links
  2007-04-11 23:53             ` Linus Torvalds
@ 2007-04-11 23:30               ` David Lang
  2007-04-12  2:14                 ` Linus Torvalds
  2007-04-12  0:00               ` Dana How
  2007-04-12  0:03               ` Sam Vilain
  2 siblings, 1 reply; 101+ messages in thread
From: David Lang @ 2007-04-11 23:30 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Sam Vilain, Git Mailing List, Junio C Hamano

On Wed, 11 Apr 2007, Linus Torvalds wrote:

> On Wed, 11 Apr 2007, David Lang wrote:
>>
>> would it make sense to have a --multiple-project option for fsck that would
>> let you specify multiple 'projects' that share a object set and have the
>> default checking not do the reachability checks that cause problems in this
>> case?
>
> Well, the thing is, sharing object directories actually makes things
> *harder* to check, rather than easier.
>
> It can be a nice space optimization, and yes, if there really is a lot of
> shared state, it can make it much cheaper to do some of the checks, but
> right now we have absolutely *no* way for fsck to then do the reachability
> check, because there is no way to tell fsck where all the refs are (since
> now the refs come in from multiple repositories!)

this is why I was suggesting a --multiple-project option to let you tell fsck 
about all of the repositories that it needs to look for refs in.

> So the individual objects get cheaper to fsck (no need to fsck shared
> objects over and over again), but the reachability gets much harder to
> fsck.

agreed.

> It's not an insurmountable problem, or even necessarily a very large one,
> but it boils down to one very basic issue:
>
> - nobody seems to actually *use* the shared object directory model!
>
> The thing is, with pack-files and alternates directories, a lot of the
> original reasons for shared object directories simply don't exist..

I suspect that if it coudl be checked it would be used more, especially with the 
subproject support.

David Lang

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-10 16:28   ` Josef Weidendorfer
  2007-04-10 16:50     ` Alex Riesen
  2007-04-10 18:45     ` Linus Torvalds
@ 2007-04-11 23:36     ` Sam Vilain
  2 siblings, 0 replies; 101+ messages in thread
From: Sam Vilain @ 2007-04-11 23:36 UTC (permalink / raw
  To: Josef Weidendorfer; +Cc: Linus Torvalds, Git Mailing List, Junio C Hamano

Josef Weidendorfer wrote:
> An example for such an attribute would be a subproject name/ID.
> An argument for this: The user should be able to specify some policies
> for submodules, like "do not clone/checkout this submodule". But the
> path where the submodule resides in a given commit is not useful here,
> as a submodule can reside at different paths in the history of the
> supermodule.
>   

I mentioned this briefly on another strand of this thread, but I think
that the simplest way to do this would be to just make refs/subproject/*
populate itself sensibly when you commit in the superproject.

I mentioned refs/subprojects/path/branch before, but I think it would
probably be the sort of thing that should be in the .git/config

Sam.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-10 19:45         ` Linus Torvalds
@ 2007-04-11 23:47           ` Sam Vilain
  2007-04-12  0:13             ` Linus Torvalds
  0 siblings, 1 reply; 101+ messages in thread
From: Sam Vilain @ 2007-04-11 23:47 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Josef Weidendorfer, Git Mailing List, Junio C Hamano

Linus Torvalds wrote:
> On Tue, 10 Apr 2007, Josef Weidendorfer wrote:
>   
>> So when moving the kdelibs submodule around, you would
>> have to update the .gitmodules file.
>>     
>
> Right. The assumption here is:
>  - submodules almost never actually change. You might add a new one 
>    occasionally, and once a decade you might do some bigger 
>    re-organization, but in general it's pretty much static.
>  - when you do move submodules around, it's probably a big flag-day anyway 
>    (ie I would expect that it's a big reorg, and that you'd quite likely 
>    expect developers to have to re-check out their tree if you did major 
>    surgery).
>   

Also, in the Perl 5 Perforce conversion there are a number of
"submodules" (ie, bundled modules with their own history) that move
around a lot. In some tree representations used during the conversion
process they might even appear twice in a given tree with differing
versions.

Sam.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 5/6] Teach "fsck" not to follow subproject links
  2007-04-11 23:05           ` David Lang
@ 2007-04-11 23:53             ` Linus Torvalds
  2007-04-11 23:30               ` David Lang
                                 ` (2 more replies)
  0 siblings, 3 replies; 101+ messages in thread
From: Linus Torvalds @ 2007-04-11 23:53 UTC (permalink / raw
  To: David Lang; +Cc: Sam Vilain, Git Mailing List, Junio C Hamano



On Wed, 11 Apr 2007, David Lang wrote:
> 
> would it make sense to have a --multiple-project option for fsck that would
> let you specify multiple 'projects' that share a object set and have the
> default checking not do the reachability checks that cause problems in this
> case?

Well, the thing is, sharing object directories actually makes things 
*harder* to check, rather than easier.

It can be a nice space optimization, and yes, if there really is a lot of 
shared state, it can make it much cheaper to do some of the checks, but 
right now we have absolutely *no* way for fsck to then do the reachability 
check, because there is no way to tell fsck where all the refs are (since 
now the refs come in from multiple repositories!)

So the individual objects get cheaper to fsck (no need to fsck shared 
objects over and over again), but the reachability gets much harder to 
fsck.

It's not an insurmountable problem, or even necessarily a very large one, 
but it boils down to one very basic issue:

 - nobody seems to actually *use* the shared object directory model!

The thing is, with pack-files and alternates directories, a lot of the 
original reasons for shared object directories simply don't exist..

		Linus

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-11 15:16     ` Linus Torvalds
  2007-04-11 22:49       ` Sam Vilain
@ 2007-04-11 23:54       ` Martin Waitz
  2007-04-12  1:57         ` Brian Gernhardt
  2007-04-12 15:12         ` Josef Weidendorfer
  1 sibling, 2 replies; 101+ messages in thread
From: Martin Waitz @ 2007-04-11 23:54 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Git Mailing List, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 9635 bytes --]

hoi :)

On Wed, Apr 11, 2007 at 08:16:10AM -0700, Linus Torvalds wrote:
> Branches in submodules actually in many ways are *more* important than 
> branches in supermodules - it's just that with the CVS mentality, you 
> would never actually see that, because CVS obviously doesn't really 
> support such a notion.

I fully agree with you about the importance of submodule branches.
In fact, I want to make them even more important and useable!

And by the way, I long forgot about CVS ;-)


> So I'd argue that branches in submodules give you:
> 
>  - you can develop the submodule *independently* of the supermodule, but 
>    still be able to easily merge back and forth.
> 
>    Quite often, the submodule would be developed entirely _outside_ of the 
>    supermodule, and the "branch" that gets the most development would thus
>    actually be the "vendor branch", entirely outside the supermodule. Call 
>    that the "main" branch or whatever, inside the supermodule it would 
>    often be something like the remote "remotes/origin/master" branch.
> 
>    So inside the supermodule, the HEAD would generally point to something 
>    that is *not* necessarily the "main development" branch, because the 
>    supermodule maintainer would quite logically and often have his own 
>    modifications to the original project on that branch. It migth be a 
>    detached branch, or just a local branch inside the submodule.

I fully agree.

>  - branches inside submodules are *also* very useful even inside the 
>    supermodule, ie they again allow topic work to be fetched into the
>    submodule *without* having to actually be part of the supermodule,
>    or as a way to track a certain experimental branch of the supermodule.
> 
>    I suspect that most supermodule usage is as an "integrator" branch, 
>    which means that the supermodule tends to follow the "main 
>    development", and the whole point of the supermodule is largely to have 
>    a collection of "stable things that work together". 
> 
>    In contrast, branches within submodules are useful for doing all the 
>    development that is *not* yet ready to be committed to the supermodule, 
>    exactly because it's not yet been tested in the full "make World" kind 
>    of situation.

I fully agree.
You are just so much better in describing things than I am...

> > Whenever you do a checkout in the supermodule you also have to update
> > the submodule and this update has to change the same thing which is read
> > above.
> 
> I suspect (but will not guarantee) that the right approach is that a 
> supermodule checkout usually just uses a "detached HEAD" setup. Within the 
> context of the supermodule, only the actual commit SHA1 matters, not what 
> branch it was developed on (side note: I haven't decided if we should 
> allow the SHA1 to be a signed tag object too - the current patches 
> obviously don't care since they never follow the SHA1 anyway, and it might 
> be a good idea).

If you use a detached HEAD then you can no longer switch back to it
once you used some other (independent) branch (for testing or whatever).
This is my main argument: If you just update some 'special'
refs/heads/from-supermodule (or whatever, maybe get it from
.gitmodules/config) you can still switch between branches, making them
more useful IMHO.

If we create some other way to easily get to the commit referenced by
the index of the supermodule then a detached HEAD is ok for me, too.
But why create two things (this not-yet-existing way to get the
supermodule index entry, plus submodules HEAD) for the same thing?
Why not simply create a new refs/heads/whatever?
This is easy and everybody knows how to work with it.

> So I strongly suspect (and that is what the patch series embodies) that as 
> far as the supermodule is concerned, it should *not* matter at all what 
> branch the subproject was on. The subproject can use branches for 
> development, and the supermodule really doesn't care what the local 
> branchname was when a commit was made - because branch-names are *local* 
> things, and a branch that is called "experimental" in one environment 
> might be called "master" in another.

Fully agree.

Please don't confuse my "I always want to use one dedicated branch" with
"I always want to use one special branch from the submodule project".
This refs/heads/whatever I am talking about is _purely_ for ease of
use of the submodule inside the supermodule.  It is in no way linked
to the branchnames that are used by the submodule project.
Well, besides that you can merge back and forth between them, of course.

> So once the commit hits the superproject, the branch identities just go 
> away (only as far as the superproject is concerned, of course - the 
> subproject still stays with whatever branches it has), and the only thing 
> that matters is the commit SHA1.

Fully agree.

> > Updating the branch which HEAD points to is dangerous.
> 
> I would strongly suggest that the *superproject* never really change the 
> status of the subproject HEAD, except it updates it for "pull/reset", and 
> then it just would use whatever the subproject decided to use.
> 
> The subproject HEAD policy would be entirely under the control of the 
> subproject. If the subproject wants to use a branch to track the 
> superproject, go wild: have a real branch that is called "my-integration" 
> and make HEAD a symref to that (and thus any work in the superproject will 
> update that branch - something that is visible when you pull directly from 
> that subproject!)

So you now have this nice "my-integration" branch lying next to other
independent (not-supermodule-related) branches.
If you want to _switch_ to one of these unrelated branches you obviously
have to change HEAD, and suddenly your unrelated branches are
considered to be part of the supermodule (ok, not yet part of its
index of course, but now all supermodule operations would work on
this unrelated branch).

I want to preserve these unrelated branches and see them as a strong
feature.  Branches in submodules should be independent from the
supermodule _because_ the supermodule has no notion of which branch
is used.

> But quite often, I suspect that a subproject would just use a detached 
> HEAD. The subproject may have branches of its own, of course, but you can 
> think of HEAD as not being connected to any of it's "own" branches, but 
> simply being the "superproject branch". That's a fairly accurate picture 
> of reality, and using "detached HEAD" sounds like a very natural thing to 
> do in that situation.

Only that you loose your nice detached HEAD view once you start using
those nice branches inside your submodule.

> So I really think you can do both, and I think using HEAD inside the 
> superproject gives you exactly that flexibility - you can decide on a 
> per-subproject basis whether HEAD should track a real local branch in a 
> subproject, or whether it should be detached.
> 
> (Side note: if you do *not* use detatched HEAD, I suspect the .gitmodules 
> file could also contain the branchname to be used for the subproject 
> tracking, but I think that's a detail, and quite debatable)
> 
> > So my advice is:
> > Always read and write one dedicated branch (hardcoded "master" or
> > configurable) when the supermodule wants to access a submodule.
> 
> So the main reasons I don't think that is a good idea are:
> 
>  - it's less flexible: see above on why you might want to use a dedicated 
>    branch *or* just detached HEAD, and why you might want to choose your 
>    own name for the dedicated branch.

In terms of flexibility it is important what you can do with the
submodule.  Being able to use branches just like in a normal
repository ("switch the branch to go to an other, unrelated branch")
is a plus for me.

A detached HEAD does not give the same level of flexibility as a real
head.

>  - it's also going to be quite confusing when the superproject sees 
>    something *else* than what is actually checked out.

Well, the user explicitly expressed his intent to switch to another
branch!  In a normal repository you are not confused about the working
directory not being in sync with "master", and we always prominently state
which branch you are on.  Of course this has to be clear for submodules,
too.  So if you do git-status in the supermodule it should print some
"submodule is on different branch"-dirty marker.

At least I had some situations where I wanted to use something like
this: use some experimental brach which should not be directly touched
by the supermodule.  Instead provide a method ("git merge
from-supermodule") to sync your working branch with new stuff from
the supermodule.

>    This is an equally strong argument for just using HEAD - when we
>    actually implement a
> 
> 	 git diff --subproject
> 
>    flag that recurses into the subproject, if you don't use HEAD inside 
>    the subproject, that suddenly becomes a *very* confusing thing.

This is right.  Suddenly we have one more player in the field which
you can diff against.

Before submodules:
tree <-> index <-> working file

submodules always using HEAD:
tree <-> index <-> submodule HEAD <-> submodule working dir

submodules using some dedicated branch:
tree <-> index <-> subm. "from-supermodule" <-> subm. HEAD <-> subm. wd

I haven't thought about which diff really makes sense in which
situation.


-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 5/6] Teach "fsck" not to follow subproject links
  2007-04-11 23:53             ` Linus Torvalds
  2007-04-11 23:30               ` David Lang
@ 2007-04-12  0:00               ` Dana How
  2007-04-12  0:03               ` Sam Vilain
  2 siblings, 0 replies; 101+ messages in thread
From: Dana How @ 2007-04-12  0:00 UTC (permalink / raw
  To: Linus Torvalds
  Cc: David Lang, Sam Vilain, Git Mailing List, Junio C Hamano, danahow

On 4/11/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> It's not an insurmountable problem, or even necessarily a very large one,
> but it boils down to one very basic issue:
>
>  - nobody seems to actually *use* the shared object directory model!

Cool -- my previous email makes me either a git idiot or a git pioneer!

So I'll think through my usage model some more and
look over the fsck source.

Until then,
-- 
Dana L. How  danahow@gmail.com  +1 650 804 5991 cell

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 5/6] Teach "fsck" not to follow subproject links
  2007-04-11 23:53             ` Linus Torvalds
  2007-04-11 23:30               ` David Lang
  2007-04-12  0:00               ` Dana How
@ 2007-04-12  0:03               ` Sam Vilain
  2 siblings, 0 replies; 101+ messages in thread
From: Sam Vilain @ 2007-04-12  0:03 UTC (permalink / raw
  To: Linus Torvalds; +Cc: David Lang, Git Mailing List, Junio C Hamano

Linus Torvalds wrote:
> It can be a nice space optimization, and yes, if there really is a lot of 
> shared state, it can make it much cheaper to do some of the checks, but 
> right now we have absolutely *no* way for fsck to then do the reachability 
> check, because there is no way to tell fsck where all the refs are (since 
> now the refs come in from multiple repositories!)
>   

Well, not if the refs are only gitlinks because there is no checkout.

> So the individual objects get cheaper to fsck (no need to fsck shared 
> objects over and over again), but the reachability gets much harder to 
> fsck.
>
> It's not an insurmountable problem, or even necessarily a very large one, 
> but it boils down to one very basic issue:
>
>  - nobody seems to actually *use* the shared object directory model!
>
> The thing is, with pack-files and alternates directories, a lot of the 
> original reasons for shared object directories simply don't exist..

I think that's just the chicken-and-egg problem. Once this happens I
think we'll see people aggregating all sorts of related repositories
with this feature, and possibly making much richer histories by tracking
portions of their trees as subprojects rather than just a subdirectory.

Sam.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-10 20:33             ` Linus Torvalds
@ 2007-04-12  0:12               ` Sam Vilain
  2007-04-12  0:35                 ` Martin Waitz
  2007-04-12  2:01                 ` Linus Torvalds
  0 siblings, 2 replies; 101+ messages in thread
From: Sam Vilain @ 2007-04-12  0:12 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Junio C Hamano, Andy Parkins, git, Josef Weidendorfer

Linus Torvalds wrote:
> So there's a very real issue where a repository with submodules still 
> "works", even with a .gitmodules file that is totally scrogged and doesn't 
> have the right information (yet), it's just that it may simply not be able 
> to do all the operations because it cannot figure out where to pull 
> missing subproject data from etc..
>   

Whoa... "missing" subproject data?

Surely, unless you're doing lightweight/shallow clones, if you have a
gitlink you've also got the dependent repository? Otherwise the
reachability rule will be broken.

Sam.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-11 23:47           ` Sam Vilain
@ 2007-04-12  0:13             ` Linus Torvalds
  0 siblings, 0 replies; 101+ messages in thread
From: Linus Torvalds @ 2007-04-12  0:13 UTC (permalink / raw
  To: Sam Vilain; +Cc: Josef Weidendorfer, Git Mailing List, Junio C Hamano



On Thu, 12 Apr 2007, Sam Vilain wrote:
> 
> Also, in the Perl 5 Perforce conversion there are a number of
> "submodules" (ie, bundled modules with their own history) that move
> around a lot. In some tree representations used during the conversion
> process they might even appear twice in a given tree with differing
> versions.

That should actually be something that is fairly natural to handle with 
the current git submodule design - there's absolutely no problem with 
having the same subproject showing up in multiple different places in the 
tree (and each place obviously will have its own commit).

However, it causes some questions at two points:

 - What do you do in the ".gitmodules" file, where you describe the 
   submodule setup?

   This is not so much a _problem_ as a "how do you want to handle it" 
   issue.

   Would people want such a module to show up as "one module" that is just 
   visible in the tree in multiple places? Or do people prefer to think of 
   of it as completely separate modules that just happen to have the same 
   base repository?

   I don't think it's clear that one or the other is the "right way" to 
   see things, and I don't think git really should care. I suspect it's 
   more likely to be a detail that some importer script just has to 
   resolve one way or the other.

   The core git infrastructure needs to be able to have one module show up 
   in multiple places over time anyway, so I don't think there is any real 
   reason not to allow the same module to show up in multiple places even 
   within one single commit.. (Ie it's really mostly about the .gitmodules 
   file *syntax* - but if we use the config file syntax, it's actually 
   very natural to allow multiple entries for the module directory name)

   At the same time, there are reasons why you might want to consider them 
   separate modules too - maybe you want to *descibe* them separately, and 
   maybe one of the copies is used for "legacy support", and you might be 
   in a situation where you want to check out only one of the copies and 
   not the other (and thus describing them as two *different* modules 
   rather than two versions of the *same* module actually makes sense!).

   So I think this is something where we are technically neutral, but 
   where we may have non-technical issues to choose one representation 
   over another (and those issues may have more to do with the *importer* 
   than with any git issues - if importing from Perforce, it probably 
   makes most sense to make the import behave as much as possible the way 
   Perforce did in that case, and I have *no* idea what that is ;)

 - After a conversion is done, and you're no longer talking about a 
   historical archive, but a "going forward" concern, exactly how 
   automatic is subproject movement going to be, and what are downstream 
   developers that pull these things supposed to do when a subproject that 
   they have checked out is moved?

   This is mostly a UI issue. I suspect that the initial answer is: "you 
   may have to un-check-out a subproject, then pull the superproject, and 
   then re-check-it-out to get it in the new location". Simply because 
   it's going to be a lot easier to do than actually having "git pull" 
   notice when subprojects move.

   IOW, that is more of a "just how nice do we want to be to people", and 
   I _think_ the answer is: "as nice as possible, but some things are more 
   important than others, and some things might take longer before they 
   are really pleasant to do" ;)

Hmm?

		Linus

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 5/6] Teach "fsck" not to follow subproject links
  2007-04-11 23:16         ` Linus Torvalds
  2007-04-11 23:05           ` David Lang
@ 2007-04-12  0:34           ` Junio C Hamano
  2007-04-12  1:52             ` Linus Torvalds
  1 sibling, 1 reply; 101+ messages in thread
From: Junio C Hamano @ 2007-04-12  0:34 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Sam Vilain, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> So I think that the way to verify a superproject is:
>
>  - fsck each and every project totally independently. This is something 
>    you have to do *anyway*.
>
>  - either as you fsck, or as a separate phase after the fsck, just 
>    traverse the trees and spit out "these are the SHA1's of subprojects"
>
>  - finally, just go through the list of SHA1's (after every project has 
>    been fsck'd) and verify that they exist (since if they exist, they will 
>    have everything that is reachable from them, as that's one of the 
>    things that the *local* fsck verifies)

The small detail in the last step is wrong, though.  Even if
they EXIST, they may be isolated commits that are note connected
to refs, and fsck in the repository would not have warned about
unreachable trees from such unconnected commits.  So you would
need to do a reachability from these commits to the refs in the
subproject.

This would be similar to the quick-fetch topic I sent out a
couple of patches for, that implements logic to skip fetching
objects from your alternate.  You would have rev-list --objects
traverse from them with "--not --all" in the subproject
repository and make sure it does not trigger "I could not list
all objects reachable from the commits you wanted because such
and such tree/blob are missing".

    That reminds me of one thing I haven't verified.  I am not
    absolutely sure that rev-list --objects makes sure that
    blobs it lists exist (trees are checked as it needs to read
    them, and if they are missing or corrupt it would notice and
    barf).  When it is used for the purpose of this "subproject
    boundary fsck" and the quick-fetch, it should.  Perhaps a
    specialized option to check deeper than usual is needed.  I
    dunno.

> Notice? At no point do you actually need to do a "global fsck". You can do 
> totally independent local fsck's, and then a really cheap test of 
> connectedness once those fsck's have completed.

This is still true.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-12  0:12               ` Sam Vilain
@ 2007-04-12  0:35                 ` Martin Waitz
  2007-04-12  2:01                 ` Linus Torvalds
  1 sibling, 0 replies; 101+ messages in thread
From: Martin Waitz @ 2007-04-12  0:35 UTC (permalink / raw
  To: Sam Vilain
  Cc: Linus Torvalds, Junio C Hamano, Andy Parkins, git,
	Josef Weidendorfer

[-- Attachment #1: Type: text/plain, Size: 1075 bytes --]

hoi :)

On Thu, Apr 12, 2007 at 12:12:59PM +1200, Sam Vilain wrote:
> Linus Torvalds wrote:
> > So there's a very real issue where a repository with submodules still 
> > "works", even with a .gitmodules file that is totally scrogged and doesn't 
> > have the right information (yet), it's just that it may simply not be able 
> > to do all the operations because it cannot figure out where to pull 
> > missing subproject data from etc..
> >   
> 
> Whoa... "missing" subproject data?
> 
> Surely, unless you're doing lightweight/shallow clones, if you have a
> gitlink you've also got the dependent repository? Otherwise the
> reachability rule will be broken.

With submodules you actually have a natural cutting point where
you can say: no, I don't want to get that.
So for submodules the reachability rule is a little bit more relaxed.

And when you fetch the superproject you now need some way to fetch
the new submodule objects.  They may be in the same upstream repository
but it may make sense to have this configurable.

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-10 18:45     ` Linus Torvalds
  2007-04-10 19:04       ` Andy Parkins
  2007-04-10 19:29       ` Josef Weidendorfer
@ 2007-04-12  0:42       ` Torgil Svensson
  2007-04-12  0:56         ` Martin Waitz
  2 siblings, 1 reply; 101+ messages in thread
From: Torgil Svensson @ 2007-04-12  0:42 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Josef Weidendorfer, Git Mailing List, Junio C Hamano

On 4/10/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:

>  - I tend to like "minimal",

>  - in a "link" object, the only thing that would normally *change* is
>   really just the commit SHA1. Everything else is really pretty static.

I like this concept.


>       [module "kdelibs"]
>               dir = kdelibs
>               url = git://git.kde.org/kdelibs
>               description = "Basic KDE libraries module"
>
>       [module "base"]
>               alias = "kdelibs", "kdebase", "kdenetwork"

I guess this file could also cover the case where the superproject is
only interested in a small subset of the subproject. For example if I
only uses some header-files in a library and want
"/lib1/src/interface" in the subproject end up as "/includes/lib1" in
the superproject. Could single files be handled in a similar way?

Although this is just an example, external links shouldn't be
specified in the same configuration file as project internal things
(which should be version-controlled). If the url configuration gets
overwritten with checkouts there will be problems bisecting if the url
changes over time.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-12  0:42       ` Torgil Svensson
@ 2007-04-12  0:56         ` Martin Waitz
  2007-04-12 21:23           ` Torgil Svensson
  0 siblings, 1 reply; 101+ messages in thread
From: Martin Waitz @ 2007-04-12  0:56 UTC (permalink / raw
  To: Torgil Svensson
  Cc: Linus Torvalds, Josef Weidendorfer, Git Mailing List,
	Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1355 bytes --]

hoi :)

On Thu, Apr 12, 2007 at 02:42:43AM +0200, Torgil Svensson wrote:
> I guess this file could also cover the case where the superproject is
> only interested in a small subset of the subproject. For example if I
> only uses some header-files in a library and want
> "/lib1/src/interface" in the subproject end up as "/includes/lib1" in
> the superproject. Could single files be handled in a similar way?

Conceptionally this information would have to be part of the
supermodule tree (after all it changes how your tree is set up).

I think it makes more sense to make users think about which part
of their tree can be reused and make them choose submodule boundaries
wisely so that the above partial-checkout is not needed.

> Although this is just an example, external links shouldn't be
> specified in the same configuration file as project internal things
> (which should be version-controlled). If the url configuration gets
> overwritten with checkouts there will be problems bisecting if the url
> changes over time.

Most of the time we may not need to add any per-submodule URL
information anyway.  If you fetch a new supermodule version, you
can get the new submodule from the same source (or from a per-submodule
source which can be determined by looking at and munching the supermodule URL).

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 5/6] Teach "fsck" not to follow subproject links
  2007-04-12  0:34           ` Junio C Hamano
@ 2007-04-12  1:52             ` Linus Torvalds
  2007-04-12  2:00               ` Junio C Hamano
  0 siblings, 1 reply; 101+ messages in thread
From: Linus Torvalds @ 2007-04-12  1:52 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Sam Vilain, Git Mailing List



On Wed, 11 Apr 2007, Junio C Hamano wrote:
>
> The small detail in the last step is wrong, though.  Even if
> they EXIST, they may be isolated commits that are note connected
> to refs, and fsck in the repository would not have warned about
> unreachable trees from such unconnected commits.

The superproject *is* a ref.

You cannot prune the subprojects on their own. That's the *only* real 
special rule about subprojects. Exactly because pruning them on their own 
is not a valid op to do.

It's the same way with an source of "alternate" objects (or a shared 
object directory) - you'd better not prune them, because other projects 
may have refs to them that you don't know about locally. So this isn't 
somethign new to subprojects.

		Linus

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-11 23:54       ` Martin Waitz
@ 2007-04-12  1:57         ` Brian Gernhardt
  2007-04-12 15:12         ` Josef Weidendorfer
  1 sibling, 0 replies; 101+ messages in thread
From: Brian Gernhardt @ 2007-04-12  1:57 UTC (permalink / raw
  To: Martin Waitz; +Cc: Linus Torvalds, Git Mailing List, Junio C Hamano


On Apr 11, 2007, at 7:54 PM, Martin Waitz wrote:

> Before submodules:
> tree <-> index <-> working file
>
> submodules always using HEAD:
> tree <-> index <-> submodule HEAD <-> submodule working dir
>
> submodules using some dedicated branch:
> tree <-> index <-> subm. "from-supermodule" <-> subm. HEAD <->  
> subm. wd

Why can't can't we extend checkout with an option to look for an  
enclosing git project, find the gitlink in the index, and check out  
that commit?  That allows you to return to the original state without  
needing to bother with new special branches.

And instead of recording the path in a .gitmodules file, why not a  
list of git directories we search for the commit?  Allows moving of  
subprojects without suddenly breaking configuration files.  When we  
find the appropriate git dir, we can use a .gitlink file or symlinks  
to attach the directory to it's repository.

I dislike moving git in the direction of enforcing more policy  
instead of less, and of making it less capable of handling content  
movement instead of more.

~~ Brian

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 5/6] Teach "fsck" not to follow subproject links
  2007-04-12  1:52             ` Linus Torvalds
@ 2007-04-12  2:00               ` Junio C Hamano
  2007-04-12  2:06                 ` Junio C Hamano
  0 siblings, 1 reply; 101+ messages in thread
From: Junio C Hamano @ 2007-04-12  2:00 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Sam Vilain, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Wed, 11 Apr 2007, Junio C Hamano wrote:
>>
>> The small detail in the last step is wrong, though.  Even if
>> they EXIST, they may be isolated commits that are note connected
>> to refs, and fsck in the repository would not have warned about
>> unreachable trees from such unconnected commits.
>
> The superproject *is* a ref.

But when you fsck the subproject repository in isolation in the
earlier step in your procedure, that is not taken into account,
is it?

The situation I had in mind was not about pruning, but an
earlier fetch, either the native one that unpacks the objects
into loose form or a http walker, fetched a commit near the tip
but was interrupted/killed before finishing the fetch nor
updating the ref.  The tip of such an incomplete commit chain
would be reported dangling.  They are ahead of your refs but
they may lack commits and trees to complete the chain back to
your refs yet.  When the higher-level project points at such a
commit, the existence of the commit is not a proof that
everything needed to complete the commit is available.

We need to prove that separately, and that was my suggestion to
run a "rev-list --objects $those-commits --not --all" in the
subproject repository, simlar to what the quick-fetch topic
does.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-12  0:12               ` Sam Vilain
  2007-04-12  0:35                 ` Martin Waitz
@ 2007-04-12  2:01                 ` Linus Torvalds
  2007-04-12  3:56                   ` Sam Vilain
  1 sibling, 1 reply; 101+ messages in thread
From: Linus Torvalds @ 2007-04-12  2:01 UTC (permalink / raw
  To: Sam Vilain; +Cc: Junio C Hamano, Andy Parkins, git, Josef Weidendorfer



[ Dang. Power failure in the middle of writing emails. Can't remember 
  which one was lost. Am rewriting some of this reply in abbreviated form.  ]

On Thu, 12 Apr 2007, Sam Vilain wrote:
>
> Linus Torvalds wrote:
> > So there's a very real issue where a repository with submodules still 
> > "works", even with a .gitmodules file that is totally scrogged and doesn't 
> > have the right information (yet), it's just that it may simply not be able 
> > to do all the operations because it cannot figure out where to pull 
> > missing subproject data from etc..
> >   
> 
> Whoa... "missing" subproject data?

Absolutely. Not just subproject data. The whole subproject is often 
missing.

If I fetch the KDE superproject, I generally do *not* want every single 
subproject. In fact, I'd likely just want one or two subprojects.

The notion that all subprojects are populated is a *bug*. I would 
personally refuse to use such a setup. Even CVS can handle that just fine, 
we certainly don't want to be worse than CVS here.

If you just track a project, it's quite common to only check out the "src" 
module, and *not* fetch things like the "validation" or "test" module if 
you're just following along. 

Or you might fetch the "kdebase" module, but that sure doesn't mean that 
you want all the other ones (kdevelop source code? full kdelibs sources? 
If I'm only interested in kwin and some other random app? No thanks!).

> Surely, unless you're doing lightweight/shallow clones, if you have a
> gitlink you've also got the dependent repository? Otherwise the
> reachability rule will be broken.

The reachability rule *must* be breakable. That's why fsck currently 
doesn't care AT ALL.

It's much better to break that rule than to even check it! I'd rather 
leave fsck like it is now, than to *ever* fix it, if the "fix" involves 
"you have to always fetch all submodules to shut fsck up".

		Linus

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 5/6] Teach "fsck" not to follow subproject links
  2007-04-12  2:00               ` Junio C Hamano
@ 2007-04-12  2:06                 ` Junio C Hamano
  2007-04-12  2:28                   ` Linus Torvalds
  0 siblings, 1 reply; 101+ messages in thread
From: Junio C Hamano @ 2007-04-12  2:06 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Sam Vilain, Git Mailing List

Junio C Hamano <junkio@cox.net> writes:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> On Wed, 11 Apr 2007, Junio C Hamano wrote:
>>>
>>> The small detail in the last step is wrong, though.  Even if
>>> they EXIST, they may be isolated commits that are note connected
>>> to refs, and fsck in the repository would not have warned about
>>> unreachable trees from such unconnected commits.
>>
>> The superproject *is* a ref.
>
> But when you fsck the subproject repository in isolation in the
> earlier step in your procedure, that is not taken into account,
> is it?

Ah, forget about this.  The HEAD, which is in the tree of the
higher-level project, is a ref.  Silly me.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 5/6] Teach "fsck" not to follow subproject links
  2007-04-11 23:30               ` David Lang
@ 2007-04-12  2:14                 ` Linus Torvalds
  2007-04-12  2:30                   ` Junio C Hamano
                                     ` (2 more replies)
  0 siblings, 3 replies; 101+ messages in thread
From: Linus Torvalds @ 2007-04-12  2:14 UTC (permalink / raw
  To: David Lang; +Cc: Sam Vilain, Git Mailing List, Junio C Hamano



On Wed, 11 Apr 2007, David Lang wrote:
>
> On Wed, 11 Apr 2007, Linus Torvalds wrote:
> > 
> > It can be a nice space optimization, and yes, if there really is a lot of
> > shared state, it can make it much cheaper to do some of the checks, but
> > right now we have absolutely *no* way for fsck to then do the reachability
> > check, because there is no way to tell fsck where all the refs are (since
> > now the refs come in from multiple repositories!)
> 
> this is why I was suggesting a --multiple-project option to let you tell fsck
> about all of the repositories that it needs to look for refs in.

Well, just from a personal observation:
 - I would *personally* actually refuse to share objects with anybody 
   else.

I just find the idea too scary. Somebody doing something bad to their 
object store by mistake (running "git prune" without realizing that there 
are *my* objects there too, or just deciding that they want to play with 
the object directory by hand, or running a new fancy experimental importer 
that has a subtle bug wrt object handling or anything like that).

I'll endorse use "alternates" files, but partly because I know the main 
project is safe (any alternates usage is in the "satellite" clones anyway, 
and they will never write to the alternate object directory), and partly 
because at least for the kernel, we don't have branches that get reset in 
the main project, so there's no reason to fear that a "git repack -a -d" 
will ever screw up any of the satellite repositories even by mistake.

But for git projects, even alternates isn't safe, in case somebody bases 
their own work on a version of "pu" that eventually goes away (even with 
reflogs, pruning *eventually* takes place).

So I tend to think that alternates and shared object directories are 
really for "temporary" stuff, or for *managed* repositories that are at 
git *hosting* sites (eg repo.or.cz), and where there is some other safety 
involved, ie users don't actually access the object directories directly 
in any way.

So I've at least personally come to the conclusion that for a *developer* 
(as opposed to a hosting site!), shared object directories just never make 
sense. The downsides are just too big. Even alternates is something where 
you just need to be fairly careful!

		Linus

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 5/6] Teach "fsck" not to follow subproject links
  2007-04-12  2:06                 ` Junio C Hamano
@ 2007-04-12  2:28                   ` Linus Torvalds
  0 siblings, 0 replies; 101+ messages in thread
From: Linus Torvalds @ 2007-04-12  2:28 UTC (permalink / raw
  To: Junio C Hamano; +Cc: Sam Vilain, Git Mailing List



On Wed, 11 Apr 2007, Junio C Hamano wrote:
> 
> Ah, forget about this.  The HEAD, which is in the tree of the
> higher-level project, is a ref.  Silly me.

Well, not entirely "silly you".

If you do a "git reset" in the superproject, that will obviously have to 
rewrite the heads in the subproject.

I do suspect that we should always enable reflogs for the subprojects, so 
that pruning is safe even for these kinds of situations, but that 
doesn't resolve all issues.

For example: to manage *cloning* of the extra stuff, you might actually 
want to have externally visible refs, and while I suspect the main 
solution will always be to just do good maintenance (ie "don't do 'git 
bisect' and _never_ rewrite history in the main superproject!!"), I don't 
think it's out of the question to add other safety nets too..

So for example, while I'm not sure it's necessary, I don't think it would 
be *wrong* if we might eventually end up having *other* safety features 
like adding a totally separate "refs/superprojects/xyzzy" ref structure. 

Or something like that.. Just to make the refs more visible both 
externally and internally, and to make it much harder to make stupid 
mistakes without realizing it.

I suspect a lot of this will depend on just how many mistakes people make. 
I don't think we've so far had a single problem with alternates files, 
re-basing, and people then pruning away objects used by other repositories 
by mistake, so maybe people really don't make those kinds of mistakes.

So maybe we don't need any extra safety nets at all. But who knows..

		Linus

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 5/6] Teach "fsck" not to follow subproject links
  2007-04-12  2:14                 ` Linus Torvalds
@ 2007-04-12  2:30                   ` Junio C Hamano
  2007-04-12 17:18                   ` David Lang
  2007-04-12 18:32                   ` Dana How
  2 siblings, 0 replies; 101+ messages in thread
From: Junio C Hamano @ 2007-04-12  2:30 UTC (permalink / raw
  To: Linus Torvalds; +Cc: David Lang, Sam Vilain, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> But for git projects, even alternates isn't safe, in case somebody bases 
> their own work on a version of "pu" that eventually goes away (even with 
> reflogs, pruning *eventually* takes place).
>
> So I tend to think that alternates and shared object directories are 
> really for "temporary" stuff, or for *managed* repositories that are at 
> git *hosting* sites (eg repo.or.cz), and where there is some other safety 
> involved, ie users don't actually access the object directories directly 
> in any way.

Actually that is not even true for repo.or.cz -- the site lets
people to create *forks* of the main project, and I recall it is
implemented in terms of alternates.

That's one of the reasons I never asked to take over git.git
repository there.  I have alt-git.git instead, which does not
allow forks.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-12  2:01                 ` Linus Torvalds
@ 2007-04-12  3:56                   ` Sam Vilain
  0 siblings, 0 replies; 101+ messages in thread
From: Sam Vilain @ 2007-04-12  3:56 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Junio C Hamano, Andy Parkins, git, Josef Weidendorfer

Linus Torvalds wrote:
>> Whoa... "missing" subproject data?
>>     
> Absolutely. Not just subproject data. The whole subproject is often 
> missing.
>
> If I fetch the KDE superproject, I generally do *not* want every single 
> subproject. In fact, I'd likely just want one or two subprojects.
>   

Ok, but couldn't this be considered a variation of a lightweight checkout?

The only reason I'm worried about this is the case where the
superproject contains *thousands* of subprojects. Eg, a superproject for
all repo.or.cz projects. Say in a day 200 projects get updated with a
few commits - do you have to do 200 pulls or just one? But maybe that
problem can be solved in another way, or maybe it won't really hurt so
much in practice and still be faster/more efficient than rsync mirroring.

This is especially the case in concert with gittorrent, which will need
modifications to support sharing multiple repositories (not that that's
a huge issue, given there's no implementation yet).

>> Surely, unless you're doing lightweight/shallow clones, if you have a
>> gitlink you've also got the dependent repository? Otherwise the
>> reachability rule will be broken.
>>     
>
> The reachability rule *must* be breakable. That's why fsck currently 
> doesn't care AT ALL.
>
> It's much better to break that rule than to even check it! I'd rather 
> leave fsck like it is now, than to *ever* fix it, if the "fix" involves 
> "you have to always fetch all submodules to shut fsck up".
>   

Well fsck can be fixed easily enough to not descend, like lightweight
checkouts.

What I really want to avoid is the situation where you can't checkout,
even though you didn't indicate a shallow/lightweight clone.

What else might this decision impact? Obviously with a smaller base you
have fewer delta targets, though that's probably not a real issue.

Sam.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-11 23:54       ` Martin Waitz
  2007-04-12  1:57         ` Brian Gernhardt
@ 2007-04-12 15:12         ` Josef Weidendorfer
  1 sibling, 0 replies; 101+ messages in thread
From: Josef Weidendorfer @ 2007-04-12 15:12 UTC (permalink / raw
  To: Martin Waitz; +Cc: Git Mailing List, Junio C Hamano, Linus Torvalds

On Thursday 12 April 2007, you wrote:
> If you use a detached HEAD then you can no longer switch back to it
> once you used some other (independent) branch (for testing or whatever).
> This is my main argument: If you just update some 'special'
> refs/heads/from-supermodule (or whatever, maybe get it from
> .gitmodules/config) you can still switch between branches, making them
> more useful IMHO.

The supermodule checkout could create a .git/SUPER_HEAD for this.
OK, that is a special kind of reference.

Or introduce "git --super ..." with works with the superproject.
Form a submodule directory, a "git --super checkout ." could reset the
submodule checkout. 

Josef

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 5/6] Teach "fsck" not to follow subproject links
  2007-04-12  2:14                 ` Linus Torvalds
  2007-04-12  2:30                   ` Junio C Hamano
@ 2007-04-12 17:18                   ` David Lang
  2007-04-12 18:32                   ` Dana How
  2 siblings, 0 replies; 101+ messages in thread
From: David Lang @ 2007-04-12 17:18 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Sam Vilain, Git Mailing List, Junio C Hamano

On Wed, 11 Apr 2007, Linus Torvalds wrote:

> So I tend to think that alternates and shared object directories are
> really for "temporary" stuff, or for *managed* repositories that are at
> git *hosting* sites (eg repo.or.cz), and where there is some other safety
> involved, ie users don't actually access the object directories directly
> in any way.
>
> So I've at least personally come to the conclusion that for a *developer*
> (as opposed to a hosting site!), shared object directories just never make
> sense. The downsides are just too big. Even alternates is something where
> you just need to be fairly careful!

I was actually thinking that hosting sites (and things like gitorrent) would be 
the ones that would get the most benifit from shareing objects. the amount saved 
for any individual developer is probably fairly minor (and the individual 
developer could run a script to look across their objects and hard-link them 
togeather if they care about the space)

David Lang

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 5/6] Teach "fsck" not to follow subproject links
  2007-04-12  2:14                 ` Linus Torvalds
  2007-04-12  2:30                   ` Junio C Hamano
  2007-04-12 17:18                   ` David Lang
@ 2007-04-12 18:32                   ` Dana How
  2007-04-12 19:17                     ` Linus Torvalds
  2 siblings, 1 reply; 101+ messages in thread
From: Dana How @ 2007-04-12 18:32 UTC (permalink / raw
  To: Linus Torvalds
  Cc: David Lang, Sam Vilain, Git Mailing List, Junio C Hamano, danahow

On 4/11/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Wed, 11 Apr 2007, David Lang wrote:
> > this is why I was suggesting a --multiple-project option to let you tell fsck
> > about all of the repositories that it needs to look for refs in.
>
> Well, just from a personal observation:
>  - I would *personally* actually refuse to share objects with anybody
>    else.
>
> I just find the idea too scary. Somebody doing something bad to their
> object store by mistake (running "git prune" without realizing that there
> are *my* objects there too, or just deciding that they want to play with
> the object directory by hand, or running a new fancy experimental importer
> that has a subtle bug wrt object handling or anything like that).
>
> I'll endorse use "alternates" files, but partly because I know the main
> project is safe (any alternates usage is in the "satellite" clones anyway,
> and they will never write to the alternate object directory), and partly
> because at least for the kernel, we don't have branches that get reset in
> the main project, so there's no reason to fear that a "git repack -a -d"
> will ever screw up any of the satellite repositories even by mistake.
>
> But for git projects, even alternates isn't safe, in case somebody bases
> their own work on a version of "pu" that eventually goes away (even with
> reflogs, pruning *eventually* takes place).
>
> So I tend to think that alternates and shared object directories are
> really for "temporary" stuff, or for *managed* repositories that are at
> git *hosting* sites (eg repo.or.cz), and where there is some other safety
> involved, ie users don't actually access the object directories directly
> in any way.
>
> So I've at least personally come to the conclusion that for a *developer*
> (as opposed to a hosting site!), shared object directories just never make
> sense. The downsides are just too big. Even alternates is something where
> you just need to be fairly careful!

These arguments all seem pretty convincing to me --
maybe the problem is that I'm not a "*developer*" right now.
Instead I'm part of a multi-developer *site*.
Below I talk about a possible way we could use git
without changing it (since I recognize this would be a minority usage pattern).

We use perforce to manage a mixed hardware/software project
(I'm the 55GB check-out guy, remember?).  We have at least 3 different
kinds of data with different usage patterns, and using perforce for
everything in one centralized server was not the best solution.

Each user ("client") has their own worktree and the perforce
repository is on a shared central server.  You can consider perforce
to have the equivalent of git's index, but it is stored on the server,
in one file ("db.have") covering all clients.  Obviously that becomes a
bottleneck -- and recently db.have got larger than the total cache RAM on
the server, which really slowed things down until we moved to a larger
server.  But repository architecture aside,  the real problem has been
perforce's usability.  Frequently one contributor,  having gotten ahead
of the team,  needs to share this more recent work with only a few
people.  This could be done with p4 branching,  but this is really clunky.
So instead the work is pushed out (submitted) to everyone, causing
instability; this is partially remedied by doing it in smaller chunks.
Another perforce problem is that tagging consumes a lot of server
space (and may slow things down as well).

Some of this data will stay in perforce, some will move into revision
control built-in to some of our other tools, and I'd like to try to move some
of it into git.  The main attraction for the last group is the lightweight
branching that would allow early/tentative work to be easily shared.
I think the subproject work currently being discussed is going to
be very helpful as well -- the perforce equivalent is chaotic.

We could give each user a work tree and an object repository,
and then have a "release" repository.  Unfortunately,  this would be
slower to use than the current perforce "solution": users would check
in to their local repository, at the speed of gzip, anyone checking
it out would do so at the speed of gzip, and all work would need
to be resubmitted (using perforce jargon here) to the central repo,
again at the speed of gzip.  Currently, people either submit or
check out from the central repo, and it's all done at the speed of
a network copy.  This speed issue is important because of
the size of a commit we'd like to share (but not yet release):
about 40 files, half of them control files of several KB each, 1/4 of them
design files of several MB each, and the last 1/4 detailed design
files 100X larger.  These 40 files will reference (include) 50 others
of several KB each sprinkled through-out the hierarchy, a few of which
might have changed.  And yes, almost all of these are generated files,
but the generation time, and the instability of the tool and script environment,
preclude forcing the other users to regenerate them, like you would
with a .o file.

So, there are 2 alternative set-ups. In one, everyone uses a shared
object repository (everyone's .git/objects is a symlink to it). In this
repository, objects/. , objects/?? , objects/pack , and objects/info all
have "sticky" set, and we do the appropriate machinations to make
all files read-only. There would be an additional phantom user "git"
who owns the shared object repository (the only user whose .git/objects
is not a symlink).  Users would commit to their own repositories,
which would write data to the shared object repository and
update their refs (e.g. HEAD). To "release", push to the ~git repository.
This push would be like a current push -- fast-forward only, figure out the list
of objects that need to be transmitted -- but instead of transmitting the
objects, change their ownership to ~git and then update ~git's refs.
Since users can share local commits, maybe the ~git ownership
change should happen at commit time.  This all seems do-able
without change in git; instead I'd add a few bash wrapper scripts
(and see below for fsck and pack/prune).

Another setup is like the previous, but make the central repo have
its own hidden object repository. You would push to it using the
standard git command.

Finally, users could run git-fsck [with misleading output];
they could run git-prune{,-packed}, but these commands wouldn't
be able to delete anything.  If we don't want users to pack,
then ~git/.git/objects/pack would be writable only by ~git.
So basically, normal people wouldn't do the things in this paragraph.

To do meaningful and safe fsck/prune on the shared repository
as ~git,  I'd add some scripting.  If you require all users'
GIT_DIR's to look like /home/USER/*/.git , then you can get all
their refs and do a meaningful fsck.  If not, you could do a fsck
--unreachable as ~git and filter the result by date and/or type.
(This sort of corresponds to abandoned changesets in perforce.)
Once you have an fsck method you like, its filtered output (i.e.,
--unreachable objects you want to keep) can be fed to git-prune.

Care would also be required with git-repack/git-prune-packed,
but it seems mostly addressable with scheduling.

If I proceed down this path,  I'd like to implement this procedure
without any change in git's .c or .sh files.  It's clear this is a
minority use and should not depend on anything being maintained
for it inside git.  I would write a few bash scripts and a README/HOWTO
for possible inclusion in contrib.

BTW,
has anyone ever thought of writing an "Administrator's Manual" for git?

Thanks,
-- 
Dana L. How  danahow@gmail.com  +1 650 804 5991 cell

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 5/6] Teach "fsck" not to follow subproject links
  2007-04-12 18:32                   ` Dana How
@ 2007-04-12 19:17                     ` Linus Torvalds
  2007-04-13  9:00                       ` Rogan Dawes
  2007-04-15  6:50                       ` Dana How
  0 siblings, 2 replies; 101+ messages in thread
From: Linus Torvalds @ 2007-04-12 19:17 UTC (permalink / raw
  To: Dana How; +Cc: David Lang, Sam Vilain, Git Mailing List, Junio C Hamano



On Thu, 12 Apr 2007, Dana How wrote:
> 
> These arguments all seem pretty convincing to me --
> maybe the problem is that I'm not a "*developer*" right now.
> Instead I'm part of a multi-developer *site*.

Yes.

The issues for hosting sites are very different from the issues of 
individual developers having their own git repositories, and I agree 100% 
that both alternates and shared object directories make tons of sense for 
hosting.

> Below I talk about a possible way we could use git
> without changing it (since I recognize this would be a minority usage
> pattern).

I hope it wouldn't even be a minority usage pattern. I am a firm believer 
that distributed SCM's and git in particular makes a lot more sense for 
source control hosting than CVS or SVN do. I'm really disappointed with 
things like sourceforge, and part of the problem is literally that a 
centralized SCM is really *fundamentally* wrong for a hosting entity. 

Using a distributed SCM just makes _so_ much more sense for hosting 
projects, and I've actually very much wanted to try to make sure that git 
can help people who host things. 

It's not my *own* primary use, but I think it's a very important usage 
pattern, even though it's very different froma "normal developer" private 
sandbox case.

So I think your case is really very interesting. I'd love to help figure 
out how to help you guys with git, but because it's not how I personally 
work, I can really just try to help when you actually hit a problem - 
you'll have to figure out what your usage patterns actually are on your 
own ;)

And btw, I think the shared object model really works very well, but I 
think it has to be paired with some stricter rules than people who use 
their own repos tend to have. For example, end-point developers have 
become very used to rebasing and generally rewriting history (or just 
resetting to an older state), and that's something that works find in a 
"local repository" setup, but it's also the kinds of patterns that can 
really screw you in a hosted and shared-object environment.

As to your two setups: I would suggest you go with the "hidden" shared 
version (ie people use the remote access pull/push to a server, and the 
*server* uses a shared object repository for multiple repositories), 
rather than having a user-visible globally shared object directory. Even 
with sticky bits and controlled group access etc, I think it's just safer 
to have that extra level of indirection.

(Partly because a globally visible shared object directory also implies 
that you'd use a networked filesystem, and I suspect a lot of developers 
would actually be a lot happier having their own development repositories 
on their own local disks, or at least some "group disk", rather than have 
one big and performance-critical network share. Even if you use some 
competent NetApp box and a modern network filesystem, it's just one less 
critical infrastructure piece that needs to be really beefy).

		Linus

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 6/6] Teach core object handling functions about gitlinks
  2007-04-12  0:56         ` Martin Waitz
@ 2007-04-12 21:23           ` Torgil Svensson
  0 siblings, 0 replies; 101+ messages in thread
From: Torgil Svensson @ 2007-04-12 21:23 UTC (permalink / raw
  To: Martin Waitz
  Cc: Linus Torvalds, Josef Weidendorfer, Git Mailing List,
	Junio C Hamano

On 4/12/07, Martin Waitz <tali@admingilde.org> wrote:

> On Thu, Apr 12, 2007 at 02:42:43AM +0200, Torgil Svensson wrote:
> > I guess this file could also cover the case where the superproject is
> > only interested in a small subset of the subproject. For example if I
> > only uses some header-files in a library and want
> > "/lib1/src/interface" in the subproject end up as "/includes/lib1" in
> > the superproject. Could single files be handled in a similar way?
>
> Conceptionally this information would have to be part of the
> supermodule tree (after all it changes how your tree is set up).

I agree. This could be included in the module config file which in
turn is version-controlled.


> I think it makes more sense to make users think about which part
> of their tree can be reused and make them choose submodule boundaries
> wisely so that the above partial-checkout is not needed.

Sometimes you can't control upstream projects the way you want it.
Also, splitting up projects for the potential need of future
superprojects has several obvious disadvantages (multiple changelogs,
versions etc). I don't see the subfolder checkout thing as a problem
since the core plumbing in Linus's implementation doesn't care what's
beneath the commit link. The subfolder checkout can "easily" be done
in a porcelain.

It's more problematic if you want to cherry-pick individual files in a
subproject. Here, I think the tight connection between links and
directories to be too restrictive. Why does a subproject commit-link
have to be represented as a folder?

//Torgil

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 5/6] Teach "fsck" not to follow subproject links
  2007-04-12 19:17                     ` Linus Torvalds
@ 2007-04-13  9:00                       ` Rogan Dawes
  2007-04-13 15:23                         ` Linus Torvalds
  2007-04-15  6:50                       ` Dana How
  1 sibling, 1 reply; 101+ messages in thread
From: Rogan Dawes @ 2007-04-13  9:00 UTC (permalink / raw
  To: Linus Torvalds
  Cc: Dana How, David Lang, Sam Vilain, Git Mailing List,
	Junio C Hamano

Linus Torvalds wrote:
> 
> Yes.
> 
> The issues for hosting sites are very different from the issues of 
> individual developers having their own git repositories, and I agree 100% 
> that both alternates and shared object directories make tons of sense for 
> hosting.
> 
>> Below I talk about a possible way we could use git
>> without changing it (since I recognize this would be a minority usage
>> pattern).
> 
> I hope it wouldn't even be a minority usage pattern. I am a firm believer 
> that distributed SCM's and git in particular makes a lot more sense for 
> source control hosting than CVS or SVN do. I'm really disappointed with 
> things like sourceforge, and part of the problem is literally that a 
> centralized SCM is really *fundamentally* wrong for a hosting entity. 
> 
> Using a distributed SCM just makes _so_ much more sense for hosting 
> projects, and I've actually very much wanted to try to make sure that git 
> can help people who host things. 


> And btw, I think the shared object model really works very well, but I 
> think it has to be paired with some stricter rules than people who use 
> their own repos tend to have. For example, end-point developers have 
> become very used to rebasing and generally rewriting history (or just 
> resetting to an older state), and that's something that works find in a 
> "local repository" setup, but it's also the kinds of patterns that can 
> really screw you in a hosted and shared-object environment.
> 

Would it not make sense for a hosting environment to say, if you are 
using alternates, or shared object directories, then you need to include 
*all* the refs in *all* the projects if you ever do an fsck?

I'm not sure how well git will scale in this case, although it just 
should be a matter of how well git scales to dealing with a single 
project with tens of thousands of refs/tags/etc. The only problem might 
be in passing all those refs/tags to fsck in one go. STDIN, I guess?

Rogan

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 5/6] Teach "fsck" not to follow subproject links
  2007-04-13  9:00                       ` Rogan Dawes
@ 2007-04-13 15:23                         ` Linus Torvalds
  0 siblings, 0 replies; 101+ messages in thread
From: Linus Torvalds @ 2007-04-13 15:23 UTC (permalink / raw
  To: Rogan Dawes
  Cc: Dana How, David Lang, Sam Vilain, Git Mailing List,
	Junio C Hamano



On Fri, 13 Apr 2007, Rogan Dawes wrote:
> 
> Would it not make sense for a hosting environment to say, if you are using
> alternates, or shared object directories, then you need to include *all* the
> refs in *all* the projects if you ever do an fsck?

Yes. And it shouldn't be hard to add support to do it. It's just not been 
done.

A lot of git programs already take refs on stdin, but fsck just doesn't do 
it (it can do it from the command line, but you'd run out of command line 
space very quickly).

More natural would be to just list all the git repos by git repo pathname 
(and there, usually the command line probably *is* long enough), but 
somebody would just have to do it. It's probably not very much code: just 
iterate over each repo both when adding refs and when actually doing the 
fsck itself.

> I'm not sure how well git will scale in this case, although it just should be
> a matter of how well git scales to dealing with a single project with tens of
> thousands of refs/tags/etc. The only problem might be in passing all those
> refs/tags to fsck in one go. STDIN, I guess?

For a real shared object directory, passing the refs to stdin (and 
teaching fsck about a "--stdin" flag) would be consistent with what we do 
for many other commands, so yes, that would work.

However, fsck actually tends to want not just the refs, but actually 
things like the index files and reflog files too, because those add other 
reachability info, which is why it's probably more natural to just give 
fsck the list of related repositories and let it figure them out.

That's also what you'd want to do for "alternates", since now there is no 
longer a single object directory either, but multiple separate (but 
related) ones.

Somebody would just have to write the code.. The basic rules are really 
all in "git/builtin-fsck.c": cmd_fsck(). Hint hint.

			Linus

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 5/6] Teach "fsck" not to follow subproject links
  2007-04-12 19:17                     ` Linus Torvalds
  2007-04-13  9:00                       ` Rogan Dawes
@ 2007-04-15  6:50                       ` Dana How
  1 sibling, 0 replies; 101+ messages in thread
From: Dana How @ 2007-04-15  6:50 UTC (permalink / raw
  To: Linus Torvalds
  Cc: David Lang, Sam Vilain, Git Mailing List, Junio C Hamano, danahow

On 4/12/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Thu, 12 Apr 2007, Dana How wrote:
> > These arguments all seem pretty convincing to me --
> > maybe the problem is that I'm not a "*developer*" right now.
> > Instead I'm part of a multi-developer *site*.
> The issues for hosting sites are very different from the issues of
> individual developers having their own git repositories, and I agree 100%
> that both alternates and shared object directories make tons of sense for
> hosting.
For clarity I should have written *office* instead of *site* to
describe my situation,
but the mention of a NetApp below indicates no actual confusion occurred.

> As to your two setups: I would suggest you go with the "hidden" shared
> version (ie people use the remote access pull/push to a server, and the
> *server* uses a shared object repository for multiple repositories),
> rather than having a user-visible globally shared object directory. Even
> with sticky bits and controlled group access etc, I think it's just safer
> to have that extra level of indirection.
>
> (Partly because a globally visible shared object directory also implies
> that you'd use a networked filesystem, and I suspect a lot of developers
> would actually be a lot happier having their own development repositories
> on their own local disks, or at least some "group disk", rather than have
> one big and performance-critical network share. Even if you use some
> competent NetApp box and a modern network filesystem, it's just one less
> critical infrastructure piece that needs to be really beefy).
We did go down the local disk route, but after two significant losses of
individuals' work,  it was decreed that (perforce) work trees must be
on the NetApp.  So we already made the investment in beefiness --
for different reasons -- and I need to conform to these decisions for
the moment.

After reliability, the other big criterion (especially with our
penchant for large files)
will be speed. With perforce,  users now see submit={1 copy to server},
sync={1 copy from server}.  In the short term I can't get away with changing
this to submit={copy working to indiv repo, copy indiv repo to shared repo}
and sync={copy shared repo to indiv repo, copy indiv repo to working},
because at first everyone will be trying to emulate what they did in perforce.

So probably I'll start out with either a very small testgroup,
or one shared object repository with sticky/group tricks on the NetApp.
Once git's collaboration advantages are apparent,
I'll switch to the hidden repository model which I prefer as well.
And hopefully these collaboration advantages will also mean people
will commit more often and local disks can come back into favor --
and then the "extra" local repo file copy operations will be less noticeable.

In any event, I have some scripting to do to learn more about our usage
patterns and pushing our datasets throught git.  I also need to finish
the pack-splitting patch (after 64b index goes in). Finally,  before all that,
I'll be out of the country for the next ~10 days...

Thanks,
-- 
Dana L. How  danahow@gmail.com  +1 650 804 5991 cell

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 0/6] Initial subproject support (RFC?)
  2007-04-10 21:03                 ` Nicolas Pitre
@ 2007-04-15 23:21                   ` J. Bruce Fields
  0 siblings, 0 replies; 101+ messages in thread
From: J. Bruce Fields @ 2007-04-15 23:21 UTC (permalink / raw
  To: Nicolas Pitre
  Cc: Junio C Hamano, Linus Torvalds, Alex Riesen, Git Mailing List

On Tue, Apr 10, 2007 at 05:03:13PM -0400, Nicolas Pitre wrote:
> This is definitively good Documentation/howto/ material.

There's actually something similar already in "modifying a single
commit" in the "user manual":

http://www.kernel.org/pub/software/scm/git/docs/user-manual.html#id276844

But it uses a throw-away branch instead of the detached head, and uses
rebase --onto instead of rebasing and then --skip'ing.

--b.

^ permalink raw reply	[flat|nested] 101+ messages in thread

* Re: [PATCH 0/6] Initial subproject support (RFC?)
  2007-04-11  9:32                   ` Junio C Hamano
@ 2007-04-15 23:25                     ` J. Bruce Fields
  0 siblings, 0 replies; 101+ messages in thread
From: J. Bruce Fields @ 2007-04-15 23:25 UTC (permalink / raw
  To: Junio C Hamano; +Cc: David Kågedal, git

On Wed, Apr 11, 2007 at 02:32:48AM -0700, Junio C Hamano wrote:
> David Kågedal <davidk@lysator.liu.se> writes:
> 
> > Junio C Hamano <junkio@cox.net> writes:
> >
> >> ...  This _will_ fail, but that is to be expected, as
> >> we intend to replace that with what we just amended.  Just reset
> >> it away and keep going.
> >> 
> >> $ git reset --hard
> >> $ git rebase --skip
> >
> > Wouldn't
> >
> > $ git rebase --onto HEAD lt/gitlink~3 lt/gitlink
> >
> > do the trick in one step?
> 
> It is probably more Kosher, and I used to always do that, but it
> is much longer to type,

Also remembering which commit you amended is a pain sometimes.  So I
usually do

	git tag base lt/gitlink~3
	git checkout base
	... edit and amend ...
	git rebase --onto HEAD base lt/gitlink
	git tag -d base

But the trick of letting the rebase fail and skipping looks less
cumbersome.

--b.

^ permalink raw reply	[flat|nested] 101+ messages in thread

end of thread, other threads:[~2007-04-15 23:25 UTC | newest]

Thread overview: 101+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-10  4:12 [PATCH 0/6] Initial subproject support (RFC?) Linus Torvalds
     [not found] ` <Pi ne.LNX.4.64.0704092115020.6730@woody.linux-foundation.org>
2007-04-10  4:13 ` [PATCH 1/6] diff-lib: use ce_mode_from_stat() rather than messing with modes manually Linus Torvalds
2007-04-10  4:13 ` [PATCH 2/6] Avoid overflowing name buffer in deep directory structures Linus Torvalds
2007-04-10  4:14 ` [PATCH 3/6] Add 'resolve_gitlink_ref()' helper function Linus Torvalds
2007-04-10  9:38   ` Alex Riesen
2007-04-10 14:58     ` Linus Torvalds
2007-04-10 15:35       ` Alex Riesen
2007-04-10 15:52         ` Linus Torvalds
2007-04-10 15:57           ` Alex Riesen
2007-04-10 16:16             ` Linus Torvalds
2007-04-10 15:54       ` Josef Weidendorfer
2007-04-10  4:14 ` [PATCH 4/6] Add "S_IFDIRLNK" file mode infrastructure for git links Linus Torvalds
2007-04-10  4:15 ` [PATCH 5/6] Teach "fsck" not to follow subproject links Linus Torvalds
2007-04-11 22:41   ` Sam Vilain
2007-04-11 22:48     ` Linus Torvalds
2007-04-11 22:59       ` Sam Vilain
2007-04-11 23:16         ` Linus Torvalds
2007-04-11 23:05           ` David Lang
2007-04-11 23:53             ` Linus Torvalds
2007-04-11 23:30               ` David Lang
2007-04-12  2:14                 ` Linus Torvalds
2007-04-12  2:30                   ` Junio C Hamano
2007-04-12 17:18                   ` David Lang
2007-04-12 18:32                   ` Dana How
2007-04-12 19:17                     ` Linus Torvalds
2007-04-13  9:00                       ` Rogan Dawes
2007-04-13 15:23                         ` Linus Torvalds
2007-04-15  6:50                       ` Dana How
2007-04-12  0:00               ` Dana How
2007-04-12  0:03               ` Sam Vilain
2007-04-12  0:34           ` Junio C Hamano
2007-04-12  1:52             ` Linus Torvalds
2007-04-12  2:00               ` Junio C Hamano
2007-04-12  2:06                 ` Junio C Hamano
2007-04-12  2:28                   ` Linus Torvalds
2007-04-11 23:30         ` Dana How
2007-04-10  4:20 ` [PATCH 6/6] Teach core object handling functions about gitlinks Linus Torvalds
2007-04-10  8:40   ` Frank Lichtenheld
2007-04-10 11:31     ` Alex Riesen
2007-04-10 14:55     ` Linus Torvalds
2007-04-10 16:28   ` Josef Weidendorfer
2007-04-10 16:50     ` Alex Riesen
2007-04-10 17:23       ` Josef Weidendorfer
2007-04-10 18:45     ` Linus Torvalds
2007-04-10 19:04       ` Andy Parkins
2007-04-10 19:20         ` Linus Torvalds
2007-04-10 20:19           ` Junio C Hamano
2007-04-10 20:33             ` Linus Torvalds
2007-04-12  0:12               ` Sam Vilain
2007-04-12  0:35                 ` Martin Waitz
2007-04-12  2:01                 ` Linus Torvalds
2007-04-12  3:56                   ` Sam Vilain
2007-04-10 19:41         ` David Lang
2007-04-10 20:06         ` Junio C Hamano
2007-04-10 19:29       ` Josef Weidendorfer
2007-04-10 19:45         ` Linus Torvalds
2007-04-11 23:47           ` Sam Vilain
2007-04-12  0:13             ` Linus Torvalds
2007-04-12  0:42       ` Torgil Svensson
2007-04-12  0:56         ` Martin Waitz
2007-04-12 21:23           ` Torgil Svensson
2007-04-11 23:36     ` Sam Vilain
2007-04-11  8:06   ` Martin Waitz
2007-04-11  8:29     ` Alex Riesen
2007-04-11  8:36       ` Martin Waitz
2007-04-11  8:49         ` Alex Riesen
2007-04-11  9:20           ` Martin Waitz
2007-04-11  9:15         ` Junio C Hamano
2007-04-11 10:03           ` Martin Waitz
2007-04-11 20:01             ` Junio C Hamano
2007-04-11 22:19               ` Martin Waitz
2007-04-11 22:36                 ` Linus Torvalds
2007-04-11  9:47     ` Andy Parkins
2007-04-11 11:31       ` Martin Waitz
2007-04-11 15:16     ` Linus Torvalds
2007-04-11 22:49       ` Sam Vilain
2007-04-11 23:54       ` Martin Waitz
2007-04-12  1:57         ` Brian Gernhardt
2007-04-12 15:12         ` Josef Weidendorfer
2007-04-10  4:46 ` [PATCH 0/6] Initial subproject support (RFC?) Linus Torvalds
2007-04-10 13:04   ` Alex Riesen
2007-04-10 15:13     ` Linus Torvalds
2007-04-10 15:48       ` Alex Riesen
2007-04-10 16:07         ` Linus Torvalds
2007-04-10 16:43           ` Alex Riesen
2007-04-10 19:32           ` Junio C Hamano
2007-04-10 20:11             ` Linus Torvalds
2007-04-10 20:52               ` Junio C Hamano
2007-04-10 21:02                 ` Sam Ravnborg
2007-04-10 21:27                   ` Junio C Hamano
2007-04-10 21:03                 ` Nicolas Pitre
2007-04-15 23:21                   ` J. Bruce Fields
2007-04-11  8:08                 ` David Kågedal
2007-04-11  9:32                   ` Junio C Hamano
2007-04-15 23:25                     ` J. Bruce Fields
2007-04-11  8:32     ` Martin Waitz
2007-04-11  8:42       ` Alex Riesen
2007-04-11  8:57         ` Martin Waitz
2007-04-10 13:39   ` [PATCH] allow git-update-index work on subprojects Alex Riesen
2007-04-10 23:19     ` [PATCH] Allow " Alex Riesen
2007-04-11  2:55       ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).