git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [0/5] Patch set for various things
       [not found] <20050417144947.GG1487@pasky.ji.cz>
@ 2005-04-17 15:20 ` Daniel Barkalow
  2005-04-17 15:24   ` [1/5] Parsing code in revision.h Daniel Barkalow
                     ` (4 more replies)
  0 siblings, 5 replies; 49+ messages in thread
From: Daniel Barkalow @ 2005-04-17 15:20 UTC (permalink / raw
  To: Petr Baudis; +Cc: git

Here are a bunch of patches which I made first against linus, that I've
rebased against pasky because they're mostly more version-control-like.

 1: Add a parsing function to revision.h
 2: Add merge-base
 3: Add http-pull
 4: Add option to make a hardlinkable cache of extracted options
 5: Add commit id to version info

This also served as a test of cleaning up a patch series with git; I took
my current working directory, diffed it against its common ancestor with
pasky (no longer current), split the patch into logical pieces, and 
applied them in sequence against the current pasky, committing after each
one. This gives a history as if I'd actually written the code just like I
would have had I known what I was doing in advance and done it very
quickly this morning. I think this should work in the future as a way to
avoid having the global revision control keeping developers' local 
mistakes while keeping history the way the mainline saw the development.

A thought for future work: it would be nice if I could identify commits
that were used in creating a commit, but which should not be tracked down
unless you were unfortunate enough to have been exposed to them (in which
case you'd like know to deal with them).

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [1/5] Parsing code in revision.h
  2005-04-17 15:20 ` [0/5] Patch set for various things Daniel Barkalow
@ 2005-04-17 15:24   ` Daniel Barkalow
  2005-04-17 16:09     ` Petr Baudis
  2005-04-17 18:18     ` [1/5] " Linus Torvalds
  2005-04-17 15:27   ` [2/5] Add merge-base Daniel Barkalow
                     ` (3 subsequent siblings)
  4 siblings, 2 replies; 49+ messages in thread
From: Daniel Barkalow @ 2005-04-17 15:24 UTC (permalink / raw
  To: Petr Baudis; +Cc: git

This adds support to revision.h for parsing commit records (but not going
any further than parsing a single record). Something like this is needed
by anything that uses revision.h, but older programs open-code it.

Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org>
Index: revision.h
===================================================================
--- 45f926575d2c44072bfcf2317dbf3f0fbb513a4e/revision.h  (mode:100644 sha1:28d0de3261a61f68e4e0948a25a416a515cd2e83)
+++ 37a0b01b85c2999243674d48bfc71cdba0e5518e/revision.h  (mode:100644 sha1:523bde6e14e18bb0ecbded8f83ad4df93fc467ab)
@@ -24,6 +24,7 @@
 	unsigned int flags;
 	unsigned char sha1[20];
 	unsigned long date;
+	unsigned char tree[20];
 	struct parent *parent;
 };
 
@@ -111,4 +112,29 @@
 	}
 }
 
+static int parse_commit_object(struct revision *rev)
+{
+	if (!(rev->flags & SEEN)) {
+		void *buffer, *bufptr;
+		unsigned long size;
+		char type[20];
+		unsigned char parent[20];
+
+		rev->flags |= SEEN;
+		buffer = bufptr = read_sha1_file(rev->sha1, type, &size);
+		if (!buffer || strcmp(type, "commit"))
+			return -1;
+		get_sha1_hex(bufptr + 5, rev->tree);
+		bufptr += 46; /* "tree " + "hex sha1" + "\n" */
+		while (!memcmp(bufptr, "parent ", 7) && 
+		       !get_sha1_hex(bufptr+7, parent)) {
+			add_relationship(rev, parent);
+			bufptr += 48;   /* "parent " + "hex sha1" + "\n" */
+		}
+		//rev->date = parse_commit_date(bufptr);
+		free(buffer);
+	}
+	return 0;
+}
+
 #endif /* REVISION_H */


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [2/5] Add merge-base
  2005-04-17 15:20 ` [0/5] Patch set for various things Daniel Barkalow
  2005-04-17 15:24   ` [1/5] Parsing code in revision.h Daniel Barkalow
@ 2005-04-17 15:27   ` Daniel Barkalow
  2005-04-17 16:01     ` Petr Baudis
  2005-04-17 16:51     ` [2.1/5] " Daniel Barkalow
  2005-04-17 15:31   ` [3/5] Add http-pull Daniel Barkalow
                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 49+ messages in thread
From: Daniel Barkalow @ 2005-04-17 15:27 UTC (permalink / raw
  To: Petr Baudis; +Cc: git

merge-base finds one of the best common ancestors of a pair of commits. In
particular, it finds one of the ones which is fewest commits away from the
further of the heads.

Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org>
Index: Makefile
===================================================================
--- 37a0b01b85c2999243674d48bfc71cdba0e5518e/Makefile  (mode:100644 sha1:346e3850de026485802e41e16a1180be2df85e4a)
+++ d662b707e11391f6cfe597fd4d0bf9c41d34d01a/Makefile  (mode:100644 sha1:b2ce7c5b63fffca59653b980d98379909f893d44)
@@ -14,7 +14,7 @@
 
 PROG=   update-cache show-diff init-db write-tree read-tree commit-tree \
 	cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \
-	check-files ls-tree
+	check-files ls-tree merge-base
 
 SCRIPT=	parent-id tree-id git gitXnormid.sh gitadd.sh gitaddremote.sh \
 	gitcommit.sh gitdiff-do gitdiff.sh gitlog.sh gitls.sh gitlsobj.sh \
Index: merge-base.c
===================================================================
--- /dev/null  (tree:37a0b01b85c2999243674d48bfc71cdba0e5518e)
+++ d662b707e11391f6cfe597fd4d0bf9c41d34d01a/merge-base.c  (mode:100644 sha1:0f85e7d9e9a896d1142a54170ddf1159f11f9cdd)
@@ -0,0 +1,108 @@
+#include <stdlib.h>
+#include "cache.h"
+#include "revision.h"
+
+struct revision *common_ancestor(struct revision *rev1, struct revision *rev2)
+{
+	struct parent *parent;
+
+	struct parent *rev1list = malloc(sizeof(struct parent));
+	struct parent *rev2list = malloc(sizeof(struct parent));
+        
+	struct parent *posn, *temp;
+
+	rev1list->parent = rev1;
+	rev1list->next = NULL;
+
+	rev2list->parent = rev2;
+	rev2list->next = NULL;
+
+	while (rev1list || rev2list) {
+		posn = rev1list;
+		rev1list = NULL;
+		while (posn) {
+			parse_commit_object(posn->parent);
+			if (posn->parent->flags & 0x0001) {
+				/*
+				printf("1 already seen %s %x\n",
+				       sha1_to_hex(posn->parent->sha1),
+				       posn->parent->flags);
+				*/
+                                // do nothing
+			} else if (posn->parent->flags & 0x0002) {
+                                // XXXX free lists
+				return posn->parent;
+			} else {
+				/*
+				printf("1 based on %s\n",
+				       sha1_to_hex(posn->parent->sha1));
+				*/
+				posn->parent->flags |= 0x0001;
+
+				parent = posn->parent->parent;
+				while (parent) {
+					temp = malloc(sizeof(struct parent));
+					temp->next = rev1list;
+					temp->parent = parent->parent;
+					rev1list = temp;
+					parent = parent->next;
+				}
+			}
+			posn = posn->next;
+		}
+		posn = rev2list;
+		rev2list = NULL;
+		while (posn) {
+			parse_commit_object(posn->parent);
+			if (posn->parent->flags & 0x0002) {
+				/*
+				printf("2 already seen %s\n",
+				       sha1_to_hex(posn->parent->sha1));
+				*/
+                                // do nothing
+			} else if (posn->parent->flags & 0x0001) {
+                                // XXXX free lists
+				return posn->parent;
+			} else {
+				/*
+				printf("2 based on %s\n",
+				       sha1_to_hex(posn->parent->sha1));
+				*/
+				posn->parent->flags |= 0x0002;
+
+				parent = posn->parent->parent;
+				while (parent) {
+					temp = malloc(sizeof(struct parent));
+					temp->next = rev2list;
+					temp->parent = parent->parent;
+					rev2list = temp;
+					parent = parent->next;
+				}
+			}
+			posn = posn->next;
+		}
+	}
+	return NULL;
+}
+
+int main(int argc, char **argv)
+{
+	struct revision *rev1, *rev2, *ret;
+	unsigned char rev1key[20], rev2key[20];
+	if (argc != 3 ||
+	    get_sha1_hex(argv[1], rev1key) ||
+	    get_sha1_hex(argv[2], rev2key)) {
+		usage("mergebase <commit-id> <commit-id>");
+	}
+	rev1 = lookup_rev(rev1key);
+	rev2 = lookup_rev(rev2key);
+	ret = common_ancestor(rev1, rev2);
+	if (ret) {
+		printf("%s\n", sha1_to_hex(ret->sha1));
+		return 0;
+	} else {
+		printf("Sorry.\n");
+		return 1;
+	}
+	
+}


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [3/5] Add http-pull
  2005-04-17 15:20 ` [0/5] Patch set for various things Daniel Barkalow
  2005-04-17 15:24   ` [1/5] Parsing code in revision.h Daniel Barkalow
  2005-04-17 15:27   ` [2/5] Add merge-base Daniel Barkalow
@ 2005-04-17 15:31   ` Daniel Barkalow
  2005-04-17 18:10     ` Petr Baudis
  2005-04-17 18:58     ` [3.1/5] " Daniel Barkalow
  2005-04-17 15:35   ` [4/5] Add option for hardlinkable cache of extracted blobs Daniel Barkalow
  2005-04-17 15:37   ` [5/5] Add commit-id to version Daniel Barkalow
  4 siblings, 2 replies; 49+ messages in thread
From: Daniel Barkalow @ 2005-04-17 15:31 UTC (permalink / raw
  To: Petr Baudis; +Cc: git

http-pull is a program that downloads from a (normal) HTTP server a commit
and all of the tree and blob objects it refers to (but not other commits,
etc.). Options could be used to make it download a larger or different
selection of objects. It depends on libcurl, which I forgot to mention in
the README again.

Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org>
Index: Makefile
===================================================================
--- d662b707e11391f6cfe597fd4d0bf9c41d34d01a/Makefile  (mode:100644 sha1:b2ce7c5b63fffca59653b980d98379909f893d44)
+++ 157b46ce1d82b3579e2e1258927b0d9bdbc033ab/Makefile  (mode:100644 sha1:940ef8578cf469354002cd8feaec25d907015267)
@@ -14,7 +14,7 @@
 
 PROG=   update-cache show-diff init-db write-tree read-tree commit-tree \
 	cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \
-	check-files ls-tree merge-base
+	check-files ls-tree http-pull merge-base
 
 SCRIPT=	parent-id tree-id git gitXnormid.sh gitadd.sh gitaddremote.sh \
 	gitcommit.sh gitdiff-do gitdiff.sh gitlog.sh gitls.sh gitlsobj.sh \
@@ -35,6 +35,7 @@
 
 LIBS= -lssl -lz
 
+http-pull: LIBS += -lcurl
 
 $(PROG):%: %.o $(COMMON)
 	$(CC) $(CFLAGS) -o $@ $^ $(LIBS)
Index: http-pull.c
===================================================================
--- /dev/null  (tree:d662b707e11391f6cfe597fd4d0bf9c41d34d01a)
+++ 157b46ce1d82b3579e2e1258927b0d9bdbc033ab/http-pull.c  (mode:100644 sha1:106ca31239e6afe6784e7c592234406f5c149e44)
@@ -0,0 +1,126 @@
+#include <fcntl.h>
+#include <unistd.h>
+#include <string.h>
+#include <stdlib.h>
+#include "cache.h"
+#include "revision.h"
+#include <errno.h>
+#include <stdio.h>
+
+#include <curl/curl.h>
+#include <curl/easy.h>
+
+static CURL *curl;
+
+static char *base;
+
+static int fetch(unsigned char *sha1)
+{
+	char *hex = sha1_to_hex(sha1);
+	char *filename = sha1_file_name(sha1);
+
+	char *url;
+	char *posn;
+	FILE *local;
+	struct stat st;
+
+	if (!stat(filename, &st)) {
+		return 0;
+	}
+
+	local = fopen(filename, "w");
+
+	if (!local) {
+		fprintf(stderr, "Couldn't open %s\n", filename);
+		return -1;
+	}
+
+	curl_easy_setopt(curl, CURLOPT_FILE, local);
+
+	url = malloc(strlen(base) + 50);
+	strcpy(url, base);
+	posn = url + strlen(base);
+	strcpy(posn, "objects/");
+	posn += 8;
+	memcpy(posn, hex, 2);
+	posn += 2;
+	*(posn++) = '/';
+	strcpy(posn, hex + 2);
+
+	curl_easy_setopt(curl, CURLOPT_URL, url);
+
+	curl_easy_perform(curl);
+
+	fclose(local);
+	
+	return 0;
+}
+
+static int process_tree(unsigned char *sha1)
+{
+	void *buffer;
+        unsigned long size;
+        char type[20];
+
+        buffer = read_sha1_file(sha1, type, &size);
+	if (!buffer)
+		return -1;
+	if (strcmp(type, "tree"))
+		return -1;
+	while (size) {
+		int len = strlen(buffer) + 1;
+		unsigned char *sha1 = buffer + len;
+		unsigned int mode;
+		int retval;
+
+		if (size < len + 20 || sscanf(buffer, "%o", &mode) != 1)
+			return -1;
+
+		buffer = sha1 + 20;
+		size -= len + 20;
+
+		retval = fetch(sha1);
+		if (retval)
+			return -1;
+
+		if (S_ISDIR(mode)) {
+			retval = process_tree(sha1);
+			if (retval)
+				return -1;
+		}
+	}
+	return 0;
+}
+
+static int process_commit(unsigned char *sha1)
+{
+	struct revision *rev = lookup_rev(sha1);
+	if (parse_commit_object(rev))
+		return -1;
+	
+	fetch(rev->tree);
+	process_tree(rev->tree);
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	char *commit_id = argv[1];
+	char *url = argv[2];
+
+	unsigned char sha1[20];
+
+	get_sha1_hex(commit_id, sha1);
+
+	curl_global_init(CURL_GLOBAL_ALL);
+
+	curl = curl_easy_init();
+
+	base = url;
+
+	fetch(sha1);
+	process_commit(sha1);
+
+	curl_global_cleanup();
+	return 0;
+}


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [4/5] Add option for hardlinkable cache of extracted blobs
  2005-04-17 15:20 ` [0/5] Patch set for various things Daniel Barkalow
                     ` (2 preceding siblings ...)
  2005-04-17 15:31   ` [3/5] Add http-pull Daniel Barkalow
@ 2005-04-17 15:35   ` Daniel Barkalow
  2005-04-17 17:47     ` Petr Baudis
  2005-04-17 15:37   ` [5/5] Add commit-id to version Daniel Barkalow
  4 siblings, 1 reply; 49+ messages in thread
From: Daniel Barkalow @ 2005-04-17 15:35 UTC (permalink / raw
  To: Petr Baudis; +Cc: git

This adds an option (compile time, defined in the Makefile) to have a
cache of extracted blobs so that different working directories can
hardlink against them instead of creating new files for every
checkout. You should only use this if you're sure the programs you use
break links on modification and you care about storing many large working
directories with few changes at the same time.

Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org>
Index: Makefile
===================================================================
--- 157b46ce1d82b3579e2e1258927b0d9bdbc033ab/Makefile  (mode:100644 sha1:940ef8578cf469354002cd8feaec25d907015267)
+++ 08f7700831e056ad710af69f91e3a8a705b6b2b1/Makefile  (mode:100644 sha1:a60fa46404c0487158d232bd021e4798bc8df8de)
@@ -2,6 +2,9 @@
 # 1461501637330902918203684832716283019655932542976 hashes do not give you
 # enough guarantees about no collisions between objects ever hapenning.
 #
+# -DUSE_HARDLINK_CACHE if you want a cache of files to be hardlinked
+# to for unmodified checked out files.
+#
 # -DNSEC if you want git to care about sub-second file mtimes and ctimes.
 # Note that you need some new glibc (at least >2.2.4) for this, and it will
 # BREAK YOUR LOCAL DIFFS! show-diff and anything using it will likely randomly
Index: checkout-cache.c
===================================================================
--- 157b46ce1d82b3579e2e1258927b0d9bdbc033ab/checkout-cache.c  (mode:100644 sha1:5d3028df0a45329e45fff2006719c9267adeb946)
+++ 08f7700831e056ad710af69f91e3a8a705b6b2b1/checkout-cache.c  (mode:100644 sha1:338588259e17dd235fdc7db759d770004a760e15)
@@ -34,6 +34,10 @@
  */
 #include "cache.h"
 
+#ifdef USE_HARDLINK_CACHE
+#define HARDLINK_CACHE ".git/blobs"
+#endif /* USE_HARDLINK_CACHE */
+
 static int force = 0, quiet = 0;
 
 static void create_directories(const char *path)
@@ -67,6 +71,80 @@
 	return fd;
 }
 
+#ifdef HARDLINK_CACHE
+
+/*
+ * NOTE! This returns a statically allocated buffer, so you have to be
+ * careful about using it. Do a "strdup()" if you need to save the
+ * filename.
+ */
+char *sha1_blob_cache_file_name(const unsigned char *sha1)
+{
+	int i;
+	static char *name, *base;
+
+	if (!base) {
+		char *sha1_file_directory = HARDLINK_CACHE;
+		int len = strlen(sha1_file_directory);
+		base = malloc(len + 60);
+		memcpy(base, sha1_file_directory, len);
+		memset(base+len, 0, 60);
+		base[len] = '/';
+		base[len+3] = '/';
+		name = base + len + 1;
+	}
+	for (i = 0; i < 20; i++) {
+		static char hex[] = "0123456789abcdef";
+		unsigned int val = sha1[i];
+		char *pos = name + i*2 + (i > 0);
+		*pos++ = hex[val >> 4];
+		*pos = hex[val & 0xf];
+	}
+	return base;
+}
+
+static int write_entry(struct cache_entry *ce)
+{
+	int fd;
+	void *new;
+	unsigned long size;
+	long wrote;
+	char type[20];
+	char *cache_name;
+	struct stat st;
+
+	cache_name = sha1_blob_cache_file_name(ce->sha1);
+
+	if (stat(cache_name, &st)) {
+		new = read_sha1_file(ce->sha1, type, &size);
+		if (!new || strcmp(type, "blob")) {
+			return error("checkout-cache: unable to read sha1 file of %s (%s)",
+				     ce->name, sha1_to_hex(ce->sha1));
+		}
+		fd = create_file(cache_name, ntohl(ce->ce_mode));
+		if (fd < 0) {
+			free(new);
+			return error("checkout-cache: unable to create %s (%s)",
+				     ce->name, strerror(errno));
+		}
+		wrote = write(fd, new, size);
+		close(fd);
+		free(new);
+		if (wrote != size)
+			return error("checkout-cache: unable to write %s", 
+				     ce->name);
+	}
+	if (link(cache_name, ce->name)) {
+		if (errno == ENOENT) {
+			create_directories(ce->name);
+			link(cache_name, ce->name);
+		}
+	}
+	return 0;
+}
+
+#else
+
 static int write_entry(struct cache_entry *ce)
 {
 	int fd;
@@ -94,6 +172,8 @@
 	return 0;
 }
 
+#endif
+
 static int checkout_entry(struct cache_entry *ce)
 {
 	struct stat st;


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [5/5] Add commit-id to version
  2005-04-17 15:20 ` [0/5] Patch set for various things Daniel Barkalow
                     ` (3 preceding siblings ...)
  2005-04-17 15:35   ` [4/5] Add option for hardlinkable cache of extracted blobs Daniel Barkalow
@ 2005-04-17 15:37   ` Daniel Barkalow
  4 siblings, 0 replies; 49+ messages in thread
From: Daniel Barkalow @ 2005-04-17 15:37 UTC (permalink / raw
  To: Petr Baudis; +Cc: git

For people who run intermediate versions of git, it is useful to know
exactly which post-release version you've installed. This adds the
commit-id to the version info, so you can tell exactly, provided you make
sure to commit before installing.

Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org>
Index: Makefile
===================================================================
--- 08f7700831e056ad710af69f91e3a8a705b6b2b1/Makefile  (mode:100644 sha1:a60fa46404c0487158d232bd021e4798bc8df8de)
+++ 6467ed39f19b48563ff25782ebe2c6f951b0af3c/Makefile  (mode:100644 sha1:0e84e3cd12f836602b420c197e08fabefe975493)
@@ -50,7 +50,7 @@
 	@echo Generating gitversion.sh...
 	@rm -f $@
 	@echo "#!/bin/sh" > $@
-	@echo "echo \"$(shell cat $(VERSION))\"" >> $@
+	@echo "echo \"$(shell cat $(VERSION)) $(shell commit-id)\"" >> $@
 	@chmod +x $@
 
 clean:


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Add merge-base
  2005-04-17 15:27   ` [2/5] Add merge-base Daniel Barkalow
@ 2005-04-17 16:01     ` Petr Baudis
  2005-04-17 16:36       ` Daniel Barkalow
  2005-04-17 16:51     ` [2.1/5] " Daniel Barkalow
  1 sibling, 1 reply; 49+ messages in thread
From: Petr Baudis @ 2005-04-17 16:01 UTC (permalink / raw
  To: Daniel Barkalow; +Cc: git

Dear diary, on Sun, Apr 17, 2005 at 05:27:13PM CEST, I got a letter
where Daniel Barkalow <barkalow@iabervon.org> told me that...
> merge-base finds one of the best common ancestors of a pair of commits. In
> particular, it finds one of the ones which is fewest commits away from the
> further of the heads.

What does it return when I have

  A -- C
    \/   \
    /\   /
  B -- D

? >:)

I assume just either A or B, randomly?

I think it would be best if it could list all the "first-class" matches
(both A and B in this case), each on a separate line; this way the
overlay tools could choose an algorithm to evaluate those further as
they see fit - e.g. sort them by time (you might aid that by listing the
commit time in front of them), then take the first n and try to diff
them all and take the one with least changes (as suggested by Linus).

And if someone doesn't care, he just does | head -n 1 | cut -f 2.

> Index: merge-base.c
> ===================================================================
> --- /dev/null  (tree:37a0b01b85c2999243674d48bfc71cdba0e5518e)
> +++ d662b707e11391f6cfe597fd4d0bf9c41d34d01a/merge-base.c  (mode:100644 sha1:0f85e7d9e9a896d1142a54170ddf1159f11f9cdd)
> @@ -0,0 +1,108 @@
> +#include <stdlib.h>
> +#include "cache.h"
> +#include "revision.h"
> +
> +struct revision *common_ancestor(struct revision *rev1, struct revision *rev2)
> +{
> +	struct parent *parent;
> +
> +	struct parent *rev1list = malloc(sizeof(struct parent));
> +	struct parent *rev2list = malloc(sizeof(struct parent));

Did I overlook anything or you could have just a single revlist?

> +        

I smell trailing whitespaces!

> +	struct parent *posn, *temp;
> +
> +	rev1list->parent = rev1;
> +	rev1list->next = NULL;
> +
> +	rev2list->parent = rev2;
> +	rev2list->next = NULL;
> +
> +	while (rev1list || rev2list) {
> +		posn = rev1list;
> +		rev1list = NULL;
> +		while (posn) {
> +			parse_commit_object(posn->parent);
> +			if (posn->parent->flags & 0x0001) {
> +				/*
> +				printf("1 already seen %s %x\n",
> +				       sha1_to_hex(posn->parent->sha1),
> +				       posn->parent->flags);
> +				*/
> +                                // do nothing

Mostly for consistency, I'd prefer you to use /* */ comments in general.

I think a terrified squeak at stderr in this situation (possibly
suggesting fsck-cache) might be appropriate.

> +			} else if (posn->parent->flags & 0x0002) {
> +                                // XXXX free lists

Hmm, so, why not free the lists?

> +				return posn->parent;
> +			} else {
> +				/*
> +				printf("1 based on %s\n",
> +				       sha1_to_hex(posn->parent->sha1));
> +				*/
> +				posn->parent->flags |= 0x0001;
> +
> +				parent = posn->parent->parent;
> +				while (parent) {
> +					temp = malloc(sizeof(struct parent));
> +					temp->next = rev1list;
> +					temp->parent = parent->parent;
> +					rev1list = temp;
> +					parent = parent->next;
> +				}
> +			}
> +			posn = posn->next;
> +		}
> +		posn = rev2list;
> +		rev2list = NULL;
> +		while (posn) {
> +			parse_commit_object(posn->parent);
> +			if (posn->parent->flags & 0x0002) {
> +				/*
> +				printf("2 already seen %s\n",
> +				       sha1_to_hex(posn->parent->sha1));
> +				*/
> +                                // do nothing
> +			} else if (posn->parent->flags & 0x0001) {
> +                                // XXXX free lists
> +				return posn->parent;
> +			} else {
> +				/*
> +				printf("2 based on %s\n",
> +				       sha1_to_hex(posn->parent->sha1));
> +				*/
> +				posn->parent->flags |= 0x0002;
> +
> +				parent = posn->parent->parent;
> +				while (parent) {
> +					temp = malloc(sizeof(struct parent));
> +					temp->next = rev2list;
> +					temp->parent = parent->parent;
> +					rev2list = temp;
> +					parent = parent->next;
> +				}
> +			}
> +			posn = posn->next;
> +		}

Symmetrical notes apply to this half. Actually, they are too similar.
What about factoring them to a common function?

> +	}
> +	return NULL;
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	struct revision *rev1, *rev2, *ret;
> +	unsigned char rev1key[20], rev2key[20];

A newline here please.

> +	if (argc != 3 ||
> +	    get_sha1_hex(argv[1], rev1key) ||
> +	    get_sha1_hex(argv[2], rev2key)) {
> +		usage("mergebase <commit-id> <commit-id>");
> +	}
> +	rev1 = lookup_rev(rev1key);
> +	rev2 = lookup_rev(rev2key);
> +	ret = common_ancestor(rev1, rev2);
> +	if (ret) {
> +		printf("%s\n", sha1_to_hex(ret->sha1));
> +		return 0;
> +	} else {
> +		printf("Sorry.\n");
> +		return 1;

Please stay silent if you don't have anything useful to say.

> +	}
> +	
> +}

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Parsing code in revision.h
  2005-04-17 15:24   ` [1/5] Parsing code in revision.h Daniel Barkalow
@ 2005-04-17 16:09     ` Petr Baudis
  2005-04-17 16:44       ` Daniel Barkalow
  2005-04-17 18:18     ` [1/5] " Linus Torvalds
  1 sibling, 1 reply; 49+ messages in thread
From: Petr Baudis @ 2005-04-17 16:09 UTC (permalink / raw
  To: Daniel Barkalow; +Cc: git

Dear diary, on Sun, Apr 17, 2005 at 05:24:20PM CEST, I got a letter
where Daniel Barkalow <barkalow@iabervon.org> told me that...
> This adds support to revision.h for parsing commit records (but not going
> any further than parsing a single record). Something like this is needed
> by anything that uses revision.h, but older programs open-code it.
> 
> Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org>

Could you please convert the current users (rev-tree.c and fsck-cache.c)
to use this in the same patch?

> Index: revision.h
> ===================================================================
> --- 45f926575d2c44072bfcf2317dbf3f0fbb513a4e/revision.h  (mode:100644 sha1:28d0de3261a61f68e4e0948a25a416a515cd2e83)
> +++ 37a0b01b85c2999243674d48bfc71cdba0e5518e/revision.h  (mode:100644 sha1:523bde6e14e18bb0ecbded8f83ad4df93fc467ab)
> @@ -24,6 +24,7 @@
>  	unsigned int flags;
>  	unsigned char sha1[20];
>  	unsigned long date;
> +	unsigned char tree[20];
>  	struct parent *parent;
>  };
>  
> @@ -111,4 +112,29 @@
>  	}
>  }
>  
> +static int parse_commit_object(struct revision *rev)
> +{
> +	if (!(rev->flags & SEEN)) {
> +		void *buffer, *bufptr;
> +		unsigned long size;
> +		char type[20];
> +		unsigned char parent[20];
> +
> +		rev->flags |= SEEN;
> +		buffer = bufptr = read_sha1_file(rev->sha1, type, &size);
> +		if (!buffer || strcmp(type, "commit"))
> +			return -1;
> +		get_sha1_hex(bufptr + 5, rev->tree);
> +		bufptr += 46; /* "tree " + "hex sha1" + "\n" */
> +		while (!memcmp(bufptr, "parent ", 7) && 
> +		       !get_sha1_hex(bufptr+7, parent)) {
> +			add_relationship(rev, parent);
> +			bufptr += 48;   /* "parent " + "hex sha1" + "\n" */
> +		}
> +		//rev->date = parse_commit_date(bufptr);

I don't like this.

> +		free(buffer);
> +	}
> +	return 0;
> +}
> +
>  #endif /* REVISION_H */

BTW, I think that in longer term having this stuffed in revision.h is a
bad idea, we should have revision.c. I will accept patches putting the
stuff to revision.h for now, though (unless it gets outrageous).

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Add merge-base
  2005-04-17 16:01     ` Petr Baudis
@ 2005-04-17 16:36       ` Daniel Barkalow
  0 siblings, 0 replies; 49+ messages in thread
From: Daniel Barkalow @ 2005-04-17 16:36 UTC (permalink / raw
  To: Petr Baudis; +Cc: git

On Sun, 17 Apr 2005, Petr Baudis wrote:

> Dear diary, on Sun, Apr 17, 2005 at 05:27:13PM CEST, I got a letter
> where Daniel Barkalow <barkalow@iabervon.org> told me that...
> > merge-base finds one of the best common ancestors of a pair of commits. In
> > particular, it finds one of the ones which is fewest commits away from the
> > further of the heads.
> 
> What does it return when I have
> 
>   A -- C
>     \/   \
>     /\   /
>   B -- D
> 
> ? >:)
> 
> I assume just either A or B, randomly?

Essentially, yes.

> I think it would be best if it could list all the "first-class" matches
> (both A and B in this case), each on a separate line; this way the
> overlay tools could choose an algorithm to evaluate those further as
> they see fit - e.g. sort them by time (you might aid that by listing the
> commit time in front of them), then take the first n and try to diff
> them all and take the one with least changes (as suggested by Linus).

It's actually kind of tricky to get all of the "best" ancestors without
getting any useless ancestors; the "best" criterion is maintained in the
current version by stopping as soon as possible.

I think that the real solution would be to have a merge program that
interacts back and forth with the revision history processor, since I
think that merges for which the choice of ancestor matters (for whether it
gives a conflict) would benefit most directly and clearly from figuring
out the histories of the conflicting changes, not choosing different
ancestors.

If someone comes up with an algorithm that wants an alternative ancestor
rather than more interactive stuff, I can work on getting a complete list.

> > Index: merge-base.c
> > ===================================================================
> > --- /dev/null  (tree:37a0b01b85c2999243674d48bfc71cdba0e5518e)
> > +++ d662b707e11391f6cfe597fd4d0bf9c41d34d01a/merge-base.c  (mode:100644 sha1:0f85e7d9e9a896d1142a54170ddf1159f11f9cdd)
> > @@ -0,0 +1,108 @@
> > +#include <stdlib.h>
> > +#include "cache.h"
> > +#include "revision.h"
> > +
> > +struct revision *common_ancestor(struct revision *rev1, struct revision *rev2)
> > +{
> > +	struct parent *parent;
> > +
> > +	struct parent *rev1list = malloc(sizeof(struct parent));
> > +	struct parent *rev2list = malloc(sizeof(struct parent));
> 
> Did I overlook anything or you could have just a single revlist?

I tried with just one, but I couldn't keep it straight in my
head. rev1list holds the unmarked ancestors of rev1; rev2list holds the
unmarked ancestors of rev2.

> > +	struct parent *posn, *temp;
> > +
> > +	rev1list->parent = rev1;
> > +	rev1list->next = NULL;
> > +
> > +	rev2list->parent = rev2;
> > +	rev2list->next = NULL;
> > +
> > +	while (rev1list || rev2list) {
> > +		posn = rev1list;
> > +		rev1list = NULL;
> > +		while (posn) {
> > +			parse_commit_object(posn->parent);
> > +			if (posn->parent->flags & 0x0001) {
> > +				/*
> > +				printf("1 already seen %s %x\n",
> > +				       sha1_to_hex(posn->parent->sha1),
> > +				       posn->parent->flags);
> > +				*/
> > +                                // do nothing
> 
> Mostly for consistency, I'd prefer you to use /* */ comments in general.

Sure.

> I think a terrified squeak at stderr in this situation (possibly
> suggesting fsck-cache) might be appropriate.

No, this is normal; it indicates that tree 1 has a recent little merge:

orig --------------- tree 2
 \
  --- X -- Y -- Z -- tree 1
       \       /
        -- A --

When we see X for A, we've already seen it for Y, but that's fine. I get
this case when I merge with you after you merge twice with Linus since I
last merged.

> > +			} else if (posn->parent->flags & 0x0002) {
> > +                                // XXXX free lists
> 
> Hmm, so, why not free the lists?

Ah, details; mainly, I want to wait until revision.h is cleaner before
fixing this sort of thing.

> Symmetrical notes apply to this half. Actually, they are too similar.
> What about factoring them to a common function?

Sure.

Fixed version to follow.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: Parsing code in revision.h
  2005-04-17 16:09     ` Petr Baudis
@ 2005-04-17 16:44       ` Daniel Barkalow
  0 siblings, 0 replies; 49+ messages in thread
From: Daniel Barkalow @ 2005-04-17 16:44 UTC (permalink / raw
  To: Petr Baudis; +Cc: git

On Sun, 17 Apr 2005, Petr Baudis wrote:

> Dear diary, on Sun, Apr 17, 2005 at 05:24:20PM CEST, I got a letter
> where Daniel Barkalow <barkalow@iabervon.org> told me that...
> > This adds support to revision.h for parsing commit records (but not going
> > any further than parsing a single record). Something like this is needed
> > by anything that uses revision.h, but older programs open-code it.
> > 
> > Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org>
> 
> Could you please convert the current users (rev-tree.c and fsck-cache.c)
> to use this in the same patch?

They do things somewhat differently, so it would be more intrusive. Could
I send an extra patch to convert them instead of doing them here?

> > Index: revision.h
> > ===================================================================
> > --- 45f926575d2c44072bfcf2317dbf3f0fbb513a4e/revision.h  (mode:100644 sha1:28d0de3261a61f68e4e0948a25a416a515cd2e83)
> > +++ 37a0b01b85c2999243674d48bfc71cdba0e5518e/revision.h  (mode:100644 sha1:523bde6e14e18bb0ecbded8f83ad4df93fc467ab)
> > @@ -24,6 +24,7 @@
> >  	unsigned int flags;
> >  	unsigned char sha1[20];
> >  	unsigned long date;
> > +	unsigned char tree[20];
> >  	struct parent *parent;
> >  };
> >  
> > @@ -111,4 +112,29 @@
> >  	}
> >  }
> >  
> > +static int parse_commit_object(struct revision *rev)
> > +{
> > +	if (!(rev->flags & SEEN)) {
> > +		void *buffer, *bufptr;
> > +		unsigned long size;
> > +		char type[20];
> > +		unsigned char parent[20];
> > +
> > +		rev->flags |= SEEN;
> > +		buffer = bufptr = read_sha1_file(rev->sha1, type, &size);
> > +		if (!buffer || strcmp(type, "commit"))
> > +			return -1;
> > +		get_sha1_hex(bufptr + 5, rev->tree);
> > +		bufptr += 46; /* "tree " + "hex sha1" + "\n" */
> > +		while (!memcmp(bufptr, "parent ", 7) && 
> > +		       !get_sha1_hex(bufptr+7, parent)) {
> > +			add_relationship(rev, parent);
> > +			bufptr += 48;   /* "parent " + "hex sha1" + "\n" */
> > +		}
> > +		//rev->date = parse_commit_date(bufptr);
> 
> I don't like this.

Yeah, that's left over from the not-quite the same parsing code in the
other programs.

> > +		free(buffer);
> > +	}
> > +	return 0;
> > +}
> > +
> >  #endif /* REVISION_H */
> 
> BTW, I think that in longer term having this stuffed in revision.h is a
> bad idea, we should have revision.c. I will accept patches putting the
> stuff to revision.h for now, though (unless it gets outrageous).

I'd actually like to make them commit.{c,h}, since the system calls the
things they actually deal in commits, not revisions. But this is getting
into stuff that's likely to cause painful divergance from Linus's repo,
which is why I'm a bit leary of actually doing it now.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [2.1/5] Add merge-base
  2005-04-17 15:27   ` [2/5] Add merge-base Daniel Barkalow
  2005-04-17 16:01     ` Petr Baudis
@ 2005-04-17 16:51     ` Daniel Barkalow
  2005-04-17 21:21       ` Petr Baudis
  1 sibling, 1 reply; 49+ messages in thread
From: Daniel Barkalow @ 2005-04-17 16:51 UTC (permalink / raw
  To: Petr Baudis; +Cc: git

merge-base finds one of the best common ancestors of a pair of commits. In
particular, it finds one of the ones which is fewest commits away from the
further of the heads.

Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org>
Index: Makefile
===================================================================
--- 45f926575d2c44072bfcf2317dbf3f0fbb513a4e/Makefile  (mode:100644 sha1:346e3850de026485802e41e16a1180be2df85e4a)
+++ 7d806c2d3be8f87d3d4d87e5254500d7fc24476b/Makefile  (mode:100644 sha1:0e84e3cd12f836602b420c197e08fabefe975493)
@@ -14,7 +17,7 @@
 
 PROG=   update-cache show-diff init-db write-tree read-tree commit-tree \
 	cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \
-	check-files ls-tree
+	check-files ls-tree merge-base
 
 SCRIPT=	parent-id tree-id git gitXnormid.sh gitadd.sh gitaddremote.sh \
 	gitcommit.sh gitdiff-do gitdiff.sh gitlog.sh gitls.sh gitlsobj.sh \
Index: merge-base.c
===================================================================
--- /dev/null  (tree:45f926575d2c44072bfcf2317dbf3f0fbb513a4e)
+++ 7d806c2d3be8f87d3d4d87e5254500d7fc24476b/merge-base.c  (mode:100644 sha1:ee979c7532cbdf823e9930993b0dd8f97aadb21f)
@@ -0,0 +1,95 @@
+#include <stdlib.h>
+#include "cache.h"
+#include "revision.h"
+
+static struct revision *process_list(struct parent **list_p, int this_mark,
+				     int other_mark)
+{
+	struct parent *parent, *temp;
+	struct parent *posn = *list_p;
+	*list_p = NULL;
+	while (posn) {
+		parse_commit_object(posn->parent);
+		if (posn->parent->flags & this_mark) {
+			/*
+			  printf("%d already seen %s %x\n",
+			  this_mark
+			  sha1_to_hex(posn->parent->sha1),
+			  posn->parent->flags);
+			*/
+			/* do nothing; this indicates that this side
+			 * split and reformed, and we only need to
+			 * mark it once.
+			 */
+		} else if (posn->parent->flags & other_mark) {
+			return posn->parent;
+		} else {
+			/*
+			  printf("%d based on %s\n",
+			  this_mark,
+			  sha1_to_hex(posn->parent->sha1));
+			*/
+			posn->parent->flags |= this_mark;
+			
+			parent = posn->parent->parent;
+			while (parent) {
+				temp = malloc(sizeof(struct parent));
+				temp->next = *list_p;
+				temp->parent = parent->parent;
+				*list_p = temp;
+				parent = parent->next;
+			}
+		}
+		posn = posn->next;
+	}
+	return NULL;
+}
+
+struct revision *common_ancestor(struct revision *rev1, struct revision *rev2)
+{
+	struct parent *rev1list = malloc(sizeof(struct parent));
+	struct parent *rev2list = malloc(sizeof(struct parent));
+
+	rev1list->parent = rev1;
+	rev1list->next = NULL;
+
+	rev2list->parent = rev2;
+	rev2list->next = NULL;
+
+	while (rev1list || rev2list) {
+		struct revision *ret;
+		ret = process_list(&rev1list, 0x1, 0x2);
+		if (ret) {
+			/* XXXX free lists */
+			return ret;
+		}
+		ret = process_list(&rev2list, 0x2, 0x1);
+		if (ret) {
+			/* XXXX free lists */
+			return ret;
+		}
+	}
+	return NULL;
+}
+
+int main(int argc, char **argv)
+{
+	struct revision *rev1, *rev2, *ret;
+	unsigned char rev1key[20], rev2key[20];
+
+	if (argc != 3 ||
+	    get_sha1_hex(argv[1], rev1key) ||
+	    get_sha1_hex(argv[2], rev2key)) {
+		usage("merge-base <commit-id> <commit-id>");
+	}
+	rev1 = lookup_rev(rev1key);
+	rev2 = lookup_rev(rev2key);
+	ret = common_ancestor(rev1, rev2);
+	if (ret) {
+		printf("%s\n", sha1_to_hex(ret->sha1));
+		return 0;
+	} else {
+		return 1;
+	}
+	
+}


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [4/5] Add option for hardlinkable cache of extracted blobs
  2005-04-17 15:35   ` [4/5] Add option for hardlinkable cache of extracted blobs Daniel Barkalow
@ 2005-04-17 17:47     ` Petr Baudis
  2005-04-17 18:54       ` Daniel Barkalow
  2005-04-17 19:25       ` Paul Jackson
  0 siblings, 2 replies; 49+ messages in thread
From: Petr Baudis @ 2005-04-17 17:47 UTC (permalink / raw
  To: Daniel Barkalow; +Cc: git

Dear diary, on Sun, Apr 17, 2005 at 05:35:19PM CEST, I got a letter
where Daniel Barkalow <barkalow@iabervon.org> told me that...
> Index: checkout-cache.c
> ===================================================================
> --- 157b46ce1d82b3579e2e1258927b0d9bdbc033ab/checkout-cache.c  (mode:100644 sha1:5d3028df0a45329e45fff2006719c9267adeb946)
> +++ 08f7700831e056ad710af69f91e3a8a705b6b2b1/checkout-cache.c  (mode:100644 sha1:338588259e17dd235fdc7db759d770004a760e15)
> @@ -67,6 +71,80 @@
>  	return fd;
>  }
>  
> +#ifdef HARDLINK_CACHE
> +
> +/*
> + * NOTE! This returns a statically allocated buffer, so you have to be
> + * careful about using it. Do a "strdup()" if you need to save the
> + * filename.
> + */
> +char *sha1_blob_cache_file_name(const unsigned char *sha1)
> +{
..code basically identical with sha1_file_name()..
> +}

You can guess what would I like you to do. ;-)

> +
> +static int write_entry(struct cache_entry *ce)
> +{
> +	int fd;
> +	void *new;
> +	unsigned long size;
> +	long wrote;
> +	char type[20];
> +	char *cache_name;
> +	struct stat st;
> +
> +	cache_name = sha1_blob_cache_file_name(ce->sha1);
> +
> +	if (stat(cache_name, &st)) {
..basically cut'n'paste of non-hardlinking write_entry()..

BTW, I'd just use access(F_OK) instead of stat() it I don't care about
the file's stat at all anyway.

> +	}
> +	if (link(cache_name, ce->name)) {
> +		if (errno == ENOENT) {
> +			create_directories(ce->name);
> +			link(cache_name, ce->name);
> +		}
> +	}
> +	return 0;
> +}

I think it would be better to have this as hardlink_entry() and
write_entry() to take the file name to write the entry to. Then you
should explicitly multiplex in checkout_cache() between what you do.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [3/5] Add http-pull
  2005-04-17 15:31   ` [3/5] Add http-pull Daniel Barkalow
@ 2005-04-17 18:10     ` Petr Baudis
  2005-04-17 18:49       ` Daniel Barkalow
  2005-04-17 18:58     ` [3.1/5] " Daniel Barkalow
  1 sibling, 1 reply; 49+ messages in thread
From: Petr Baudis @ 2005-04-17 18:10 UTC (permalink / raw
  To: Daniel Barkalow; +Cc: git

Dear diary, on Sun, Apr 17, 2005 at 05:31:16PM CEST, I got a letter
where Daniel Barkalow <barkalow@iabervon.org> told me that...
> http-pull is a program that downloads from a (normal) HTTP server a commit
> and all of the tree and blob objects it refers to (but not other commits,
> etc.). Options could be used to make it download a larger or different
> selection of objects. It depends on libcurl, which I forgot to mention in
> the README again.
> 
> Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org>

So, while you will be resending the patch, please update the README.

> Index: Makefile
> ===================================================================
> --- d662b707e11391f6cfe597fd4d0bf9c41d34d01a/Makefile  (mode:100644 sha1:b2ce7c5b63fffca59653b980d98379909f893d44)
> +++ 157b46ce1d82b3579e2e1258927b0d9bdbc033ab/Makefile  (mode:100644 sha1:940ef8578cf469354002cd8feaec25d907015267)
> @@ -35,6 +35,7 @@
>  
>  LIBS= -lssl -lz
>  
> +http-pull: LIBS += -lcurl
>  
>  $(PROG):%: %.o $(COMMON)
>  	$(CC) $(CFLAGS) -o $@ $^ $(LIBS)

Whew. Looks like an awful trick, you say this works?! :-)

At times, I wouldn't want to be a GNU make parser.

> Index: http-pull.c
> ===================================================================
> --- /dev/null  (tree:d662b707e11391f6cfe597fd4d0bf9c41d34d01a)
> +++ 157b46ce1d82b3579e2e1258927b0d9bdbc033ab/http-pull.c  (mode:100644 sha1:106ca31239e6afe6784e7c592234406f5c149e44)
> @@ -0,0 +1,126 @@
> +	if (!stat(filename, &st)) {
> +		return 0;
> +	}

access()

> +	url = malloc(strlen(base) + 50);

Off-by-one. What about the trailing NUL?

> +	strcpy(url, base);
> +	posn = url + strlen(base);
> +	strcpy(posn, "objects/");
> +	posn += 8;
> +	memcpy(posn, hex, 2);
> +	posn += 2;
> +	*(posn++) = '/';
> +	strcpy(posn, hex + 2);


> +static int process_tree(unsigned char *sha1)
> +{
> +	void *buffer;
> +        unsigned long size;
> +        char type[20];
> +
> +        buffer = read_sha1_file(sha1, type, &size);

Something with your whitespaces is wrong here. ;-)

> +	fetch(rev->tree);
> +	process_tree(rev->tree);

> +	fetch(sha1);
> +	process_commit(sha1);

You are ignoring return codes of own routines everywhere.
You should use error() instead of plain -1, BTW.


I think you should have at least two disjunct modes - either you are
downloading everything related to the given commit, or you are
downloading all commit records for commit predecessors.

Even if you might not want all the intermediate trees, you definitively
want the intermediate commits, to keep the history graph contignuous.

So in git pull, I'd imagine to do

	http-pull -c $new_head
	http-pull -t $(tree-id $new_head)

So, -c would fetch a given commit and all its predecessors until it hits
what you already have on your side. -t would fetch a given tree with all
files and subtrees and everything. http-pull shouldn't default on
either, since they are mutually exclusive.

What do you think?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [1/5] Parsing code in revision.h
  2005-04-17 15:24   ` [1/5] Parsing code in revision.h Daniel Barkalow
  2005-04-17 16:09     ` Petr Baudis
@ 2005-04-17 18:18     ` Linus Torvalds
  2005-04-17 18:30       ` Petr Baudis
  2005-04-17 19:09       ` Daniel Barkalow
  1 sibling, 2 replies; 49+ messages in thread
From: Linus Torvalds @ 2005-04-17 18:18 UTC (permalink / raw
  To: Daniel Barkalow; +Cc: Petr Baudis, git



On Sun, 17 Apr 2005, Daniel Barkalow wrote:
>
> --- 45f926575d2c44072bfcf2317dbf3f0fbb513a4e/revision.h  (mode:100644 sha1:28d0de3261a61f68e4e0948a25a416a515cd2e83)
> +++ 37a0b01b85c2999243674d48bfc71cdba0e5518e/revision.h  (mode:100644 sha1:523bde6e14e18bb0ecbded8f83ad4df93fc467ab)
> @@ -24,6 +24,7 @@
>  	unsigned int flags;
>  	unsigned char sha1[20];
>  	unsigned long date;
> +	unsigned char tree[20];
>  	struct parent *parent;
>  };
>  

I think this is really wrong.

The whole point of "revision.h" is that it's a generic framework for 
keeping track of relationships between different objects. And those 
objects are in no way just "commit" objects.

For example, fsck uses this "struct revision" to create a full free of 
_all_ the object dependencies, which means that a "struct revision" can be 
any object at all - it's not in any way limited to commit objects, and 
there is no "tree" object that is associated with these things at all.

Besides, why do you want the tree? There's really nothing you can do with 
the tree to a first approximation - you need to _first_ do the 
reachability analysis entirely on the commit dependencies, and then when 
you've selected a set of commits, you can just output those.

Later phases will indeed look up what the tree is, but that's only after
you've decided on the commit object. There's no point in looking up (or
even trying to just remember) _all_ the tree objects.

Hmm?

		Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [1/5] Parsing code in revision.h
  2005-04-17 18:18     ` [1/5] " Linus Torvalds
@ 2005-04-17 18:30       ` Petr Baudis
  2005-04-17 19:25         ` Linus Torvalds
  2005-04-17 19:09       ` Daniel Barkalow
  1 sibling, 1 reply; 49+ messages in thread
From: Petr Baudis @ 2005-04-17 18:30 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Daniel Barkalow, git

Dear diary, on Sun, Apr 17, 2005 at 08:18:47PM CEST, I got a letter
where Linus Torvalds <torvalds@osdl.org> told me that...
> 
> 
> On Sun, 17 Apr 2005, Daniel Barkalow wrote:
> >
> > --- 45f926575d2c44072bfcf2317dbf3f0fbb513a4e/revision.h  (mode:100644 sha1:28d0de3261a61f68e4e0948a25a416a515cd2e83)
> > +++ 37a0b01b85c2999243674d48bfc71cdba0e5518e/revision.h  (mode:100644 sha1:523bde6e14e18bb0ecbded8f83ad4df93fc467ab)
> > @@ -24,6 +24,7 @@
> >  	unsigned int flags;
> >  	unsigned char sha1[20];
> >  	unsigned long date;
> > +	unsigned char tree[20];
> >  	struct parent *parent;
> >  };
> >  
> 
> I think this is really wrong.
> 
> The whole point of "revision.h" is that it's a generic framework for 
> keeping track of relationships between different objects. And those 
> objects are in no way just "commit" objects.

Someone started the avalanche by adding date to the structure. Of
course, date is smaller, but it leads people (including me) out of the
way.

Perhaps struct commit which will have struct revision (ugh - what about
rather struct object?) as a member?

> For example, fsck uses this "struct revision" to create a full free of 
> _all_ the object dependencies, which means that a "struct revision" can be 
> any object at all - it's not in any way limited to commit objects, and 
> there is no "tree" object that is associated with these things at all.

That's some really bad naming then.

> Besides, why do you want the tree? There's really nothing you can do with 
> the tree to a first approximation - you need to _first_ do the 
> reachability analysis entirely on the commit dependencies, and then when 
> you've selected a set of commits, you can just output those.
> 
> Later phases will indeed look up what the tree is, but that's only after
> you've decided on the commit object. There's no point in looking up (or
> even trying to just remember) _all_ the tree objects.

The goal was to have a commit record parser which would spit out this
structure containing all the relevant info, but I can agree that wasting
memory with it makes no sense. Perhaps it could take a possibly-NULL
buffer pointer where it would drop the tree ID, Daniel?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [3/5] Add http-pull
  2005-04-17 18:10     ` Petr Baudis
@ 2005-04-17 18:49       ` Daniel Barkalow
  2005-04-17 19:08         ` Petr Baudis
  0 siblings, 1 reply; 49+ messages in thread
From: Daniel Barkalow @ 2005-04-17 18:49 UTC (permalink / raw
  To: Petr Baudis; +Cc: git

On Sun, 17 Apr 2005, Petr Baudis wrote:

> > Index: Makefile
> > ===================================================================
> > --- d662b707e11391f6cfe597fd4d0bf9c41d34d01a/Makefile  (mode:100644 sha1:b2ce7c5b63fffca59653b980d98379909f893d44)
> > +++ 157b46ce1d82b3579e2e1258927b0d9bdbc033ab/Makefile  (mode:100644 sha1:940ef8578cf469354002cd8feaec25d907015267)
> > @@ -35,6 +35,7 @@
> >  
> >  LIBS= -lssl -lz
> >  
> > +http-pull: LIBS += -lcurl
> >  
> >  $(PROG):%: %.o $(COMMON)
> >  	$(CC) $(CFLAGS) -o $@ $^ $(LIBS)
> 
> Whew. Looks like an awful trick, you say this works?! :-)
> 
> At times, I wouldn't want to be a GNU make parser.

Yup. GNU make is big on the features which do the obvious thing, even when
you can't believe they work. This is probably why nobody's managed to
replace it.

> > Index: http-pull.c
> > ===================================================================
> > --- /dev/null  (tree:d662b707e11391f6cfe597fd4d0bf9c41d34d01a)
> > +++ 157b46ce1d82b3579e2e1258927b0d9bdbc033ab/http-pull.c  (mode:100644 sha1:106ca31239e6afe6784e7c592234406f5c149e44)
> > +	url = malloc(strlen(base) + 50);
> 
> Off-by-one. What about the trailing NUL?

I get length(base) + "object/"=8 + 40 SHA1 + 1 for '/' and 1 for NUL = 50.

> I think you should have at least two disjunct modes - either you are
> downloading everything related to the given commit, or you are
> downloading all commit records for commit predecessors.
> 
> Even if you might not want all the intermediate trees, you definitively
> want the intermediate commits, to keep the history graph contignuous.
> 
> So in git pull, I'd imagine to do
> 
> 	http-pull -c $new_head
> 	http-pull -t $(tree-id $new_head)
> 
> So, -c would fetch a given commit and all its predecessors until it hits
> what you already have on your side. -t would fetch a given tree with all
> files and subtrees and everything. http-pull shouldn't default on
> either, since they are mutually exclusive.
> 
> What do you think?

I think I'd rather keep the current behavior and add a -c for getting the
history of commits, and maybe a -a for getting the history of commits and
their tress.

There's some trickiness for the history of commits thing for stopping at
the point where you have everything, but also behaving appropriately if
you try once, fail partway through, and then try again. It's on my queue
of things to think about.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [4/5] Add option for hardlinkable cache of extracted blobs
  2005-04-17 17:47     ` Petr Baudis
@ 2005-04-17 18:54       ` Daniel Barkalow
  2005-04-17 19:25       ` Paul Jackson
  1 sibling, 0 replies; 49+ messages in thread
From: Daniel Barkalow @ 2005-04-17 18:54 UTC (permalink / raw
  To: Petr Baudis; +Cc: git

Drop this one for now; I'll revisit it once more important stuff is
settled down.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [3.1/5] Add http-pull
  2005-04-17 15:31   ` [3/5] Add http-pull Daniel Barkalow
  2005-04-17 18:10     ` Petr Baudis
@ 2005-04-17 18:58     ` Daniel Barkalow
  1 sibling, 0 replies; 49+ messages in thread
From: Daniel Barkalow @ 2005-04-17 18:58 UTC (permalink / raw
  To: Petr Baudis; +Cc: git

http-pull is a program that downloads from a (normal) HTTP server a commit
and all of the tree and blob objects it refers to (but not other commits,
etc.). Options could be used to make it download a larger or different
selection of objects.

Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org>
Index: Makefile
===================================================================
--- 45f926575d2c44072bfcf2317dbf3f0fbb513a4e/Makefile  (mode:100644 sha1:346e3850de026485802e41e16a1180be2df85e4a)
+++ 3eae85f66143160a26f5545d197862c89e2a8fb8/Makefile  (mode:100644 sha1:0e84e3cd12f836602b420c197e08fabefe975493)
@@ -14,7 +17,7 @@
 
 PROG=   update-cache show-diff init-db write-tree read-tree commit-tree \
 	cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \
-	check-files ls-tree merge-base
+	check-files ls-tree http-pull merge-base
 
 SCRIPT=	parent-id tree-id git gitXnormid.sh gitadd.sh gitaddremote.sh \
 	gitcommit.sh gitdiff-do gitdiff.sh gitlog.sh gitls.sh gitlsobj.sh \
@@ -35,6 +38,7 @@
 
 LIBS= -lssl -lz
 
+http-pull: LIBS += -lcurl
 
 $(PROG):%: %.o $(COMMON)
 	$(CC) $(CFLAGS) -o $@ $^ $(LIBS)
Index: README
===================================================================
--- 45f926575d2c44072bfcf2317dbf3f0fbb513a4e/README  (mode:100664 sha1:0170eafb60ad9009ca41c6536cecd6d1fdee5b86)
+++ 3eae85f66143160a26f5545d197862c89e2a8fb8/README  (mode:100664 sha1:921d552d810394e665323ec82b4826914918689c)
@@ -120,7 +120,7 @@
 	diff, patch
 	libssl
 	rsync
-
+	curl (later than 7.7, according to the docs)
 
 
 	The "core GIT"
Index: http-pull.c
===================================================================
--- /dev/null  (tree:45f926575d2c44072bfcf2317dbf3f0fbb513a4e)
+++ 3eae85f66143160a26f5545d197862c89e2a8fb8/http-pull.c  (mode:100644 sha1:7ba4ad67f6dac34addb537ee147ae3de0550a484)
@@ -0,0 +1,139 @@
+#include <fcntl.h>
+#include <unistd.h>
+#include <string.h>
+#include <stdlib.h>
+#include "cache.h"
+#include "revision.h"
+#include <errno.h>
+#include <stdio.h>
+
+#include <curl/curl.h>
+#include <curl/easy.h>
+
+static CURL *curl;
+
+static char *base;
+
+static int fetch(unsigned char *sha1)
+{
+	char *hex = sha1_to_hex(sha1);
+	char *filename = sha1_file_name(sha1);
+
+	char *url;
+	char *posn;
+	FILE *local;
+
+	if (!access(filename, R_OK)) {
+		return 0;
+	}
+
+	local = fopen(filename, "w");
+
+	if (!local) {
+		return error("Couldn't open %s", filename);
+	}
+
+	curl_easy_setopt(curl, CURLOPT_FILE, local);
+
+	url = malloc(strlen(base) + 50);
+	strcpy(url, base);
+	posn = url + strlen(base);
+	strcpy(posn, "objects/");
+	posn += 8;
+	memcpy(posn, hex, 2);
+	posn += 2;
+	*(posn++) = '/';
+	strcpy(posn, hex + 2);
+
+	curl_easy_setopt(curl, CURLOPT_URL, url);
+
+	if (curl_easy_perform(curl)) {
+		fclose(local);
+		unlink(filename);
+		return error("Error downloading %s from %s",
+			     sha1_to_hex(sha1), url);
+	}
+
+	fclose(local);
+	
+	return 0;
+}
+
+static int process_tree(unsigned char *sha1)
+{
+	void *buffer;
+	unsigned long size;
+	char type[20];
+
+	buffer = read_sha1_file(sha1, type, &size);
+	if (!buffer)
+	 	return error("Couldn't read %s.",
+			     sha1_to_hex(sha1));
+	if (strcmp(type, "tree"))
+		return error("Expected %s to be a tree, but was a %s.",
+			     sha1_to_hex(sha1), type);
+	while (size) {
+		int len = strlen(buffer) + 1;
+		unsigned char *sha1 = buffer + len;
+		unsigned int mode;
+		int retval;
+
+		if (size < len + 20 || sscanf(buffer, "%o", &mode) != 1)
+			return error("Invalid tree object");
+
+		buffer = sha1 + 20;
+		size -= len + 20;
+
+		retval = fetch(sha1);
+		if (retval)
+			return retval;
+
+		if (S_ISDIR(mode)) {
+			retval = process_tree(sha1);
+			if (retval)
+				return retval;
+		}
+	}
+	return 0;
+}
+
+static int process_commit(unsigned char *sha1)
+{
+	int retval;
+	struct revision *rev = lookup_rev(sha1);
+	if (parse_commit_object(rev))
+		return error("Couldn't parse commit %s\n", sha1_to_hex(sha1));
+
+	retval = fetch(rev->tree);
+	if (retval)
+		return retval;
+	retval = process_tree(rev->tree);
+	return retval;
+}
+
+int main(int argc, char **argv)
+{
+	char *commit_id = argv[1];
+	char *url = argv[2];
+	int retval;
+
+	unsigned char sha1[20];
+
+	get_sha1_hex(commit_id, sha1);
+
+	curl_global_init(CURL_GLOBAL_ALL);
+
+	curl = curl_easy_init();
+
+	base = url;
+
+	retval = fetch(sha1);
+	if (retval)
+		return 1;
+	retval = process_commit(sha1);
+	if (retval)
+		return 1;
+
+	curl_global_cleanup();
+	return 0;
+}


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [3/5] Add http-pull
  2005-04-17 18:49       ` Daniel Barkalow
@ 2005-04-17 19:08         ` Petr Baudis
  2005-04-17 19:24           ` Daniel Barkalow
  0 siblings, 1 reply; 49+ messages in thread
From: Petr Baudis @ 2005-04-17 19:08 UTC (permalink / raw
  To: Daniel Barkalow; +Cc: git

Dear diary, on Sun, Apr 17, 2005 at 08:49:11PM CEST, I got a letter
where Daniel Barkalow <barkalow@iabervon.org> told me that...
> On Sun, 17 Apr 2005, Petr Baudis wrote:
> > > Index: http-pull.c
> > > ===================================================================
> > > --- /dev/null  (tree:d662b707e11391f6cfe597fd4d0bf9c41d34d01a)
> > > +++ 157b46ce1d82b3579e2e1258927b0d9bdbc033ab/http-pull.c  (mode:100644 sha1:106ca31239e6afe6784e7c592234406f5c149e44)
> > > +	url = malloc(strlen(base) + 50);
> > 
> > Off-by-one. What about the trailing NUL?
> 
> I get length(base) + "object/"=8 + 40 SHA1 + 1 for '/' and 1 for NUL = 50.

Sorry, counted one '/' more. :-)

> > I think you should have at least two disjunct modes - either you are
> > downloading everything related to the given commit, or you are
> > downloading all commit records for commit predecessors.
> > 
> > Even if you might not want all the intermediate trees, you definitively
> > want the intermediate commits, to keep the history graph contignuous.
> > 
> > So in git pull, I'd imagine to do
> > 
> > 	http-pull -c $new_head
> > 	http-pull -t $(tree-id $new_head)
> > 
> > So, -c would fetch a given commit and all its predecessors until it hits
> > what you already have on your side. -t would fetch a given tree with all
> > files and subtrees and everything. http-pull shouldn't default on
> > either, since they are mutually exclusive.
> > 
> > What do you think?
> 
> I think I'd rather keep the current behavior and add a -c for getting the
> history of commits, and maybe a -a for getting the history of commits and
> their tress.

I'm not too kind at this. Either make it totally separate commands, or
make a required switch specifying what to do. Otherwise it implies the
switches would just modify what it does, but they make it do something
completely different.

-a would be fine too - basically a combination of -c and -t. I'd imagine
that is what Linus would want to use, e.g.

> There's some trickiness for the history of commits thing for stopping at
> the point where you have everything, but also behaving appropriately if
> you try once, fail partway through, and then try again. It's on my queue
> of things to think about.

Can't you just stop the recursion when you hit a commit you already
have?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [1/5] Parsing code in revision.h
  2005-04-17 18:18     ` [1/5] " Linus Torvalds
  2005-04-17 18:30       ` Petr Baudis
@ 2005-04-17 19:09       ` Daniel Barkalow
  1 sibling, 0 replies; 49+ messages in thread
From: Daniel Barkalow @ 2005-04-17 19:09 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Petr Baudis, git

On Sun, 17 Apr 2005, Linus Torvalds wrote:

> On Sun, 17 Apr 2005, Daniel Barkalow wrote:
> >
> > --- 45f926575d2c44072bfcf2317dbf3f0fbb513a4e/revision.h  (mode:100644 sha1:28d0de3261a61f68e4e0948a25a416a515cd2e83)
> > +++ 37a0b01b85c2999243674d48bfc71cdba0e5518e/revision.h  (mode:100644 sha1:523bde6e14e18bb0ecbded8f83ad4df93fc467ab)
> > @@ -24,6 +24,7 @@
> >  	unsigned int flags;
> >  	unsigned char sha1[20];
> >  	unsigned long date;
> > +	unsigned char tree[20];
> >  	struct parent *parent;
> >  };
> >  
> 
> I think this is really wrong.
> 
> The whole point of "revision.h" is that it's a generic framework for 
> keeping track of relationships between different objects. And those 
> objects are in no way just "commit" objects.
>
> For example, fsck uses this "struct revision" to create a full free of 
> _all_ the object dependencies, which means that a "struct revision" can be 
> any object at all - it's not in any way limited to commit objects, and 
> there is no "tree" object that is associated with these things at all.

I entirely missed this. No wonder my fsck-cache conversion wasn't going
so well...

> Besides, why do you want the tree? There's really nothing you can do with 
> the tree to a first approximation - you need to _first_ do the 
> reachability analysis entirely on the commit dependencies, and then when 
> you've selected a set of commits, you can just output those.

I actually want the tree for http-pull, not merging stuff. I was trying to
get a commit parser, not reachability at that point.

I think the right thing is to make a separate struct commit that has the
stuff I want in it, and probably do a struct tree at the same time.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [3/5] Add http-pull
  2005-04-17 19:08         ` Petr Baudis
@ 2005-04-17 19:24           ` Daniel Barkalow
  2005-04-17 19:59             ` Petr Baudis
  0 siblings, 1 reply; 49+ messages in thread
From: Daniel Barkalow @ 2005-04-17 19:24 UTC (permalink / raw
  To: Petr Baudis; +Cc: git

On Sun, 17 Apr 2005, Petr Baudis wrote:

> Dear diary, on Sun, Apr 17, 2005 at 08:49:11PM CEST, I got a letter
> where Daniel Barkalow <barkalow@iabervon.org> told me that...
> 
> I'm not too kind at this. Either make it totally separate commands, or
> make a required switch specifying what to do. Otherwise it implies the
> switches would just modify what it does, but they make it do something
> completely different.

That's a good point. I'll require a -t for now, and add more later.

> -a would be fine too - basically a combination of -c and -t. I'd imagine
> that is what Linus would want to use, e.g.

Well, -c -t would give you the current tree and the whole commit log, but
not old trees. -a would additionally give you old trees.

> > There's some trickiness for the history of commits thing for stopping at
> > the point where you have everything, but also behaving appropriately if
> > you try once, fail partway through, and then try again. It's on my queue
> > of things to think about.
> 
> Can't you just stop the recursion when you hit a commit you already
> have?

The problem is that, if you've fetched the final commit already, and then
the server dies, and you try again later, you already have the last one,
and so you think you've got everything.

At this point, I also want to put off doing much further with recursion
and commits until revision.h and such are sorted out.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [4/5] Add option for hardlinkable cache of extracted blobs
  2005-04-17 17:47     ` Petr Baudis
  2005-04-17 18:54       ` Daniel Barkalow
@ 2005-04-17 19:25       ` Paul Jackson
  2005-04-17 19:59         ` Petr Baudis
  1 sibling, 1 reply; 49+ messages in thread
From: Paul Jackson @ 2005-04-17 19:25 UTC (permalink / raw
  To: Petr Baudis; +Cc: barkalow, git

Petr wrote:
> BTW, I'd just use access(F_OK) instead of stat() it I don't care about

That's a bad habit to get into.

access(2) checks with the process's real uid and gid, rather than with
the effective ids as is done when actually attempting an operation. 
This is to allow set-UID programs to easily determine the invoking
user's authority.

Using access(2) when it shouldn't be used is a common source of bugs.

I recommend _only_ using it when you require exactly the above real vs.
effective id behaviour.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [1/5] Parsing code in revision.h
  2005-04-17 18:30       ` Petr Baudis
@ 2005-04-17 19:25         ` Linus Torvalds
  2005-04-17 19:45           ` Daniel Barkalow
  0 siblings, 1 reply; 49+ messages in thread
From: Linus Torvalds @ 2005-04-17 19:25 UTC (permalink / raw
  To: Petr Baudis; +Cc: Daniel Barkalow, git



On Sun, 17 Apr 2005, Petr Baudis wrote:
> 
> Someone started the avalanche by adding date to the structure. Of
> course, date is smaller, but it leads people (including me) out of the
> way.

Yeah, the naming and the structure comes from "rev-tree.c", so there's a 
bit of historical baggage already. 

Anyway, I don't think you should need it. I cleaned up things a bit, and 
wrote a really simple "merge-base" thing that does base the "best" hit on 
date, which ends up probably doing the right thing in practice.

It might be interesting to extend that to do the "five best" common
parents according to date (making sure to remove the trivial cases: a
parent of a common parent is always itself a common parent, but such a
common grandparent is obviously always uninteresting).

Then, for that small set of parents, doing something much more involved
(generation counting is fairly simple, but possibly not as good a
"goodness" match as tree-diff or something).

		Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [1/5] Parsing code in revision.h
  2005-04-17 19:25         ` Linus Torvalds
@ 2005-04-17 19:45           ` Daniel Barkalow
  2005-04-17 19:54             ` Linus Torvalds
  0 siblings, 1 reply; 49+ messages in thread
From: Daniel Barkalow @ 2005-04-17 19:45 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Petr Baudis, git

On Sun, 17 Apr 2005, Linus Torvalds wrote:

> On Sun, 17 Apr 2005, Petr Baudis wrote:
> > 
> > Someone started the avalanche by adding date to the structure. Of
> > course, date is smaller, but it leads people (including me) out of the
> > way.
> 
> Yeah, the naming and the structure comes from "rev-tree.c", so there's a 
> bit of historical baggage already. 
> 
> Anyway, I don't think you should need it. I cleaned up things a bit, and 
> wrote a really simple "merge-base" thing that does base the "best" hit on 
> date, which ends up probably doing the right thing in practice.

Yours reads the whole commit history; I intentionally wrote mine to
only read as far back as turns out to be necessary. I think that looking
at the whole history is going to be impractical when you're trying to
merge in a bunch of patches against the latest release, even if you pull
the history out of a cache. When it's one step on one side and a dozen on
the other, it matters a whole lot if there's a year of history behind the
common ancestor(s).

So I still think it's best to have a non-recursive commit parser, and do
the recursion only as needed for the operation under consideration.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [1/5] Parsing code in revision.h
  2005-04-17 19:45           ` Daniel Barkalow
@ 2005-04-17 19:54             ` Linus Torvalds
  2005-04-17 20:06               ` Linus Torvalds
  0 siblings, 1 reply; 49+ messages in thread
From: Linus Torvalds @ 2005-04-17 19:54 UTC (permalink / raw
  To: Daniel Barkalow; +Cc: Petr Baudis, git



On Sun, 17 Apr 2005, Daniel Barkalow wrote:
> 
> Yours reads the whole commit history; I intentionally wrote mine to
> only read as far back as turns out to be necessary.

Yes. I'm not opposed to yours, I was just opposed to some of the things 
around it you did, so I wrote mine as a kind of place-holder. I'll happily 
take patches to turn it from a rally simple and stupid one into a more 
polished version.

		Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [3/5] Add http-pull
  2005-04-17 19:24           ` Daniel Barkalow
@ 2005-04-17 19:59             ` Petr Baudis
  2005-04-21  3:27               ` Brad Roberts
  0 siblings, 1 reply; 49+ messages in thread
From: Petr Baudis @ 2005-04-17 19:59 UTC (permalink / raw
  To: Daniel Barkalow; +Cc: git

Dear diary, on Sun, Apr 17, 2005 at 09:24:27PM CEST, I got a letter
where Daniel Barkalow <barkalow@iabervon.org> told me that...
> On Sun, 17 Apr 2005, Petr Baudis wrote:
> 
> > Dear diary, on Sun, Apr 17, 2005 at 08:49:11PM CEST, I got a letter
> > where Daniel Barkalow <barkalow@iabervon.org> told me that...
> > > There's some trickiness for the history of commits thing for stopping at
> > > the point where you have everything, but also behaving appropriately if
> > > you try once, fail partway through, and then try again. It's on my queue
> > > of things to think about.
> > 
> > Can't you just stop the recursion when you hit a commit you already
> > have?
> 
> The problem is that, if you've fetched the final commit already, and then
> the server dies, and you try again later, you already have the last one,
> and so you think you've got everything.

Hmm, some kind of journaling? ;-)

> At this point, I also want to put off doing much further with recursion
> and commits until revision.h and such are sorted out.

Agreed.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [4/5] Add option for hardlinkable cache of extracted blobs
  2005-04-17 19:25       ` Paul Jackson
@ 2005-04-17 19:59         ` Petr Baudis
  2005-04-17 20:03           ` Daniel Barkalow
  2005-04-18  1:20           ` Paul Jackson
  0 siblings, 2 replies; 49+ messages in thread
From: Petr Baudis @ 2005-04-17 19:59 UTC (permalink / raw
  To: Paul Jackson; +Cc: barkalow, git

Dear diary, on Sun, Apr 17, 2005 at 09:25:17PM CEST, I got a letter
where Paul Jackson <pj@sgi.com> told me that...
> Petr wrote:
> > BTW, I'd just use access(F_OK) instead of stat() it I don't care about
> 
> I recommend _only_ using it when you require exactly the above real vs.
> effective id behaviour.

Does this distinction have any effect when doing F_OK?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [4/5] Add option for hardlinkable cache of extracted blobs
  2005-04-17 19:59         ` Petr Baudis
@ 2005-04-17 20:03           ` Daniel Barkalow
  2005-04-17 20:18             ` Petr Baudis
                               ` (2 more replies)
  2005-04-18  1:20           ` Paul Jackson
  1 sibling, 3 replies; 49+ messages in thread
From: Daniel Barkalow @ 2005-04-17 20:03 UTC (permalink / raw
  To: Petr Baudis; +Cc: Paul Jackson, git

On Sun, 17 Apr 2005, Petr Baudis wrote:

> Dear diary, on Sun, Apr 17, 2005 at 09:25:17PM CEST, I got a letter
> where Paul Jackson <pj@sgi.com> told me that...
> > Petr wrote:
> > > BTW, I'd just use access(F_OK) instead of stat() it I don't care about
> > 
> > I recommend _only_ using it when you require exactly the above real vs.
> > effective id behaviour.
> 
> Does this distinction have any effect when doing F_OK?

Actually, the documentation I've got says:

"F_OK requests checking whether merely testing for the existence of the
 file would be allowed (this depends on the permissions of the directories
 in the path to the file, as given in path-name.)"

So it shouldn't complain about a filename which you're allowed to try to
stat, even if there's nothing there. And it would depend on the privs of
the wrong user in looking at the path.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [1/5] Parsing code in revision.h
  2005-04-17 19:54             ` Linus Torvalds
@ 2005-04-17 20:06               ` Linus Torvalds
  2005-04-17 20:22                 ` Daniel Barkalow
  0 siblings, 1 reply; 49+ messages in thread
From: Linus Torvalds @ 2005-04-17 20:06 UTC (permalink / raw
  To: Daniel Barkalow; +Cc: Petr Baudis, git



On Sun, 17 Apr 2005, Linus Torvalds wrote:
> 
> Yes. I'm not opposed to yours, I was just opposed to some of the things 
> around it you did, so I wrote mine as a kind of place-holder. I'll happily 
> take patches to turn it from a rally simple and stupid one into a more 
> polished version.

Btw, before I forget - I did have another reason. I actually think that
the date is potentially a lot more important than "how many parents deep".

In particular, it's entirely possible that the top of my head might be a
veru recent merge that merges with a small fix relative to a very old
parent (making that old parent be just two hops away from the head), while
the thing I want to merge might also have that old parent (for similar
reasons) as a relatively "close" parent from a pure link-counting
standpoint.

The reason I bring this up is that quite often people end up basing their
work on a specific release version, so a merge (especially in specialized
areas) may thus bring such an old parent pretty close to the head, and it
can actually be quite possible (indeed probable) that such a parent ends
up being a common parent.

However, it can easily be a very _bad_ parent.

In ascii barfic:

	        ----------------------- patch ---------
	       /                                        \
	      /                                          \
	- old release -- ... lots of development .. -----HEAD
	     \  \
	      \  \
	       \  ---------------------patch-- MERGE-HEAD      
	        \                             /
		  .. lots of development ..  /

it looks like "old release" is pretty close to both HEAD and MERGE-HEAD, 
right?

But that's just an artifact of the fact that they both had a trivial merge
against some older code, and if the two "lots of development" things have
ever done an earlier merge, there's quite possibly a _much_ better common
parent there somewhere.

I dunno.

		Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [4/5] Add option for hardlinkable cache of extracted blobs
  2005-04-17 20:03           ` Daniel Barkalow
@ 2005-04-17 20:18             ` Petr Baudis
  2005-04-18  1:35               ` Paul Jackson
  2005-04-17 20:58             ` Russell King
  2005-04-18  1:24             ` [4/5] Add option for hardlinkable cache of extracted blobs Paul Jackson
  2 siblings, 1 reply; 49+ messages in thread
From: Petr Baudis @ 2005-04-17 20:18 UTC (permalink / raw
  To: Daniel Barkalow; +Cc: Paul Jackson, git

Dear diary, on Sun, Apr 17, 2005 at 10:03:46PM CEST, I got a letter
where Daniel Barkalow <barkalow@iabervon.org> told me that...
> On Sun, 17 Apr 2005, Petr Baudis wrote:
> 
> > Dear diary, on Sun, Apr 17, 2005 at 09:25:17PM CEST, I got a letter
> > where Paul Jackson <pj@sgi.com> told me that...
> > > Petr wrote:
> > > > BTW, I'd just use access(F_OK) instead of stat() it I don't care about
> > > 
> > > I recommend _only_ using it when you require exactly the above real vs.
> > > effective id behaviour.
> > 
> > Does this distinction have any effect when doing F_OK?
> 
> Actually, the documentation I've got says:
> 
> "F_OK requests checking whether merely testing for the existence of the
>  file would be allowed (this depends on the permissions of the directories
>  in the path to the file, as given in path-name.)"
> 
> So it shouldn't complain about a filename which you're allowed to try to
> stat, even if there's nothing there. And it would depend on the privs of
> the wrong user in looking at the path.

The documentation I've got says:

"R_OK,  W_OK  and  X_OK request checking whether the file exists and has
 read, write and execute permissions, respectively.  F_OK just requests
 checking for the existence of the file."

And IEEE1003.1 agrees:
http://www.opengroup.org/onlinepubs/009695399/functions/access.html

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [1/5] Parsing code in revision.h
  2005-04-17 20:06               ` Linus Torvalds
@ 2005-04-17 20:22                 ` Daniel Barkalow
  0 siblings, 0 replies; 49+ messages in thread
From: Daniel Barkalow @ 2005-04-17 20:22 UTC (permalink / raw
  To: Linus Torvalds; +Cc: Petr Baudis, git

On Sun, 17 Apr 2005, Linus Torvalds wrote:

> On Sun, 17 Apr 2005, Linus Torvalds wrote:
> > 
> > Yes. I'm not opposed to yours, I was just opposed to some of the things 
> > around it you did, so I wrote mine as a kind of place-holder. I'll happily 
> > take patches to turn it from a rally simple and stupid one into a more 
> > polished version.
> 
> Btw, before I forget - I did have another reason. I actually think that
> the date is potentially a lot more important than "how many parents deep".
> 
> In particular, it's entirely possible that the top of my head might be a
> veru recent merge that merges with a small fix relative to a very old
> parent (making that old parent be just two hops away from the head), while
> the thing I want to merge might also have that old parent (for similar
> reasons) as a relatively "close" parent from a pure link-counting
> standpoint.
> 
> The reason I bring this up is that quite often people end up basing their
> work on a specific release version, so a merge (especially in specialized
> areas) may thus bring such an old parent pretty close to the head, and it
> can actually be quite possible (indeed probable) that such a parent ends
> up being a common parent.
> 
> However, it can easily be a very _bad_ parent.
> 
> In ascii barfic:
> 
> 	        ----------------------- patch ---------
> 	       /                                        \
> 	      /                                          \
> 	- old release -- ... lots of development .. -----HEAD
> 	     \  \                 /
> 	      \  \               /
> 	       \  --------------/------patch-- MERGE-HEAD      
> 	        \              /              /
> 		  .. lots of development ..  /

(I added a merge line so there's another commit to discuss in the picture)

> it looks like "old release" is pretty close to both HEAD and MERGE-HEAD, 
> right?
> 
> But that's just an artifact of the fact that they both had a trivial merge
> against some older code, and if the two "lots of development" things have
> ever done an earlier merge, there's quite possibly a _much_ better common
> parent there somewhere.
> 
> I dunno.

I think you're going to want multiple common ancestors being used in the
merge when this thing starts to happen; otherwise, you're going to have to
fix up conflicts for "patch" for all the trees you pull. (Also, you're
only going to see a parent for patch (that is, the old revision as an
ancestor of the application of patch) if the trees are merging it in via
git, in which case you'll also see a commit for patch, which will have a
recent date. So you'd see "just yesterday, we both merged a tiny 
change; so that tree, which is very much like an ancient one, is more
recent that last weeks major merge".

I dunno either, but I hope we'll have 2+n-way-merge before there's a lot
of complex history to deal with.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [4/5] Add option for hardlinkable cache of extracted blobs
  2005-04-17 20:03           ` Daniel Barkalow
  2005-04-17 20:18             ` Petr Baudis
@ 2005-04-17 20:58             ` Russell King
  2005-04-17 22:10               ` First ever real kernel git merge! Linus Torvalds
  2005-04-18  1:24             ` [4/5] Add option for hardlinkable cache of extracted blobs Paul Jackson
  2 siblings, 1 reply; 49+ messages in thread
From: Russell King @ 2005-04-17 20:58 UTC (permalink / raw
  To: Daniel Barkalow; +Cc: Petr Baudis, Paul Jackson, git

On Sun, Apr 17, 2005 at 04:03:46PM -0400, Daniel Barkalow wrote:
> Actually, the documentation I've got says:
> 
> "F_OK requests checking whether merely testing for the existence of the
>  file would be allowed (this depends on the permissions of the directories
>  in the path to the file, as given in path-name.)"
> 
> So it shouldn't complain about a filename which you're allowed to try to
> stat, even if there's nothing there. And it would depend on the privs of
> the wrong user in looking at the path.

Isn't it the case that with selinux, various objects may be hidden
depending on their accessibility?  I wonder if this has an effect
here.

(or what about any other security model?)

-- 
Russell King


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.1/5] Add merge-base
  2005-04-17 16:51     ` [2.1/5] " Daniel Barkalow
@ 2005-04-17 21:21       ` Petr Baudis
  2005-04-17 21:25         ` Daniel Barkalow
  0 siblings, 1 reply; 49+ messages in thread
From: Petr Baudis @ 2005-04-17 21:21 UTC (permalink / raw
  To: Daniel Barkalow; +Cc: git

Dear diary, on Sun, Apr 17, 2005 at 06:51:59PM CEST, I got a letter
where Daniel Barkalow <barkalow@iabervon.org> told me that...
> merge-base finds one of the best common ancestors of a pair of commits. In
> particular, it finds one of the ones which is fewest commits away from the
> further of the heads.
> 
> Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org>

Note that during merge with Linus (probably the most complicated I've
got so far, but still thankfully not too painful thanks to the rej
tool) I've decided to revert your merge-base in favour of Linus'
version. I did this mainly to make me merging Linus less awful; we
should probably clean it up first and decide which solution to go for in
the first place before possibly replacing it again, I think.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.1/5] Add merge-base
  2005-04-17 21:21       ` Petr Baudis
@ 2005-04-17 21:25         ` Daniel Barkalow
  0 siblings, 0 replies; 49+ messages in thread
From: Daniel Barkalow @ 2005-04-17 21:25 UTC (permalink / raw
  To: Petr Baudis; +Cc: git

On Sun, 17 Apr 2005, Petr Baudis wrote:

> Dear diary, on Sun, Apr 17, 2005 at 06:51:59PM CEST, I got a letter
> where Daniel Barkalow <barkalow@iabervon.org> told me that...
> > merge-base finds one of the best common ancestors of a pair of commits. In
> > particular, it finds one of the ones which is fewest commits away from the
> > further of the heads.
> > 
> > Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org>
> 
> Note that during merge with Linus (probably the most complicated I've
> got so far, but still thankfully not too painful thanks to the rej
> tool) I've decided to revert your merge-base in favour of Linus'
> version. I did this mainly to make me merging Linus less awful; we
> should probably clean it up first and decide which solution to go for in
> the first place before possibly replacing it again, I think.

Sure. I'm working on the rearrangement now.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 49+ messages in thread

* First ever real kernel git merge!
  2005-04-17 20:58             ` Russell King
@ 2005-04-17 22:10               ` Linus Torvalds
  0 siblings, 0 replies; 49+ messages in thread
From: Linus Torvalds @ 2005-04-17 22:10 UTC (permalink / raw
  To: Russell King; +Cc: Git Mailing List


It may not be pretty, but it seems to have worked fine!

Here's my history log (with intermediate checking removed - I was being
pretty anal ;):

	rsync -avz --ignore-existing master.kernel.org:/home/rmk/linux-2.6-rmk.git/ .git/
	rsync -avz --ignore-existing master.kernel.org:/home/rmk/linux-2.6-rmk.git/HEAD .git/MERGE-HEAD
	merge-base $(cat .git/HEAD) $(cat .git/MERGE-HEAD)
	for i in e7905b2f22eb5d5308c9122b9c06c2d02473dd4f $(cat .git/HEAD) $(cat .git/MERGE-HEAD); do cat-file commit $i | head -1; done
	read-tree -m cf9fd295d3048cd84c65d5e1a5a6b606bf4fddc6 9c78e08d12ae8189f3bd5e03accc39e3f08e45c9 a43c4447b2edc9fb01a6369f10c1165de4494c88
	write-tree 
	commit-tree 7792a93eddb3f9b8e3115daab8adb3030f258ce6 -p $(cat .git/HEAD) -p $(cat .git/MERGE-HEAD)
	echo 5fa17ec1c56589476c7c6a2712b10c81b3d5f85a > .git/HEAD 
	fsck-cache --unreachable 5fa17ec1c56589476c7c6a2712b10c81b3d5f85a

which looks really messy, because I really wanted to do each step slowly 
by hand, so those magic revision numbers are just cut-and-pasted from the 
results that all the previous stages had printed out.

NOTE! As expected, this merge had absolutely zero file-level clashes,
which is why I could just do the "read-tree -m" followed by a write-tree. 
But it's a real merge: I had some extra commits in my tree that were not
in Russell's tree, and obviously vice versa.

Also note! The end result is not actually written back to the corrent 
working directory, so to see what the merge result actually is, there's 
another final phase:

	read-tree 7792a93eddb3f9b8e3115daab8adb3030f258ce6
	update-cache --refresh
	checkout-cache -f -a

which just updates the current working directory to the results. I'm _not_
caring about old dirty state for now - the theory was to get this thing
working first, and worry about making it nice to use later.

A second note: a real "merge" thing should notice that if the "merge-base"  
output ends up being one of the inputs (it one side is a strict subset of
the other side), then the merge itself should never be done, and the
script should just update directly to which-ever is non-common HEAD.

But as far as I can tell, this really did work out correctly and 100% 
according to plan. As a result, if you update to my current tree, the 
top-of-tree commit should be:

	cat-file commit $(cat .git/HEAD)

	tree 7792a93eddb3f9b8e3115daab8adb3030f258ce6
	parent 8173055926cdb8534fbaed517a792bd45aed8377
	parent df4449813c900973841d0fa5a9e9bc7186956e1e
	author Linus Torvalds <torvalds@ppc970.osdl.org> 1113774444 -0700
	committer Linus Torvalds <torvalds@ppc970.osdl.org> 1113774444 -0700

	Merge with master.kernel.org:/home/rmk/linux-2.6-rmk.git - ARM changes

	First ever true git merge. Let's see if it actually works.

Yehaa! It did take basically zero time, btw. Except for my bunbling about,
and the first "rsync the objects from rmk's directory" part (which wasn't
horrible, it just wasn't instantaneous like the other phases).

Btw, to see the output, you really want to have a "git log" that sorts by 
date. I had an old "gitlog.sh" that did the old recursive thing, and while 
it shows the right thing, the ordering ended up making it be very 
non-obvious that rmk's changes had been added recently, since they ended 
up being at the very bottom.

			Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [4/5] Add option for hardlinkable cache of extracted blobs
  2005-04-17 19:59         ` Petr Baudis
  2005-04-17 20:03           ` Daniel Barkalow
@ 2005-04-18  1:20           ` Paul Jackson
  1 sibling, 0 replies; 49+ messages in thread
From: Paul Jackson @ 2005-04-18  1:20 UTC (permalink / raw
  To: Petr Baudis; +Cc: barkalow, git

Petr wrote:
> Does this distinction have any effect when doing F_OK?

Well, yeah.  If only one of real or effective id's could traverse the
path (execute perm on directories), then you'd get the wrong answer.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [4/5] Add option for hardlinkable cache of extracted blobs
  2005-04-17 20:03           ` Daniel Barkalow
  2005-04-17 20:18             ` Petr Baudis
  2005-04-17 20:58             ` Russell King
@ 2005-04-18  1:24             ` Paul Jackson
  2 siblings, 0 replies; 49+ messages in thread
From: Paul Jackson @ 2005-04-18  1:24 UTC (permalink / raw
  To: Daniel Barkalow; +Cc: pasky, git

> So it shouldn't complain about a filename which you're allowed to try to
> stat, even if there's nothing there.

I'm not sure what 'nothing there' means to you.

To me, it means 'no file there', so no you would not be allowed
to stat it - and should fail ENOENT.

> And it would depend on the privs of
> the wrong user in looking at the path.

Yup.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [4/5] Add option for hardlinkable cache of extracted blobs
  2005-04-17 20:18             ` Petr Baudis
@ 2005-04-18  1:35               ` Paul Jackson
  2005-04-18  1:48                 ` Petr Baudis
  0 siblings, 1 reply; 49+ messages in thread
From: Paul Jackson @ 2005-04-18  1:35 UTC (permalink / raw
  To: Petr Baudis; +Cc: barkalow, git

Petr wrote:
> The documentation I've got says:
> 
> "R_OK,  W_OK  and  X_OK request checking whether the file exists and has
>  read, write and execute permissions, respectively.  F_OK just requests
>  checking for the existence of the file."

You don't exactly say it, but I'm guessing that you think that this
documentation is stating that F_OK checks for the existance of the file
_regardless_ of path access permissions.

No so.  Write your own little test program, and/or read the kernel source.

Even if the file exists, if its directory entry is not accessible to the
_real_ uid/gid, access F_OK will fail.  If the problem is a lack of
seach permissions on some directory in the path, the errno will be
EACCES.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [4/5] Add option for hardlinkable cache of extracted blobs
  2005-04-18  1:35               ` Paul Jackson
@ 2005-04-18  1:48                 ` Petr Baudis
  2005-04-18  4:49                   ` Paul Jackson
  0 siblings, 1 reply; 49+ messages in thread
From: Petr Baudis @ 2005-04-18  1:48 UTC (permalink / raw
  To: Paul Jackson; +Cc: barkalow, git

Dear diary, on Mon, Apr 18, 2005 at 03:35:08AM CEST, I got a letter
where Paul Jackson <pj@sgi.com> told me that...
> Petr wrote:
> > The documentation I've got says:
> > 
> > "R_OK,  W_OK  and  X_OK request checking whether the file exists and has
> >  read, write and execute permissions, respectively.  F_OK just requests
> >  checking for the existence of the file."
> 
> You don't exactly say it, but I'm guessing that you think that this
> documentation is stating that F_OK checks for the existance of the file
> _regardless_ of path access permissions.
> 
> No so.  Write your own little test program, and/or read the kernel source.
> 
> Even if the file exists, if its directory entry is not accessible to the
> _real_ uid/gid, access F_OK will fail.  If the problem is a lack of
> seach permissions on some directory in the path, the errno will be
> EACCES.

Ok, I stand corrected; and when giving the access(2) manual page a
second look, it could imply that too. It has some room for more
crystal-clearness, though. ;-)

Thanks,

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [4/5] Add option for hardlinkable cache of extracted blobs
  2005-04-18  1:48                 ` Petr Baudis
@ 2005-04-18  4:49                   ` Paul Jackson
  0 siblings, 0 replies; 49+ messages in thread
From: Paul Jackson @ 2005-04-18  4:49 UTC (permalink / raw
  To: Petr Baudis; +Cc: barkalow, git

Pasky wrote:
> It has some room for more
> crystal-clearness, though. ;-)

True indeed ;).

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [3/5] Add http-pull
  2005-04-17 19:59             ` Petr Baudis
@ 2005-04-21  3:27               ` Brad Roberts
  2005-04-21  4:28                 ` Daniel Barkalow
  0 siblings, 1 reply; 49+ messages in thread
From: Brad Roberts @ 2005-04-21  3:27 UTC (permalink / raw
  To: Petr Baudis; +Cc: Daniel Barkalow, git

On Sun, 17 Apr 2005, Petr Baudis wrote:

> Date: Sun, 17 Apr 2005 21:59:00 +0200
> From: Petr Baudis <pasky@ucw.cz>
> To: Daniel Barkalow <barkalow@iabervon.org>
> Cc: git@vger.kernel.org
> Subject: Re: [3/5] Add http-pull
>
> Dear diary, on Sun, Apr 17, 2005 at 09:24:27PM CEST, I got a letter
> where Daniel Barkalow <barkalow@iabervon.org> told me that...
> > On Sun, 17 Apr 2005, Petr Baudis wrote:
> >
> > > Dear diary, on Sun, Apr 17, 2005 at 08:49:11PM CEST, I got a letter
> > > where Daniel Barkalow <barkalow@iabervon.org> told me that...
> > > > There's some trickiness for the history of commits thing for stopping at
> > > > the point where you have everything, but also behaving appropriately if
> > > > you try once, fail partway through, and then try again. It's on my queue
> > > > of things to think about.
> > >
> > > Can't you just stop the recursion when you hit a commit you already
> > > have?
> >
> > The problem is that, if you've fetched the final commit already, and then
> > the server dies, and you try again later, you already have the last one,
> > and so you think you've got everything.
>
> Hmm, some kind of journaling? ;-)

How about fetching in the inverse order.  Ie, deepest parents up towards
current.  With that method the repository is always self consistent, even
if not yet current.

Later,
Brad


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [3/5] Add http-pull
  2005-04-21  3:27               ` Brad Roberts
@ 2005-04-21  4:28                 ` Daniel Barkalow
  2005-04-21 22:05                   ` tony.luck
  0 siblings, 1 reply; 49+ messages in thread
From: Daniel Barkalow @ 2005-04-21  4:28 UTC (permalink / raw
  To: Brad Roberts; +Cc: Petr Baudis, git

On Wed, 20 Apr 2005, Brad Roberts wrote:

> How about fetching in the inverse order.  Ie, deepest parents up towards
> current.  With that method the repository is always self consistent, even
> if not yet current.

You don't know the deepest parents to fetch until you've read everything
more recent, since the history you'd have to walk is the history you're
downloading.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [3/5] Add http-pull
  2005-04-21  4:28                 ` Daniel Barkalow
@ 2005-04-21 22:05                   ` tony.luck
  2005-04-22 19:46                     ` Daniel Barkalow
  0 siblings, 1 reply; 49+ messages in thread
From: tony.luck @ 2005-04-21 22:05 UTC (permalink / raw
  To: Daniel Barkalow; +Cc: Brad Roberts, Petr Baudis, git

On Wed, 20 Apr 2005, Brad Roberts wrote:
> How about fetching in the inverse order.  Ie, deepest parents up towards
> current.  With that method the repository is always self consistent, even
> if not yet current.

Daniel Barkalow replied:
> You don't know the deepest parents to fetch until you've read everything
> more recent, since the history you'd have to walk is the history you're
> downloading.

You "just" need to defer adding tree/commit objects to the repository until
after you have inserted all objects on which they depend.  That's what my
"wget" based version does ... it's very crude, in that it loads all tree
& commit objects into a temporary repository (.gittmp) ... since you can
only use "cat-file" and "ls-tree" on things if they live in objects/xx/xxx..xxx
The blobs can go directly into the real repo (but to be really safe you'd
have to ensure that the whole blob had been pulled from the network before
inserting it ... it's probably a good move to validate everything that you
pull from the outside world too).

-Tony

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [3/5] Add http-pull
  2005-04-21 22:05                   ` tony.luck
@ 2005-04-22 19:46                     ` Daniel Barkalow
  2005-04-22 22:40                       ` Petr Baudis
  0 siblings, 1 reply; 49+ messages in thread
From: Daniel Barkalow @ 2005-04-22 19:46 UTC (permalink / raw
  To: tony.luck; +Cc: Brad Roberts, Petr Baudis, git

On Thu, 21 Apr 2005 tony.luck@intel.com wrote:

> On Wed, 20 Apr 2005, Brad Roberts wrote:
> > How about fetching in the inverse order.  Ie, deepest parents up towards
> > current.  With that method the repository is always self consistent, even
> > if not yet current.
> 
> Daniel Barkalow replied:
> > You don't know the deepest parents to fetch until you've read everything
> > more recent, since the history you'd have to walk is the history you're
> > downloading.
> 
> You "just" need to defer adding tree/commit objects to the repository until
> after you have inserted all objects on which they depend.  That's what my
> "wget" based version does ... it's very crude, in that it loads all tree
> & commit objects into a temporary repository (.gittmp) ... since you can
> only use "cat-file" and "ls-tree" on things if they live in objects/xx/xxx..xxx
> The blobs can go directly into the real repo (but to be really safe you'd
> have to ensure that the whole blob had been pulled from the network before
> inserting it ... it's probably a good move to validate everything that you
> pull from the outside world too).

The problem with this general scheme is that it means that you have to
start over if something goes wrong, rather than resuming from where you
left off (and being able to use what you got until then). I think a better
solution is to track what things you mean to have and what things you
expect you could get from where.

As for validation, I now have my programs (which I haven't gotten a chance
to send out recently) checking everything as it is downloaded to make sure
it is complete (zlib likes it) and has the correct hash.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [3/5] Add http-pull
  2005-04-22 19:46                     ` Daniel Barkalow
@ 2005-04-22 22:40                       ` Petr Baudis
  2005-04-22 23:00                         ` Daniel Barkalow
  0 siblings, 1 reply; 49+ messages in thread
From: Petr Baudis @ 2005-04-22 22:40 UTC (permalink / raw
  To: Daniel Barkalow; +Cc: tony.luck, Brad Roberts, git

Dear diary, on Fri, Apr 22, 2005 at 09:46:35PM CEST, I got a letter
where Daniel Barkalow <barkalow@iabervon.org> told me that...
> On Thu, 21 Apr 2005 tony.luck@intel.com wrote:
> 
> > On Wed, 20 Apr 2005, Brad Roberts wrote:
> > > How about fetching in the inverse order.  Ie, deepest parents up towards
> > > current.  With that method the repository is always self consistent, even
> > > if not yet current.
> > 
> > Daniel Barkalow replied:
> > > You don't know the deepest parents to fetch until you've read everything
> > > more recent, since the history you'd have to walk is the history you're
> > > downloading.
> > 
> > You "just" need to defer adding tree/commit objects to the repository until
> > after you have inserted all objects on which they depend.  That's what my
> > "wget" based version does ... it's very crude, in that it loads all tree
> > & commit objects into a temporary repository (.gittmp) ... since you can
> > only use "cat-file" and "ls-tree" on things if they live in objects/xx/xxx..xxx
> > The blobs can go directly into the real repo (but to be really safe you'd
> > have to ensure that the whole blob had been pulled from the network before
> > inserting it ... it's probably a good move to validate everything that you
> > pull from the outside world too).
> 
> The problem with this general scheme is that it means that you have to
> start over if something goes wrong, rather than resuming from where you
> left off (and being able to use what you got until then).

Huh. Why? You just go back to history until you find a commit you
already have. If you did it the way as Tony described, if you have that
commit, you can be sure that you have everything it depends on too.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [3/5] Add http-pull
  2005-04-22 22:40                       ` Petr Baudis
@ 2005-04-22 23:00                         ` Daniel Barkalow
  2005-04-22 23:08                           ` Petr Baudis
  0 siblings, 1 reply; 49+ messages in thread
From: Daniel Barkalow @ 2005-04-22 23:00 UTC (permalink / raw
  To: Petr Baudis; +Cc: tony.luck, Brad Roberts, git

On Sat, 23 Apr 2005, Petr Baudis wrote:

> Dear diary, on Fri, Apr 22, 2005 at 09:46:35PM CEST, I got a letter
> where Daniel Barkalow <barkalow@iabervon.org> told me that...
> 
> Huh. Why? You just go back to history until you find a commit you
> already have. If you did it the way as Tony described, if you have that
> commit, you can be sure that you have everything it depends on too.

But if you download 1000 files of the 1010 you need, and then your network
goes down, you will need to download those 1000 again when it comes back,
because you can't save them unless you have the full history. 

There's also no way to say, give me just the head and the tree associated
with it, let me check it out, next download the commit history so I can do
my merge most correctly, let me do that, finally download the intermediate
blobs and trees so that I can track down where something broke.

Ideally, you'd be able to put the latest head and tree into your database,
and it would know that you just hadn't gotten the ancestor yet, and would
be able to determine from your personal metadata (rather than based on
what you had or lacked) that you believe you have all ancestors of the
previous time you pulled, you don't want the trees that Linus merged in
midway through, but everything else you just don't have yet.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [3/5] Add http-pull
  2005-04-22 23:00                         ` Daniel Barkalow
@ 2005-04-22 23:08                           ` Petr Baudis
  2005-04-22 23:12                             ` Daniel Barkalow
  0 siblings, 1 reply; 49+ messages in thread
From: Petr Baudis @ 2005-04-22 23:08 UTC (permalink / raw
  To: Daniel Barkalow; +Cc: tony.luck, Brad Roberts, git

Dear diary, on Sat, Apr 23, 2005 at 01:00:33AM CEST, I got a letter
where Daniel Barkalow <barkalow@iabervon.org> told me that...
> On Sat, 23 Apr 2005, Petr Baudis wrote:
> 
> > Dear diary, on Fri, Apr 22, 2005 at 09:46:35PM CEST, I got a letter
> > where Daniel Barkalow <barkalow@iabervon.org> told me that...
> > 
> > Huh. Why? You just go back to history until you find a commit you
> > already have. If you did it the way as Tony described, if you have that
> > commit, you can be sure that you have everything it depends on too.
> 
> But if you download 1000 files of the 1010 you need, and then your network
> goes down, you will need to download those 1000 again when it comes back,
> because you can't save them unless you have the full history. 

Why can't I? I think I can do that perfectly fine. The worst thing that
can happen is that fsck-cache will complain a bit.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [3/5] Add http-pull
  2005-04-22 23:08                           ` Petr Baudis
@ 2005-04-22 23:12                             ` Daniel Barkalow
  2005-04-22 23:24                               ` Martin Schlemmer
  0 siblings, 1 reply; 49+ messages in thread
From: Daniel Barkalow @ 2005-04-22 23:12 UTC (permalink / raw
  To: Petr Baudis; +Cc: tony.luck, Brad Roberts, git

On Sat, 23 Apr 2005, Petr Baudis wrote:

> Dear diary, on Sat, Apr 23, 2005 at 01:00:33AM CEST, I got a letter
> where Daniel Barkalow <barkalow@iabervon.org> told me that...
> > On Sat, 23 Apr 2005, Petr Baudis wrote:
> > 
> > > Dear diary, on Fri, Apr 22, 2005 at 09:46:35PM CEST, I got a letter
> > > where Daniel Barkalow <barkalow@iabervon.org> told me that...
> > > 
> > > Huh. Why? You just go back to history until you find a commit you
> > > already have. If you did it the way as Tony described, if you have that
> > > commit, you can be sure that you have everything it depends on too.
> > 
> > But if you download 1000 files of the 1010 you need, and then your network
> > goes down, you will need to download those 1000 again when it comes back,
> > because you can't save them unless you have the full history. 
> 
> Why can't I? I think I can do that perfectly fine. The worst thing that
> can happen is that fsck-cache will complain a bit.

Not if you're using the fact that you don't have them to tell you that you
still need the other 10, which is what tony's scheme would do.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [3/5] Add http-pull
  2005-04-22 23:12                             ` Daniel Barkalow
@ 2005-04-22 23:24                               ` Martin Schlemmer
  0 siblings, 0 replies; 49+ messages in thread
From: Martin Schlemmer @ 2005-04-22 23:24 UTC (permalink / raw
  To: Daniel Barkalow; +Cc: Petr Baudis, tony.luck, Brad Roberts, git

[-- Attachment #1: Type: text/plain, Size: 1444 bytes --]

On Fri, 2005-04-22 at 19:12 -0400, Daniel Barkalow wrote:
> On Sat, 23 Apr 2005, Petr Baudis wrote:
> 
> > Dear diary, on Sat, Apr 23, 2005 at 01:00:33AM CEST, I got a letter
> > where Daniel Barkalow <barkalow@iabervon.org> told me that...
> > > On Sat, 23 Apr 2005, Petr Baudis wrote:
> > > 
> > > > Dear diary, on Fri, Apr 22, 2005 at 09:46:35PM CEST, I got a letter
> > > > where Daniel Barkalow <barkalow@iabervon.org> told me that...
> > > > 
> > > > Huh. Why? You just go back to history until you find a commit you
> > > > already have. If you did it the way as Tony described, if you have that
> > > > commit, you can be sure that you have everything it depends on too.
> > > 
> > > But if you download 1000 files of the 1010 you need, and then your network
> > > goes down, you will need to download those 1000 again when it comes back,
> > > because you can't save them unless you have the full history. 
> > 
> > Why can't I? I think I can do that perfectly fine. The worst thing that
> > can happen is that fsck-cache will complain a bit.
> 
> Not if you're using the fact that you don't have them to tell you that you
> still need the other 10, which is what tony's scheme would do.
> 

Any way (like maybe extending one of the web interfaces already around)
to first get a list of all the sha1's you need, and then starting from
the bottom like Tony/Petr wants you to do?


-- 
Martin Schlemmer


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2005-04-22 23:16 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20050417144947.GG1487@pasky.ji.cz>
2005-04-17 15:20 ` [0/5] Patch set for various things Daniel Barkalow
2005-04-17 15:24   ` [1/5] Parsing code in revision.h Daniel Barkalow
2005-04-17 16:09     ` Petr Baudis
2005-04-17 16:44       ` Daniel Barkalow
2005-04-17 18:18     ` [1/5] " Linus Torvalds
2005-04-17 18:30       ` Petr Baudis
2005-04-17 19:25         ` Linus Torvalds
2005-04-17 19:45           ` Daniel Barkalow
2005-04-17 19:54             ` Linus Torvalds
2005-04-17 20:06               ` Linus Torvalds
2005-04-17 20:22                 ` Daniel Barkalow
2005-04-17 19:09       ` Daniel Barkalow
2005-04-17 15:27   ` [2/5] Add merge-base Daniel Barkalow
2005-04-17 16:01     ` Petr Baudis
2005-04-17 16:36       ` Daniel Barkalow
2005-04-17 16:51     ` [2.1/5] " Daniel Barkalow
2005-04-17 21:21       ` Petr Baudis
2005-04-17 21:25         ` Daniel Barkalow
2005-04-17 15:31   ` [3/5] Add http-pull Daniel Barkalow
2005-04-17 18:10     ` Petr Baudis
2005-04-17 18:49       ` Daniel Barkalow
2005-04-17 19:08         ` Petr Baudis
2005-04-17 19:24           ` Daniel Barkalow
2005-04-17 19:59             ` Petr Baudis
2005-04-21  3:27               ` Brad Roberts
2005-04-21  4:28                 ` Daniel Barkalow
2005-04-21 22:05                   ` tony.luck
2005-04-22 19:46                     ` Daniel Barkalow
2005-04-22 22:40                       ` Petr Baudis
2005-04-22 23:00                         ` Daniel Barkalow
2005-04-22 23:08                           ` Petr Baudis
2005-04-22 23:12                             ` Daniel Barkalow
2005-04-22 23:24                               ` Martin Schlemmer
2005-04-17 18:58     ` [3.1/5] " Daniel Barkalow
2005-04-17 15:35   ` [4/5] Add option for hardlinkable cache of extracted blobs Daniel Barkalow
2005-04-17 17:47     ` Petr Baudis
2005-04-17 18:54       ` Daniel Barkalow
2005-04-17 19:25       ` Paul Jackson
2005-04-17 19:59         ` Petr Baudis
2005-04-17 20:03           ` Daniel Barkalow
2005-04-17 20:18             ` Petr Baudis
2005-04-18  1:35               ` Paul Jackson
2005-04-18  1:48                 ` Petr Baudis
2005-04-18  4:49                   ` Paul Jackson
2005-04-17 20:58             ` Russell King
2005-04-17 22:10               ` First ever real kernel git merge! Linus Torvalds
2005-04-18  1:24             ` [4/5] Add option for hardlinkable cache of extracted blobs Paul Jackson
2005-04-18  1:20           ` Paul Jackson
2005-04-17 15:37   ` [5/5] Add commit-id to version Daniel Barkalow

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).