git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] Controversial blob munging series
@ 2007-04-22  6:08 Junio C Hamano
  2007-04-22  6:08 ` [PATCH 1/2] Add 'filter' attribute and external filter driver definition Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Junio C Hamano @ 2007-04-22  6:08 UTC (permalink / raw
  To: git

This is on top of 'next' I'll push out after I am done with
v1.5.1.2 I am preparing today.

[1/2] Add 'filter' attribute and external filter driver definition.
[2/2] Add 'ident' conversion.

I'll park them near the tip of 'pu', but consider they are
primarily for interested people to experiment with.

I suspect this might have helped me (and other Asians) a year
ago.  I did not manage to configure my Emacs to work well with
utf-8 encoded Japanese text, and had some difficulties in
maintaining documentation for git-lost-found (it has my name
spelled in Japanese).

I could have had:

	(in .git/info/attributes)
	Documentation/git-lost-found.txt filter=eucjp-n-utf8

	(in config)
	[filter "eucjp-n-utf8"]
		clean  = nkf -E -w
		smudge = nkf -W -e

so that checked-out copy is after "nkf -W -e" (filter to EUC-JP,
treating the input as UTF-8) to allow my Emacs work with EUC-JP.
Check-in will be done after "nkf -E -w" (filter to UTF-8,
treating the input as EUC-JP), which would keep the "official"
version in the repository as UTF-8.  The best part of this is
that the configurations above are both private to me, and people
do not even have to know that I am suffering from the inability
to use UTF-8 in my editor.

These days I configured my Emacs to deal with UTF-8 much better
than when I added git-lost-found manual page, so I would not
need the above hack, though.

I also suspect a "fun but probably not very useful in practice"
application would be to have "indent" as the clean filter while
leaving "smudge" filter empty.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] Add 'filter' attribute and external filter driver definition.
  2007-04-22  6:08 [PATCH 0/2] Controversial blob munging series Junio C Hamano
@ 2007-04-22  6:08 ` Junio C Hamano
  2007-04-22  6:08 ` [PATCH 2/2] Add 'ident' conversion Junio C Hamano
  2007-04-23 13:50 ` [PATCH 0/2] Controversial blob munging series Johannes Schindelin
  2 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2007-04-22  6:08 UTC (permalink / raw
  To: git

The interface is similar to the custom low-level merge drivers.

First you configure your filter driver by defining 'filter.<name>.*'
variables in the configuration.

	filter.<name>.clean	filter command to run upon checkin
	filter.<name>.smudge	filter command to run upon checkout

Then you assign filter attribute to each path, whose name
matches the custom filter driver's name.

Example:

	(in .gitattributes)
	*.c	filter=indent

	(in config)
	[filter "indent"]
		clean = indent
		smudge = cat

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
 convert.c         |  259 +++++++++++++++++++++++++++++++++++++++++++++++++++--
 t/t0021-filter.sh |   36 ++++++++
 2 files changed, 286 insertions(+), 9 deletions(-)
 create mode 100755 t/t0021-filter.sh

diff --git a/convert.c b/convert.c
index 37239ac..1fa8f2a 100644
--- a/convert.c
+++ b/convert.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "attr.h"
+#include "run-command.h"
 
 /*
  * convert.c - convert a file when checking it out and checking it in.
@@ -200,18 +201,214 @@ static char *crlf_to_worktree(const char *path, const char *src, unsigned long *
 	return buffer;
 }
 
+static int filter_buffer(const char *path, const char *src,
+			 unsigned long size, const char *cmd)
+{
+	/*
+	 * Spawn cmd and feed the buffer contents through its stdin.
+	 */
+	struct child_process child_process;
+	int pipe_feed[2];
+	int write_err, status;
+
+	memset(&child_process, 0, sizeof(child_process));
+
+	if (pipe(pipe_feed) < 0) {
+		error("cannot create pipe to run external filter %s", cmd);
+		return 1;
+	}
+
+	child_process.pid = fork();
+	if (child_process.pid < 0) {
+		error("cannot fork to run external filter %s", cmd);
+		close(pipe_feed[0]);
+		close(pipe_feed[1]);
+		return 1;
+	}
+	if (!child_process.pid) {
+		dup2(pipe_feed[0], 0);
+		close(pipe_feed[0]);
+		close(pipe_feed[1]);
+		execlp(cmd, cmd, NULL);
+		return 1;
+	}
+	close(pipe_feed[0]);
+	close(1);
+
+	write_err = (write_in_full(pipe_feed[1], src, size) < 0);
+	if (close(pipe_feed[1]))
+		write_err = 1;
+	if (write_err)
+		error("cannot feed the input to external filter %s", cmd);
+
+	status = finish_command(&child_process);
+	if (status)
+		error("external filter %s failed %d", cmd, -status);
+	return (write_err || status);
+}
+
+static char *apply_filter(const char *path, const char *src,
+			  unsigned long *sizep, const char *cmd)
+{
+	/*
+	 * Create a pipeline to have the command filter the buffer's
+	 * contents.
+	 *
+	 * (child --> cmd) --> us
+	 */
+	const int SLOP = 4096;
+	int pipe_feed[2];
+	int status;
+	char *dst;
+	unsigned long dstsize, dstalloc;
+	struct child_process child_process;
+
+	if (!cmd)
+		return NULL;
+
+	memset(&child_process, 0, sizeof(child_process));
+
+	if (pipe(pipe_feed) < 0) {
+		error("cannot create pipe to run external filter %s", cmd);
+		return NULL;
+	}
+
+	child_process.pid = fork();
+	if (child_process.pid < 0) {
+		error("cannot fork to run external filter %s", cmd);
+		close(pipe_feed[0]);
+		close(pipe_feed[1]);
+		return NULL;
+	}
+	if (!child_process.pid) {
+		dup2(pipe_feed[1], 1);
+		close(pipe_feed[0]);
+		close(pipe_feed[1]);
+		exit(filter_buffer(path, src, *sizep, cmd));
+	}
+	close(pipe_feed[1]);
+
+	dstalloc = *sizep;
+	dst = xmalloc(dstalloc);
+	dstsize = 0;
+
+	while (1) {
+		ssize_t numread = xread(pipe_feed[0], dst + dstsize,
+					dstalloc - dstsize);
+
+		if (numread <= 0) {
+			if (!numread)
+				break;
+			error("read from external filter %s failed", cmd);
+			free(dst);
+			dst = NULL;
+			break;
+		}
+		dstsize += numread;
+		if (dstalloc <= dstsize + SLOP) {
+			dstalloc = dstsize + SLOP;
+			dst = xrealloc(dst, dstalloc);
+		}
+	}
+	if (close(pipe_feed[0])) {
+		error("read from external filter %s failed", cmd);
+		free(dst);
+		dst = NULL;
+	}
+
+	status = finish_command(&child_process);
+	if (status) {
+		error("external filter %s failed %d", cmd, -status);
+		free(dst);
+		dst = NULL;
+	}
+
+	if (dst)
+		*sizep = dstsize;
+	return dst;
+}
+
+static struct convert_driver {
+	const char *name;
+	struct convert_driver *next;
+	char *smudge;
+	char *clean;
+} *user_convert, **user_convert_tail;
+
+static int read_convert_config(const char *var, const char *value)
+{
+	const char *ep, *name;
+	int namelen;
+	struct convert_driver *drv;
+
+	/*
+	 * External conversion drivers are configured using
+	 * "filter.<name>.variable".
+	 */
+	if (prefixcmp(var, "filter.") || (ep = strrchr(var, '.')) == var + 6)
+		return 0;
+	name = var + 7;
+	namelen = ep - name;
+	for (drv = user_convert; drv; drv = drv->next)
+		if (!strncmp(drv->name, name, namelen) && !drv->name[namelen])
+			break;
+	if (!drv) {
+		char *namebuf;
+		drv = xcalloc(1, sizeof(struct convert_driver));
+		namebuf = xmalloc(namelen + 1);
+		memcpy(namebuf, name, namelen);
+		namebuf[namelen] = 0;
+		drv->name = namebuf;
+		drv->next = NULL;
+		*user_convert_tail = drv;
+		user_convert_tail = &(drv->next);
+	}
+
+	ep++;
+
+	/*
+	 * filter.<name>.smudge and filter.<name>.clean specifies
+	 * the command line:
+	 *
+	 *	command-line
+	 *
+	 * The command-line will not be interpolated in any way.
+	 */
+
+	if (!strcmp("smudge", ep)) {
+		if (!value)
+			return error("%s: lacks value", var);
+		drv->smudge = strdup(value);
+		return 0;
+	}
+
+	if (!strcmp("clean", ep)) {
+		if (!value)
+			return error("%s: lacks value", var);
+		drv->clean = strdup(value);
+		return 0;
+	}
+	return 0;
+}
+
 static void setup_convert_check(struct git_attr_check *check)
 {
 	static struct git_attr *attr_crlf;
+	static struct git_attr *attr_filter;
 
-	if (!attr_crlf)
+	if (!attr_crlf) {
 		attr_crlf = git_attr("crlf", 4);
-	check->attr = attr_crlf;
+		attr_filter = git_attr("filter", 6);
+		user_convert_tail = &user_convert;
+		git_config(read_convert_config);
+	}
+	check[0].attr = attr_crlf;
+	check[1].attr = attr_filter;
 }
 
 static int git_path_check_crlf(const char *path, struct git_attr_check *check)
 {
-	const char *value = check->value;
+	const char *value = check[0].value;
 
 	if (ATTR_TRUE(value))
 		return CRLF_TEXT;
@@ -224,26 +421,70 @@ static int git_path_check_crlf(const char *path, struct git_attr_check *check)
 	return CRLF_GUESS;
 }
 
+static struct convert_driver *git_path_check_convert(const char *path,
+					     struct git_attr_check *check)
+{
+	const char *value = check[1].value;
+	struct convert_driver *drv;
+
+	if (ATTR_TRUE(value) || ATTR_FALSE(value) || ATTR_UNSET(value))
+		return NULL;
+	for (drv = user_convert; drv; drv = drv->next)
+		if (!strcmp(value, drv->name))
+			return drv;
+	return NULL;
+}
+
 char *convert_to_git(const char *path, const char *src, unsigned long *sizep)
 {
-	struct git_attr_check check[1];
+	struct git_attr_check check[2];
 	int crlf = CRLF_GUESS;
+	char *filter = NULL;
+	char *buf, *buf2;
 
 	setup_convert_check(check);
-	if (!git_checkattr(path, 1, check)) {
+	if (!git_checkattr(path, 2, check)) {
+		struct convert_driver *drv;
 		crlf = git_path_check_crlf(path, check);
+		drv = git_path_check_convert(path, check);
+		if (drv && drv->clean)
+			filter = drv->clean;
 	}
-	return crlf_to_git(path, src, sizep, crlf);
+
+	buf = apply_filter(path, src, sizep, filter);
+
+	buf2 = crlf_to_git(path, buf ? buf : src, sizep, crlf);
+	if (buf2) {
+		free(buf);
+		buf = buf2;
+	}
+
+	return buf;
 }
 
 char *convert_to_working_tree(const char *path, const char *src, unsigned long *sizep)
 {
-	struct git_attr_check check[1];
+	struct git_attr_check check[2];
 	int crlf = CRLF_GUESS;
+	char *filter = NULL;
+	char *buf, *buf2;
 
 	setup_convert_check(check);
-	if (!git_checkattr(path, 1, check)) {
+	if (!git_checkattr(path, 2, check)) {
+		struct convert_driver *drv;
 		crlf = git_path_check_crlf(path, check);
+		drv = git_path_check_convert(path, check);
+		if (drv && drv->smudge)
+			filter = drv->smudge;
 	}
-	return crlf_to_worktree(path, src, sizep, crlf);
+
+	buf = crlf_to_worktree(path, src, sizep, crlf);
+
+	buf2 = apply_filter(path, buf ? buf : src, sizep, filter);
+	if (buf2) {
+		free(buf);
+		buf = buf2;
+	}
+
+	return buf;
 }
diff --git a/t/t0021-filter.sh b/t/t0021-filter.sh
new file mode 100755
index 0000000..d6bb1c6
--- /dev/null
+++ b/t/t0021-filter.sh
@@ -0,0 +1,36 @@
+#!/bin/sh
+
+test_description='external filter conversion'
+
+. ./test-lib.sh
+
+cat <<\EOF >rot13.sh
+tr '[a-zA-Z]' '[n-za-mN-ZA-M]'
+EOF
+chmod +x rot13.sh
+
+test_expect_success setup '
+	git config filter.rot13.smudge ./rot13.sh &&
+	git config filter.rot13.clean ./rot13.sh &&
+
+	echo "*.t filter=rot13" >.gitattributes &&
+
+	{
+	    echo a b c d e f g h i j k l m
+	    echo n o p q r s t u v w x y z
+	} >test &&
+	cat test >test.t &&
+	cat test >test.o &&
+	git add test test.t &&
+	rm -f test test.t &&
+	git checkout -- test test.t
+'
+
+test_expect_success check '
+
+	cmp test.o test &&
+	cmp test.o test.t
+
+'
+
+test_done
-- 
1.5.1.2.919.g280f4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] Add 'ident' conversion.
  2007-04-22  6:08 [PATCH 0/2] Controversial blob munging series Junio C Hamano
  2007-04-22  6:08 ` [PATCH 1/2] Add 'filter' attribute and external filter driver definition Junio C Hamano
@ 2007-04-22  6:08 ` Junio C Hamano
  2007-04-23 13:50 ` [PATCH 0/2] Controversial blob munging series Johannes Schindelin
  2 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2007-04-22  6:08 UTC (permalink / raw
  To: git

The 'ident' attribute set to path squashes "$ident:<any bytes
except dollor sign>$" to "$ident$" upon checkin, and expands it
to "$ident: <blob SHA-1> $" upon checkout.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
 convert.c         |  186 +++++++++++++++++++++++++++++++++++++++++++++++++---
 t/t0021-filter.sh |   22 +++++--
 2 files changed, 192 insertions(+), 16 deletions(-)

diff --git a/convert.c b/convert.c
index 1fa8f2a..0b85dd7 100644
--- a/convert.c
+++ b/convert.c
@@ -395,20 +395,161 @@ static void setup_convert_check(struct git_attr_check *check)
 {
 	static struct git_attr *attr_crlf;
 	static struct git_attr *attr_filter;
+	static struct git_attr *attr_ident;
 
 	if (!attr_crlf) {
 		attr_crlf = git_attr("crlf", 4);
 		attr_filter = git_attr("filter", 6);
+		attr_ident = git_attr("ident", 5);
 		user_convert_tail = &user_convert;
 		git_config(read_convert_config);
 	}
 	check[0].attr = attr_crlf;
 	check[1].attr = attr_filter;
+	check[2].attr = attr_ident;
+}
+
+static int count_ident(const char *cp, unsigned long size)
+{
+	/*
+	 * "$ident: 0000000000000000000000000000000000000000 $" <=> "$ident$"
+	 */
+	int cnt = 0;
+	char ch;
+
+	while (size) {
+		ch = *cp++;
+		size--;
+		if (ch != '$')
+			continue;
+		if (size < 6)
+			break;
+		if (memcmp("ident", cp, 5))
+			continue;
+		ch = cp[5];
+		cp += 6;
+		size -= 6;
+		if (ch == '$')
+			cnt++; /* $ident$ */
+		if (ch != ':')
+			continue;
+
+		/*
+		 * "$ident: ... "; scan up to the closing dollar sign and discard.
+		 */
+		while (size) {
+			ch = *cp++;
+			size--;
+			if (ch == '$') {
+				cnt++;
+				break;
+			}
+		}
+	}
+	return cnt;
+}
+
+static char *ident_to_git(const char *path, const char *src, unsigned long *sizep, int ident)
+{
+	int cnt;
+	unsigned long size;
+	char *dst, *buf;
+
+	if (!ident)
+		return NULL;
+	size = *sizep;
+	cnt = count_ident(src, size);
+	if (!cnt)
+		return NULL;
+	buf = xmalloc(size);
+
+	for (dst = buf; size; size--) {
+		char ch = *src++;
+		*dst++ = ch;
+		if ((ch == '$') && (6 <= size) &&
+		    !memcmp("ident:", src, 6)) {
+			unsigned long rem = size - 6;
+			const char *cp = src + 6;
+			do {
+				ch = *cp++;
+				if (ch == '$')
+					break;
+				rem--;
+			} while (rem);
+			if (!rem)
+				continue;
+			memcpy(dst, "ident$", 6);
+			dst += 6;
+			size -= (cp - src);
+			src = cp;
+		}
+	}
+
+	*sizep = dst - buf;
+	return buf;
+}
+
+static char *ident_to_worktree(const char *path, const char *src, unsigned long *sizep, int ident)
+{
+	int cnt;
+	unsigned long size;
+	char *dst, *buf;
+	unsigned char sha1[20];
+
+	if (!ident)
+		return NULL;
+
+	size = *sizep;
+	cnt = count_ident(src, size);
+	if (!cnt)
+		return NULL;
+
+	hash_sha1_file(src, size, "blob", sha1);
+	buf = xmalloc(size + cnt * 43);
+
+	for (dst = buf; size; size--) {
+		const char *cp;
+		char ch = *src++;
+		*dst++ = ch;
+		if ((ch != '$') || (size < 6) || memcmp("ident", src, 5))
+			continue;
+
+		if (src[5] == ':') {
+			/* discard up to but not including the closing $ */
+			unsigned long rem = size - 6;
+			cp = src + 6;
+			do {
+				ch = *cp++;
+				if (ch == '$')
+					break;
+				rem--;
+			} while (rem);
+			if (!rem)
+				continue;
+			size -= (cp - src);
+		} else if (src[5] == '$')
+			cp = src + 5;
+		else
+			continue;
+
+		memcpy(dst, "ident: ", 7);
+		dst += 7;
+		memcpy(dst, sha1_to_hex(sha1), 40);
+		dst += 40;
+		*dst++ = ' ';
+		size -= (cp - src);
+		src = cp;
+		*dst++ = *src++;
+		size--;
+	}
+
+	*sizep = dst - buf;
+	return buf;
 }
 
 static int git_path_check_crlf(const char *path, struct git_attr_check *check)
 {
-	const char *value = check[0].value;
+	const char *value = check->value;
 
 	if (ATTR_TRUE(value))
 		return CRLF_TEXT;
@@ -424,7 +565,7 @@ static int git_path_check_crlf(const char *path, struct git_attr_check *check)
 static struct convert_driver *git_path_check_convert(const char *path,
 					     struct git_attr_check *check)
 {
-	const char *value = check[1].value;
+	const char *value = check->value;
 	struct convert_driver *drv;
 
 	if (ATTR_TRUE(value) || ATTR_FALSE(value) || ATTR_UNSET(value))
@@ -435,20 +576,29 @@ static struct convert_driver *git_path_check_convert(const char *path,
 	return NULL;
 }
 
+static int git_path_check_ident(const char *path, struct git_attr_check *check)
+{
+	const char *value = check->value;
+
+	return !!ATTR_TRUE(value);
+}
+
 char *convert_to_git(const char *path, const char *src, unsigned long *sizep)
 {
-	struct git_attr_check check[2];
+	struct git_attr_check check[3];
 	int crlf = CRLF_GUESS;
+	int ident = 0;
 	char *filter = NULL;
 	char *buf, *buf2;
 
 	setup_convert_check(check);
-	if (!git_checkattr(path, 2, check)) {
+	if (!git_checkattr(path, 3, check)) {
 		struct convert_driver *drv;
-		crlf = git_path_check_crlf(path, check);
-		drv = git_path_check_convert(path, check);
+		crlf = git_path_check_crlf(path, check + 0);
+		drv = git_path_check_convert(path, check + 1);
 		if (drv && drv->clean)
 			filter = drv->clean;
+		ident = git_path_check_ident(path, check + 2);
 	}
 
 	buf = apply_filter(path, src, sizep, filter);
@@ -459,26 +609,40 @@ char *convert_to_git(const char *path, const char *src, unsigned long *sizep)
 		buf = buf2;
 	}
 
+	buf2 = ident_to_git(path, buf ? buf : src, sizep, ident);
+	if (buf2) {
+		free(buf);
+		buf = buf2;
+	}
+
 	return buf;
 }
 
 char *convert_to_working_tree(const char *path, const char *src, unsigned long *sizep)
 {
-	struct git_attr_check check[2];
+	struct git_attr_check check[3];
 	int crlf = CRLF_GUESS;
+	int ident = 0;
 	char *filter = NULL;
 	char *buf, *buf2;
 
 	setup_convert_check(check);
-	if (!git_checkattr(path, 2, check)) {
+	if (!git_checkattr(path, 3, check)) {
 		struct convert_driver *drv;
-		crlf = git_path_check_crlf(path, check);
-		drv = git_path_check_convert(path, check);
+		crlf = git_path_check_crlf(path, check + 0);
+		drv = git_path_check_convert(path, check + 1);
 		if (drv && drv->smudge)
 			filter = drv->smudge;
+		ident = git_path_check_ident(path, check + 2);
 	}
 
-	buf = crlf_to_worktree(path, src, sizep, crlf);
+	buf = ident_to_worktree(path, src, sizep, ident);
+
+	buf2 = crlf_to_worktree(path, buf ? buf : src, sizep, crlf);
+	if (buf2) {
+		free(buf);
+		buf = buf2;
+	}
 
 	buf2 = apply_filter(path, buf ? buf : src, sizep, filter);
 	if (buf2) {
diff --git a/t/t0021-filter.sh b/t/t0021-filter.sh
index d6bb1c6..725a425 100755
--- a/t/t0021-filter.sh
+++ b/t/t0021-filter.sh
@@ -13,24 +13,36 @@ test_expect_success setup '
 	git config filter.rot13.smudge ./rot13.sh &&
 	git config filter.rot13.clean ./rot13.sh &&
 
-	echo "*.t filter=rot13" >.gitattributes &&
+	{
+	    echo "*.t filter=rot13"
+	    echo "*.i ident"
+	} >.gitattributes &&
 
 	{
 	    echo a b c d e f g h i j k l m
 	    echo n o p q r s t u v w x y z
+	    echo '\''$ident$'\''
 	} >test &&
 	cat test >test.t &&
 	cat test >test.o &&
-	git add test test.t &&
-	rm -f test test.t &&
-	git checkout -- test test.t
+	cat test >test.i &&
+	git add test test.t test.i &&
+	rm -f test test.t test.i &&
+	git checkout -- test test.t test.i
 '
 
+script='s/^\$ident: \([0-9a-f]*\) \$/\1/p'
+
 test_expect_success check '
 
 	cmp test.o test &&
-	cmp test.o test.t
+	cmp test.o test.t &&
 
+	# ident should be stripped in the repository
+	git diff --raw --exit-code :test :test.i &&
+	id=$(git rev-parse --verify :test) &&
+	embedded=$(sed -ne "$script" test.i) &&
+	test "z$id" = "z$embedded"
 '
 
 test_done
-- 
1.5.1.2.919.g280f4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Controversial blob munging series
  2007-04-22  6:08 [PATCH 0/2] Controversial blob munging series Junio C Hamano
  2007-04-22  6:08 ` [PATCH 1/2] Add 'filter' attribute and external filter driver definition Junio C Hamano
  2007-04-22  6:08 ` [PATCH 2/2] Add 'ident' conversion Junio C Hamano
@ 2007-04-23 13:50 ` Johannes Schindelin
  2007-04-23 16:29   ` Julian Phillips
  2007-04-23 17:13   ` Junio C Hamano
  2 siblings, 2 replies; 10+ messages in thread
From: Johannes Schindelin @ 2007-04-23 13:50 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

Hi,

On Sat, 21 Apr 2007, Junio C Hamano wrote:

> This is on top of 'next' I'll push out after I am done with
> v1.5.1.2 I am preparing today.
> 
> [1/2] Add 'filter' attribute and external filter driver definition.
> [2/2] Add 'ident' conversion.

I think this is great work! And it is useful, too. Let me describe a usage 
scenario I have in mind.

Being stuck with Pine, which still does not do Maildir, and wanting 
to be able to read my mails as distributed as I am working on documents 
and software projects, I always dreamt of having all my mail in Git.

With filters, it should be relatively easy to do that. Before checking in, 
the individual mailbox files are split, the contents are put into the 
object database, and the mailbox file is replaced by a text file 
consisting of the SHA1s of the mails.

Ideally, I would eventually not only teach Pine to understand Maildir 
format, but read and store the mails in a Git backend. Alas, I am way too 
lazy for that.

So, with filters I'd do the cheap and easy thing.

You may not be able to appreciate the advantages of my scenario, but this 
kind of flexibility is what makes Git so useful.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Controversial blob munging series
  2007-04-23 13:50 ` [PATCH 0/2] Controversial blob munging series Johannes Schindelin
@ 2007-04-23 16:29   ` Julian Phillips
  2007-04-23 16:54     ` Johannes Schindelin
  2007-04-23 17:13   ` Junio C Hamano
  1 sibling, 1 reply; 10+ messages in thread
From: Julian Phillips @ 2007-04-23 16:29 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: Junio C Hamano, git

On Mon, 23 Apr 2007, Johannes Schindelin wrote:

> Being stuck with Pine, which still does not do Maildir, and wanting
> to be able to read my mails as distributed as I am working on documents
> and software projects, I always dreamt of having all my mail in Git.
>
> With filters, it should be relatively easy to do that. Before checking in,
> the individual mailbox files are split, the contents are put into the
> object database, and the mailbox file is replaced by a text file
> consisting of the SHA1s of the mails.
>
> Ideally, I would eventually not only teach Pine to understand Maildir
> format, but read and store the mails in a Git backend. Alas, I am way too
> lazy for that.

You do know about Eduardo Chappa's patches for pine?  In particular 
http://staff.washington.edu/chappa/pine/info/maildir.html.

-- 
Julian

  ---
Windows contains FAT.
Use Linux -- you won't ever have to worry about your weight.

    -- Ewout Stam

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Controversial blob munging series
  2007-04-23 16:29   ` Julian Phillips
@ 2007-04-23 16:54     ` Johannes Schindelin
  0 siblings, 0 replies; 10+ messages in thread
From: Johannes Schindelin @ 2007-04-23 16:54 UTC (permalink / raw
  To: Julian Phillips; +Cc: Junio C Hamano, git

Hi,

On Mon, 23 Apr 2007, Julian Phillips wrote:

> On Mon, 23 Apr 2007, Johannes Schindelin wrote:
> 
> > Being stuck with Pine, which still does not do Maildir, and wanting
> > to be able to read my mails as distributed as I am working on documents
> > and software projects, I always dreamt of having all my mail in Git.
> >
> > With filters, it should be relatively easy to do that. Before checking in,
> > the individual mailbox files are split, the contents are put into the
> > object database, and the mailbox file is replaced by a text file
> > consisting of the SHA1s of the mails.
> >
> > Ideally, I would eventually not only teach Pine to understand Maildir
> > format, but read and store the mails in a Git backend. Alas, I am way too
> > lazy for that.
> 
> You do know about Eduardo Chappa's patches for pine?  In particular
> http://staff.washington.edu/chappa/pine/info/maildir.html.

I did not.

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Controversial blob munging series
  2007-04-23 13:50 ` [PATCH 0/2] Controversial blob munging series Johannes Schindelin
  2007-04-23 16:29   ` Julian Phillips
@ 2007-04-23 17:13   ` Junio C Hamano
  2007-04-23 17:35     ` Johannes Schindelin
  1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2007-04-23 17:13 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Sat, 21 Apr 2007, Junio C Hamano wrote:
>
>> This is on top of 'next' I'll push out after I am done with
>> v1.5.1.2 I am preparing today.
>> 
>> [1/2] Add 'filter' attribute and external filter driver definition.
>> [2/2] Add 'ident' conversion.
>
> I think this is great work! And it is useful, too. Let me describe a usage 
> scenario I have in mind.
>
> Being stuck with Pine, which still does not do Maildir, and wanting 
> to be able to read my mails as distributed as I am working on documents 
> and software projects, I always dreamt of having all my mail in Git.
>
> With filters, it should be relatively easy to do that. Before checking in, 
> the individual mailbox files are split, the contents are put into the 
> object database, and the mailbox file is replaced by a text file 
> consisting of the SHA1s of the mails.
>
> Ideally, I would eventually not only teach Pine to understand Maildir 
> format, but read and store the mails in a Git backend. Alas, I am way too 
> lazy for that.
>
> So, with filters I'd do the cheap and easy thing.
>
> You may not be able to appreciate the advantages of my scenario, but this 
> kind of flexibility is what makes Git so useful.

An earlier message $gmane/44896 from Linus comes to my mind.  An excerpt:

   The thing is, it's easy enough (although potentially _very_ expensive) to 
   run some per-file script at each commit and at each checkout. But there 
   are some fundamental operations that are even more common:

    - checking for "file changed", aka the "git status" kind of thing

      Anything we do would have to follow the same "stat" rules, at a 
      minimum. You can *not* afford to have to check the file manually.

      So especially if you combine several pieces into one, or split one file 
      into several pieces, your index would have to contain the entry 
      that matches the _filesystem_ (because that's what the index is all 
      about), but then the *tree* would contain the pieces (or the single 
      entry that matches several filesystem entries).

and I am inclined to think that this is quite fundamental.  I
think you just fell into category who want "extended semantics"
Linus talked about in $gmane/45214:

  I suspect that this gets some complaining off our back, but I *also* 
  suspect that people will actually end up really screwing themselves with 
  something like this and then blaming us and causing a huge pain down the 
  line when we've supported this and people want "extended semantics" that 
  are no longer clean.

which is kind of dissapointing.

Even if you somehow solved the issue of "stat" rule, I do not
know what your plans are to manage the blobs that you drop in
the object store.  The list of object names in the mail-index
file you are generating do not count as connectivity for the
purpose of fetch/push/fsck/prune.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Controversial blob munging series
  2007-04-23 17:13   ` Junio C Hamano
@ 2007-04-23 17:35     ` Johannes Schindelin
  2007-04-23 18:42       ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Schindelin @ 2007-04-23 17:35 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

Hi,

On Mon, 23 Apr 2007, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > On Sat, 21 Apr 2007, Junio C Hamano wrote:
> >
> >> This is on top of 'next' I'll push out after I am done with
> >> v1.5.1.2 I am preparing today.
> >> 
> >> [1/2] Add 'filter' attribute and external filter driver definition.
> >> [2/2] Add 'ident' conversion.
> >
> > I think this is great work! And it is useful, too. Let me describe a usage 
> > scenario I have in mind.
> >
> > Being stuck with Pine, which still does not do Maildir, and wanting 
> > to be able to read my mails as distributed as I am working on documents 
> > and software projects, I always dreamt of having all my mail in Git.
> >
> > With filters, it should be relatively easy to do that. Before checking in, 
> > the individual mailbox files are split, the contents are put into the 
> > object database, and the mailbox file is replaced by a text file 
> > consisting of the SHA1s of the mails.
> >
> > Ideally, I would eventually not only teach Pine to understand Maildir 
> > format, but read and store the mails in a Git backend. Alas, I am way too 
> > lazy for that.
> >
> > So, with filters I'd do the cheap and easy thing.
> >
> > You may not be able to appreciate the advantages of my scenario, but this 
> > kind of flexibility is what makes Git so useful.
> 
> An earlier message $gmane/44896 from Linus comes to my mind.  An excerpt:
> 
>    The thing is, it's easy enough (although potentially _very_ expensive) to 
>    run some per-file script at each commit and at each checkout. But there 
>    are some fundamental operations that are even more common:
> 
>     - checking for "file changed", aka the "git status" kind of thing
> 
>       Anything we do would have to follow the same "stat" rules, at a 
>       minimum. You can *not* afford to have to check the file manually.
> 
>       So especially if you combine several pieces into one, or split one file 
>       into several pieces, your index would have to contain the entry 
>       that matches the _filesystem_ (because that's what the index is all 
>       about), but then the *tree* would contain the pieces (or the single 
>       entry that matches several filesystem entries).
> 
> and I am inclined to think that this is quite fundamental.  I
> think you just fell into category who want "extended semantics"
> Linus talked about in $gmane/45214:
> 
>   I suspect that this gets some complaining off our back, but I *also* 
>   suspect that people will actually end up really screwing themselves with 
>   something like this and then blaming us and causing a huge pain down the 
>   line when we've supported this and people want "extended semantics" that 
>   are no longer clean.
> 
> which is kind of dissapointing.
> 
> Even if you somehow solved the issue of "stat" rule, I do not
> know what your plans are to manage the blobs that you drop in
> the object store.  The list of object names in the mail-index
> file you are generating do not count as connectivity for the
> purpose of fetch/push/fsck/prune.

I had the idea to update a ref, which holds "trees" of message-id -> blob 
pairs, and get updated at the same time.

If Git were libified already, I might have tried to go for direct storage 
in .git/objects/ instead.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Controversial blob munging series
  2007-04-23 17:35     ` Johannes Schindelin
@ 2007-04-23 18:42       ` Junio C Hamano
  2007-04-23 18:49         ` Johannes Schindelin
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2007-04-23 18:42 UTC (permalink / raw
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Mon, 23 Apr 2007, Junio C Hamano wrote:
> ...
>> ... I am inclined to think that this is quite fundamental.  I
>> think you just fell into category who want "extended semantics"
>> Linus talked about in $gmane/45214:
>> 
>>   I suspect that this gets some complaining off our back, but I *also* 
>>   suspect that people will actually end up really screwing themselves with 
>>   something like this and then blaming us and causing a huge pain down the 
>>   line when we've supported this and people want "extended semantics" that 
>>   are no longer clean.
>> 
>> which is kind of dissapointing.

I think this was the biggest worry.  If even Dscho, who is among
a dozen people with the most intimate knowledge of git on the
planet, gets it wrong, I can almost guarantee that we will get
into the mess Linus predicted above.

>> Even if you somehow solved the issue of "stat" rule, I do not
>> know what your plans are to manage the blobs that you drop in
>> the object store.  The list of object names in the mail-index
>> file you are generating do not count as connectivity for the
>> purpose of fetch/push/fsck/prune.
>
> I had the idea to update a ref, which holds "trees" of message-id -> blob 
> pairs, and get updated at the same time.

I somehow thought this mailbox thing was because you wanted to
transfer mailboxes across repositories.  How would you prevent
that ref from getting out of sync with the mail-index file git
knows nothing about its involvement in connectivity?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] Controversial blob munging series
  2007-04-23 18:42       ` Junio C Hamano
@ 2007-04-23 18:49         ` Johannes Schindelin
  0 siblings, 0 replies; 10+ messages in thread
From: Johannes Schindelin @ 2007-04-23 18:49 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

Hi,

On Mon, 23 Apr 2007, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > On Mon, 23 Apr 2007, Junio C Hamano wrote:
> > ...
> >> ... I am inclined to think that this is quite fundamental.  I
> >> think you just fell into category who want "extended semantics"
> >> Linus talked about in $gmane/45214:
> >> 
> >>   I suspect that this gets some complaining off our back, but I *also* 
> >>   suspect that people will actually end up really screwing themselves with 
> >>   something like this and then blaming us and causing a huge pain down the 
> >>   line when we've supported this and people want "extended semantics" that 
> >>   are no longer clean.
> >> 
> >> which is kind of dissapointing.
> 
> I think this was the biggest worry.  If even Dscho, who is among
> a dozen people with the most intimate knowledge of git on the
> planet, gets it wrong, I can almost guarantee that we will get
> into the mess Linus predicted above.

Flattering always works :-)

> >> Even if you somehow solved the issue of "stat" rule, I do not
> >> know what your plans are to manage the blobs that you drop in
> >> the object store.  The list of object names in the mail-index
> >> file you are generating do not count as connectivity for the
> >> purpose of fetch/push/fsck/prune.
> >
> > I had the idea to update a ref, which holds "trees" of message-id -> blob 
> > pairs, and get updated at the same time.
> 
> I somehow thought this mailbox thing was because you wanted to
> transfer mailboxes across repositories.  How would you prevent
> that ref from getting out of sync with the mail-index file git
> knows nothing about its involvement in connectivity?

If your suspicion was that I did not really think it through, then you're 
correct. Of course, I would have transferred _all_ refs anyway, since the 
whole point of the exercise is to lose nothing.

However, I see where your argument is going.

Since Julian pointed out that there is a maildir patch for pine, I'll 
probably go for that one, since it is hanging lower.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-04-23 18:50 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-22  6:08 [PATCH 0/2] Controversial blob munging series Junio C Hamano
2007-04-22  6:08 ` [PATCH 1/2] Add 'filter' attribute and external filter driver definition Junio C Hamano
2007-04-22  6:08 ` [PATCH 2/2] Add 'ident' conversion Junio C Hamano
2007-04-23 13:50 ` [PATCH 0/2] Controversial blob munging series Johannes Schindelin
2007-04-23 16:29   ` Julian Phillips
2007-04-23 16:54     ` Johannes Schindelin
2007-04-23 17:13   ` Junio C Hamano
2007-04-23 17:35     ` Johannes Schindelin
2007-04-23 18:42       ` Junio C Hamano
2007-04-23 18:49         ` Johannes Schindelin

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).