git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v4 0/5] Patches to avoid reporting conversion changes.
@ 2010-06-01 14:41 Henrik Grubbström (Grubba)
  2010-06-01 14:41 ` [PATCH v4 1/5] sha1_file: Add index_blob() Henrik Grubbström (Grubba)
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Henrik Grubbström (Grubba) @ 2010-06-01 14:41 UTC (permalink / raw
  To: git; +Cc: gitster, Henrik Grubbström  

This is the fourth go at having the git index keep track of
the conversion mode for blobs.

This is useful for repositorys not containing fully normalized files
(eg containing CRLF's or expanded $Id$ strings), where a later attribute
change implies a conversion mode change. Without this set of patches
the user would need to recommit semantically unchanged files to get
a clean index.

Changes since last time:

  o The patch set has been rebased upon 0ed6711 (aka eb/core-eol),
    to be able to take advantage of convert.c:get_output_conversion()
    and convert.c:determine_action().

  o As a consequence of using those functions, the on-disk format for
    the CONV extension has changed slightly. The change should have
    minimal impact, since the index will in most cases self-repair.

  o The t0025-crlf-auto.sh tests have been updated to still test
    the same behaviour.

Junio: This should be close to what you envisioned in
       <7vsk6qio1f.fsf@alter.siamese.dyndns.org>.

Henrik Grubbström (Grubba) (5):
  sha1_file: Add index_blob().
  strbuf: Add strbuf_add_uint32().
  cache: Keep track of conversion mode changes.
  cache: Add index extension "CONV".
  t/t0021: Test that conversion changes are detected.

 cache.h               |   12 ++++++
 convert.c             |   46 ++++++++++++++++++++++
 read-cache.c          |  102 ++++++++++++++++++++++++++++++++++++++++++++----
 sha1_file.c           |   19 +++++++++
 strbuf.h              |    4 ++
 t/t0021-conversion.sh |   54 ++++++++++++++++++++++++++
 t/t0025-crlf-auto.sh  |   20 ++++++----
 7 files changed, 240 insertions(+), 17 deletions(-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4 1/5] sha1_file: Add index_blob().
  2010-06-01 14:41 [PATCH v4 0/5] Patches to avoid reporting conversion changes Henrik Grubbström (Grubba)
@ 2010-06-01 14:41 ` Henrik Grubbström (Grubba)
  2010-06-01 14:41 ` [PATCH v4 2/5] strbuf: Add strbuf_add_uint32() Henrik Grubbström (Grubba)
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Henrik Grubbström (Grubba) @ 2010-06-01 14:41 UTC (permalink / raw
  To: git; +Cc: gitster, Henrik Grubbström  

When conversion attributes have changed, it
is useful to be able to easily reconvert an
existing blob.

Signed-off-by: Henrik Grubbström <grubba@grubba.org>
---
No changes since v1.

 cache.h     |    1 +
 sha1_file.c |   19 +++++++++++++++++++
 2 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/cache.h b/cache.h
index ebe71c7..004296d 100644
--- a/cache.h
+++ b/cache.h
@@ -482,6 +482,7 @@ extern int ie_match_stat(const struct index_state *, struct cache_entry *, struc
 extern int ie_modified(const struct index_state *, struct cache_entry *, struct stat *, unsigned int);
 
 extern int ce_path_match(const struct cache_entry *ce, const char **pathspec);
+extern int index_blob(unsigned char *dst_sha1, const unsigned char *src_sha1, int write_object, const char *path);
 extern int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object, enum object_type type, const char *path);
 extern int index_path(unsigned char *sha1, const char *path, struct stat *st, int write_object);
 extern void fill_stat_cache_info(struct cache_entry *ce, struct stat *st);
diff --git a/sha1_file.c b/sha1_file.c
index 657825e..6e7b999 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2434,6 +2434,25 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
 	return ret;
 }
 
+int index_blob(unsigned char *dst_sha1, const unsigned char *src_sha1,
+	       int write_object, const char *path)
+{
+	void *buf;
+	unsigned long buflen = 0;
+	int ret;
+
+	memcpy(dst_sha1, src_sha1, 20);
+	buf = read_object_with_reference(src_sha1, typename(OBJ_BLOB),
+					 &buflen, dst_sha1);
+	if (!buf)
+		return 0;
+
+	ret = index_mem(dst_sha1, buf, buflen, write_object, OBJ_BLOB, path);
+	free(buf);
+
+	return ret;
+}
+
 int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object,
 	     enum object_type type, const char *path)
 {
-- 
1.7.0.4.369.g81e89

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 2/5] strbuf: Add strbuf_add_uint32().
  2010-06-01 14:41 [PATCH v4 0/5] Patches to avoid reporting conversion changes Henrik Grubbström (Grubba)
  2010-06-01 14:41 ` [PATCH v4 1/5] sha1_file: Add index_blob() Henrik Grubbström (Grubba)
@ 2010-06-01 14:41 ` Henrik Grubbström (Grubba)
  2010-06-01 14:41 ` [PATCH v4 3/5] cache: Keep track of conversion mode changes Henrik Grubbström (Grubba)
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Henrik Grubbström (Grubba) @ 2010-06-01 14:41 UTC (permalink / raw
  To: git; +Cc: gitster, Henrik Grubbström  

Adds a convenience function for adding an unsigned 32bit
integer in network byte-order to a strbuf.

Signed-off-by: Henrik Grubbström <grubba@grubba.org>
---
 strbuf.h |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/strbuf.h b/strbuf.h
index fac2dbc..52cd71e 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -107,6 +107,10 @@ static inline void strbuf_addbuf(struct strbuf *sb, const struct strbuf *sb2) {
 	strbuf_grow(sb, sb2->len);
 	strbuf_add(sb, sb2->buf, sb2->len);
 }
+static inline void strbuf_add_uint32(struct strbuf *sb, uint32_t val) {
+	val = htonl(val);
+	strbuf_add(sb, &val, sizeof(val));
+}
 extern void strbuf_adddup(struct strbuf *sb, size_t pos, size_t len);
 
 typedef size_t (*expand_fn_t) (struct strbuf *sb, const char *placeholder, void *context);
-- 
1.7.0.4.369.g81e89

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 3/5] cache: Keep track of conversion mode changes.
  2010-06-01 14:41 [PATCH v4 0/5] Patches to avoid reporting conversion changes Henrik Grubbström (Grubba)
  2010-06-01 14:41 ` [PATCH v4 1/5] sha1_file: Add index_blob() Henrik Grubbström (Grubba)
  2010-06-01 14:41 ` [PATCH v4 2/5] strbuf: Add strbuf_add_uint32() Henrik Grubbström (Grubba)
@ 2010-06-01 14:41 ` Henrik Grubbström (Grubba)
  2010-06-01 14:41 ` [PATCH v4 4/5] cache: Add index extension "CONV" Henrik Grubbström (Grubba)
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Henrik Grubbström (Grubba) @ 2010-06-01 14:41 UTC (permalink / raw
  To: git; +Cc: gitster, Henrik Grubbström  

The index now keeps track of the conversion mode that was active
when the entry was created. This can be used to detect the most
common cases of when the conversion mode has changed.

Signed-off-by: Henrik Grubbström <grubba@grubba.org>
---
Rebased on 0ed6711 (aka eb/core-eol).

The conversion mode flags are now based on the EOL_* set of flags,
and to avoid code duplication convert.c:get_output_conversion() and
convert.c:determine_action() are used to determine the eol conversion
mode.

The denormalized eol tests in t0025-crlf-auto.sh have been altered to
still test the intended eol conversion properties.

 cache.h              |   11 +++++++++++
 convert.c            |   46 ++++++++++++++++++++++++++++++++++++++++++++++
 read-cache.c         |   37 ++++++++++++++++++++++++++++++++++---
 t/t0025-crlf-auto.sh |   20 ++++++++++++--------
 4 files changed, 103 insertions(+), 11 deletions(-)

diff --git a/cache.h b/cache.h
index 004296d..263f4f3 100644
--- a/cache.h
+++ b/cache.h
@@ -151,10 +151,20 @@ struct cache_entry {
 	unsigned int ce_size;
 	unsigned int ce_flags;
 	unsigned char sha1[20];
+	unsigned int ce_conv_flags;
 	struct cache_entry *next;
 	char name[FLEX_ARRAY]; /* more */
 };
 
+#define CONV_EOL_CRLF		EOL_CRLF
+#define CONV_EOL_LF		EOL_LF
+#define CONV_GIT_LF		0x0004
+#define CONV_GIT_AUTO		0x0008
+#define CONV_IDENT		0x0010
+#define CONV_FILT		0x0020
+#define CONV_MASK		0x003f
+#define CONV_NORM_NEEDED	0x010000
+
 #define CE_NAMEMASK  (0x0fff)
 #define CE_STAGEMASK (0x3000)
 #define CE_EXTENDED  (0x4000)
@@ -1016,6 +1026,7 @@ extern void trace_argv_printf(const char **argv, const char *format, ...);
 
 /* convert.c */
 /* returns 1 if *dst was used */
+extern unsigned int git_conv_flags(const char *path);
 extern int convert_to_git(const char *path, const char *src, size_t len,
                           struct strbuf *dst, enum safe_crlf checksafe);
 extern int convert_to_working_tree(const char *path, const char *src, size_t len, struct strbuf *dst);
diff --git a/convert.c b/convert.c
index 80d80b1..387a7c7 100644
--- a/convert.c
+++ b/convert.c
@@ -681,6 +681,52 @@ enum action determine_action(enum action text_attr, enum eol eol_attr) {
 	return text_attr;
 }
 
+unsigned int git_conv_flags(const char *path)
+{
+	struct git_attr_check check[5];
+	enum action action = CRLF_GUESS;
+	enum eol eol_attr = EOL_UNSET;
+	int ident = 0;
+	unsigned ret = 0;
+	struct convert_driver *drv = NULL;
+
+	setup_convert_check(check);
+	if (!git_checkattr(path, ARRAY_SIZE(check), check)) {
+		action = git_path_check_crlf(path, check + 4);
+		if (action == CRLF_GUESS)
+			action = git_path_check_crlf(path, check + 0);
+		ident = git_path_check_ident(path, check + 1);
+		drv = git_path_check_convert(path, check + 2);
+		eol_attr = git_path_check_eol(path, check + 3);
+	}
+
+	ret = get_output_conversion(action);
+
+	action = determine_action(action, eol_attr);
+
+	switch(action) {
+	case CRLF_BINARY:
+		break;
+	case CRLF_GUESS:
+		if (auto_crlf == AUTO_CRLF_FALSE)
+			break;
+		ret |= CONV_GIT_AUTO;
+		break;
+	case CRLF_AUTO:
+		ret |= CONV_GIT_AUTO|CONV_GIT_LF;
+		break;
+	default:
+		ret |= CONV_GIT_LF;
+		break;
+	}
+
+	if (ident)
+		ret |= CONV_IDENT;
+	if (drv)
+		ret |= CONV_FILT;
+	return ret;
+}
+
 int convert_to_git(const char *path, const char *src, size_t len,
                    struct strbuf *dst, enum safe_crlf checksafe)
 {
diff --git a/read-cache.c b/read-cache.c
index f1f789b..eeda928 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -88,12 +88,38 @@ void fill_stat_cache_info(struct cache_entry *ce, struct stat *st)
 static int ce_compare_data(struct cache_entry *ce, struct stat *st)
 {
 	int match = -1;
-	int fd = open(ce->name, O_RDONLY);
-
+	int fd;
+	unsigned char norm_sha1[20];
+	unsigned int conv_flags = git_conv_flags(ce->name);
+	const unsigned char *cmp_sha1 = ce->sha1;
+
+	if ((conv_flags ^ ce->ce_conv_flags) & CONV_MASK) {
+		if (ce->ce_conv_flags & CONV_NORM_NEEDED) {
+			/* Smudge the entry since it was only correct
+			 * for the old conversion mode. */
+			ce->ce_size = 0;
+		}
+		ce->ce_conv_flags = conv_flags;
+	} else
+		conv_flags = ce->ce_conv_flags & CONV_NORM_NEEDED;
+
+	if (conv_flags) {
+		index_blob(norm_sha1, ce->sha1, 0, ce->name);
+		if (!(conv_flags & CONV_NORM_NEEDED) &&
+		    hashcmp(norm_sha1, ce->sha1)) {
+			ce->ce_conv_flags = conv_flags | CONV_NORM_NEEDED;
+			/* Smudge the entry since we don't know
+			 * the correct value. */
+			ce->ce_size = 0;
+		}
+		cmp_sha1 = norm_sha1;
+	}
+	
+	fd = open(ce->name, O_RDONLY);
 	if (fd >= 0) {
 		unsigned char sha1[20];
 		if (!index_fd(sha1, fd, st, 0, OBJ_BLOB, ce->name))
-			match = hashcmp(sha1, ce->sha1);
+			match = hashcmp(sha1, cmp_sha1);
 		/* index_fd() closed the file descriptor already */
 	}
 	return match;
@@ -227,6 +253,11 @@ static int ce_match_stat_basic(struct cache_entry *ce, struct stat *st)
 		changed |= INODE_CHANGED;
 #endif
 
+	/* ce_size can not be trusted if the conversion mode has changed. */
+	if ((ce->ce_mode & S_IFMT) == S_IFREG &&
+	    ((ce->ce_conv_flags ^ git_conv_flags(ce->name)) & CONV_MASK))
+		return changed;
+
 	if (ce->ce_size != (unsigned int) st->st_size)
 		changed |= DATA_CHANGED;
 
diff --git a/t/t0025-crlf-auto.sh b/t/t0025-crlf-auto.sh
index f5f67a6..1bcf3dd 100755
--- a/t/t0025-crlf-auto.sh
+++ b/t/t0025-crlf-auto.sh
@@ -47,9 +47,10 @@ test_expect_success 'crlf=true causes a CRLF file to be normalized' '
 	git read-tree --reset -u HEAD &&
 
 	# Note, "normalized" means that git will normalize it if added
+	echo >>two &&
 	has_cr two &&
-	twodiff=`git diff two` &&
-	test -n "$twodiff"
+	twodiff=`git diff --numstat two | cut -f1` &&
+	test -n "$twodiff" -a "$twodiff" -gt 1
 '
 
 test_expect_success 'text=true causes a CRLF file to be normalized' '
@@ -59,9 +60,10 @@ test_expect_success 'text=true causes a CRLF file to be normalized' '
 	git read-tree --reset -u HEAD &&
 
 	# Note, "normalized" means that git will normalize it if added
+	echo >>two &&
 	has_cr two &&
-	twodiff=`git diff two` &&
-	test -n "$twodiff"
+	twodiff=`git diff --numstat two | cut -f1` &&
+	test -n "$twodiff" -a "$twodiff" -gt 1
 '
 
 test_expect_success 'eol=crlf gives a normalized file CRLFs with autocrlf=false' '
@@ -107,11 +109,12 @@ test_expect_success 'autocrlf=true does not normalize CRLF files' '
 	git read-tree --reset -u HEAD &&
 
 	has_cr one &&
+	echo >>two &&
 	has_cr two &&
 	onediff=`git diff one` &&
-	twodiff=`git diff two` &&
+	twodiff=`git diff --numstat two | cut -f1` &&
 	threediff=`git diff three` &&
-	test -z "$onediff" -a -z "$twodiff" -a -z "$threediff"
+	test -z "$onediff" -a -n "$twodiff" -a -z "$threediff" -a "$twodiff" -eq 1
 '
 
 test_expect_success 'text=auto, autocrlf=true _does_ normalize CRLF files' '
@@ -122,11 +125,12 @@ test_expect_success 'text=auto, autocrlf=true _does_ normalize CRLF files' '
 	git read-tree --reset -u HEAD &&
 
 	has_cr one &&
+	echo >>two &&
 	has_cr two &&
 	onediff=`git diff one` &&
-	twodiff=`git diff two` &&
+	twodiff=`git diff --numstat two | cut -f1` &&
 	threediff=`git diff three` &&
-	test -z "$onediff" -a -n "$twodiff" -a -z "$threediff"
+	test -z "$onediff" -a -n "$twodiff" -a -z "$threediff" -a "$twodiff" -gt 1
 '
 
 test_expect_success 'text=auto, autocrlf=true does not normalize binary files' '
-- 
1.7.0.4.369.g81e89

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 4/5] cache: Add index extension "CONV".
  2010-06-01 14:41 [PATCH v4 0/5] Patches to avoid reporting conversion changes Henrik Grubbström (Grubba)
                   ` (2 preceding siblings ...)
  2010-06-01 14:41 ` [PATCH v4 3/5] cache: Keep track of conversion mode changes Henrik Grubbström (Grubba)
@ 2010-06-01 14:41 ` Henrik Grubbström (Grubba)
  2010-06-01 14:41 ` [PATCH v4 5/5] t/t0021: Test that conversion changes are detected Henrik Grubbström (Grubba)
  2010-06-02  4:40 ` [PATCH v4 0/5] Patches to avoid reporting conversion changes Junio C Hamano
  5 siblings, 0 replies; 19+ messages in thread
From: Henrik Grubbström (Grubba) @ 2010-06-01 14:41 UTC (permalink / raw
  To: git; +Cc: gitster, Henrik Grubbström  

The index can now store and retrieve the ce_conv_flags data.

Signed-off-by: Henrik Grubbström <grubba@grubba.org>
---
The on disk format has changed slightly due to the changes in the
previous patch, but the change should have minimal impact, since
it typically only would force a one-time reindex.

 read-cache.c |   65 ++++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 files changed, 59 insertions(+), 6 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index eeda928..630e001 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -27,6 +27,7 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int reall
 #define CACHE_EXT(s) ( (s[0]<<24)|(s[1]<<16)|(s[2]<<8)|(s[3]) )
 #define CACHE_EXT_TREE 0x54524545	/* "TREE" */
 #define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */
+#define CACHE_EXT_CONV 0x434f4e56	/* "CONV" */
 
 struct index_state the_index;
 
@@ -1209,6 +1210,34 @@ static int verify_hdr(struct cache_header *hdr, unsigned long size)
 	return 0;
 }
 
+/* The on disk format is the default conversion flags followed
+ * by alternating cache entry numbers and corresponding flags.
+ */
+static int conv_read(struct cache_entry **cache, unsigned int entries,
+		     const unsigned int *data, unsigned long sz)
+{
+	unsigned int entry_no;
+	unsigned int default_conv_flags;
+	if (sz < sizeof(*data))
+		return 0;
+	default_conv_flags = ntohl(*data);
+	data++;
+	sz -= sizeof(*data);
+	if (default_conv_flags) {
+		for (entry_no = 0; entry_no < entries; entry_no++)
+			cache[entry_no]->ce_conv_flags = default_conv_flags;
+	}
+	while (sz >= 2*sizeof(*data)) {
+		entry_no = ntohl(*data);
+		data++;
+		if (entry_no >= entries) break;
+		cache[entry_no]->ce_conv_flags = ntohl(*data);
+		data++;
+		sz -= 2*sizeof(*data);
+	}
+	return 0;
+}
+
 static int read_index_extension(struct index_state *istate,
 				const char *ext, void *data, unsigned long sz)
 {
@@ -1219,6 +1248,9 @@ static int read_index_extension(struct index_state *istate,
 	case CACHE_EXT_RESOLVE_UNDO:
 		istate->resolve_undo = resolve_undo_read(data, sz);
 		break;
+	case CACHE_EXT_CONV:
+		return conv_read(istate->cache, istate->cache_nr, data, sz);
+		break;
 	default:
 		if (*ext < 'A' || 'Z' < *ext)
 			return error("index uses %.4s extension, which we do not understand",
@@ -1542,6 +1574,15 @@ static void ce_smudge_racily_clean_entry(struct cache_entry *ce)
 	}
 }
 
+static void conv_write(struct strbuf *sb, const struct cache_entry *ce,
+		       int entry_no)
+{
+	unsigned int entry[2];
+	entry[0] = htonl(entry_no);
+	entry[1] = htonl(ce->ce_conv_flags);
+	strbuf_add(sb, &entry, sizeof(entry));
+}
+
 static int ce_write_entry(git_SHA_CTX *c, int fd, struct cache_entry *ce)
 {
 	int size = ondisk_ce_size(ce);
@@ -1577,10 +1618,12 @@ int write_index(struct index_state *istate, int newfd)
 {
 	git_SHA_CTX c;
 	struct cache_header hdr;
-	int i, err, removed, extended;
+	int i, j, err, removed, extended;
 	struct cache_entry **cache = istate->cache;
 	int entries = istate->cache_nr;
 	struct stat st;
+	struct strbuf sb = STRBUF_INIT;
+	unsigned int default_conv_flags;
 
 	for (i = removed = extended = 0; i < entries; i++) {
 		if (cache[i]->ce_flags & CE_REMOVE)
@@ -1603,7 +1646,10 @@ int write_index(struct index_state *istate, int newfd)
 	if (ce_write(&c, newfd, &hdr, sizeof(hdr)) < 0)
 		return -1;
 
-	for (i = 0; i < entries; i++) {
+	default_conv_flags = git_conv_flags("");
+	strbuf_add_uint32(&sb, default_conv_flags);
+
+	for (i = j = 0; i < entries; i++) {
 		struct cache_entry *ce = cache[i];
 		if (ce->ce_flags & CE_REMOVE)
 			continue;
@@ -1611,12 +1657,21 @@ int write_index(struct index_state *istate, int newfd)
 			ce_smudge_racily_clean_entry(ce);
 		if (ce_write_entry(&c, newfd, ce) < 0)
 			return -1;
+		if (ce->ce_conv_flags != default_conv_flags)
+			conv_write(&sb, ce, j);
+		j++;
 	}
 
 	/* Write extension data here */
+	if (default_conv_flags || sb.len > sizeof(default_conv_flags)) {
+		err = write_index_ext_header(&c, newfd, CACHE_EXT_CONV, sb.len) < 0
+			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
+		strbuf_release(&sb);
+		if (err)
+			return -1;
+	} else
+		strbuf_release(&sb);
 	if (istate->cache_tree) {
-		struct strbuf sb = STRBUF_INIT;
-
 		cache_tree_write(&sb, istate->cache_tree);
 		err = write_index_ext_header(&c, newfd, CACHE_EXT_TREE, sb.len) < 0
 			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
@@ -1625,8 +1680,6 @@ int write_index(struct index_state *istate, int newfd)
 			return -1;
 	}
 	if (istate->resolve_undo) {
-		struct strbuf sb = STRBUF_INIT;
-
 		resolve_undo_write(&sb, istate->resolve_undo);
 		err = write_index_ext_header(&c, newfd, CACHE_EXT_RESOLVE_UNDO,
 					     sb.len) < 0
-- 
1.7.0.4.369.g81e89

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v4 5/5] t/t0021: Test that conversion changes are detected.
  2010-06-01 14:41 [PATCH v4 0/5] Patches to avoid reporting conversion changes Henrik Grubbström (Grubba)
                   ` (3 preceding siblings ...)
  2010-06-01 14:41 ` [PATCH v4 4/5] cache: Add index extension "CONV" Henrik Grubbström (Grubba)
@ 2010-06-01 14:41 ` Henrik Grubbström (Grubba)
  2010-06-02  4:40 ` [PATCH v4 0/5] Patches to avoid reporting conversion changes Junio C Hamano
  5 siblings, 0 replies; 19+ messages in thread
From: Henrik Grubbström (Grubba) @ 2010-06-01 14:41 UTC (permalink / raw
  To: git; +Cc: gitster, Henrik Grubbström  

Signed-off-by Henrik Grubbström <grubba@grubba.org>
---
 t/t0021-conversion.sh |   54 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 54 insertions(+), 0 deletions(-)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 6cb8d60..6d41ee7 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -89,4 +89,58 @@ test_expect_success expanded_in_repo '
 	cmp expanded-keywords expected-output
 '
 
+# Check that files that have had their canonical representation
+# changed since being checked in aren't reported as modified
+# directly after being checked out.
+test_expect_success keywords_not_modified '
+	{
+		echo "File with foreign keywords"
+		echo "\$Id\$"
+		echo "\$Id: NoTerminatingSymbol"
+		echo "\$Id: Foreign Commit With Spaces \$"
+		echo "\$Id: GitCommitId \$"
+		echo "\$Id: NoTerminatingSymbolAtEOF"
+	} > expanded-keywords2 &&
+
+	git add expanded-keywords2 &&
+	git commit -m "File with keywords expanded" &&
+
+	echo "expanded-keywords2 ident" >> .gitattributes &&
+
+	rm -f expanded-keywords2 &&
+	git checkout -- expanded-keywords2 &&
+
+	test "x`git status --porcelain -- expanded-keywords2`" = x
+'
+
+# Test detection of CRLF conversion changes CRLF ==> LF.
+test_expect_success crlf_conversion_change_crlf_to_lf '
+(
+	# step 0. a blob with CRLF
+	git init one && cd one &&
+	echo -e "a quick brown fox\015" >kuzu &&
+	git add kuzu && git commit -m kuzu &&
+	# step 1. you want CRLF in work area, LF in repository
+	git config core.autocrlf true &&
+	# step 2. user edit and revert.
+	touch kuzu &&
+	git update-index --refresh
+)
+'
+
+# Test detection of CRLF conversion changes LF ==> CRLF.
+test_expect_success crlf_conversion_change_lf_to_crlf '
+(
+	# step 0 & 1. a project with LF ending
+	git init two && cd two &&
+	echo a quick brown fox >kuzu &&
+	git add kuzu && git commit -m kuzu &&
+	# step 2. you want CRLF in your work area
+	echo -e "a quick brown fox\015" >kuzu &&
+	git config core.autocrlf true &&
+	# step 3. oops, refresh
+	git update-index --refresh
+)
+'
+
 test_done
-- 
1.7.0.4.369.g81e89

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 0/5] Patches to avoid reporting conversion changes.
  2010-06-01 14:41 [PATCH v4 0/5] Patches to avoid reporting conversion changes Henrik Grubbström (Grubba)
                   ` (4 preceding siblings ...)
  2010-06-01 14:41 ` [PATCH v4 5/5] t/t0021: Test that conversion changes are detected Henrik Grubbström (Grubba)
@ 2010-06-02  4:40 ` Junio C Hamano
  2010-06-03 16:00   ` Henrik Grubbström
  5 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2010-06-02  4:40 UTC (permalink / raw
  To: Henrik Grubbström (Grubba); +Cc: git

"Henrik Grubbström (Grubba)" <grubba@grubba.org> writes:

> This is useful for repositorys not containing fully normalized files
> (eg containing CRLF's or expanded $Id$ strings), where a later attribute
> change implies a conversion mode change. Without this set of patches
> the user would need to recommit semantically unchanged files to get
> a clean index.

A more fundamental (or perhaps "silly") question is if that "user would
need to" is necessarily a bad thing.  If the user wants to cleanse such
abnormality in the recorded blobs, shouldn't there be a conscious act,
iow, a commit that records that "I am fixing that mistake, and from now
on, the recorded data are normalized"?

Perhaps I am missing something very trivial that you have already
explained to the list but I forgot amid my moving and other confusion, and
if that is the case I apologize in advance ;-).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 0/5] Patches to avoid reporting conversion changes.
  2010-06-02  4:40 ` [PATCH v4 0/5] Patches to avoid reporting conversion changes Junio C Hamano
@ 2010-06-03 16:00   ` Henrik Grubbström
  2010-06-04  0:56     ` Jonathan Nieder
  0 siblings, 1 reply; 19+ messages in thread
From: Henrik Grubbström @ 2010-06-03 16:00 UTC (permalink / raw
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2396 bytes --]

On Tue, 1 Jun 2010, Junio C Hamano wrote:

> "Henrik Grubbström (Grubba)" <grubba@grubba.org> writes:
>
>> This is useful for repositorys not containing fully normalized files
>> (eg containing CRLF's or expanded $Id$ strings), where a later attribute
>> change implies a conversion mode change. Without this set of patches
>> the user would need to recommit semantically unchanged files to get
>> a clean index.
>
> A more fundamental (or perhaps "silly") question is if that "user would
> need to" is necessarily a bad thing.  If the user wants to cleanse such
> abnormality in the recorded blobs, shouldn't there be a conscious act,
> iow, a commit that records that "I am fixing that mistake, and from now
> on, the recorded data are normalized"?

I believe that users typically aren't interested in if data in the 
repository is on normalized form or not (witness the autocrlf=true 
discussion a few weeks ago, where one of the main complaints was
that it required a renormalization (which fg/autocrlf attempts to
solve for that specific case by not normalizing)), as long as they
get the expected content on checkout.

This set of patches allows for an incremental, on-demand normalization.
Eg the user could switch the attributes for a group of files from

   *.bat -crlf

(let's assume *.bat files use crlf linebreaks) to

   *.bat -crlf text eol=crlf

and then have git normalize the individual files when there's actually a 
semantic reason for a change. With the current eb/core-eol patches this
change would cause a dirty index on checkout. The user who committed the 
change however has a clean index until any of the files affected is 
touched.

In my case, I have repositories containing files both requiring crlf and 
lf line endings, and additionally have expanded $Id$-strings that I want 
changed on first semantic change (but not before). To be able to use a
git binary without this patchset I'd have to do a

   git commit -a -m 'Normalized'

as the first thing after a checkout.

I could of course add a config option to control the behaviour (hmm, or 
maybe an attribute?).

> Perhaps I am missing something very trivial that you have already
> explained to the list but I forgot amid my moving and other confusion, and
> if that is the case I apologize in advance ;-).

No problem.

--
Henrik Grubbström					grubba@grubba.org
Roxen Internet Software AB				grubba@roxen.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 0/5] Patches to avoid reporting conversion changes.
  2010-06-03 16:00   ` Henrik Grubbström
@ 2010-06-04  0:56     ` Jonathan Nieder
  2010-06-04 11:59       ` Henrik Grubbström
  0 siblings, 1 reply; 19+ messages in thread
From: Jonathan Nieder @ 2010-06-04  0:56 UTC (permalink / raw
  To: Henrik Grubbström; +Cc: Junio C Hamano, git

Hi Henrik,

Henrik Grubbström wrote:

> I believe that users typically aren't interested in if data in the
> repository is on normalized form or not (witness the autocrlf=true
> discussion a few weeks ago, where one of the main complaints was
> that it required a renormalization (which fg/autocrlf attempts to
> solve for that specific case by not normalizing)), as long as they
> get the expected content on checkout.

I agree.  (In the case of autocrlf, it is also not very easy to
renormalize.  The usual recommendation I have seen is "git rm -r \
--cached . && git add .", which is not exactly simple.)

> This set of patches allows for an incremental, on-demand normalization.
> Eg the user could switch the attributes for a group of files from
>
>   *.bat -crlf
>
> (let's assume *.bat files use crlf linebreaks) to
>
>   *.bat -crlf text eol=crlf
>
> and then have git normalize the individual files when there's
> actually a semantic reason for a change.

... but if I understand correctly, I don’t agree with this at all.

Imagine someone with an old copy of git that does not do
normalization.  If you convert everything at once, she sees a single
enormous, semantically uninteresting cleanup patch (and she can check
the result with ‘diff -w’ or sed if suspicious).  If you wait for some
real change to piggy-back onto, on the other hand, then the per-file
normalization patches will make it hard to find what changed.

Of course, very few people use such old copies of git.  The real
problem is that git itself sees what this person would see; you are
asking to slow down everyone who tries to use diff or blame on your
repository by implicitly requiring the -w option.

> In my case, I have repositories containing files both requiring crlf
> and lf line endings, and additionally have expanded $Id$-strings
> that I want changed on first semantic change (but not before). To be
> able to use a
> git binary without this patchset I'd have to do a
> 
>   git commit -a -m 'Normalized'
> 
> as the first thing after a checkout.

The Right Thing would be to not set the relevant attributes until it
is time for the file to be normalized.  I can understand that that
might be hard and could require tool support.

This is not an argument against your patches, since I haven’t read
them (for all I know, they make everything better :)).

Regards,
Jonathan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 0/5] Patches to avoid reporting conversion changes.
  2010-06-04  0:56     ` Jonathan Nieder
@ 2010-06-04 11:59       ` Henrik Grubbström
  2010-06-04 19:42         ` Jonathan Nieder
  0 siblings, 1 reply; 19+ messages in thread
From: Henrik Grubbström @ 2010-06-04 11:59 UTC (permalink / raw
  To: Jonathan Nieder; +Cc: Junio C Hamano, git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3683 bytes --]

On Thu, 3 Jun 2010, Jonathan Nieder wrote:

> Hi Henrik,

Hi.

> Henrik Grubbström wrote:
>
>> I believe that users typically aren't interested in if data in the
>> repository is on normalized form or not (witness the autocrlf=true
>> discussion a few weeks ago, where one of the main complaints was
>> that it required a renormalization (which fg/autocrlf attempts to
>> solve for that specific case by not normalizing)), as long as they
>> get the expected content on checkout.
>
> I agree.  (In the case of autocrlf, it is also not very easy to
> renormalize.  The usual recommendation I have seen is "git rm -r \
> --cached . && git add .", which is not exactly simple.)
>
>> This set of patches allows for an incremental, on-demand normalization.
[...]
> ... but if I understand correctly, I don't agree with this at all.
>
> Imagine someone with an old copy of git that does not do
> normalization.  If you convert everything at once, she sees a single
> enormous, semantically uninteresting cleanup patch (and she can check
> the result with 'diff -w' or sed if suspicious).  If you wait for some
> real change to piggy-back onto, on the other hand, then the per-file
> normalization patches will make it hard to find what changed.

This seems more like an argument against repositories where 
renormalizations have occurred, than against the feature as such.

> Of course, very few people use such old copies of git.  The real
> problem is that git itself sees what this person would see; you are
> asking to slow down everyone who tries to use diff or blame on your
> repository by implicitly requiring the -w option.

Well, diff and blame would be confused by a crlf renormalization 
regardless of whether the renormalization was piggy-backed or not. 
I haven't looked at the implementation of blame, but it was possible 
to reduce the confusion in the diff case by letting it normalize the 
old blob according to the current set of attributes (this was part of 
my original patch set, but Junio didn't like the feature).

> The Right Thing would be to not set the relevant attributes until it
> is time for the file to be normalized.  I can understand that that
> might be hard and could require tool support.

True, but then the .gitattributes file would start to resemble
a manifest file for the entire repository, which would be a
pain to maintain.

I did do an experiment with a .gitattributes file like:

   *.c crlf ident
   [attr]foreign_ident -ident block_commit=Remove-foreign_ident-attribute-before-commit.
   # A list of files that haven't been changed since import follows.
   /foo.c foreign_ident
   /bar.c foreign_ident
   # etc

and a suitable pre-commit hook that looked at the block_commit
attribute, but there were two problems in addition to the long
list of files in the .gitattributes file:

   * The attributes file parsing was broken (recently fixed in the
     master branch), and the above actually caused foo.c and bar.c
     to have the ident attribute.

   * Hooks are not copied by git clone. Support for copying of hooks
     to non-POSIX-like systems is not something I'd like to attempt.

The latter problem could in this case be solved by adding support
for the block_commit attribute to the core of git, but it doesn't
seem like something most users would understand how to use, and I
doubt that Junio would accept such a patch.

> This is not an argument against your patches, since I haven't read
> them (for all I know, they make everything better :)).

:-)

I don't believe that they make everything better, but that they're
a step on the way.

--
Henrik Grubbström					grubba@grubba.org
Roxen Internet Software AB				grubba@roxen.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 0/5] Patches to avoid reporting conversion changes.
  2010-06-04 11:59       ` Henrik Grubbström
@ 2010-06-04 19:42         ` Jonathan Nieder
  2010-06-06 10:50           ` Henrik Grubbström
  0 siblings, 1 reply; 19+ messages in thread
From: Jonathan Nieder @ 2010-06-04 19:42 UTC (permalink / raw
  To: Henrik Grubbström; +Cc: Junio C Hamano, git

Henrik Grubbström wrote:
> On Thu, 3 Jun 2010, Jonathan Nieder wrote:

>> If you wait for some
>> real change to piggy-back onto, on the other hand, then the per-file
>> normalization patches will make it hard to find what changed.
>
> This seems more like an argument against repositories where
> renormalizations have occurred, than against the feature as such.

No, it is an argument against making the process of renormalization
more painful than it has to be (and against piggy-backing in general).
It is kindest to have a flag day and yank the carriage returns off all
at once like a bandage.

> Well, diff and blame would be confused by a crlf renormalization
> regardless of whether the renormalization was piggy-backed or not.

Only if they cross the revision where renormalization occurred.

> I did do an experiment with a .gitattributes file like:
> 
>   *.c crlf ident
>   [attr]foreign_ident -ident block_commit=Remove-foreign_ident-attribute-before-commit.
>   # A list of files that haven't been changed since import follows.
>   /foo.c foreign_ident
>   /bar.c foreign_ident
>   # etc

This looks more sane.  Ident strings usually touch only a few lines.

> there were two problems in addition to the long
> list of files in the .gitattributes file:
> 
>   * The attributes file parsing was broken (recently fixed in the
>     master branch), and the above actually caused foo.c and bar.c
>     to have the ident attribute.

Wouldn’t something like

 /foo.c -ident has_foreign_ident

work?  (Thanks for fixing that attributes macro processing bug, btw.)

>   * Hooks are not copied by git clone. Support for copying of hooks
>     to non-POSIX-like systems is not something I'd like to attempt.

Can’t you include a hooks/pre-commit file and a HACKING file: "copy
this file to .git/hooks if you want your patches to be accepted"?

Thanks for your hard work,
Jonathan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 0/5] Patches to avoid reporting conversion changes.
  2010-06-04 19:42         ` Jonathan Nieder
@ 2010-06-06 10:50           ` Henrik Grubbström
  2010-06-07  8:59             ` Finn Arne Gangstad
  0 siblings, 1 reply; 19+ messages in thread
From: Henrik Grubbström @ 2010-06-06 10:50 UTC (permalink / raw
  To: Jonathan Nieder; +Cc: Junio C Hamano, git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3520 bytes --]

On Fri, 4 Jun 2010, Jonathan Nieder wrote:

> Henrik Grubbström wrote:
>> On Thu, 3 Jun 2010, Jonathan Nieder wrote:
>
>>> If you wait for some
>>> real change to piggy-back onto, on the other hand, then the per-file
>>> normalization patches will make it hard to find what changed.
>>
>> This seems more like an argument against repositories where
>> renormalizations have occurred, than against the feature as such.
>
> No, it is an argument against making the process of renormalization
> more painful than it has to be (and against piggy-backing in general).
> It is kindest to have a flag day and yank the carriage returns off all
> at once like a bandage.

In the crlf case, yes, probably. In the ident case there might be good 
reasons to want to have the ident strings stay unmodified as long as 
possible (since otherwise there'll be two ident strings that identify
the same code).

Currently (as I believe you know), git has no detection of when the 
conversion mode for a file has changed, and it might even take a while 
before the users notice that the repository is not normalized. eg:

   0) There's a repository with some files containing crlf line endings.

   1) User A notices that git now has native support for crlf
      line endings, and adds the attribute eol=crlf for the
      affected files.

   2) User A does a git status, sees that .gitattributes is
      modified, and commits it.

   3) User A does a new git status, and has a clean index.

   4) User B who has been developing on the project for a while
      does a git pull from user A, and gets the new .gitattributes.

   5) User B does a git status, and has a clean index.

   6) User C is new to the project and does an initial git clone,
      and ends up with a dirty index.

I believe we both agree that the above is undesireable behaviour. The 
above behaviour also means that it's likely that there are repositorys
out there which contain unnormalized files.

What my patch set achieves is that user C above also gets a clean index.

What it seems you want is that user A above should have all files that got 
denormalized by the attribute change marked dirty at 2 (and 3).

With a minor change of the read-cache.c:ce_compare_data() patch (returning 
1 if conv_flags has CONV_NORM_NEEDED), you should get the behaviour you 
want (all files which are unnormalized in the repository will be dirty).

As I believe both behaviours may be desireable a config option and/or 
attribute is needed. Any suggestions for a name (and default value)?

>> Well, diff and blame would be confused by a crlf renormalization
>> regardless of whether the renormalization was piggy-backed or not.
>
> Only if they cross the revision where renormalization occurred.

Which is true in both cases.

> Wouldn't something like
>
> /foo.c -ident has_foreign_ident
>
> work?

Yes, but it sort of reduces the usefulness of macros if you need to expand 
them by hand... :-)

Better to fix the bug and wait for the fix to enter a released version.

>>   * Hooks are not copied by git clone. Support for copying of hooks
>>     to non-POSIX-like systems is not something I'd like to attempt.
>
> Can't you include a hooks/pre-commit file and a HACKING file: "copy
> this file to .git/hooks if you want your patches to be accepted"?

Yes, but you still have to remember to do it everytime you clone the 
repository.

Thanks for taking the time to discuss the problem.

--
Henrik Grubbström					grubba@grubba.org
Roxen Internet Software AB				grubba@roxen.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 0/5] Patches to avoid reporting conversion changes.
  2010-06-06 10:50           ` Henrik Grubbström
@ 2010-06-07  8:59             ` Finn Arne Gangstad
  2010-06-07 16:37               ` Henrik Grubbström
  0 siblings, 1 reply; 19+ messages in thread
From: Finn Arne Gangstad @ 2010-06-07  8:59 UTC (permalink / raw
  To: Henrik Grubbström; +Cc: Jonathan Nieder, Junio C Hamano, git

On Sun, Jun 06, 2010 at 12:50:08PM +0200, Henrik Grubbström wrote:
[...]
>
> Currently (as I believe you know), git has no detection of when the  
> conversion mode for a file has changed, and it might even take a while  
> before the users notice that the repository is not normalized. eg:
>
>   0) There's a repository with some files containing crlf line endings.
>
>   1) User A notices that git now has native support for crlf
>      line endings, and adds the attribute eol=crlf for the
>      affected files.
>
>   2) User A does a git status, sees that .gitattributes is
>      modified, and commits it.

I think it would be best if git at this time could decide that the
affected files also become dirty. The ideal commit is one that
both alters the .gitattributes _and_ the affected files at the same
time, and git should make it easy to create that commit.

> [...]
>   6) User C is new to the project and does an initial git clone,
>      and ends up with a dirty index.

And the reason for this is mostly that unless you perform some special
actions, you will commit attributes and contents that are mismatched.

In your suggested mode, whay would happen if you did this:

$ git clone ......  (which has files that are "wrong" wrt line endings and
attributes for some .c files)
$ touch *.c

Would it still believe all *.c files were clean? Does it require an
actual other change at the same time to allow you to normalize the
file? That would be detrimental I think.  Changing newlines is best
done as a separate commit, intermingling newline changes and real
changes in the same commmit is not where you want to go.

However, for your ID string you obviously want this behaviour. I'm
guessing that hook is alreasy set up so that if you just touch the
file, it will still be treated as unmodified?

>
> What my patch set achieves is that user C above also gets a clean index.
>
> What it seems you want is that user A above should have all files that 
> got denormalized by the attribute change marked dirty at 2 (and 3).

That would indeed be a very welcome change.

> As I believe both behaviours may be desireable a config option and/or  
> attribute is needed. Any suggestions for a name (and default value)?

I think the default behaviour should be to mark files dirty if there
are ANY attribute changes that could cause content changes done to
them at all. I'm not sure that is exactly what your patch series is
allowing us to track though?

Just to be clear:

If you add this to your .gitattributes

*.c eol=lf

I think it would be very helpful if git then would treat all .c files
as "stat-dirty" the next time it updates its index.

A for config variables, what about:

core.rereadOnAttributeChanges = [true]/false    (default = true)

Which makes some sense for detecting it in 2, but not so much for
ignoring it in 6.

- Finn Arne

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 0/5] Patches to avoid reporting conversion changes.
  2010-06-07  8:59             ` Finn Arne Gangstad
@ 2010-06-07 16:37               ` Henrik Grubbström
  2010-06-07 19:50                 ` Finn Arne Gangstad
  0 siblings, 1 reply; 19+ messages in thread
From: Henrik Grubbström @ 2010-06-07 16:37 UTC (permalink / raw
  To: Finn Arne Gangstad; +Cc: Jonathan Nieder, Junio C Hamano, git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4341 bytes --]


Hi Finn,

On Mon, 7 Jun 2010, Finn Arne Gangstad wrote:

> On Sun, Jun 06, 2010 at 12:50:08PM +0200, Henrik Grubbström wrote:
> [...]
>>
>> Currently (as I believe you know), git has no detection of when the
>> conversion mode for a file has changed, and it might even take a while
>> before the users notice that the repository is not normalized. eg:
>>
>>   0) There's a repository with some files containing crlf line endings.
>>
>>   1) User A notices that git now has native support for crlf
>>      line endings, and adds the attribute eol=crlf for the
>>      affected files.
>>
>>   2) User A does a git status, sees that .gitattributes is
>>      modified, and commits it.
>
> I think it would be best if git at this time could decide that the
> affected files also become dirty. The ideal commit is one that
> both alters the .gitattributes _and_ the affected files at the same
> time, and git should make it easy to create that commit.

I agree in the case of newly added attributes. In the case of repositories 
already containing unnormalized files this however leads to problems.
eg

   Consider the case above, but a while later when the repository has been
   fixed at HEAD. If an old version from before the normalization is
   checked out, the index will once again become dirty, which means that
   git will refuse the user to check out some other version unless the
   --force flag is given. Excessive use of --force is not a good thing.
   If the user is aware of the problem, and checking out old versions is
   a common operation, toggling the suggested option might be a good
   solution.

>> [...]
>>   6) User C is new to the project and does an initial git clone,
>>      and ends up with a dirty index.
>
> And the reason for this is mostly that unless you perform some special
> actions, you will commit attributes and contents that are mismatched.
>
> In your suggested mode, whay would happen if you did this:
>
> $ git clone ......  (which has files that are "wrong" wrt line endings and
> attributes for some .c files)
> $ touch *.c
>
> Would it still believe all *.c files were clean? Does it require an
> actual other change at the same time to allow you to normalize the
> file? That would be detrimental I think.  Changing newlines is best
> done as a separate commit, intermingling newline changes and real
> changes in the same commmit is not where you want to go.

With the patch set as submitted, it would still claim that the files 
are clean. With the additional two-liner I suggested to Jonathan, the 
files would be dirty.

> However, for your ID string you obviously want this behaviour. I'm
> guessing that hook is alreasy set up so that if you just touch the
> file, it will still be treated as unmodified?

Yes.

>> What my patch set achieves is that user C above also gets a clean index.
>>
>> What it seems you want is that user A above should have all files that
>> got denormalized by the attribute change marked dirty at 2 (and 3).
>
> That would indeed be a very welcome change.
>
>> As I believe both behaviours may be desireable a config option and/or
>> attribute is needed. Any suggestions for a name (and default value)?
>
> I think the default behaviour should be to mark files dirty if there
> are ANY attribute changes that could cause content changes done to
> them at all. I'm not sure that is exactly what your patch series is
> allowing us to track though?

What it does, is to track the attributes affecting conversion that 
were active when the entries in the index were last updated, and 
also whether the entry is on normalized form in the repository. 
This makes it possible to invalidate entries when attributes change.

> Just to be clear:
>
> If you add this to your .gitattributes
>
> *.c eol=lf
>
> I think it would be very helpful if git then would treat all .c files
> as "stat-dirty" the next time it updates its index.

Ok.

> A for config variables, what about:
>
> core.rereadOnAttributeChanges = [true]/false    (default = true)
>
> Which makes some sense for detecting it in 2, but not so much for
> ignoring it in 6.

I was thinking more along the lines of

   core.allowUnnormalizedIndex = true/[false]	(default = false)

which unfortunately contains a negation.

--
Henrik Grubbström					grubba@grubba.org
Roxen Internet Software AB				grubba@roxen.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 0/5] Patches to avoid reporting conversion changes.
  2010-06-07 16:37               ` Henrik Grubbström
@ 2010-06-07 19:50                 ` Finn Arne Gangstad
  2010-06-08 15:52                   ` Henrik Grubbström
  0 siblings, 1 reply; 19+ messages in thread
From: Finn Arne Gangstad @ 2010-06-07 19:50 UTC (permalink / raw
  To: Henrik Grubbström; +Cc: Jonathan Nieder, Junio C Hamano, git

On Mon, Jun 07, 2010 at 06:37:56PM +0200, Henrik Grubbström wrote:
>
> On Mon, 7 Jun 2010, Finn Arne Gangstad wrote:
>
>> I think it would be best if git at this time could decide that the
>> affected files also become dirty. The ideal commit is one that
>> both alters the .gitattributes _and_ the affected files at the same
>> time, and git should make it easy to create that commit.
>
> I agree in the case of newly added attributes. In the case of 
> repositories already containing unnormalized files this however leads to 
> problems.
> eg
>
>   Consider the case above, but a while later when the repository has been
>   fixed at HEAD. If an old version from before the normalization is
>   checked out, the index will once again become dirty, which means that
>   git will refuse the user to check out some other version unless the
>   --force flag is given. Excessive use of --force is not a good thing.
>   If the user is aware of the problem, and checking out old versions is
>   a common operation, toggling the suggested option might be a good
>   solution.

Maybe I misunderstand something, but if you check out an older
version, the .gitattributes file will change to match the old version.
The old version should not have the conversion attributes set, and
should therefore result in a clean checkout?

- Finn Arne

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 0/5] Patches to avoid reporting conversion changes.
  2010-06-07 19:50                 ` Finn Arne Gangstad
@ 2010-06-08 15:52                   ` Henrik Grubbström
  2010-06-09 14:03                     ` Finn Arne Gangstad
  0 siblings, 1 reply; 19+ messages in thread
From: Henrik Grubbström @ 2010-06-08 15:52 UTC (permalink / raw
  To: Finn Arne Gangstad; +Cc: Jonathan Nieder, Junio C Hamano, git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3880 bytes --]

On Mon, 7 Jun 2010, Finn Arne Gangstad wrote:

> On Mon, Jun 07, 2010 at 06:37:56PM +0200, Henrik Grubbström wrote:
>>
>> On Mon, 7 Jun 2010, Finn Arne Gangstad wrote:
>>
>>> I think it would be best if git at this time could decide that the
>>> affected files also become dirty. The ideal commit is one that
>>> both alters the .gitattributes _and_ the affected files at the same
>>> time, and git should make it easy to create that commit.
>>
>> I agree in the case of newly added attributes. In the case of
>> repositories already containing unnormalized files this however leads to
>> problems.
>> eg
>>
>>   Consider the case above, but a while later when the repository has been
>>   fixed at HEAD. If an old version from before the normalization is
>>   checked out, the index will once again become dirty, which means that
>>   git will refuse the user to check out some other version unless the
>>   --force flag is given. Excessive use of --force is not a good thing.
>>   If the user is aware of the problem, and checking out old versions is
>>   a common operation, toggling the suggested option might be a good
>>   solution.
>
> Maybe I misunderstand something, but if you check out an older
> version, the .gitattributes file will change to match the old version.
> The old version should not have the conversion attributes set, and
> should therefore result in a clean checkout?

True, there's no problem before the attribute change, but there is 
for commits between the attribute change and when the repository got 
normalized (which can be a while with the current git).

Re: configuration option naming:

   I've settled for core.normalizationPolicy, with the values
   'strict' (default) for the behaviour requested by you and Jonathan,
   and 'relaxed' for my initial behaviour.

Teaser:

   $ git init foo
   warning: templates not found /home/grubba/share/git-core/templates
   Initialized empty Git repository in /tmp/grubba/foo/.git/
   $ cd foo
   $ cat >expanded-keywords
   $Id: some id string $
   $ git add expanded-keywords
   $ git commit -m 'Initial commit.'
   [master (root-commit) 755d1f6] Initial commit.
    1 files changed, 1 insertions(+), 0 deletions(-)
    create mode 100644 expanded-keywords
   $ git status
   # On branch master
   nothing to commit (working directory clean)
   $ cat >.gitattributes
   * ident
   $ git status
   # On branch master
   # Changed but not updated:
   #   (use "git add <file>..." to update what will be committed)
   #   (use "git checkout -- <file>..." to discard changes in working directory)
   #
   #       modified:   expanded-keywords
   #
   # Untracked files:
   #   (use "git add <file>..." to include in what will be committed)
   #
   #       .gitattributes
   no changes added to commit (use "git add" and/or "git commit -a")
   $ git config core.normalizationPolicy relaxed
   $ git status
   # On branch master
   # Untracked files:
   #   (use "git add <file>..." to include in what will be committed)
   #
   #       .gitattributes
   nothing added to commit but untracked files present (use "git add" to track)
   $ git config core.normalizationPolicy strict
   $ git status
   # On branch master
   # Changed but not updated:
   #   (use "git add <file>..." to update what will be committed)
   #   (use "git checkout -- <file>..." to discard changes in working directory)
   #
   #       modified:   expanded-keywords
   #
   # Untracked files:
   #   (use "git add <file>..." to include in what will be committed)
   #
   #       .gitattributes
   no changes added to commit (use "git add" and/or "git commit -a")
   $ rm .gitattributes
   $ git status
   # On branch master
   nothing to commit (working directory clean)

Which I believe matches all the behaviours that have been requested.

--
Henrik Grubbström					grubba@grubba.org
Roxen Internet Software AB				grubba@roxen.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 0/5] Patches to avoid reporting conversion changes.
  2010-06-08 15:52                   ` Henrik Grubbström
@ 2010-06-09 14:03                     ` Finn Arne Gangstad
  2010-06-09 18:04                       ` Henrik Grubbström
  0 siblings, 1 reply; 19+ messages in thread
From: Finn Arne Gangstad @ 2010-06-09 14:03 UTC (permalink / raw
  To: Henrik Grubbström; +Cc: Jonathan Nieder, Junio C Hamano, git

On Tue, Jun 08, 2010 at 05:52:37PM +0200, Henrik Grubbström wrote:

> [...]
> True, there's no problem before the attribute change, but there is for 
> commits between the attribute change and when the repository got  
> normalized (which can be a while with the current git).

As you say, the current git makes it easy to commit something where
the attributes and the contents do not match. I think this needs to be
fixed, and that your proposed patch in relaxed mode makes the problem
_worse_, since it will then take even longer before these commits are
fixed. But see below.

>
> Re: configuration option naming:
>
>   I've settled for core.normalizationPolicy, with the values
>   'strict' (default) for the behaviour requested by you and Jonathan,
>   and 'relaxed' for my initial behaviour.

The name might be a bit vague, maybe there are other things that could
be normalized? Maybe adding the word "index" is an improvement -
e.g. core.indexNormalizationPolicy or just core.indexNormalization. 

>
> Teaser:
> [...]
>   $ git status
>   # On branch master
>   nothing to commit (working directory clean)
>   $ cat >.gitattributes
>   * ident
>   $ git status
> [...]
>   #       modified:   expanded-keywords
> [...]
>
>   $ git config core.normalizationPolicy relaxed
>   $ git status
>   # On branch master
> [No longer modified]

THIS behaviour is what I find scary. In this case, "ident" is clearly
a newly added attribute, and git should not hide this from you. If you
add a mode where git will hide this permanently, chances are the
repositories will never be fixed.

The ident attribute may be a bit special since in your case it is only
supposed to change if some other contents in the file change as well,
but please also think how this will work with the text/eol
attributes. Setting the text attribute and then having to CHANGE a
file before getting it normalized is not good.

Still, I think your original problem description of cloning something
and ending up with a dirty tree is indeed an annoying problem.  So
what about having the relaxed mode behave as follows:

If both of these are true:
 - the current attributes for a file are the same as it is registered as
   in the index with your new patch
 - a checkout of the file would result in identical contents to what is 
   currently in the working directory
Then behave as if the file is not modified.

Or, in other words: If attributes are unchanged, a file is unmodified
not only if it would result in the same contents after being added,
but also if it would result in the current working directory
contents after being checked out again.

This should work for both text and ident on clone at least.

- Finn Arne

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 0/5] Patches to avoid reporting conversion changes.
  2010-06-09 14:03                     ` Finn Arne Gangstad
@ 2010-06-09 18:04                       ` Henrik Grubbström
  2010-06-10 19:55                         ` Finn Arne Gangstad
  0 siblings, 1 reply; 19+ messages in thread
From: Henrik Grubbström @ 2010-06-09 18:04 UTC (permalink / raw
  To: Finn Arne Gangstad; +Cc: Jonathan Nieder, Junio C Hamano, git

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2932 bytes --]

On Wed, 9 Jun 2010, Finn Arne Gangstad wrote:

> On Tue, Jun 08, 2010 at 05:52:37PM +0200, Henrik Grubbström wrote:
>
>> [...]
>> True, there's no problem before the attribute change, but there is for
>> commits between the attribute change and when the repository got
>> normalized (which can be a while with the current git).
>
> As you say, the current git makes it easy to commit something where
> the attributes and the contents do not match. I think this needs to be
> fixed, and that your proposed patch in relaxed mode makes the problem
> _worse_, since it will then take even longer before these commits are
> fixed. But see below.
>
>>
>> Re: configuration option naming:
>>
>>   I've settled for core.normalizationPolicy, with the values
>>   'strict' (default) for the behaviour requested by you and Jonathan,
>>   and 'relaxed' for my initial behaviour.
>
> The name might be a bit vague, maybe there are other things that could
> be normalized? Maybe adding the word "index" is an improvement -
> e.g. core.indexNormalizationPolicy or just core.indexNormalization.

Hmm... Maybe.

>> Teaser:
[...]
>
> THIS behaviour is what I find scary. In this case, "ident" is clearly
> a newly added attribute, and git should not hide this from you. If you
> add a mode where git will hide this permanently, chances are the
> repositories will never be fixed.

True.

> The ident attribute may be a bit special since in your case it is only
> supposed to change if some other contents in the file change as well,
> but please also think how this will work with the text/eol
> attributes. Setting the text attribute and then having to CHANGE a
> file before getting it normalized is not good.
>
> Still, I think your original problem description of cloning something
> and ending up with a dirty tree is indeed an annoying problem.  So
> what about having the relaxed mode behave as follows:
>
> If both of these are true:
> - the current attributes for a file are the same as it is registered as
>   in the index with your new patch
> - a checkout of the file would result in identical contents to what is
>   currently in the working directory
> Then behave as if the file is not modified.
>
> Or, in other words: If attributes are unchanged, a file is unmodified
> not only if it would result in the same contents after being added,
> but also if it would result in the current working directory
> contents after being checked out again.

Ok, so the expanded-keywords file in the example should show up as 
modified in relaxed mode as well, but be cleaned if the modified 
attributes file is added to the index? Or only after being committed?

> This should work for both text and ident on clone at least.

I'll see what I can do...

Currently, I'm trying to understand the semantic difference between 
ce_modified() and ce_match_stat().

--
Henrik Grubbström					grubba@grubba.org
Roxen Internet Software AB				grubba@roxen.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4 0/5] Patches to avoid reporting conversion changes.
  2010-06-09 18:04                       ` Henrik Grubbström
@ 2010-06-10 19:55                         ` Finn Arne Gangstad
  0 siblings, 0 replies; 19+ messages in thread
From: Finn Arne Gangstad @ 2010-06-10 19:55 UTC (permalink / raw
  To: Henrik Grubbström; +Cc: Jonathan Nieder, Junio C Hamano, git

On Wed, Jun 09, 2010 at 08:04:34PM +0200, Henrik Grubbström wrote:

> Ok, so the expanded-keywords file in the example should show up as  
> modified in relaxed mode as well, but be cleaned if the modified  
> attributes file is added to the index? Or only after being committed?

Not before being committed, since you would otherwise have to add all
other files before adding .gitattributes, but I am not sure even that
is sufficient reason to claim the files are unmodified (see case 3 below).

I think we agree on the following:

If there is a discrepancy between .gitattributes and the contents in
the repository, the following should be true:

git checkout -f (or git reset --hard)
git status -> ALWAYS report modified files in strict mode
sleep 1
touch *
git status -> NEVER report modified files in relaxed mode


The case I think you are asking about above is the following in
"relaxed" mode:

echo "something that causes a discrepancy" >> .gitattributes
git status -> MODIFIED (1)
git add .gitattributes
git status -> MODIFIED (2)
git commit -m "bad commit"
git status -> ???????  (3)    <<-- Do you want this to be CLEAN?
git reset --hard (or git checkout -f)
sleep 1
touch *
git status -> CLEAN    (4)

1 and 4 should be uncontroversial and 2 I think is necessary because
you should be able to git add in several steps. Whether 3 should be
clean or modified I'm not so sure about, I think that it would make it
more likely to get the repo normalized properly if it was still seen
as modified there.

- Finn Arne

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2010-06-10 19:56 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-01 14:41 [PATCH v4 0/5] Patches to avoid reporting conversion changes Henrik Grubbström (Grubba)
2010-06-01 14:41 ` [PATCH v4 1/5] sha1_file: Add index_blob() Henrik Grubbström (Grubba)
2010-06-01 14:41 ` [PATCH v4 2/5] strbuf: Add strbuf_add_uint32() Henrik Grubbström (Grubba)
2010-06-01 14:41 ` [PATCH v4 3/5] cache: Keep track of conversion mode changes Henrik Grubbström (Grubba)
2010-06-01 14:41 ` [PATCH v4 4/5] cache: Add index extension "CONV" Henrik Grubbström (Grubba)
2010-06-01 14:41 ` [PATCH v4 5/5] t/t0021: Test that conversion changes are detected Henrik Grubbström (Grubba)
2010-06-02  4:40 ` [PATCH v4 0/5] Patches to avoid reporting conversion changes Junio C Hamano
2010-06-03 16:00   ` Henrik Grubbström
2010-06-04  0:56     ` Jonathan Nieder
2010-06-04 11:59       ` Henrik Grubbström
2010-06-04 19:42         ` Jonathan Nieder
2010-06-06 10:50           ` Henrik Grubbström
2010-06-07  8:59             ` Finn Arne Gangstad
2010-06-07 16:37               ` Henrik Grubbström
2010-06-07 19:50                 ` Finn Arne Gangstad
2010-06-08 15:52                   ` Henrik Grubbström
2010-06-09 14:03                     ` Finn Arne Gangstad
2010-06-09 18:04                       ` Henrik Grubbström
2010-06-10 19:55                         ` Finn Arne Gangstad

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).