git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH/RFC] add: support saving the last <n> versions of the index
@ 2013-08-04  6:28 Nguyễn Thái Ngọc Duy
  2013-08-08 12:51 ` [PATCH/RFC v2] read-cache: save index entry updates in ILOG index extension Nguyễn Thái Ngọc Duy
  0 siblings, 1 reply; 4+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-08-04  6:28 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

When you do "git add foo", change "foo" and "git add foo" again, the
previous foo version is still in the repository, but its SHA-1 may be
lost. "git fsck --lost-found" may help, but it may take more time to
find out which blob is the old "foo".

This patch adds support for saving old index files, so that people
could look back more easily. In the case of "old foo", the user could
use ls-files with the previous of the index to determine SHA-1 of "old
foo".

When core.indexlogsize is defined to <n>, every time the index
changes(**), the old version is stored as $GIT_DIR/index-<i>, where
<i> is from 0..<n>-1. <i> increased after every change and wrapped
around at <n>-1. The current <i> is stored in $GIT_DIR/index as new
ILOG extension. ILOG extension also contains HEAD's tree SHA-1 and a
line describing the change.

(**) The index can change in many ways, but only changes by "git add"
(and maybe "git update-index" as well) are backed up. Non-SHA1 updates
are not worth watching. SHA-1 changes that end up in a commit
are tracked by reflog already. "git mv" and "git rm" may be worth
watching too.
---
 http://thread.gmane.org/gmane.comp.version-control.git/231594 reminds
 me that I have had something like this for a long time but never
 finished it. So here we go again.

 I think the basic idea is ok (new index versions like v5 may have
 problems though). The UI is harder. We need something like "git
 reflog" and perhaps something like @{N} syntax to access old indexes.
 Being able to diff between two indexes are even better but that might
 be a lot of work, diff-ing "git ls-files --stage" output should be
 ok. Hmm?

 builtin/add.c |  9 ++++++++-
 cache.h       |  5 +++++
 config.c      |  5 +++++
 environment.c |  1 +
 lockfile.c    | 12 +++++++++++-
 read-cache.c  | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 6 files changed, 85 insertions(+), 3 deletions(-)

diff --git a/builtin/add.c b/builtin/add.c
index 8266a9c..90e580c 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -15,6 +15,7 @@
 #include "diffcore.h"
 #include "revision.h"
 #include "bulk-checkin.h"
+#include "quote.h"
 
 static const char * const builtin_add_usage[] = {
 	N_("git add [options] [--] <pathspec>..."),
@@ -454,9 +455,15 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 	char *seen = NULL;
 	int implicit_dot = 0;
 	struct update_callback_data update_data;
+	struct strbuf log = STRBUF_INIT;
 
 	git_config(add_config, NULL);
 
+	if (index_log_size) {
+		strbuf_addf(&log, "[%s] add ", prefix ? prefix : "");
+		sq_quote_argv(&log, argv, 0);
+	}
+
 	argc = parse_options(argc, argv, prefix, builtin_add_options,
 			  builtin_add_usage, PARSE_OPT_KEEP_ARGV0);
 	if (patch_interactive)
@@ -600,7 +607,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 
  finish:
 	if (active_cache_changed) {
-		if (write_cache(newfd, active_cache, active_nr) ||
+		if (write_index_with_log(&the_index, newfd, log.buf) ||
 		    commit_locked_index(&lock_file))
 			die(_("Unable to write new index file"));
 	}
diff --git a/cache.h b/cache.h
index 85b544f..7655c4b 100644
--- a/cache.h
+++ b/cache.h
@@ -274,9 +274,12 @@ struct index_state {
 	struct cache_tree *cache_tree;
 	struct cache_time timestamp;
 	unsigned name_hash_initialized : 1,
+		 log_updated : 1,
 		 initialized : 1;
 	struct hash_table name_hash;
 	struct hash_table dir_hash;
+	unsigned int log_index;
+	struct strbuf *log;
 };
 
 extern struct index_state the_index;
@@ -454,6 +457,7 @@ extern int read_index_from(struct index_state *, const char *path);
 extern int is_index_unborn(struct index_state *);
 extern int read_index_unmerged(struct index_state *);
 extern int write_index(struct index_state *, int newfd);
+extern int write_index_with_log(struct index_state *, int newfd, const char *log);
 extern int discard_index(struct index_state *);
 extern int unmerged_index(const struct index_state *);
 extern int verify_path(const char *path);
@@ -595,6 +599,7 @@ extern int fsync_object_files;
 extern int core_preload_index;
 extern int core_apply_sparse_checkout;
 extern int precomposed_unicode;
+extern unsigned long index_log_size;
 
 /*
  * The character that begins a commented line in user-editable file
diff --git a/config.c b/config.c
index e13a7b6..f5e40fb 100644
--- a/config.c
+++ b/config.c
@@ -831,6 +831,11 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.indexlogsize")) {
+		index_log_size = git_config_ulong(var, value);
+		return 0;
+	}
+
 	/* Add other config variables here and to Documentation/config.txt. */
 	return 0;
 }
diff --git a/environment.c b/environment.c
index 5398c36..12f01f7 100644
--- a/environment.c
+++ b/environment.c
@@ -62,6 +62,7 @@ int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
 struct startup_info *startup_info;
 unsigned long pack_size_limit_cfg;
+unsigned long index_log_size;
 
 /*
  * The character that begins a commented line in user-editable file
diff --git a/lockfile.c b/lockfile.c
index 8fbcb6a..880a4e4 100644
--- a/lockfile.c
+++ b/lockfile.c
@@ -267,8 +267,18 @@ int commit_locked_index(struct lock_file *lk)
 		lk->filename[0] = 0;
 		return 0;
 	}
-	else
+	else {
+		if (the_index.log_updated) {
+			char src[PATH_MAX], backup[PATH_MAX];
+			int len = strlen(lk->filename) - 5; /* .lock */
+			memcpy(src, lk->filename, len);
+			src[len] = 0;
+			memcpy(backup, lk->filename, len);
+			sprintf(backup+len, "-%u", the_index.log_index);
+			rename(src, backup); /* does it matter if it fails? */
+		}
 		return commit_lock_file(lk);
+	}
 }
 
 void rollback_lock_file(struct lock_file *lk)
diff --git a/read-cache.c b/read-cache.c
index c3d5e35..9d8300a 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -33,6 +33,7 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int reall
 #define CACHE_EXT(s) ( (s[0]<<24)|(s[1]<<16)|(s[2]<<8)|(s[3]) )
 #define CACHE_EXT_TREE 0x54524545	/* "TREE" */
 #define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */
+#define CACHE_EXT_INDEX_LOG 0x494C4F47	  /* "ILOG" */
 
 struct index_state the_index;
 
@@ -1297,6 +1298,18 @@ static int read_index_extension(struct index_state *istate,
 	case CACHE_EXT_RESOLVE_UNDO:
 		istate->resolve_undo = resolve_undo_read(data, sz);
 		break;
+	case CACHE_EXT_INDEX_LOG:
+		if (!istate->log) {
+			istate->log = xmalloc(sizeof(struct strbuf *));
+			strbuf_init(istate->log, 128);
+		}
+		strbuf_add(istate->log, data, sz);
+		if (sscanf(istate->log->buf, "%u", &istate->log_index) != 1) {
+			strbuf_release(istate->log);
+			free(istate->log);
+			istate->log = NULL;
+		}
+		break;
 	default:
 		if (*ext < 'A' || 'Z' < *ext)
 			return error("index uses %.4s extension, which we do not understand",
@@ -1538,6 +1551,15 @@ int discard_index(struct index_state *istate)
 	free(istate->cache);
 	istate->cache = NULL;
 	istate->cache_alloc = 0;
+	istate->log_updated = 0;
+	istate->log_index = 0;
+	if (istate->log) {
+		strbuf_release(istate->log);
+		free(istate->log);
+		istate->log = NULL;
+	}
+
+	/* no need to throw away allocated active_cache */
 	return 0;
 }
 
@@ -1771,7 +1793,8 @@ void update_index_if_able(struct index_state *istate, struct lock_file *lockfile
 		rollback_lock_file(lockfile);
 }
 
-int write_index(struct index_state *istate, int newfd)
+int write_index_with_log(struct index_state *istate, int newfd,
+			 const char *log)
 {
 	git_SHA_CTX c;
 	struct cache_header hdr;
@@ -1847,6 +1870,32 @@ int write_index(struct index_state *istate, int newfd)
 			return -1;
 	}
 
+	if (index_log_size) {
+		unsigned char sha1[20];
+		if (log) {
+			istate->log_index = (istate->log_index + 1) % index_log_size;
+			istate->log_updated = 1;
+		} else
+			log = "";
+		if (get_sha1("HEAD^{tree}", sha1))
+			hashclr(sha1);
+		if (!istate->log) {
+			istate->log = xmalloc(sizeof(struct strbuf *));
+			strbuf_init(istate->log, 128);
+		} else
+			strbuf_reset(istate->log);
+		strbuf_addf(istate->log, "%u %s %s %s",
+			    istate->log_index, sha1_to_hex(sha1),
+			    git_committer_info(0), log);
+
+		err = write_index_ext_header(&c, newfd, CACHE_EXT_INDEX_LOG,
+					     istate->log->len) < 0
+			|| ce_write(&c, newfd, istate->log->buf,
+				    istate->log->len) < 0;
+		if (err)
+			return -1;
+	}
+
 	if (ce_flush(&c, newfd) || fstat(newfd, &st))
 		return -1;
 	istate->timestamp.sec = (unsigned int)st.st_mtime;
@@ -1854,6 +1903,11 @@ int write_index(struct index_state *istate, int newfd)
 	return 0;
 }
 
+int write_index(struct index_state *istate, int newfd)
+{
+	return write_index_with_log(istate, newfd, NULL);
+}
+
 /*
  * Read the index file that is potentially unmerged into given
  * index_state, dropping any unmerged entries.  Returns true if
-- 
1.8.2.83.gc99314b

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH/RFC v2] read-cache: save index entry updates in ILOG index extension
  2013-08-04  6:28 [PATCH/RFC] add: support saving the last <n> versions of the index Nguyễn Thái Ngọc Duy
@ 2013-08-08 12:51 ` Nguyễn Thái Ngọc Duy
  2013-08-08 18:46   ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2013-08-08 12:51 UTC (permalink / raw)
  To: git; +Cc: Nguyễn Thái Ngọc Duy

If you have something different from both worktree and HEAD in index,
then accidentally do "git add foo", you may find it hard to recover
the previous version of foo in index. This is especially true when you
do "git add -p" with manual patch editing.

This patch makes sure that every operation that modifies the index
from worktree or stdin is recorded as list of (path, SHA-1) in index
with command+arguments of the operation.

When you make such a mistake, you can look at ILOG extension with
(unimplemented) "git ls-files --generation=X [ -- <paths>]" where X is
from 1 (the most recent operation) to N (the least recent) . "X" could
even be "all" to list all generations.

SHA-1 syntax is also going to be extended to support :-N:path syntax
to get an entry from generation N, for convenience.

Old operation's updates are removed as new ones are added to keep the
size under 1 MB. ILOG keeps minimum 10 operations regardless of its
size. These contansts should be configurable later one. ILOG content
will be compressed later on so that it leaves minimum
footprint. Because it's only needed at index writing time, read-only
operations won't pay the cost for decompressing and compressing ILOG.

ILOG may also be used in a different way to implement "git add
--undo". Before updating entries, git-add could save _old_ entries to
ILOG (and mark to-be-added entries as "deleted/something" for
example). It then can use this information to revert the
operation. Similar candidates are "git commit -a" (destroying current
index) or "git merge" and of course "git mv" and "git rm"

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/add.c          |   1 +
 builtin/apply.c        |   1 +
 builtin/update-index.c |   1 +
 cache.h                |   3 ++
 read-cache.c           | 121 +++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 127 insertions(+)

diff --git a/builtin/add.c b/builtin/add.c
index 8266a9c..d000f8a 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -456,6 +456,7 @@ int cmd_add(int argc, const char **argv, const char *prefix)
 	struct update_callback_data update_data;
 
 	git_config(add_config, NULL);
+	log_index_changes(prefix, argv);
 
 	argc = parse_options(argc, argv, prefix, builtin_add_options,
 			  builtin_add_usage, PARSE_OPT_KEEP_ARGV0);
diff --git a/builtin/apply.c b/builtin/apply.c
index 50912c9..fc43ea8 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -4423,6 +4423,7 @@ int cmd_apply(int argc, const char **argv, const char *prefix_)
 	prefix = prefix_;
 	prefix_length = prefix ? strlen(prefix) : 0;
 	git_config(git_apply_config, NULL);
+	log_index_changes(prefix, argv);
 	if (apply_default_whitespace)
 		parse_whitespace_option(apply_default_whitespace);
 	if (apply_default_ignorewhitespace)
diff --git a/builtin/update-index.c b/builtin/update-index.c
index c317981..aa757cb 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -799,6 +799,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 		usage_with_options(update_index_usage, options);
 
 	git_config(git_default_config, NULL);
+	log_index_changes(prefix, argv);
 
 	/* We can't free this memory, it becomes part of a linked list parsed atexit() */
 	lock_file = xcalloc(1, sizeof(struct lock_file));
diff --git a/cache.h b/cache.h
index 85b544f..a2156bf 100644
--- a/cache.h
+++ b/cache.h
@@ -168,6 +168,7 @@ struct cache_entry {
 
 /* used to temporarily mark paths matched by pathspecs */
 #define CE_MATCHED           (1 << 26)
+#define CE_BASE              (1 << 27)
 
 /*
  * Extended on-disk flags
@@ -277,6 +278,7 @@ struct index_state {
 		 initialized : 1;
 	struct hash_table name_hash;
 	struct hash_table dir_hash;
+	struct strbuf *index_log;
 };
 
 extern struct index_state the_index;
@@ -481,6 +483,7 @@ extern struct cache_entry *make_cache_entry(unsigned int mode, const unsigned ch
 extern int ce_same_name(const struct cache_entry *a, const struct cache_entry *b);
 extern int index_name_is_other(const struct index_state *, const char *, int);
 extern void *read_blob_data_from_index(struct index_state *, const char *, unsigned long *);
+extern void log_index_changes(const char *prefix, const char **argv);
 
 /* do stat comparison even if CE_VALID is true */
 #define CE_MATCH_IGNORE_VALID		01
diff --git a/read-cache.c b/read-cache.c
index c3d5e35..4021667 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -14,6 +14,7 @@
 #include "resolve-undo.h"
 #include "strbuf.h"
 #include "varint.h"
+#include "quote.h"
 
 static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int really);
 
@@ -33,8 +34,10 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int reall
 #define CACHE_EXT(s) ( (s[0]<<24)|(s[1]<<16)|(s[2]<<8)|(s[3]) )
 #define CACHE_EXT_TREE 0x54524545	/* "TREE" */
 #define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */
+#define CACHE_EXT_INDEX_LOG 0x494C4F47 /* "ILOG" */
 
 struct index_state the_index;
+static struct strbuf log_message = STRBUF_INIT;
 
 static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
 {
@@ -1297,6 +1300,14 @@ static int read_index_extension(struct index_state *istate,
 	case CACHE_EXT_RESOLVE_UNDO:
 		istate->resolve_undo = resolve_undo_read(data, sz);
 		break;
+	case CACHE_EXT_INDEX_LOG:
+		if (!istate->index_log) {
+			istate->index_log = xmalloc(sizeof(*istate->index_log));
+			strbuf_init(istate->index_log, sz);
+		}
+		strbuf_reset(istate->index_log);
+		strbuf_add(istate->index_log, data, sz);
+		break;
 	default:
 		if (*ext < 'A' || 'Z' < *ext)
 			return error("index uses %.4s extension, which we do not understand",
@@ -1509,6 +1520,14 @@ int read_index_from(struct index_state *istate, const char *path)
 		src_offset += extsize;
 	}
 	munmap(mmap, mmap_size);
+	if (istate == &the_index) {
+		for (i = 0; i < istate->cache_nr; i++) {
+			struct cache_entry *ce = istate->cache[i];
+			if (ce_stage(ce))
+				continue;
+			ce->ce_flags |= CE_BASE;
+		}
+	}
 	return istate->cache_nr;
 
 unmap:
@@ -1538,6 +1557,11 @@ int discard_index(struct index_state *istate)
 	free(istate->cache);
 	istate->cache = NULL;
 	istate->cache_alloc = 0;
+	if (istate->index_log) {
+		strbuf_release(istate->index_log);
+		free(istate->index_log);
+		istate->index_log = NULL;
+	}
 	return 0;
 }
 
@@ -1771,6 +1795,81 @@ void update_index_if_able(struct index_state *istate, struct lock_file *lockfile
 		rollback_lock_file(lockfile);
 }
 
+void log_index_changes(const char *prefix, const char **argv)
+{
+	if (prefix || argv) {
+		if (prefix)
+			strbuf_addf(&log_message, "[%s]", prefix);
+		sq_quote_argv(&log_message, argv, 0);
+	} else
+		strbuf_setlen(&log_message, 0);
+}
+
+static void get_updated_entries(struct index_state *istate,
+				struct cache_entry ***cache_out,
+				unsigned int *cache_nr_out)
+{
+	struct cache_entry **cache;
+	unsigned int i, nr, cache_nr = 0;
+
+	*cache_nr_out = 0;
+	*cache_out = NULL;
+	for (i = 0; i < istate->cache_nr; i++) {
+		if (istate->cache[i]->ce_flags & CE_BASE)
+			continue;
+		cache_nr++;
+	}
+	if (!cache_nr)
+		return;
+
+	cache = xmalloc(cache_nr * sizeof(*istate->cache));
+	for (i = nr = 0; i < istate->cache_nr; i++) {
+		struct cache_entry *ce = istate->cache[i];
+		if (ce->ce_flags & CE_BASE)
+			continue;
+		cache[nr++] = ce;
+	}
+	*cache_out = cache;
+	*cache_nr_out = cache_nr;
+}
+
+static void write_index_log(struct strbuf *sb,
+			    const struct strbuf *old_log,
+			    const struct strbuf *msg,
+			    struct cache_entry **cache,
+			    unsigned int cache_nr)
+{
+	struct strbuf body = STRBUF_INIT;
+	unsigned int i, size, nr, body_len, hdr_len;
+	const char *end, *p;
+	strbuf_addf(&body, "%s%c", msg->buf, '\0');
+	for (i = 0; i < cache_nr; i++)
+		strbuf_addf(&body, "%s %s%c", sha1_to_hex(cache[i]->sha1),
+			    cache[i]->name, '\0');
+	strbuf_addf(sb, "%u %u%c", (unsigned int)cache_nr, (unsigned int)body.len, '\0');
+	strbuf_addbuf(sb, &body);
+	strbuf_release(&body);
+
+	if (!old_log)
+		return;
+
+	size = sb->len;
+	nr = cache_nr;
+	end = old_log->buf + old_log->len;
+	p = old_log->buf;
+	while (p < end && (size < 1024 * 1024 || nr < 10)) {
+		if (sscanf(p, "%u %u", &cache_nr, &body_len) != 2) {
+			error("fail to parse old index log at %u", (unsigned int)(p - old_log->buf));
+			break;
+		}
+		hdr_len = strlen(p) + 1;
+		strbuf_add(sb, p, hdr_len + body_len);
+		size += body_len;
+		nr += cache_nr;
+		p += hdr_len + body_len;
+	}
+}
+
 int write_index(struct index_state *istate, int newfd)
 {
 	git_SHA_CTX c;
@@ -1780,6 +1879,11 @@ int write_index(struct index_state *istate, int newfd)
 	int entries = istate->cache_nr;
 	struct stat st;
 	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
+	unsigned int index_log_nr = 0;
+	struct cache_entry **index_log_entries = NULL;
+
+	if (istate == &the_index && log_message.len)
+		get_updated_entries(istate, &index_log_entries, &index_log_nr);
 
 	for (i = removed = extended = 0; i < entries; i++) {
 		if (cache[i]->ce_flags & CE_REMOVE)
@@ -1846,6 +1950,23 @@ int write_index(struct index_state *istate, int newfd)
 		if (err)
 			return -1;
 	}
+	if (index_log_entries && log_message.len) {
+		struct strbuf sb = STRBUF_INIT;
+		write_index_log(&sb, istate->index_log, &log_message,
+				index_log_entries, index_log_nr);
+		err = write_index_ext_header(&c, newfd, CACHE_EXT_INDEX_LOG,
+					     sb.len) < 0
+			|| ce_write(&c, newfd, sb.buf,
+				    sb.len) < 0;
+		if (istate->index_log)
+			strbuf_release(istate->index_log);
+		else
+			istate->index_log = xmalloc(sizeof(*istate->index_log));
+		*istate->index_log = sb;
+		if (err)
+			return -1;
+	}
+	free(index_log_entries);
 
 	if (ce_flush(&c, newfd) || fstat(newfd, &st))
 		return -1;
-- 
1.8.2.83.gc99314b

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH/RFC v2] read-cache: save index entry updates in ILOG index extension
  2013-08-08 12:51 ` [PATCH/RFC v2] read-cache: save index entry updates in ILOG index extension Nguyễn Thái Ngọc Duy
@ 2013-08-08 18:46   ` Junio C Hamano
  2013-08-09 13:32     ` Duy Nguyen
  0 siblings, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2013-08-08 18:46 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git

Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:

> Old operation's updates are removed as new ones are added to keep the
> size under 1 MB. ILOG keeps minimum 10 operations regardless of its
> size. These contansts should be configurable later one. ILOG content
> will be compressed later on so that it leaves minimum
> footprint.

List of <sha-1, pathname> tuples would not compress well, I suspect.

> Because it's only needed at index writing time, read-only
> operations won't pay the cost for decompressing and compressing ILOG.

Another idea is to lazily read existing ILOG by (1) keeping just an
offset in the originating index file at read_index() time and (2)
reading it on demand when ":-4:Makefile" was asked for.

> diff --git a/cache.h b/cache.h
> index 85b544f..a2156bf 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -168,6 +168,7 @@ struct cache_entry {
>  
>  /* used to temporarily mark paths matched by pathspecs */
>  #define CE_MATCHED           (1 << 26)
> +#define CE_BASE              (1 << 27)

As this is not about pathspec match, please have its own comment
line (or a blank line, if this goes without comment) above this new
line.

> @@ -277,6 +278,7 @@ struct index_state {
>  		 initialized : 1;
>  	struct hash_table name_hash;
>  	struct hash_table dir_hash;
> +	struct strbuf *index_log;
>  };

Sane to have this as a per-index_state variable.

> +extern void log_index_changes(const char *prefix, const char **argv);

Not sane to name this function _index_anything and not to have index_state
as its first parameter, breaking the naming convention.

> diff --git a/read-cache.c b/read-cache.c
> index c3d5e35..4021667 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -14,6 +14,7 @@
>  #include "resolve-undo.h"
>  #include "strbuf.h"
>  #include "varint.h"
> +#include "quote.h"
>  
>  static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int really);
>  
> @@ -33,8 +34,10 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int reall
>  #define CACHE_EXT(s) ( (s[0]<<24)|(s[1]<<16)|(s[2]<<8)|(s[3]) )
>  #define CACHE_EXT_TREE 0x54524545	/* "TREE" */
>  #define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */
> +#define CACHE_EXT_INDEX_LOG 0x494C4F47 /* "ILOG" */
>  
>  struct index_state the_index;
> +static struct strbuf log_message = STRBUF_INIT;

Not sane to have a single global here.  Give the callers a helper
function to prepare a ilog message into a strbuf they prepare before
they muck with argc/argv with parse_options(), and have them pass
that strbuf to

	log_index_changes(struct index_state *, struct strbuf *);

and supply the usual

	#define log_cache_changes(logmsg) log_index_changes(&the_index, (logmsg))

macro, inside "#ifndef NO_THE_INDEX_COMPATIBILITY_MACROS" section.

> @@ -1509,6 +1520,14 @@ int read_index_from(struct index_state *istate, const char *path)
>  		src_offset += extsize;
>  	}
>  	munmap(mmap, mmap_size);
> +	if (istate == &the_index) {

Yuck.  Why can't the callers operate on their own copies of the
in-core index and get the same benefit?

> +		for (i = 0; i < istate->cache_nr; i++) {
> +			struct cache_entry *ce = istate->cache[i];
> +			if (ce_stage(ce))
> +				continue;
> +			ce->ce_flags |= CE_BASE;
> +		}
> +	}
>  	return istate->cache_nr;

> +static void get_updated_entries(struct index_state *istate,
> +				struct cache_entry ***cache_out,
> +				unsigned int *cache_nr_out)
> +{
> +	struct cache_entry **cache;
> +	unsigned int i, nr, cache_nr = 0;
> +
> +	*cache_nr_out = 0;
> +	*cache_out = NULL;
> +	for (i = 0; i < istate->cache_nr; i++) {
> +		if (istate->cache[i]->ce_flags & CE_BASE)
> +			continue;
> +		cache_nr++;
> +	}
> +	if (!cache_nr)
> +		return;
> +	cache = xmalloc(cache_nr * sizeof(*istate->cache));
> +	for (i = nr = 0; i < istate->cache_nr; i++) {
> +		struct cache_entry *ce = istate->cache[i];
> +		if (ce->ce_flags & CE_BASE)
> +			continue;
> +		cache[nr++] = ce;
> +	}
> +	*cache_out = cache;
> +	*cache_nr_out = cache_nr;
> +}

I can read what the function does, but I do not understand the
assumption this code makes.

Is this assuming that any newly created cache_entry that goes to
the_index via add_index_entry() will not have CE_BASE bit set?  Some
codepaths try to preserve the flags bit they do not care and/or
understand (e.g. rename_index_entry_at() creates a new ce with a new
name, and keeps most of the flags bit except for the name-hash state
bits), so CE_BASE will be propagated from the original to the new
one, I think.

You seem to be recording the state _after_ the change---that can be
read without the extension, can't it?  I thought we care more about
the state that was _lost_ by the change.

Recording the state after the change misses deleted entries, doesn't
it?

> +static void write_index_log(struct strbuf *sb,
> +			    const struct strbuf *old_log,
> +			    const struct strbuf *msg,
> +			    struct cache_entry **cache,
> +			    unsigned int cache_nr)
> +{
> +	struct strbuf body = STRBUF_INIT;
> +	unsigned int i, size, nr, body_len, hdr_len;
> +	const char *end, *p;
> +	strbuf_addf(&body, "%s%c", msg->buf, '\0');
> +	for (i = 0; i < cache_nr; i++)
> +		strbuf_addf(&body, "%s %s%c", sha1_to_hex(cache[i]->sha1),
> +			    cache[i]->name, '\0');

We do not care about file modes (e.g. "update-index --chmod")?

> +	strbuf_addf(sb, "%u %u%c", (unsigned int)cache_nr, (unsigned int)body.len, '\0');

OK, so the header records how many entries there are and how big the
record is, followed by a list of tuples that describe what got
changed.

> +	strbuf_addbuf(sb, &body);
> +	strbuf_release(&body);
> +
> +	if (!old_log)
> +		return;
> +
> +	size = sb->len;
> +	nr = cache_nr;
> +	end = old_log->buf + old_log->len;
> +	p = old_log->buf;
> +	while (p < end && (size < 1024 * 1024 || nr < 10)) {
> +		if (sscanf(p, "%u %u", &cache_nr, &body_len) != 2) {
> +			error("fail to parse old index log at %u", (unsigned int)(p - old_log->buf));
> +			break;
> +		}

And that header is used to determine how much to bulk copy from the
stack of old records.

>  int write_index(struct index_state *istate, int newfd)
>  {
>  	git_SHA_CTX c;
> @@ -1780,6 +1879,11 @@ int write_index(struct index_state *istate, int newfd)
>  	int entries = istate->cache_nr;
>  	struct stat st;
>  	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
> +	unsigned int index_log_nr = 0;
> +	struct cache_entry **index_log_entries = NULL;
> +
> +	if (istate == &the_index && log_message.len)
> +		get_updated_entries(istate, &index_log_entries, &index_log_nr);

Yuck about "istate == &the_index" again.

>  	for (i = removed = extended = 0; i < entries; i++) {
>  		if (cache[i]->ce_flags & CE_REMOVE)
> @@ -1846,6 +1950,23 @@ int write_index(struct index_state *istate, int newfd)
>  		if (err)
>  			return -1;
>  	}
> +	if (index_log_entries && log_message.len) {
> +		struct strbuf sb = STRBUF_INIT;
> +		write_index_log(&sb, istate->index_log, &log_message,
> +				index_log_entries, index_log_nr);
> +		err = write_index_ext_header(&c, newfd, CACHE_EXT_INDEX_LOG,
> +					     sb.len) < 0
> +			|| ce_write(&c, newfd, sb.buf,
> +				    sb.len) < 0;
> +		if (istate->index_log)
> +			strbuf_release(istate->index_log);
> +		else
> +			istate->index_log = xmalloc(sizeof(*istate->index_log));
> +		*istate->index_log = sb;
> +		if (err)
> +			return -1;
> +	}
> +	free(index_log_entries);
>  
>  	if (ce_flush(&c, newfd) || fstat(newfd, &st))
>  		return -1;

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH/RFC v2] read-cache: save index entry updates in ILOG index extension
  2013-08-08 18:46   ` Junio C Hamano
@ 2013-08-09 13:32     ` Duy Nguyen
  0 siblings, 0 replies; 4+ messages in thread
From: Duy Nguyen @ 2013-08-09 13:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

On Fri, Aug 9, 2013 at 1:46 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Nguyễn Thái Ngọc Duy  <pclouds@gmail.com> writes:
>
>> Old operation's updates are removed as new ones are added to keep the
>> size under 1 MB. ILOG keeps minimum 10 operations regardless of its
>> size. These contansts should be configurable later one. ILOG content
>> will be compressed later on so that it leaves minimum
>> footprint.
>
> List of <sha-1, pathname> tuples would not compress well, I suspect.

I was hoping that it still compresses well the discrete segments of
pathname. In the worst case we can group sha-1 together, separate from
pathnames.

>> Because it's only needed at index writing time, read-only
>> operations won't pay the cost for decompressing and compressing ILOG.
>
> Another idea is to lazily read existing ILOG by (1) keeping just an
> offset in the originating index file at read_index() time and (2)
> reading it on demand when ":-4:Makefile" was asked for.

We need to go through ILOG extension anyway for index sha-1
verification, so it's already read in kernel buffer. What we save is a
just malloc. But I'll try.

>
>> diff --git a/cache.h b/cache.h
>> index 85b544f..a2156bf 100644
>> --- a/cache.h
>> +++ b/cache.h
>> @@ -168,6 +168,7 @@ struct cache_entry {
>>
>>  /* used to temporarily mark paths matched by pathspecs */
>>  #define CE_MATCHED           (1 << 26)
>> +#define CE_BASE              (1 << 27)
>
> As this is not about pathspec match, please have its own comment
> line (or a blank line, if this goes without comment) above this new
> line.

This patch is more about the idea, whether it makes sense. You would
(and did) find the patch somewhat disgusting later on.

>> @@ -277,6 +278,7 @@ struct index_state {
>>                initialized : 1;
>>       struct hash_table name_hash;
>>       struct hash_table dir_hash;
>> +     struct strbuf *index_log;
>>  };
>
> Sane to have this as a per-index_state variable.
>
>> +extern void log_index_changes(const char *prefix, const char **argv);
>
> Not sane to name this function _index_anything and not to have index_state
> as its first parameter, breaking the naming convention.

The reason I can't put index_state there is because this function is
called early, often before read_cache is is called. And I can't caller
it later because argv would be ruined by parse_options(). An option is
to convert argv to a string unconditionally in git.c, then
log_index_changes can be called much later, and with index_state
pointer.

>> +static void get_updated_entries(struct index_state *istate,
>> +                             struct cache_entry ***cache_out,
>> +                             unsigned int *cache_nr_out)
>> +{
>> +     struct cache_entry **cache;
>> +     unsigned int i, nr, cache_nr = 0;
>> +
>> +     *cache_nr_out = 0;
>> +     *cache_out = NULL;
>> +     for (i = 0; i < istate->cache_nr; i++) {
>> +             if (istate->cache[i]->ce_flags & CE_BASE)
>> +                     continue;
>> +             cache_nr++;
>> +     }
>> +     if (!cache_nr)
>> +             return;
>> +     cache = xmalloc(cache_nr * sizeof(*istate->cache));
>> +     for (i = nr = 0; i < istate->cache_nr; i++) {
>> +             struct cache_entry *ce = istate->cache[i];
>> +             if (ce->ce_flags & CE_BASE)
>> +                     continue;
>> +             cache[nr++] = ce;
>> +     }
>> +     *cache_out = cache;
>> +     *cache_nr_out = cache_nr;
>> +}
>
> I can read what the function does, but I do not understand the
> assumption this code makes.
>
> Is this assuming that any newly created cache_entry that goes to
> the_index via add_index_entry() will not have CE_BASE bit set?  Some
> codepaths try to preserve the flags bit they do not care and/or
> understand (e.g. rename_index_entry_at() creates a new ce with a new
> name, and keeps most of the flags bit except for the name-hash state
> bits), so CE_BASE will be propagated from the original to the new
> one, I think.
>
> You seem to be recording the state _after_ the change---that can be
> read without the extension, can't it?  I thought we care more about
> the state that was _lost_ by the change.
>
> Recording the state after the change misses deleted entries, doesn't
> it?

Right. At the end of the commit message I mentioned about "git add
--undo". After I wrote it, I became more convinced it's the way to go.
That should be the UI, instead of letting the user hunt the right
entry through the index log. And then caller has the responsibility to
track changes and feed it to read-cache (CE_BASE trick is gone). And
it would record something like raw diff: a pair of old/new sha-1 and a
path name. This helps differentiate modified, deleted and added
entries that "git add --undo" may need to undo.

>> +static void write_index_log(struct strbuf *sb,
>> +                         const struct strbuf *old_log,
>> +                         const struct strbuf *msg,
>> +                         struct cache_entry **cache,
>> +                         unsigned int cache_nr)
>> +{
>> +     struct strbuf body = STRBUF_INIT;
>> +     unsigned int i, size, nr, body_len, hdr_len;
>> +     const char *end, *p;
>> +     strbuf_addf(&body, "%s%c", msg->buf, '\0');
>> +     for (i = 0; i < cache_nr; i++)
>> +             strbuf_addf(&body, "%s %s%c", sha1_to_hex(cache[i]->sha1),
>> +                         cache[i]->name, '\0');
>
> We do not care about file modes (e.g. "update-index --chmod")?

Not as valuable in my opinion. But if I implement "git add --undo", I
probably need to pay attention to file modes or some users may get
upset as it's not a "real undo" otherwise.
-- 
Duy

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-08-09 13:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-04  6:28 [PATCH/RFC] add: support saving the last <n> versions of the index Nguyễn Thái Ngọc Duy
2013-08-08 12:51 ` [PATCH/RFC v2] read-cache: save index entry updates in ILOG index extension Nguyễn Thái Ngọc Duy
2013-08-08 18:46   ` Junio C Hamano
2013-08-09 13:32     ` Duy Nguyen

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).