git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v3 00/24] Index-v5
@ 2013-08-18 19:41 Thomas Gummerer
  2013-08-18 19:41 ` [PATCH v3 01/24] t2104: Don't fail for index versions other than [23] Thomas Gummerer
                   ` (24 more replies)
  0 siblings, 25 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:41 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

Hi,

previous rounds (without api) are at $gmane/202752, $gmane/202923,
$gmane/203088 and $gmane/203517, the previous rounds with api were at
$gmane/229732 and $gmane/230210.  Thanks to Duy for reviewing the the
last round and Junio and Ramsay for additional comments.

Changes since the previous round:

read-cache: move index v2 specific functions to their own file
  - set istate->ops to NULL in discard_index

read-cache: add index reading api
  - style fixes
  - instead of using internal_ops struct, do for_each_index_entry in
    read-cache.c

grep.c: use index api
  - remove duplicate call to match_pathspec_depth

ls-files.c: use index api
  - load the whole index if there is a trai

documentation: add documentation of the index-v5 file format
  - fix typo
  - change the position of nfile and ndir in the index file
  - document that the conflicts are also stored in the fileentries
    block
  - document invalid flag

read-cache: read index-v5
  - restrict partial loading a bit more, by being more careful when
    adjusting the pathspec
  - move the ondisk structs from cache.h to read-cache-v5.c
  - merge for and while loop in read_entries
  - keep a directory tree instead of a flat list when reading the
    directories
  - ce_queue_push moved to read-cache: write index-v5 using a next_ce
    pointer instead of the next pointer that's already used by
    name-hash.
  - fix reading if there are extensions that are not yet supported
  - ignore entries that have the invalid flag set

read-cache: read cache-tree in index-v5
  - use the tree structure which is now used in read index-v5

read-cache: write index-v5
  - simplify compile_directory_data

changes to the index file format:
  - store the number of files before the number of directories in the
    header, so that the file command still can recognize the number of
    files in the repository correctly.
  - store all staged entries in the fileentries block. Doesn't hurt
    the performance a lot but simplifies the code.
  - add an invalid flag for entries that should be ignored.  currently
    unused but respected when reading.  will be used once the conflict
    resolution is done by flipping a bit in the conflict entries at the
    end of the index.
  
added commits:
  - read-cache: use fixed width integer types
  - read-cache: clear version in discard_index()
  - read-cache: Don't compare uid, gid and ino on cygwin
  - introduce GIT_INDEX_VERSION environment variable
  - test-lib: allow setting the index format version

Thomas Gummerer (23):
  t2104: Don't fail for index versions other than [23]
  read-cache: use fixed width integer types
  read-cache: split index file version specific functionality
  read-cache: clear version in discard_index()
  read-cache: move index v2 specific functions to their own file
  read-cache: Don't compare uid, gid and ino on cygwin
  read-cache: Re-read index if index file changed
  add documentation for the index api
  read-cache: add index reading api
  make sure partially read index is not changed
  grep.c: use index api
  ls-files.c: use index api
  documentation: add documentation of the index-v5 file format
  read-cache: make in-memory format aware of stat_crc
  read-cache: read index-v5
  read-cache: read resolve-undo data
  read-cache: read cache-tree in index-v5
  read-cache: write index-v5
  read-cache: write index-v5 cache-tree data
  read-cache: write resolve-undo data for index-v5
  update-index.c: rewrite index when index-version is given
  introduce GIT_INDEX_VERSION environment variable
  test-lib: allow setting the index format version

Thomas Rast (1):
  p0003-index.sh: add perf test for the index formats

 Documentation/technical/api-in-core-index.txt    |   54 +-
 Documentation/technical/index-file-format-v5.txt |  301 +++++
 Makefile                                         |   10 +
 builtin/apply.c                                  |    2 +
 builtin/grep.c                                   |   69 +-
 builtin/ls-files.c                               |   36 +-
 builtin/update-index.c                           |    6 +-
 cache-tree.c                                     |    2 +-
 cache-tree.h                                     |    1 +
 cache.h                                          |   93 +-
 read-cache-v2.c                                  |  550 +++++++++
 read-cache-v5.c                                  | 1417 ++++++++++++++++++++++
 read-cache.c                                     |  685 +++--------
 read-cache.h                                     |   61 +
 t/perf/p0003-index.sh                            |   63 +
 t/t2104-update-index-skip-worktree.sh            |    1 +
 t/test-lib-functions.sh                          |    5 +
 t/test-lib.sh                                    |    3 +
 test-index-version.c                             |    6 +
 unpack-trees.c                                   |    3 +-
 20 files changed, 2786 insertions(+), 582 deletions(-)
 create mode 100644 Documentation/technical/index-file-format-v5.txt
 create mode 100644 read-cache-v2.c
 create mode 100644 read-cache-v5.c
 create mode 100644 read-cache.h
 create mode 100755 t/perf/p0003-index.sh

-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v3 01/24] t2104: Don't fail for index versions other than [23]
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
@ 2013-08-18 19:41 ` Thomas Gummerer
  2013-08-18 19:41 ` [PATCH v3 02/24] read-cache: use fixed width integer types Thomas Gummerer
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:41 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

t2104 currently checks for the exact index version 2 or 3,
depending if there is a skip-worktree flag or not. Other
index versions do not use extended flags and thus cannot
be tested for version changes.

Make this test update the index to version 2 at the beginning
of the test. Testing the skip-worktree flags for the default
index format is still covered by t7011 and t7012.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 t/t2104-update-index-skip-worktree.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/t2104-update-index-skip-worktree.sh b/t/t2104-update-index-skip-worktree.sh
index 1d0879b..bd9644f 100755
--- a/t/t2104-update-index-skip-worktree.sh
+++ b/t/t2104-update-index-skip-worktree.sh
@@ -22,6 +22,7 @@ H sub/2
 EOF
 
 test_expect_success 'setup' '
+	git update-index --index-version=2 &&
 	mkdir sub &&
 	touch ./1 ./2 sub/1 sub/2 &&
 	git add 1 2 sub/1 sub/2 &&
-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 02/24] read-cache: use fixed width integer types
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
  2013-08-18 19:41 ` [PATCH v3 01/24] t2104: Don't fail for index versions other than [23] Thomas Gummerer
@ 2013-08-18 19:41 ` Thomas Gummerer
  2013-08-18 20:21   ` Eric Sunshine
  2013-08-20 19:30   ` Junio C Hamano
  2013-08-18 19:41 ` [PATCH v3 03/24] read-cache: split index file version specific functionality Thomas Gummerer
                   ` (22 subsequent siblings)
  24 siblings, 2 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:41 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

Use the fixed width integer types uint16_t and uint32_t for ondisk
structures, because unsigned short and unsigned int do not hae a
guaranteed size.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 cache.h      | 10 +++++-----
 read-cache.c | 30 +++++++++++++++---------------
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/cache.h b/cache.h
index bd6fb9f..9ef778a 100644
--- a/cache.h
+++ b/cache.h
@@ -101,9 +101,9 @@ unsigned long git_deflate_bound(git_zstream *, unsigned long);
 
 #define CACHE_SIGNATURE 0x44495243	/* "DIRC" */
 struct cache_header {
-	unsigned int hdr_signature;
-	unsigned int hdr_version;
-	unsigned int hdr_entries;
+	uint32_t hdr_signature;
+	uint32_t hdr_version;
+	uint32_t hdr_entries;
 };
 
 #define INDEX_FORMAT_LB 2
@@ -115,8 +115,8 @@ struct cache_header {
  * check it for equality in the 32 bits we save.
  */
 struct cache_time {
-	unsigned int sec;
-	unsigned int nsec;
+	uint32_t sec;
+	uint32_t nsec;
 };
 
 struct stat_data {
diff --git a/read-cache.c b/read-cache.c
index ceaf207..0df5b31 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1230,14 +1230,14 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int reall
 struct ondisk_cache_entry {
 	struct cache_time ctime;
 	struct cache_time mtime;
-	unsigned int dev;
-	unsigned int ino;
-	unsigned int mode;
-	unsigned int uid;
-	unsigned int gid;
-	unsigned int size;
+	uint32_t dev;
+	uint32_t ino;
+	uint32_t mode;
+	uint32_t uid;
+	uint32_t gid;
+	uint32_t size;
 	unsigned char sha1[20];
-	unsigned short flags;
+	uint16_t flags;
 	char name[FLEX_ARRAY]; /* more */
 };
 
@@ -1249,15 +1249,15 @@ struct ondisk_cache_entry {
 struct ondisk_cache_entry_extended {
 	struct cache_time ctime;
 	struct cache_time mtime;
-	unsigned int dev;
-	unsigned int ino;
-	unsigned int mode;
-	unsigned int uid;
-	unsigned int gid;
-	unsigned int size;
+	uint32_t dev;
+	uint32_t ino;
+	uint32_t mode;
+	uint32_t uid;
+	uint32_t gid;
+	uint32_t size;
 	unsigned char sha1[20];
-	unsigned short flags;
-	unsigned short flags2;
+	uint16_t flags;
+	uint16_t flags2;
 	char name[FLEX_ARRAY]; /* more */
 };
 
-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 03/24] read-cache: split index file version specific functionality
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
  2013-08-18 19:41 ` [PATCH v3 01/24] t2104: Don't fail for index versions other than [23] Thomas Gummerer
  2013-08-18 19:41 ` [PATCH v3 02/24] read-cache: use fixed width integer types Thomas Gummerer
@ 2013-08-18 19:41 ` Thomas Gummerer
  2013-08-18 19:41 ` [PATCH v3 04/24] read-cache: clear version in discard_index() Thomas Gummerer
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:41 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

Split index file version specific functionality to their own functions,
to prepare for moving the index file version specific parts to their own
file.  This makes it easier to add a new index file format later.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 read-cache.c | 114 ++++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 74 insertions(+), 40 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 0df5b31..de0bbcd 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1269,10 +1269,8 @@ struct ondisk_cache_entry_extended {
 			    ondisk_cache_entry_extended_size(ce_namelen(ce)) : \
 			    ondisk_cache_entry_size(ce_namelen(ce)))
 
-static int verify_hdr(struct cache_header *hdr, unsigned long size)
+static int verify_hdr_version(struct cache_header *hdr, unsigned long size)
 {
-	git_SHA_CTX c;
-	unsigned char sha1[20];
 	int hdr_version;
 
 	if (hdr->hdr_signature != htonl(CACHE_SIGNATURE))
@@ -1280,10 +1278,21 @@ static int verify_hdr(struct cache_header *hdr, unsigned long size)
 	hdr_version = ntohl(hdr->hdr_version);
 	if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version)
 		return error("bad index version %d", hdr_version);
+	return 0;
+}
+
+static int verify_hdr(void *mmap, unsigned long size)
+{
+	git_SHA_CTX c;
+	unsigned char sha1[20];
+
+	if (size < sizeof(struct cache_header) + 20)
+		die("index file smaller than expected");
+
 	git_SHA1_Init(&c);
-	git_SHA1_Update(&c, hdr, size - 20);
+	git_SHA1_Update(&c, mmap, size - 20);
 	git_SHA1_Final(sha1, &c);
-	if (hashcmp(sha1, (unsigned char *)hdr + size - 20))
+	if (hashcmp(sha1, (unsigned char *)mmap + size - 20))
 		return error("bad index file sha1 signature");
 	return 0;
 }
@@ -1425,44 +1434,14 @@ static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
 	return ce;
 }
 
-/* remember to discard_cache() before reading a different cache! */
-int read_index_from(struct index_state *istate, const char *path)
+static int read_index_v2(struct index_state *istate, void *mmap, unsigned long mmap_size)
 {
-	int fd, i;
-	struct stat st;
+	int i;
 	unsigned long src_offset;
 	struct cache_header *hdr;
-	void *mmap;
-	size_t mmap_size;
 	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
 
-	if (istate->initialized)
-		return istate->cache_nr;
-
-	istate->timestamp.sec = 0;
-	istate->timestamp.nsec = 0;
-	fd = open(path, O_RDONLY);
-	if (fd < 0) {
-		if (errno == ENOENT)
-			return 0;
-		die_errno("index file open failed");
-	}
-
-	if (fstat(fd, &st))
-		die_errno("cannot stat the open index");
-
-	mmap_size = xsize_t(st.st_size);
-	if (mmap_size < sizeof(struct cache_header) + 20)
-		die("index file smaller than expected");
-
-	mmap = xmmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
-	if (mmap == MAP_FAILED)
-		die_errno("unable to map index file");
-	close(fd);
-
 	hdr = mmap;
-	if (verify_hdr(hdr, mmap_size) < 0)
-		goto unmap;
 
 	istate->version = ntohl(hdr->hdr_version);
 	istate->cache_nr = ntohl(hdr->hdr_entries);
@@ -1488,8 +1467,6 @@ int read_index_from(struct index_state *istate, const char *path)
 		src_offset += consumed;
 	}
 	strbuf_release(&previous_name_buf);
-	istate->timestamp.sec = st.st_mtime;
-	istate->timestamp.nsec = ST_MTIME_NSEC(st);
 
 	while (src_offset <= mmap_size - 20 - 8) {
 		/* After an array of active_nr index entries,
@@ -1509,6 +1486,58 @@ int read_index_from(struct index_state *istate, const char *path)
 		src_offset += 8;
 		src_offset += extsize;
 	}
+	return 0;
+unmap:
+	munmap(mmap, mmap_size);
+	die("index file corrupt");
+}
+
+/* remember to discard_cache() before reading a different cache! */
+int read_index_from(struct index_state *istate, const char *path)
+{
+	int fd;
+	struct stat st;
+	struct cache_header *hdr;
+	void *mmap;
+	size_t mmap_size;
+
+	errno = EBUSY;
+	if (istate->initialized)
+		return istate->cache_nr;
+
+	errno = ENOENT;
+	istate->timestamp.sec = 0;
+	istate->timestamp.nsec = 0;
+	fd = open(path, O_RDONLY);
+	if (fd < 0) {
+		if (errno == ENOENT)
+			return 0;
+		die_errno("index file open failed");
+	}
+
+	if (fstat(fd, &st))
+		die_errno("cannot stat the open index");
+
+	errno = EINVAL;
+	mmap_size = xsize_t(st.st_size);
+	if (mmap_size < sizeof(struct cache_header) + 20)
+		die("index file smaller than expected");
+
+	mmap = xmmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
+	close(fd);
+	if (mmap == MAP_FAILED)
+		die_errno("unable to map index file");
+
+	hdr = mmap;
+	if (verify_hdr_version(hdr, mmap_size) < 0)
+		goto unmap;
+
+	if (verify_hdr(mmap, mmap_size) < 0)
+		goto unmap;
+
+	read_index_v2(istate, mmap, mmap_size);
+	istate->timestamp.sec = st.st_mtime;
+	istate->timestamp.nsec = ST_MTIME_NSEC(st);
 	munmap(mmap, mmap_size);
 	return istate->cache_nr;
 
@@ -1772,7 +1801,7 @@ void update_index_if_able(struct index_state *istate, struct lock_file *lockfile
 		rollback_lock_file(lockfile);
 }
 
-int write_index(struct index_state *istate, int newfd)
+static int write_index_v2(struct index_state *istate, int newfd)
 {
 	git_SHA_CTX c;
 	struct cache_header hdr;
@@ -1855,6 +1884,11 @@ int write_index(struct index_state *istate, int newfd)
 	return 0;
 }
 
+int write_index(struct index_state *istate, int newfd)
+{
+	return write_index_v2(istate, newfd);
+}
+
 /*
  * Read the index file that is potentially unmerged into given
  * index_state, dropping any unmerged entries.  Returns true if
-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 04/24] read-cache: clear version in discard_index()
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
                   ` (2 preceding siblings ...)
  2013-08-18 19:41 ` [PATCH v3 03/24] read-cache: split index file version specific functionality Thomas Gummerer
@ 2013-08-18 19:41 ` Thomas Gummerer
  2013-08-20 19:34   ` Junio C Hamano
  2013-08-18 19:41 ` [PATCH v3 05/24] read-cache: move index v2 specific functions to their own file Thomas Gummerer
                   ` (20 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:41 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

All fields except index_state->version are reset in discard_index.
Reset the version too.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 read-cache.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/read-cache.c b/read-cache.c
index de0bbcd..1e22f6f 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1558,6 +1558,7 @@ int discard_index(struct index_state *istate)
 	for (i = 0; i < istate->cache_nr; i++)
 		free(istate->cache[i]);
 	resolve_undo_clear_index(istate);
+	istate->version = 0;
 	istate->cache_nr = 0;
 	istate->cache_changed = 0;
 	istate->timestamp.sec = 0;
-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 05/24] read-cache: move index v2 specific functions to their own file
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
                   ` (3 preceding siblings ...)
  2013-08-18 19:41 ` [PATCH v3 04/24] read-cache: clear version in discard_index() Thomas Gummerer
@ 2013-08-18 19:41 ` Thomas Gummerer
  2013-08-18 19:41 ` [PATCH v3 06/24] read-cache: Don't compare uid, gid and ino on cygwin Thomas Gummerer
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:41 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

Move index version 2 specific functions to their own file. The non-index
specific functions will be in read-cache.c, while the index version 2
specific functions will be in read-cache-v2.c.

Helped-by: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 Makefile               |   2 +
 builtin/apply.c        |   2 +
 builtin/update-index.c |   2 +-
 cache.h                |  13 +-
 read-cache-v2.c        | 544 ++++++++++++++++++++++++++++++++++++++++++++++
 read-cache.c           | 576 +++++--------------------------------------------
 read-cache.h           |  58 +++++
 test-index-version.c   |   6 +
 unpack-trees.c         |   3 +-
 9 files changed, 669 insertions(+), 537 deletions(-)
 create mode 100644 read-cache-v2.c
 create mode 100644 read-cache.h

diff --git a/Makefile b/Makefile
index 6b446e7..afae23e 100644
--- a/Makefile
+++ b/Makefile
@@ -712,6 +712,7 @@ LIB_H += progress.h
 LIB_H += prompt.h
 LIB_H += quote.h
 LIB_H += reachable.h
+LIB_H += read-cache.h
 LIB_H += reflog-walk.h
 LIB_H += refs.h
 LIB_H += remote.h
@@ -855,6 +856,7 @@ LIB_OBJS += prompt.o
 LIB_OBJS += quote.o
 LIB_OBJS += reachable.o
 LIB_OBJS += read-cache.o
+LIB_OBJS += read-cache-v2.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += remote.o
diff --git a/builtin/apply.c b/builtin/apply.c
index 50912c9..3d5a5dc 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -3682,6 +3682,8 @@ static void build_fake_ancestor(struct patch *list, const char *filename)
 			die ("Could not add %s to temporary index", name);
 	}
 
+	if (!result.initialized)
+		initialize_index(&result, 0);
 	fd = open(filename, O_WRONLY | O_CREAT, 0666);
 	if (fd < 0 || write_index(&result, fd) || close(fd))
 		die ("Could not write temporary index to %s", filename);
diff --git a/builtin/update-index.c b/builtin/update-index.c
index e3a10d7..c5bb889 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -863,7 +863,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 
 		if (the_index.version != preferred_index_format)
 			active_cache_changed = 1;
-		the_index.version = preferred_index_format;
+		change_cache_version(preferred_index_format);
 	}
 
 	if (read_from_stdin) {
diff --git a/cache.h b/cache.h
index 9ef778a..d4dae21 100644
--- a/cache.h
+++ b/cache.h
@@ -95,16 +95,8 @@ unsigned long git_deflate_bound(git_zstream *, unsigned long);
  */
 #define DEFAULT_GIT_PORT 9418
 
-/*
- * Basic data structures for the directory cache
- */
 
 #define CACHE_SIGNATURE 0x44495243	/* "DIRC" */
-struct cache_header {
-	uint32_t hdr_signature;
-	uint32_t hdr_version;
-	uint32_t hdr_entries;
-};
 
 #define INDEX_FORMAT_LB 2
 #define INDEX_FORMAT_UB 4
@@ -279,6 +271,7 @@ struct index_state {
 		 initialized : 1;
 	struct hash_table name_hash;
 	struct hash_table dir_hash;
+	struct index_ops *ops;
 };
 
 extern struct index_state the_index;
@@ -296,6 +289,8 @@ extern void free_name_hash(struct index_state *istate);
 #define active_cache_changed (the_index.cache_changed)
 #define active_cache_tree (the_index.cache_tree)
 
+#define initialize_cache() initialize_index(&the_index, 0)
+#define change_cache_version(version) change_index_version(&the_index, (version))
 #define read_cache() read_index(&the_index)
 #define read_cache_from(path) read_index_from(&the_index, (path))
 #define read_cache_preload(pathspec) read_index_preload(&the_index, (pathspec))
@@ -454,6 +449,8 @@ extern void sanitize_stdfds(void);
 	} while (0)
 
 /* Initialize and use the cache information */
+extern void initialize_index(struct index_state *istate, int version);
+extern void change_index_version(struct index_state *istate, int version);
 extern int read_index(struct index_state *);
 extern int read_index_preload(struct index_state *, const struct pathspec *pathspec);
 extern int read_index_from(struct index_state *, const char *path);
diff --git a/read-cache-v2.c b/read-cache-v2.c
new file mode 100644
index 0000000..070d468
--- /dev/null
+++ b/read-cache-v2.c
@@ -0,0 +1,544 @@
+#include "cache.h"
+#include "read-cache.h"
+#include "resolve-undo.h"
+#include "cache-tree.h"
+#include "varint.h"
+
+/* Mask for the name length in ce_flags in the on-disk index */
+#define CE_NAMEMASK  (0x0fff)
+
+/*****************************************************************
+ * Index File I/O
+ *****************************************************************/
+
+/*
+ * dev/ino/uid/gid/size are also just tracked to the low 32 bits
+ * Again - this is just a (very strong in practice) heuristic that
+ * the inode hasn't changed.
+ *
+ * We save the fields in big-endian order to allow using the
+ * index file over NFS transparently.
+ */
+struct ondisk_cache_entry {
+	struct cache_time ctime;
+	struct cache_time mtime;
+	uint32_t dev;
+	uint32_t ino;
+	uint32_t mode;
+	uint32_t uid;
+	uint32_t gid;
+	uint32_t size;
+	unsigned char sha1[20];
+	uint16_t flags;
+	char name[FLEX_ARRAY]; /* more */
+};
+
+/*
+ * This struct is used when CE_EXTENDED bit is 1
+ * The struct must match ondisk_cache_entry exactly from
+ * ctime till flags
+ */
+struct ondisk_cache_entry_extended {
+	struct cache_time ctime;
+	struct cache_time mtime;
+	uint32_t dev;
+	uint32_t ino;
+	uint32_t mode;
+	uint32_t uid;
+	uint32_t gid;
+	uint32_t size;
+	unsigned char sha1[20];
+	uint16_t flags;
+	uint16_t flags2;
+	char name[FLEX_ARRAY]; /* more */
+};
+
+/* These are only used for v3 or lower */
+#define align_flex_name(STRUCT,len) ((offsetof(struct STRUCT,name) + (len) + 8) & ~7)
+#define ondisk_cache_entry_size(len) align_flex_name(ondisk_cache_entry,len)
+#define ondisk_cache_entry_extended_size(len) align_flex_name(ondisk_cache_entry_extended,len)
+#define ondisk_ce_size(ce) (((ce)->ce_flags & CE_EXTENDED) ? \
+			    ondisk_cache_entry_extended_size(ce_namelen(ce)) : \
+			    ondisk_cache_entry_size(ce_namelen(ce)))
+
+static int verify_hdr(void *mmap, unsigned long size)
+{
+	git_SHA_CTX c;
+	unsigned char sha1[20];
+
+	if (size < + sizeof(struct cache_header) + 20)
+		die("index file smaller than expected");
+
+	git_SHA1_Init(&c);
+	git_SHA1_Update(&c, mmap, size - 20);
+	git_SHA1_Final(sha1, &c);
+	if (hashcmp(sha1, (unsigned char *)mmap + size - 20))
+		return error("bad index file sha1 signature");
+	return 0;
+}
+
+static int match_stat_basic(const struct cache_entry *ce,
+			    struct stat *st, int changed)
+{
+	changed |= match_stat_data(&ce->ce_stat_data, st);
+
+	/* Racily smudged entry? */
+	if (!ce->ce_stat_data.sd_size) {
+		if (!is_empty_blob_sha1(ce->sha1))
+			changed |= DATA_CHANGED;
+	}
+	return changed;
+}
+
+static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *ondisk,
+						   unsigned int flags,
+						   const char *name,
+						   size_t len)
+{
+	struct cache_entry *ce = xmalloc(cache_entry_size(len));
+
+	ce->ce_stat_data.sd_ctime.sec = ntoh_l(ondisk->ctime.sec);
+	ce->ce_stat_data.sd_mtime.sec = ntoh_l(ondisk->mtime.sec);
+	ce->ce_stat_data.sd_ctime.nsec = ntoh_l(ondisk->ctime.nsec);
+	ce->ce_stat_data.sd_mtime.nsec = ntoh_l(ondisk->mtime.nsec);
+	ce->ce_stat_data.sd_dev   = ntoh_l(ondisk->dev);
+	ce->ce_stat_data.sd_ino   = ntoh_l(ondisk->ino);
+	ce->ce_mode  = ntoh_l(ondisk->mode);
+	ce->ce_stat_data.sd_uid   = ntoh_l(ondisk->uid);
+	ce->ce_stat_data.sd_gid   = ntoh_l(ondisk->gid);
+	ce->ce_stat_data.sd_size  = ntoh_l(ondisk->size);
+	ce->ce_flags = flags & ~CE_NAMEMASK;
+	ce->ce_namelen = len;
+	hashcpy(ce->sha1, ondisk->sha1);
+	memcpy(ce->name, name, len);
+	ce->name[len] = '\0';
+	return ce;
+}
+
+/*
+ * Adjacent cache entries tend to share the leading paths, so it makes
+ * sense to only store the differences in later entries.  In the v4
+ * on-disk format of the index, each on-disk cache entry stores the
+ * number of bytes to be stripped from the end of the previous name,
+ * and the bytes to append to the result, to come up with its name.
+ */
+static unsigned long expand_name_field(struct strbuf *name, const char *cp_)
+{
+	const unsigned char *ep, *cp = (const unsigned char *)cp_;
+	size_t len = decode_varint(&cp);
+
+	if (name->len < len)
+		die("malformed name field in the index");
+	strbuf_remove(name, name->len - len, len);
+	for (ep = cp; *ep; ep++)
+		; /* find the end */
+	strbuf_add(name, cp, ep - cp);
+	return (const char *)ep + 1 - cp_;
+}
+
+static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
+					    unsigned long *ent_size,
+					    struct strbuf *previous_name)
+{
+	struct cache_entry *ce;
+	size_t len;
+	const char *name;
+	unsigned int flags;
+
+	/* On-disk flags are just 16 bits */
+	flags = ntoh_s(ondisk->flags);
+	len = flags & CE_NAMEMASK;
+
+	if (flags & CE_EXTENDED) {
+		struct ondisk_cache_entry_extended *ondisk2;
+		int extended_flags;
+		ondisk2 = (struct ondisk_cache_entry_extended *)ondisk;
+		extended_flags = ntoh_s(ondisk2->flags2) << 16;
+		/* We do not yet understand any bit out of CE_EXTENDED_FLAGS */
+		if (extended_flags & ~CE_EXTENDED_FLAGS)
+			die("Unknown index entry format %08x", extended_flags);
+		flags |= extended_flags;
+		name = ondisk2->name;
+	}
+	else
+		name = ondisk->name;
+
+	if (!previous_name) {
+		/* v3 and earlier */
+		if (len == CE_NAMEMASK)
+			len = strlen(name);
+		ce = cache_entry_from_ondisk(ondisk, flags, name, len);
+
+		*ent_size = ondisk_ce_size(ce);
+	} else {
+		unsigned long consumed;
+		consumed = expand_name_field(previous_name, name);
+		ce = cache_entry_from_ondisk(ondisk, flags,
+					     previous_name->buf,
+					     previous_name->len);
+
+		*ent_size = (name - ((char *)ondisk)) + consumed;
+	}
+	return ce;
+}
+
+static int read_index_extension(struct index_state *istate,
+				const char *ext, void *data, unsigned long sz)
+{
+	switch (CACHE_EXT(ext)) {
+	case CACHE_EXT_TREE:
+		istate->cache_tree = cache_tree_read(data, sz);
+		break;
+	case CACHE_EXT_RESOLVE_UNDO:
+		istate->resolve_undo = resolve_undo_read(data, sz);
+		break;
+	default:
+		if (*ext < 'A' || 'Z' < *ext)
+			return error("index uses %.4s extension, which we do not understand",
+				     ext);
+		fprintf(stderr, "ignoring %.4s extension\n", ext);
+		break;
+	}
+	return 0;
+}
+
+static int read_index_v2(struct index_state *istate, void *mmap,
+			 unsigned long mmap_size)
+{
+	int i;
+	unsigned long src_offset;
+	struct cache_header *hdr;
+	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
+
+	hdr = mmap;
+	istate->cache_nr = ntohl(hdr->hdr_entries);
+	istate->cache_alloc = alloc_nr(istate->cache_nr);
+	istate->cache = xcalloc(istate->cache_alloc, sizeof(struct cache_entry *));
+
+	if (istate->version == 4)
+		previous_name = &previous_name_buf;
+	else
+		previous_name = NULL;
+
+	src_offset = sizeof(*hdr);
+	for (i = 0; i < istate->cache_nr; i++) {
+		struct ondisk_cache_entry *disk_ce;
+		struct cache_entry *ce;
+		unsigned long consumed;
+
+		disk_ce = (struct ondisk_cache_entry *)((char *)mmap + src_offset);
+		ce = create_from_disk(disk_ce, &consumed, previous_name);
+		set_index_entry(istate, i, ce);
+
+		src_offset += consumed;
+	}
+	strbuf_release(&previous_name_buf);
+
+	while (src_offset <= mmap_size - 20 - 8) {
+		/* After an array of active_nr index entries,
+		 * there can be arbitrary number of extended
+		 * sections, each of which is prefixed with
+		 * extension name (4-byte) and section length
+		 * in 4-byte network byte order.
+		 */
+		uint32_t extsize;
+		memcpy(&extsize, (char *)mmap + src_offset + 4, 4);
+		extsize = ntohl(extsize);
+		if (read_index_extension(istate,
+					(const char *) mmap + src_offset,
+					(char *) mmap + src_offset + 8,
+					extsize) < 0)
+			goto unmap;
+		src_offset += 8;
+		src_offset += extsize;
+	}
+	return 0;
+unmap:
+	munmap(mmap, mmap_size);
+	die("index file corrupt");
+}
+
+#define WRITE_BUFFER_SIZE 8192
+static unsigned char write_buffer[WRITE_BUFFER_SIZE];
+static unsigned long write_buffer_len;
+
+static int ce_write_flush(git_SHA_CTX *context, int fd)
+{
+	unsigned int buffered = write_buffer_len;
+	if (buffered) {
+		git_SHA1_Update(context, write_buffer, buffered);
+		if (write_in_full(fd, write_buffer, buffered) != buffered)
+			return -1;
+		write_buffer_len = 0;
+	}
+	return 0;
+}
+
+static int ce_write(git_SHA_CTX *context, int fd, void *data, unsigned int len)
+{
+	while (len) {
+		unsigned int buffered = write_buffer_len;
+		unsigned int partial = WRITE_BUFFER_SIZE - buffered;
+		if (partial > len)
+			partial = len;
+		memcpy(write_buffer + buffered, data, partial);
+		buffered += partial;
+		if (buffered == WRITE_BUFFER_SIZE) {
+			write_buffer_len = buffered;
+			if (ce_write_flush(context, fd))
+				return -1;
+			buffered = 0;
+		}
+		write_buffer_len = buffered;
+		len -= partial;
+		data = (char *) data + partial;
+	}
+	return 0;
+}
+
+static int write_index_ext_header(git_SHA_CTX *context, int fd,
+				  unsigned int ext, unsigned int sz)
+{
+	ext = htonl(ext);
+	sz = htonl(sz);
+	return ((ce_write(context, fd, &ext, 4) < 0) ||
+		(ce_write(context, fd, &sz, 4) < 0)) ? -1 : 0;
+}
+
+static int ce_flush(git_SHA_CTX *context, int fd)
+{
+	unsigned int left = write_buffer_len;
+
+	if (left) {
+		write_buffer_len = 0;
+		git_SHA1_Update(context, write_buffer, left);
+	}
+
+	/* Flush first if not enough space for SHA1 signature */
+	if (left + 20 > WRITE_BUFFER_SIZE) {
+		if (write_in_full(fd, write_buffer, left) != left)
+			return -1;
+		left = 0;
+	}
+
+	/* Append the SHA1 signature at the end */
+	git_SHA1_Final(write_buffer + left, context);
+	left += 20;
+	return (write_in_full(fd, write_buffer, left) != left) ? -1 : 0;
+}
+
+static void ce_smudge_racily_clean_entry(struct index_state *istate, struct cache_entry *ce)
+{
+	/*
+	 * The only thing we care about in this function is to smudge the
+	 * falsely clean entry due to touch-update-touch race, so we leave
+	 * everything else as they are.  We are called for entries whose
+	 * ce_stat_data.sd_mtime match the index file mtime.
+	 *
+	 * Note that this actually does not do much for gitlinks, for
+	 * which ce_match_stat_basic() always goes to the actual
+	 * contents.  The caller checks with is_racy_timestamp() which
+	 * always says "no" for gitlinks, so we are not called for them ;-)
+	 */
+	struct stat st;
+
+	if (lstat(ce->name, &st) < 0)
+		return;
+	if (ce_match_stat_basic(istate, ce, &st))
+		return;
+	if (ce_modified_check_fs(ce, &st)) {
+		/* This is "racily clean"; smudge it.  Note that this
+		 * is a tricky code.  At first glance, it may appear
+		 * that it can break with this sequence:
+		 *
+		 * $ echo xyzzy >frotz
+		 * $ git-update-index --add frotz
+		 * $ : >frotz
+		 * $ sleep 3
+		 * $ echo filfre >nitfol
+		 * $ git-update-index --add nitfol
+		 *
+		 * but it does not.  When the second update-index runs,
+		 * it notices that the entry "frotz" has the same timestamp
+		 * as index, and if we were to smudge it by resetting its
+		 * size to zero here, then the object name recorded
+		 * in index is the 6-byte file but the cached stat information
+		 * becomes zero --- which would then match what we would
+		 * obtain from the filesystem next time we stat("frotz").
+		 *
+		 * However, the second update-index, before calling
+		 * this function, notices that the cached size is 6
+		 * bytes and what is on the filesystem is an empty
+		 * file, and never calls us, so the cached size information
+		 * for "frotz" stays 6 which does not match the filesystem.
+		 */
+		ce->ce_stat_data.sd_size = 0;
+	}
+}
+
+/* Copy miscellaneous fields but not the name */
+static char *copy_cache_entry_to_ondisk(struct ondisk_cache_entry *ondisk,
+				       struct cache_entry *ce)
+{
+	short flags;
+
+	ondisk->ctime.sec = htonl(ce->ce_stat_data.sd_ctime.sec);
+	ondisk->mtime.sec = htonl(ce->ce_stat_data.sd_mtime.sec);
+	ondisk->ctime.nsec = htonl(ce->ce_stat_data.sd_ctime.nsec);
+	ondisk->mtime.nsec = htonl(ce->ce_stat_data.sd_mtime.nsec);
+	ondisk->dev  = htonl(ce->ce_stat_data.sd_dev);
+	ondisk->ino  = htonl(ce->ce_stat_data.sd_ino);
+	ondisk->mode = htonl(ce->ce_mode);
+	ondisk->uid  = htonl(ce->ce_stat_data.sd_uid);
+	ondisk->gid  = htonl(ce->ce_stat_data.sd_gid);
+	ondisk->size = htonl(ce->ce_stat_data.sd_size);
+	hashcpy(ondisk->sha1, ce->sha1);
+
+	flags = ce->ce_flags;
+	flags |= (ce_namelen(ce) >= CE_NAMEMASK ? CE_NAMEMASK : ce_namelen(ce));
+	ondisk->flags = htons(flags);
+	if (ce->ce_flags & CE_EXTENDED) {
+		struct ondisk_cache_entry_extended *ondisk2;
+		ondisk2 = (struct ondisk_cache_entry_extended *)ondisk;
+		ondisk2->flags2 = htons((ce->ce_flags & CE_EXTENDED_FLAGS) >> 16);
+		return ondisk2->name;
+	}
+	else {
+		return ondisk->name;
+	}
+}
+
+static int ce_write_entry(git_SHA_CTX *c, int fd, struct cache_entry *ce,
+			  struct strbuf *previous_name)
+{
+	int size;
+	struct ondisk_cache_entry *ondisk;
+	char *name;
+	int result;
+
+	if (!previous_name) {
+		size = ondisk_ce_size(ce);
+		ondisk = xcalloc(1, size);
+		name = copy_cache_entry_to_ondisk(ondisk, ce);
+		memcpy(name, ce->name, ce_namelen(ce));
+	} else {
+		int common, to_remove, prefix_size;
+		unsigned char to_remove_vi[16];
+		for (common = 0;
+		     (ce->name[common] &&
+		      common < previous_name->len &&
+		      ce->name[common] == previous_name->buf[common]);
+		     common++)
+			; /* still matching */
+		to_remove = previous_name->len - common;
+		prefix_size = encode_varint(to_remove, to_remove_vi);
+
+		if (ce->ce_flags & CE_EXTENDED)
+			size = offsetof(struct ondisk_cache_entry_extended, name);
+		else
+			size = offsetof(struct ondisk_cache_entry, name);
+		size += prefix_size + (ce_namelen(ce) - common + 1);
+
+		ondisk = xcalloc(1, size);
+		name = copy_cache_entry_to_ondisk(ondisk, ce);
+		memcpy(name, to_remove_vi, prefix_size);
+		memcpy(name + prefix_size, ce->name + common, ce_namelen(ce) - common);
+
+		strbuf_splice(previous_name, common, to_remove,
+			      ce->name + common, ce_namelen(ce) - common);
+	}
+
+	result = ce_write(c, fd, ondisk, size);
+	free(ondisk);
+	return result;
+}
+
+static int write_index_v2(struct index_state *istate, int newfd)
+{
+	git_SHA_CTX c;
+	struct cache_header hdr;
+	int i, err, removed, extended, hdr_version;
+	struct cache_entry **cache = istate->cache;
+	int entries = istate->cache_nr;
+	struct stat st;
+	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
+
+	for (i = removed = extended = 0; i < entries; i++) {
+		if (cache[i]->ce_flags & CE_REMOVE)
+			removed++;
+
+		/* reduce extended entries if possible */
+		cache[i]->ce_flags &= ~CE_EXTENDED;
+		if (cache[i]->ce_flags & CE_EXTENDED_FLAGS) {
+			extended++;
+			cache[i]->ce_flags |= CE_EXTENDED;
+		}
+	}
+
+	if (!istate->version)
+		istate->version = INDEX_FORMAT_DEFAULT;
+
+	/* demote version 3 to version 2 when the latter suffices */
+	if (istate->version == 3 || istate->version == 2)
+		istate->version = extended ? 3 : 2;
+
+	hdr_version = istate->version;
+
+	hdr.hdr_signature = htonl(CACHE_SIGNATURE);
+	hdr.hdr_version = htonl(hdr_version);
+	hdr.hdr_entries = htonl(entries - removed);
+
+	git_SHA1_Init(&c);
+	if (ce_write(&c, newfd, &hdr, sizeof(hdr)) < 0)
+		return -1;
+
+	previous_name = (hdr_version == 4) ? &previous_name_buf : NULL;
+	for (i = 0; i < entries; i++) {
+		struct cache_entry *ce = cache[i];
+		if (ce->ce_flags & CE_REMOVE)
+			continue;
+		if (!ce_uptodate(ce) && is_racy_timestamp(istate, ce))
+			ce_smudge_racily_clean_entry(istate, ce);
+		if (is_null_sha1(ce->sha1))
+			return error("cache entry has null sha1: %s", ce->name);
+		if (ce_write_entry(&c, newfd, ce, previous_name) < 0)
+			return -1;
+	}
+	strbuf_release(&previous_name_buf);
+
+	/* Write extension data here */
+	if (istate->cache_tree) {
+		struct strbuf sb = STRBUF_INIT;
+
+		cache_tree_write(&sb, istate->cache_tree);
+		err = write_index_ext_header(&c, newfd, CACHE_EXT_TREE, sb.len) < 0
+			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
+		strbuf_release(&sb);
+		if (err)
+			return -1;
+	}
+	if (istate->resolve_undo) {
+		struct strbuf sb = STRBUF_INIT;
+
+		resolve_undo_write(&sb, istate->resolve_undo);
+		err = write_index_ext_header(&c, newfd, CACHE_EXT_RESOLVE_UNDO,
+					     sb.len) < 0
+			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
+		strbuf_release(&sb);
+		if (err)
+			return -1;
+	}
+
+	if (ce_flush(&c, newfd) || fstat(newfd, &st))
+		return -1;
+	istate->timestamp.sec = (unsigned int)st.st_mtime;
+	istate->timestamp.nsec = ST_MTIME_NSEC(st);
+	return 0;
+}
+
+struct index_ops v2_ops = {
+	match_stat_basic,
+	verify_hdr,
+	read_index_v2,
+	write_index_v2
+};
diff --git a/read-cache.c b/read-cache.c
index 1e22f6f..1f827de 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -5,6 +5,7 @@
  */
 #define NO_THE_INDEX_COMPATIBILITY_MACROS
 #include "cache.h"
+#include "read-cache.h"
 #include "cache-tree.h"
 #include "refs.h"
 #include "dir.h"
@@ -17,26 +18,9 @@
 
 static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int really);
 
-/* Mask for the name length in ce_flags in the on-disk index */
-
-#define CE_NAMEMASK  (0x0fff)
-
-/* Index extensions.
- *
- * The first letter should be 'A'..'Z' for extensions that are not
- * necessary for a correct operation (i.e. optimization data).
- * When new extensions are added that _needs_ to be understood in
- * order to correctly interpret the index file, pick character that
- * is outside the range, to cause the reader to abort.
- */
-
-#define CACHE_EXT(s) ( (s[0]<<24)|(s[1]<<16)|(s[2]<<8)|(s[3]) )
-#define CACHE_EXT_TREE 0x54524545	/* "TREE" */
-#define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */
-
 struct index_state the_index;
 
-static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
+void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
 {
 	istate->cache[nr] = ce;
 	add_name_hash(istate, ce);
@@ -190,7 +174,7 @@ static int ce_compare_gitlink(const struct cache_entry *ce)
 	return hashcmp(sha1, ce->sha1);
 }
 
-static int ce_modified_check_fs(const struct cache_entry *ce, struct stat *st)
+int ce_modified_check_fs(const struct cache_entry *ce, struct stat *st)
 {
 	switch (st->st_mode & S_IFMT) {
 	case S_IFREG:
@@ -210,7 +194,18 @@ static int ce_modified_check_fs(const struct cache_entry *ce, struct stat *st)
 	return 0;
 }
 
-static int ce_match_stat_basic(const struct cache_entry *ce, struct stat *st)
+/*
+ * Check if the reading/writing operations are set and set them
+ * to the correct version
+ */
+static void set_istate_ops(struct index_state *istate)
+{
+	if (istate->version >= 2 && istate->version <= 4)
+		istate->ops = &v2_ops;
+}
+
+int ce_match_stat_basic(const struct index_state *istate,
+			const struct cache_entry *ce, struct stat *st)
 {
 	unsigned int changed = 0;
 
@@ -243,19 +238,13 @@ static int ce_match_stat_basic(const struct cache_entry *ce, struct stat *st)
 		die("internal error: ce_mode is %o", ce->ce_mode);
 	}
 
-	changed |= match_stat_data(&ce->ce_stat_data, st);
-
-	/* Racily smudged entry? */
-	if (!ce->ce_stat_data.sd_size) {
-		if (!is_empty_blob_sha1(ce->sha1))
-			changed |= DATA_CHANGED;
-	}
-
+	changed = istate->ops->match_stat_basic(ce, st, changed);
 	return changed;
 }
 
-static int is_racy_timestamp(const struct index_state *istate,
-			     const struct cache_entry *ce)
+
+int is_racy_timestamp(const struct index_state *istate,
+		      const struct cache_entry *ce)
 {
 	return (!S_ISGITLINK(ce->ce_mode) &&
 		istate->timestamp.sec &&
@@ -298,7 +287,7 @@ int ie_match_stat(const struct index_state *istate,
 	if (ce->ce_flags & CE_INTENT_TO_ADD)
 		return DATA_CHANGED | TYPE_CHANGED | MODE_CHANGED;
 
-	changed = ce_match_stat_basic(ce, st);
+	changed = ce_match_stat_basic(istate, ce, st);
 
 	/*
 	 * Within 1 second of this sequence:
@@ -982,6 +971,8 @@ int add_index_entry(struct index_state *istate, struct cache_entry *ce, int opti
 {
 	int pos;
 
+	if (!istate->initialized)
+		initialize_index(istate, INDEX_FORMAT_DEFAULT);
 	if (option & ADD_CACHE_JUST_APPEND)
 		pos = istate->cache_nr;
 	else {
@@ -1212,13 +1203,25 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int reall
 	return refresh_cache_ent(&the_index, ce, really, NULL, NULL);
 }
 
+void initialize_index(struct index_state *istate, int version)
+{
+	istate->initialized = 1;
+	if (!version)
+		version = INDEX_FORMAT_DEFAULT;
+	istate->version = version;
+	set_istate_ops(istate);
+}
+
+void change_index_version(struct index_state *istate, int version)
+{
+	istate->version = version;
+	set_istate_ops(istate);
+}
 
 /*****************************************************************
  * Index File I/O
  *****************************************************************/
 
-#define INDEX_FORMAT_DEFAULT 3
-
 /*
  * dev/ino/uid/gid/size are also just tracked to the low 32 bits
  * Again - this is just a (very strong in practice) heuristic that
@@ -1269,7 +1272,8 @@ struct ondisk_cache_entry_extended {
 			    ondisk_cache_entry_extended_size(ce_namelen(ce)) : \
 			    ondisk_cache_entry_size(ce_namelen(ce)))
 
-static int verify_hdr_version(struct cache_header *hdr, unsigned long size)
+static int verify_hdr_version(struct index_state *istate,
+			      struct cache_header *hdr, unsigned long size)
 {
 	int hdr_version;
 
@@ -1278,42 +1282,7 @@ static int verify_hdr_version(struct cache_header *hdr, unsigned long size)
 	hdr_version = ntohl(hdr->hdr_version);
 	if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version)
 		return error("bad index version %d", hdr_version);
-	return 0;
-}
-
-static int verify_hdr(void *mmap, unsigned long size)
-{
-	git_SHA_CTX c;
-	unsigned char sha1[20];
-
-	if (size < sizeof(struct cache_header) + 20)
-		die("index file smaller than expected");
-
-	git_SHA1_Init(&c);
-	git_SHA1_Update(&c, mmap, size - 20);
-	git_SHA1_Final(sha1, &c);
-	if (hashcmp(sha1, (unsigned char *)mmap + size - 20))
-		return error("bad index file sha1 signature");
-	return 0;
-}
-
-static int read_index_extension(struct index_state *istate,
-				const char *ext, void *data, unsigned long sz)
-{
-	switch (CACHE_EXT(ext)) {
-	case CACHE_EXT_TREE:
-		istate->cache_tree = cache_tree_read(data, sz);
-		break;
-	case CACHE_EXT_RESOLVE_UNDO:
-		istate->resolve_undo = resolve_undo_read(data, sz);
-		break;
-	default:
-		if (*ext < 'A' || 'Z' < *ext)
-			return error("index uses %.4s extension, which we do not understand",
-				     ext);
-		fprintf(stderr, "ignoring %.4s extension\n", ext);
-		break;
-	}
+	initialize_index(istate, hdr_version);
 	return 0;
 }
 
@@ -1322,176 +1291,6 @@ int read_index(struct index_state *istate)
 	return read_index_from(istate, get_index_file());
 }
 
-#ifndef NEEDS_ALIGNED_ACCESS
-#define ntoh_s(var) ntohs(var)
-#define ntoh_l(var) ntohl(var)
-#else
-static inline uint16_t ntoh_s_force_align(void *p)
-{
-	uint16_t x;
-	memcpy(&x, p, sizeof(x));
-	return ntohs(x);
-}
-static inline uint32_t ntoh_l_force_align(void *p)
-{
-	uint32_t x;
-	memcpy(&x, p, sizeof(x));
-	return ntohl(x);
-}
-#define ntoh_s(var) ntoh_s_force_align(&(var))
-#define ntoh_l(var) ntoh_l_force_align(&(var))
-#endif
-
-static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *ondisk,
-						   unsigned int flags,
-						   const char *name,
-						   size_t len)
-{
-	struct cache_entry *ce = xmalloc(cache_entry_size(len));
-
-	ce->ce_stat_data.sd_ctime.sec = ntoh_l(ondisk->ctime.sec);
-	ce->ce_stat_data.sd_mtime.sec = ntoh_l(ondisk->mtime.sec);
-	ce->ce_stat_data.sd_ctime.nsec = ntoh_l(ondisk->ctime.nsec);
-	ce->ce_stat_data.sd_mtime.nsec = ntoh_l(ondisk->mtime.nsec);
-	ce->ce_stat_data.sd_dev   = ntoh_l(ondisk->dev);
-	ce->ce_stat_data.sd_ino   = ntoh_l(ondisk->ino);
-	ce->ce_mode  = ntoh_l(ondisk->mode);
-	ce->ce_stat_data.sd_uid   = ntoh_l(ondisk->uid);
-	ce->ce_stat_data.sd_gid   = ntoh_l(ondisk->gid);
-	ce->ce_stat_data.sd_size  = ntoh_l(ondisk->size);
-	ce->ce_flags = flags & ~CE_NAMEMASK;
-	ce->ce_namelen = len;
-	hashcpy(ce->sha1, ondisk->sha1);
-	memcpy(ce->name, name, len);
-	ce->name[len] = '\0';
-	return ce;
-}
-
-/*
- * Adjacent cache entries tend to share the leading paths, so it makes
- * sense to only store the differences in later entries.  In the v4
- * on-disk format of the index, each on-disk cache entry stores the
- * number of bytes to be stripped from the end of the previous name,
- * and the bytes to append to the result, to come up with its name.
- */
-static unsigned long expand_name_field(struct strbuf *name, const char *cp_)
-{
-	const unsigned char *ep, *cp = (const unsigned char *)cp_;
-	size_t len = decode_varint(&cp);
-
-	if (name->len < len)
-		die("malformed name field in the index");
-	strbuf_remove(name, name->len - len, len);
-	for (ep = cp; *ep; ep++)
-		; /* find the end */
-	strbuf_add(name, cp, ep - cp);
-	return (const char *)ep + 1 - cp_;
-}
-
-static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
-					    unsigned long *ent_size,
-					    struct strbuf *previous_name)
-{
-	struct cache_entry *ce;
-	size_t len;
-	const char *name;
-	unsigned int flags;
-
-	/* On-disk flags are just 16 bits */
-	flags = ntoh_s(ondisk->flags);
-	len = flags & CE_NAMEMASK;
-
-	if (flags & CE_EXTENDED) {
-		struct ondisk_cache_entry_extended *ondisk2;
-		int extended_flags;
-		ondisk2 = (struct ondisk_cache_entry_extended *)ondisk;
-		extended_flags = ntoh_s(ondisk2->flags2) << 16;
-		/* We do not yet understand any bit out of CE_EXTENDED_FLAGS */
-		if (extended_flags & ~CE_EXTENDED_FLAGS)
-			die("Unknown index entry format %08x", extended_flags);
-		flags |= extended_flags;
-		name = ondisk2->name;
-	}
-	else
-		name = ondisk->name;
-
-	if (!previous_name) {
-		/* v3 and earlier */
-		if (len == CE_NAMEMASK)
-			len = strlen(name);
-		ce = cache_entry_from_ondisk(ondisk, flags, name, len);
-
-		*ent_size = ondisk_ce_size(ce);
-	} else {
-		unsigned long consumed;
-		consumed = expand_name_field(previous_name, name);
-		ce = cache_entry_from_ondisk(ondisk, flags,
-					     previous_name->buf,
-					     previous_name->len);
-
-		*ent_size = (name - ((char *)ondisk)) + consumed;
-	}
-	return ce;
-}
-
-static int read_index_v2(struct index_state *istate, void *mmap, unsigned long mmap_size)
-{
-	int i;
-	unsigned long src_offset;
-	struct cache_header *hdr;
-	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
-
-	hdr = mmap;
-
-	istate->version = ntohl(hdr->hdr_version);
-	istate->cache_nr = ntohl(hdr->hdr_entries);
-	istate->cache_alloc = alloc_nr(istate->cache_nr);
-	istate->cache = xcalloc(istate->cache_alloc, sizeof(*istate->cache));
-	istate->initialized = 1;
-
-	if (istate->version == 4)
-		previous_name = &previous_name_buf;
-	else
-		previous_name = NULL;
-
-	src_offset = sizeof(*hdr);
-	for (i = 0; i < istate->cache_nr; i++) {
-		struct ondisk_cache_entry *disk_ce;
-		struct cache_entry *ce;
-		unsigned long consumed;
-
-		disk_ce = (struct ondisk_cache_entry *)((char *)mmap + src_offset);
-		ce = create_from_disk(disk_ce, &consumed, previous_name);
-		set_index_entry(istate, i, ce);
-
-		src_offset += consumed;
-	}
-	strbuf_release(&previous_name_buf);
-
-	while (src_offset <= mmap_size - 20 - 8) {
-		/* After an array of active_nr index entries,
-		 * there can be arbitrary number of extended
-		 * sections, each of which is prefixed with
-		 * extension name (4-byte) and section length
-		 * in 4-byte network byte order.
-		 */
-		uint32_t extsize;
-		memcpy(&extsize, (char *)mmap + src_offset + 4, 4);
-		extsize = ntohl(extsize);
-		if (read_index_extension(istate,
-					 (const char *) mmap + src_offset,
-					 (char *) mmap + src_offset + 8,
-					 extsize) < 0)
-			goto unmap;
-		src_offset += 8;
-		src_offset += extsize;
-	}
-	return 0;
-unmap:
-	munmap(mmap, mmap_size);
-	die("index file corrupt");
-}
-
 /* remember to discard_cache() before reading a different cache! */
 int read_index_from(struct index_state *istate, const char *path)
 {
@@ -1508,10 +1307,13 @@ int read_index_from(struct index_state *istate, const char *path)
 	errno = ENOENT;
 	istate->timestamp.sec = 0;
 	istate->timestamp.nsec = 0;
+
 	fd = open(path, O_RDONLY);
 	if (fd < 0) {
-		if (errno == ENOENT)
+		if (errno == ENOENT) {
+			initialize_index(istate, 0);
 			return 0;
+		}
 		die_errno("index file open failed");
 	}
 
@@ -1520,24 +1322,23 @@ int read_index_from(struct index_state *istate, const char *path)
 
 	errno = EINVAL;
 	mmap_size = xsize_t(st.st_size);
-	if (mmap_size < sizeof(struct cache_header) + 20)
-		die("index file smaller than expected");
-
 	mmap = xmmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
 	close(fd);
 	if (mmap == MAP_FAILED)
 		die_errno("unable to map index file");
 
 	hdr = mmap;
-	if (verify_hdr_version(hdr, mmap_size) < 0)
+	if (verify_hdr_version(istate, hdr, mmap_size) < 0)
 		goto unmap;
 
-	if (verify_hdr(mmap, mmap_size) < 0)
+	if (istate->ops->verify_hdr(mmap, mmap_size) < 0)
 		goto unmap;
 
-	read_index_v2(istate, mmap, mmap_size);
+	if (istate->ops->read_index(istate, mmap, mmap_size) < 0)
+		goto unmap;
 	istate->timestamp.sec = st.st_mtime;
 	istate->timestamp.nsec = ST_MTIME_NSEC(st);
+
 	munmap(mmap, mmap_size);
 	return istate->cache_nr;
 
@@ -1569,6 +1370,7 @@ int discard_index(struct index_state *istate)
 	free(istate->cache);
 	istate->cache = NULL;
 	istate->cache_alloc = 0;
+	istate->ops = NULL;
 	return 0;
 }
 
@@ -1582,201 +1384,6 @@ int unmerged_index(const struct index_state *istate)
 	return 0;
 }
 
-#define WRITE_BUFFER_SIZE 8192
-static unsigned char write_buffer[WRITE_BUFFER_SIZE];
-static unsigned long write_buffer_len;
-
-static int ce_write_flush(git_SHA_CTX *context, int fd)
-{
-	unsigned int buffered = write_buffer_len;
-	if (buffered) {
-		git_SHA1_Update(context, write_buffer, buffered);
-		if (write_in_full(fd, write_buffer, buffered) != buffered)
-			return -1;
-		write_buffer_len = 0;
-	}
-	return 0;
-}
-
-static int ce_write(git_SHA_CTX *context, int fd, void *data, unsigned int len)
-{
-	while (len) {
-		unsigned int buffered = write_buffer_len;
-		unsigned int partial = WRITE_BUFFER_SIZE - buffered;
-		if (partial > len)
-			partial = len;
-		memcpy(write_buffer + buffered, data, partial);
-		buffered += partial;
-		if (buffered == WRITE_BUFFER_SIZE) {
-			write_buffer_len = buffered;
-			if (ce_write_flush(context, fd))
-				return -1;
-			buffered = 0;
-		}
-		write_buffer_len = buffered;
-		len -= partial;
-		data = (char *) data + partial;
-	}
-	return 0;
-}
-
-static int write_index_ext_header(git_SHA_CTX *context, int fd,
-				  unsigned int ext, unsigned int sz)
-{
-	ext = htonl(ext);
-	sz = htonl(sz);
-	return ((ce_write(context, fd, &ext, 4) < 0) ||
-		(ce_write(context, fd, &sz, 4) < 0)) ? -1 : 0;
-}
-
-static int ce_flush(git_SHA_CTX *context, int fd)
-{
-	unsigned int left = write_buffer_len;
-
-	if (left) {
-		write_buffer_len = 0;
-		git_SHA1_Update(context, write_buffer, left);
-	}
-
-	/* Flush first if not enough space for SHA1 signature */
-	if (left + 20 > WRITE_BUFFER_SIZE) {
-		if (write_in_full(fd, write_buffer, left) != left)
-			return -1;
-		left = 0;
-	}
-
-	/* Append the SHA1 signature at the end */
-	git_SHA1_Final(write_buffer + left, context);
-	left += 20;
-	return (write_in_full(fd, write_buffer, left) != left) ? -1 : 0;
-}
-
-static void ce_smudge_racily_clean_entry(struct cache_entry *ce)
-{
-	/*
-	 * The only thing we care about in this function is to smudge the
-	 * falsely clean entry due to touch-update-touch race, so we leave
-	 * everything else as they are.  We are called for entries whose
-	 * ce_stat_data.sd_mtime match the index file mtime.
-	 *
-	 * Note that this actually does not do much for gitlinks, for
-	 * which ce_match_stat_basic() always goes to the actual
-	 * contents.  The caller checks with is_racy_timestamp() which
-	 * always says "no" for gitlinks, so we are not called for them ;-)
-	 */
-	struct stat st;
-
-	if (lstat(ce->name, &st) < 0)
-		return;
-	if (ce_match_stat_basic(ce, &st))
-		return;
-	if (ce_modified_check_fs(ce, &st)) {
-		/* This is "racily clean"; smudge it.  Note that this
-		 * is a tricky code.  At first glance, it may appear
-		 * that it can break with this sequence:
-		 *
-		 * $ echo xyzzy >frotz
-		 * $ git-update-index --add frotz
-		 * $ : >frotz
-		 * $ sleep 3
-		 * $ echo filfre >nitfol
-		 * $ git-update-index --add nitfol
-		 *
-		 * but it does not.  When the second update-index runs,
-		 * it notices that the entry "frotz" has the same timestamp
-		 * as index, and if we were to smudge it by resetting its
-		 * size to zero here, then the object name recorded
-		 * in index is the 6-byte file but the cached stat information
-		 * becomes zero --- which would then match what we would
-		 * obtain from the filesystem next time we stat("frotz").
-		 *
-		 * However, the second update-index, before calling
-		 * this function, notices that the cached size is 6
-		 * bytes and what is on the filesystem is an empty
-		 * file, and never calls us, so the cached size information
-		 * for "frotz" stays 6 which does not match the filesystem.
-		 */
-		ce->ce_stat_data.sd_size = 0;
-	}
-}
-
-/* Copy miscellaneous fields but not the name */
-static char *copy_cache_entry_to_ondisk(struct ondisk_cache_entry *ondisk,
-				       struct cache_entry *ce)
-{
-	short flags;
-
-	ondisk->ctime.sec = htonl(ce->ce_stat_data.sd_ctime.sec);
-	ondisk->mtime.sec = htonl(ce->ce_stat_data.sd_mtime.sec);
-	ondisk->ctime.nsec = htonl(ce->ce_stat_data.sd_ctime.nsec);
-	ondisk->mtime.nsec = htonl(ce->ce_stat_data.sd_mtime.nsec);
-	ondisk->dev  = htonl(ce->ce_stat_data.sd_dev);
-	ondisk->ino  = htonl(ce->ce_stat_data.sd_ino);
-	ondisk->mode = htonl(ce->ce_mode);
-	ondisk->uid  = htonl(ce->ce_stat_data.sd_uid);
-	ondisk->gid  = htonl(ce->ce_stat_data.sd_gid);
-	ondisk->size = htonl(ce->ce_stat_data.sd_size);
-	hashcpy(ondisk->sha1, ce->sha1);
-
-	flags = ce->ce_flags;
-	flags |= (ce_namelen(ce) >= CE_NAMEMASK ? CE_NAMEMASK : ce_namelen(ce));
-	ondisk->flags = htons(flags);
-	if (ce->ce_flags & CE_EXTENDED) {
-		struct ondisk_cache_entry_extended *ondisk2;
-		ondisk2 = (struct ondisk_cache_entry_extended *)ondisk;
-		ondisk2->flags2 = htons((ce->ce_flags & CE_EXTENDED_FLAGS) >> 16);
-		return ondisk2->name;
-	}
-	else {
-		return ondisk->name;
-	}
-}
-
-static int ce_write_entry(git_SHA_CTX *c, int fd, struct cache_entry *ce,
-			  struct strbuf *previous_name)
-{
-	int size;
-	struct ondisk_cache_entry *ondisk;
-	char *name;
-	int result;
-
-	if (!previous_name) {
-		size = ondisk_ce_size(ce);
-		ondisk = xcalloc(1, size);
-		name = copy_cache_entry_to_ondisk(ondisk, ce);
-		memcpy(name, ce->name, ce_namelen(ce));
-	} else {
-		int common, to_remove, prefix_size;
-		unsigned char to_remove_vi[16];
-		for (common = 0;
-		     (ce->name[common] &&
-		      common < previous_name->len &&
-		      ce->name[common] == previous_name->buf[common]);
-		     common++)
-			; /* still matching */
-		to_remove = previous_name->len - common;
-		prefix_size = encode_varint(to_remove, to_remove_vi);
-
-		if (ce->ce_flags & CE_EXTENDED)
-			size = offsetof(struct ondisk_cache_entry_extended, name);
-		else
-			size = offsetof(struct ondisk_cache_entry, name);
-		size += prefix_size + (ce_namelen(ce) - common + 1);
-
-		ondisk = xcalloc(1, size);
-		name = copy_cache_entry_to_ondisk(ondisk, ce);
-		memcpy(name, to_remove_vi, prefix_size);
-		memcpy(name + prefix_size, ce->name + common, ce_namelen(ce) - common);
-
-		strbuf_splice(previous_name, common, to_remove,
-			      ce->name + common, ce_namelen(ce) - common);
-	}
-
-	result = ce_write(c, fd, ondisk, size);
-	free(ondisk);
-	return result;
-}
-
 static int has_racy_timestamp(struct index_state *istate)
 {
 	int entries = istate->cache_nr;
@@ -1802,92 +1409,9 @@ void update_index_if_able(struct index_state *istate, struct lock_file *lockfile
 		rollback_lock_file(lockfile);
 }
 
-static int write_index_v2(struct index_state *istate, int newfd)
-{
-	git_SHA_CTX c;
-	struct cache_header hdr;
-	int i, err, removed, extended, hdr_version;
-	struct cache_entry **cache = istate->cache;
-	int entries = istate->cache_nr;
-	struct stat st;
-	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
-
-	for (i = removed = extended = 0; i < entries; i++) {
-		if (cache[i]->ce_flags & CE_REMOVE)
-			removed++;
-
-		/* reduce extended entries if possible */
-		cache[i]->ce_flags &= ~CE_EXTENDED;
-		if (cache[i]->ce_flags & CE_EXTENDED_FLAGS) {
-			extended++;
-			cache[i]->ce_flags |= CE_EXTENDED;
-		}
-	}
-
-	if (!istate->version)
-		istate->version = INDEX_FORMAT_DEFAULT;
-
-	/* demote version 3 to version 2 when the latter suffices */
-	if (istate->version == 3 || istate->version == 2)
-		istate->version = extended ? 3 : 2;
-
-	hdr_version = istate->version;
-
-	hdr.hdr_signature = htonl(CACHE_SIGNATURE);
-	hdr.hdr_version = htonl(hdr_version);
-	hdr.hdr_entries = htonl(entries - removed);
-
-	git_SHA1_Init(&c);
-	if (ce_write(&c, newfd, &hdr, sizeof(hdr)) < 0)
-		return -1;
-
-	previous_name = (hdr_version == 4) ? &previous_name_buf : NULL;
-	for (i = 0; i < entries; i++) {
-		struct cache_entry *ce = cache[i];
-		if (ce->ce_flags & CE_REMOVE)
-			continue;
-		if (!ce_uptodate(ce) && is_racy_timestamp(istate, ce))
-			ce_smudge_racily_clean_entry(ce);
-		if (is_null_sha1(ce->sha1))
-			return error("cache entry has null sha1: %s", ce->name);
-		if (ce_write_entry(&c, newfd, ce, previous_name) < 0)
-			return -1;
-	}
-	strbuf_release(&previous_name_buf);
-
-	/* Write extension data here */
-	if (istate->cache_tree) {
-		struct strbuf sb = STRBUF_INIT;
-
-		cache_tree_write(&sb, istate->cache_tree);
-		err = write_index_ext_header(&c, newfd, CACHE_EXT_TREE, sb.len) < 0
-			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
-		strbuf_release(&sb);
-		if (err)
-			return -1;
-	}
-	if (istate->resolve_undo) {
-		struct strbuf sb = STRBUF_INIT;
-
-		resolve_undo_write(&sb, istate->resolve_undo);
-		err = write_index_ext_header(&c, newfd, CACHE_EXT_RESOLVE_UNDO,
-					     sb.len) < 0
-			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
-		strbuf_release(&sb);
-		if (err)
-			return -1;
-	}
-
-	if (ce_flush(&c, newfd) || fstat(newfd, &st))
-		return -1;
-	istate->timestamp.sec = (unsigned int)st.st_mtime;
-	istate->timestamp.nsec = ST_MTIME_NSEC(st);
-	return 0;
-}
-
 int write_index(struct index_state *istate, int newfd)
 {
-	return write_index_v2(istate, newfd);
+	return istate->ops->write_index(istate, newfd);
 }
 
 /*
diff --git a/read-cache.h b/read-cache.h
new file mode 100644
index 0000000..d8debb8
--- /dev/null
+++ b/read-cache.h
@@ -0,0 +1,58 @@
+/* Index extensions.
+ *
+ * The first letter should be 'A'..'Z' for extensions that are not
+ * necessary for a correct operation (i.e. optimization data).
+ * When new extensions are added that _needs_ to be understood in
+ * order to correctly interpret the index file, pick character that
+ * is outside the range, to cause the reader to abort.
+ */
+
+#define CACHE_EXT(s) ( (s[0]<<24)|(s[1]<<16)|(s[2]<<8)|(s[3]) )
+#define CACHE_EXT_TREE 0x54524545	/* "TREE" */
+#define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */
+
+#define INDEX_FORMAT_DEFAULT 3
+
+/*
+ * Basic data structures for the directory cache
+ */
+struct cache_header {
+	uint32_t hdr_signature;
+	uint32_t hdr_version;
+	uint32_t hdr_entries;
+};
+
+struct index_ops {
+	int (*match_stat_basic)(const struct cache_entry *ce, struct stat *st, int changed);
+	int (*verify_hdr)(void *mmap, unsigned long size);
+	int (*read_index)(struct index_state *istate, void *mmap, unsigned long mmap_size);
+	int (*write_index)(struct index_state *istate, int newfd);
+};
+
+extern struct index_ops v2_ops;
+
+#ifndef NEEDS_ALIGNED_ACCESS
+#define ntoh_s(var) ntohs(var)
+#define ntoh_l(var) ntohl(var)
+#else
+static inline uint16_t ntoh_s_force_align(void *p)
+{
+	uint16_t x;
+	memcpy(&x, p, sizeof(x));
+	return ntohs(x);
+}
+static inline uint32_t ntoh_l_force_align(void *p)
+{
+	uint32_t x;
+	memcpy(&x, p, sizeof(x));
+	return ntohl(x);
+}
+#define ntoh_s(var) ntoh_s_force_align(&(var))
+#define ntoh_l(var) ntoh_l_force_align(&(var))
+#endif
+
+extern int ce_modified_check_fs(const struct cache_entry *ce, struct stat *st);
+extern int ce_match_stat_basic(const struct index_state *istate,
+			       const struct cache_entry *ce, struct stat *st);
+extern int is_racy_timestamp(const struct index_state *istate, const struct cache_entry *ce);
+extern void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce);
diff --git a/test-index-version.c b/test-index-version.c
index 05d4699..d3c0ebd 100644
--- a/test-index-version.c
+++ b/test-index-version.c
@@ -1,5 +1,11 @@
 #include "cache.h"
 
+struct cache_header {
+	uint32_t hdr_signature;
+	uint32_t hdr_version;
+	uint32_t hdr_entries;
+};
+
 int main(int argc, char **argv)
 {
 	struct cache_header hdr;
diff --git a/unpack-trees.c b/unpack-trees.c
index bf01717..71df3ad 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1035,10 +1035,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	}
 
 	memset(&o->result, 0, sizeof(o->result));
-	o->result.initialized = 1;
+	initialize_index(&o->result, o->src_index->version);
 	o->result.timestamp.sec = o->src_index->timestamp.sec;
 	o->result.timestamp.nsec = o->src_index->timestamp.nsec;
-	o->result.version = o->src_index->version;
 	o->merge_size = len;
 	mark_all_ce_unused(o->src_index);
 
-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 06/24] read-cache: Don't compare uid, gid and ino on cygwin
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
                   ` (4 preceding siblings ...)
  2013-08-18 19:41 ` [PATCH v3 05/24] read-cache: move index v2 specific functions to their own file Thomas Gummerer
@ 2013-08-18 19:41 ` Thomas Gummerer
  2013-08-18 22:34   ` Ramsay Jones
  2013-08-18 19:41 ` [PATCH v3 07/24] read-cache: Re-read index if index file changed Thomas Gummerer
                   ` (18 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:41 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

Cygwin doesn't have uid, gid and ino stats fields.  Therefore we should
never check them in the match_stat_data when working on the CYGWIN
platform.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---

This patch was not tested on Cygwin yet.  I think it's needed though,
because the re-reading of the index if it changed will no longer use
it's own index_changed function, but use the stat_validity_check
function instead.  Would be great if someone running Cygwin could test
this.

 read-cache.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index 1f827de..aa17ce7 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -82,6 +82,7 @@ int match_stat_data(const struct stat_data *sd, struct stat *st)
 		changed |= CTIME_CHANGED;
 #endif
 
+#if !defined (__CYGWIN__)
 	if (check_stat) {
 		if (sd->sd_uid != (unsigned int) st->st_uid ||
 			sd->sd_gid != (unsigned int) st->st_gid)
@@ -89,6 +90,7 @@ int match_stat_data(const struct stat_data *sd, struct stat *st)
 		if (sd->sd_ino != (unsigned int) st->st_ino)
 			changed |= INODE_CHANGED;
 	}
+#endif
 
 #ifdef USE_STDEV
 	/*
-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 07/24] read-cache: Re-read index if index file changed
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
                   ` (5 preceding siblings ...)
  2013-08-18 19:41 ` [PATCH v3 06/24] read-cache: Don't compare uid, gid and ino on cygwin Thomas Gummerer
@ 2013-08-18 19:41 ` Thomas Gummerer
  2013-08-18 19:41 ` [PATCH v3 08/24] add documentation for the index api Thomas Gummerer
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:41 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

Add the possibility of re-reading the index file, if it changed
while reading.

The index file might change during the read, causing outdated
information to be displayed. We check if the index file changed
by using its stat data as heuristic.

Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 read-cache.c | 65 +++++++++++++++++++++++++++++++-----------------------------
 1 file changed, 34 insertions(+), 31 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index aa17ce7..2d12601 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1296,8 +1296,8 @@ int read_index(struct index_state *istate)
 /* remember to discard_cache() before reading a different cache! */
 int read_index_from(struct index_state *istate, const char *path)
 {
-	int fd;
-	struct stat st;
+	int fd, err, i;
+	struct stat_validity sv;
 	struct cache_header *hdr;
 	void *mmap;
 	size_t mmap_size;
@@ -1309,43 +1309,46 @@ int read_index_from(struct index_state *istate, const char *path)
 	errno = ENOENT;
 	istate->timestamp.sec = 0;
 	istate->timestamp.nsec = 0;
-
-	fd = open(path, O_RDONLY);
-	if (fd < 0) {
-		if (errno == ENOENT) {
-			initialize_index(istate, 0);
-			return 0;
+	sv.sd = NULL;
+	for (i = 0; i < 50; i++) {
+		err = 0;
+		fd = open(path, O_RDONLY);
+		if (fd < 0) {
+			if (errno == ENOENT) {
+				initialize_index(istate, 0);
+				return 0;
+			}
+			die_errno("index file open failed");
 		}
-		die_errno("index file open failed");
-	}
 
-	if (fstat(fd, &st))
-		die_errno("cannot stat the open index");
+		stat_validity_update(&sv, fd);
+		if (!sv.sd)
+			die_errno("cannot stat the open index");
 
-	errno = EINVAL;
-	mmap_size = xsize_t(st.st_size);
-	mmap = xmmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
-	close(fd);
-	if (mmap == MAP_FAILED)
-		die_errno("unable to map index file");
+		errno = EINVAL;
+		mmap_size = xsize_t(sv.sd->sd_size);
+		mmap = xmmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
+		close(fd);
+		if (mmap == MAP_FAILED)
+			die_errno("unable to map index file");
 
-	hdr = mmap;
-	if (verify_hdr_version(istate, hdr, mmap_size) < 0)
-		goto unmap;
+		hdr = mmap;
+		if (verify_hdr_version(istate, hdr, mmap_size) < 0)
+			err = 1;
 
-	if (istate->ops->verify_hdr(mmap, mmap_size) < 0)
-		goto unmap;
+		if (istate->ops->verify_hdr(mmap, mmap_size) < 0)
+			err = 1;
 
-	if (istate->ops->read_index(istate, mmap, mmap_size) < 0)
-		goto unmap;
-	istate->timestamp.sec = st.st_mtime;
-	istate->timestamp.nsec = ST_MTIME_NSEC(st);
+		if (istate->ops->read_index(istate, mmap, mmap_size) < 0)
+			err = 1;
+		istate->timestamp.sec = sv.sd->sd_mtime.sec;
+		istate->timestamp.nsec = sv.sd->sd_mtime.nsec;
 
-	munmap(mmap, mmap_size);
-	return istate->cache_nr;
+		munmap(mmap, mmap_size);
+		if (stat_validity_check(&sv, path) && !err)
+			return istate->cache_nr;
+	}
 
-unmap:
-	munmap(mmap, mmap_size);
 	die("index file corrupt");
 }
 
-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 08/24] add documentation for the index api
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
                   ` (6 preceding siblings ...)
  2013-08-18 19:41 ` [PATCH v3 07/24] read-cache: Re-read index if index file changed Thomas Gummerer
@ 2013-08-18 19:41 ` Thomas Gummerer
  2013-08-18 20:50   ` Eric Sunshine
  2013-08-18 19:41 ` [PATCH v3 09/24] read-cache: add index reading api Thomas Gummerer
                   ` (16 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:41 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

Add documentation for the index reading api.  This also includes
documentation for the new api functions introduced in the next patch.

Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 Documentation/technical/api-in-core-index.txt | 54 +++++++++++++++++++++++++--
 1 file changed, 50 insertions(+), 4 deletions(-)

diff --git a/Documentation/technical/api-in-core-index.txt b/Documentation/technical/api-in-core-index.txt
index adbdbf5..9b8c37c 100644
--- a/Documentation/technical/api-in-core-index.txt
+++ b/Documentation/technical/api-in-core-index.txt
@@ -1,14 +1,60 @@
 in-core index API
 =================
 
+Reading API
+-----------
+
+`cache`::
+
+	An array of cache entries.  This is used to access the cache
+	entries directly.  Use `index_name_pos` to search for the
+	index of a specific cache entry.
+
+`read_index_filtered`::
+
+	Read a part of the index, filtered by the pathspec given in
+	the opts.  The function may load more than necessary, so the
+	caller still responsible to apply filters appropriately.  The
+	filtering is only done for performance reasons, as it's
+	possible to only read part of the index when the on-disk
+	format is index-v5.
+
+	To iterate only over the entries that match the pathspec, use
+	the for_each_index_entry function.
+
+`read_index`::
+
+	Read the whole index file from disk.
+
+`index_name_pos`::
+
+	Find a cache_entry with name in the index.  Returns pos if an
+	entry is matched exactly and -1-pos if an entry is matched
+	partially.
+	e.g.
+	index:
+	file1
+	file2
+	path/file1
+	zzz
+
+	index_name_pos("path/file1", 10) returns 2, while
+	index_name_pos("path", 4) returns -3
+
+`for_each_index_entry`::
+
+	Iterates over all cache_entries in the index filtered by
+	filter_opts in the index_state.  For each cache entry fn is
+	executed with cb_data as callback data.  From within the loop
+	do `return 0` to continue, or `return 1` to break the loop.
+
+TODO
+----
 Talk about <read-cache.c> and <cache-tree.c>, things like:
 
-* cache -> the_index macros
-* read_index()
 * write_index()
 * ie_match_stat() and ie_modified(); how they are different and when to
   use which.
-* index_name_pos()
 * remove_index_entry_at()
 * remove_file_from_index()
 * add_file_to_index()
@@ -18,4 +64,4 @@ Talk about <read-cache.c> and <cache-tree.c>, things like:
 * cache_tree_invalidate_path()
 * cache_tree_update()
 
-(JC, Linus)
+(JC, Linus, Thomas Gummerer)
-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 09/24] read-cache: add index reading api
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
                   ` (7 preceding siblings ...)
  2013-08-18 19:41 ` [PATCH v3 08/24] add documentation for the index api Thomas Gummerer
@ 2013-08-18 19:41 ` Thomas Gummerer
  2013-08-18 19:41 ` [PATCH v3 10/24] make sure partially read index is not changed Thomas Gummerer
                   ` (15 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:41 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

Add an api for access to the index file.  Currently there is only a very
basic api for accessing the index file, which only allows a full read of
the index, and lets the users of the data filter it.  The new index api
gives the users the possibility to use only part of the index and
provides functions for iterating over and accessing cache entries.

This simplifies future improvements to the in-memory format, as changes
will be concentrated on one file, instead of the whole git source code.

Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 cache.h         | 41 ++++++++++++++++++++++++++++++++++++++++-
 read-cache-v2.c | 10 ++++++++--
 read-cache.c    | 47 +++++++++++++++++++++++++++++++++++++++++++----
 read-cache.h    |  3 ++-
 4 files changed, 93 insertions(+), 8 deletions(-)

diff --git a/cache.h b/cache.h
index d4dae21..da224c9 100644
--- a/cache.h
+++ b/cache.h
@@ -127,7 +127,7 @@ struct cache_entry {
 	unsigned int ce_flags;
 	unsigned int ce_namelen;
 	unsigned char sha1[20];
-	struct cache_entry *next;
+	struct cache_entry *next; /* used by name_hash */
 	char name[FLEX_ARRAY]; /* more */
 };
 
@@ -260,6 +260,29 @@ static inline unsigned int canon_mode(unsigned int mode)
 
 #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)
 
+/*
+ * Options by which the index should be filtered when read partially.
+ *
+ * pathspec: The pathspec which the index entries have to match
+ * seen: Used to return the seen parameter from match_pathspec()
+ * max_prefix_len: The common prefix length of the pathspecs
+ *
+ * read_staged: used to indicate if the conflicted entries (entries
+ *     with a stage) should be included
+ * read_cache_tree: used to indicate if the cache-tree should be read
+ * read_resolve_undo: used to indicate if the resolve undo data should
+ *     be read
+ */
+struct filter_opts {
+	const struct pathspec *pathspec;
+	char *seen;
+	int max_prefix_len;
+
+	int read_staged;
+	int read_cache_tree;
+	int read_resolve_undo;
+};
+
 struct index_state {
 	struct cache_entry **cache;
 	unsigned int version;
@@ -272,6 +295,7 @@ struct index_state {
 	struct hash_table name_hash;
 	struct hash_table dir_hash;
 	struct index_ops *ops;
+	struct filter_opts *filter_opts;
 };
 
 extern struct index_state the_index;
@@ -315,6 +339,12 @@ extern void free_name_hash(struct index_state *istate);
 #define unmerge_cache_entry_at(at) unmerge_index_entry_at(&the_index, at)
 #define unmerge_cache(pathspec) unmerge_index(&the_index, pathspec)
 #define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
+
+/* index api */
+#define read_cache_filtered(opts) read_index_filtered(&the_index, (opts))
+#define read_cache_filtered_from(path, opts) read_index_filtered_from(&the_index, (path), (opts))
+#define for_each_cache_entry(fn, cb_data) \
+	for_each_index_entry(&the_index, (fn), (cb_data))
 #endif
 
 enum object_type {
@@ -448,6 +478,15 @@ extern void sanitize_stdfds(void);
 		} \
 	} while (0)
 
+/* index api */
+extern int read_index_filtered(struct index_state *, struct filter_opts *opts);
+extern int read_index_filtered_from(struct index_state *, const char *path, struct filter_opts *opts);
+
+typedef int each_cache_entry_fn(struct cache_entry *ce, void *);
+extern int for_each_index_entry(struct index_state *istate,
+				each_cache_entry_fn, void *);
+
+
 /* Initialize and use the cache information */
 extern void initialize_index(struct index_state *istate, int version);
 extern void change_index_version(struct index_state *istate, int version);
diff --git a/read-cache-v2.c b/read-cache-v2.c
index 070d468..63be074 100644
--- a/read-cache-v2.c
+++ b/read-cache-v2.c
@@ -3,6 +3,7 @@
 #include "resolve-undo.h"
 #include "cache-tree.h"
 #include "varint.h"
+#include "dir.h"
 
 /* Mask for the name length in ce_flags in the on-disk index */
 #define CE_NAMEMASK  (0x0fff)
@@ -202,8 +203,14 @@ static int read_index_extension(struct index_state *istate,
 	return 0;
 }
 
+/*
+ * The performance is the same if we read the whole index or only
+ * part of it, therefore we always read the whole index to avoid
+ * having to re-read it later.  The filter_opts will determine
+ * what part of the index is used when retrieving the cache-entries.
+ */
 static int read_index_v2(struct index_state *istate, void *mmap,
-			 unsigned long mmap_size)
+			 unsigned long mmap_size, struct filter_opts *opts)
 {
 	int i;
 	unsigned long src_offset;
@@ -229,7 +236,6 @@ static int read_index_v2(struct index_state *istate, void *mmap,
 		disk_ce = (struct ondisk_cache_entry *)((char *)mmap + src_offset);
 		ce = create_from_disk(disk_ce, &consumed, previous_name);
 		set_index_entry(istate, i, ce);
-
 		src_offset += consumed;
 	}
 	strbuf_release(&previous_name_buf);
diff --git a/read-cache.c b/read-cache.c
index 2d12601..38b9a04 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1290,11 +1290,41 @@ static int verify_hdr_version(struct index_state *istate,
 
 int read_index(struct index_state *istate)
 {
-	return read_index_from(istate, get_index_file());
+	return read_index_filtered_from(istate, get_index_file(), NULL);
 }
 
-/* remember to discard_cache() before reading a different cache! */
-int read_index_from(struct index_state *istate, const char *path)
+int read_index_filtered(struct index_state *istate, struct filter_opts *opts)
+{
+	return read_index_filtered_from(istate, get_index_file(), opts);
+}
+
+/*
+ * Execute fn for each index entry which is currently in istate.  Data
+ * can be given to the function using the cb_data parameter.
+ */
+int for_each_index_entry(struct index_state *istate, each_cache_entry_fn fn, void *cb_data)
+{
+	int i, ret = 0;
+	struct filter_opts *opts = istate->filter_opts;
+
+	for (i = 0; i < istate->cache_nr; i++) {
+		struct cache_entry *ce = istate->cache[i];
+
+		if (opts && !opts->read_staged && ce_stage(ce))
+			continue;
+
+		if (opts && !match_pathspec_depth(opts->pathspec, ce->name, ce_namelen(ce),
+						  opts->max_prefix_len, opts->seen))
+			continue;
+
+		if ((ret = fn(istate->cache[i], cb_data)))
+			break;
+	}
+	return ret;
+}
+
+int read_index_filtered_from(struct index_state *istate, const char *path,
+			     struct filter_opts *opts)
 {
 	int fd, err, i;
 	struct stat_validity sv;
@@ -1309,6 +1339,7 @@ int read_index_from(struct index_state *istate, const char *path)
 	errno = ENOENT;
 	istate->timestamp.sec = 0;
 	istate->timestamp.nsec = 0;
+	istate->filter_opts = opts;
 	sv.sd = NULL;
 	for (i = 0; i < 50; i++) {
 		err = 0;
@@ -1339,7 +1370,7 @@ int read_index_from(struct index_state *istate, const char *path)
 		if (istate->ops->verify_hdr(mmap, mmap_size) < 0)
 			err = 1;
 
-		if (istate->ops->read_index(istate, mmap, mmap_size) < 0)
+		if (istate->ops->read_index(istate, mmap, mmap_size, opts) < 0)
 			err = 1;
 		istate->timestamp.sec = sv.sd->sd_mtime.sec;
 		istate->timestamp.nsec = sv.sd->sd_mtime.nsec;
@@ -1352,6 +1383,13 @@ int read_index_from(struct index_state *istate, const char *path)
 	die("index file corrupt");
 }
 
+
+/* remember to discard_cache() before reading a different cache! */
+int read_index_from(struct index_state *istate, const char *path)
+{
+	return read_index_filtered_from(istate, path, NULL);
+}
+
 int is_index_unborn(struct index_state *istate)
 {
 	return (!istate->cache_nr && !istate->timestamp.sec);
@@ -1376,6 +1414,7 @@ int discard_index(struct index_state *istate)
 	istate->cache = NULL;
 	istate->cache_alloc = 0;
 	istate->ops = NULL;
+	istate->filter_opts = NULL;
 	return 0;
 }
 
diff --git a/read-cache.h b/read-cache.h
index d8debb8..644b199 100644
--- a/read-cache.h
+++ b/read-cache.h
@@ -25,7 +25,8 @@ struct cache_header {
 struct index_ops {
 	int (*match_stat_basic)(const struct cache_entry *ce, struct stat *st, int changed);
 	int (*verify_hdr)(void *mmap, unsigned long size);
-	int (*read_index)(struct index_state *istate, void *mmap, unsigned long mmap_size);
+	int (*read_index)(struct index_state *istate, void *mmap, unsigned long mmap_size,
+			  struct filter_opts *opts);
 	int (*write_index)(struct index_state *istate, int newfd);
 };
 
-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 10/24] make sure partially read index is not changed
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
                   ` (8 preceding siblings ...)
  2013-08-18 19:41 ` [PATCH v3 09/24] read-cache: add index reading api Thomas Gummerer
@ 2013-08-18 19:41 ` Thomas Gummerer
  2013-08-18 21:06   ` Eric Sunshine
  2013-08-18 19:42 ` [PATCH v3 11/24] grep.c: use index api Thomas Gummerer
                   ` (14 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:41 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

A partially read index file currently cannot be written to disk.  Make
sure that never happens, by erroring out when a caller tries to write a
partially read index.  Do the same when trying to re-read a partially
read index without having discarded it first to avoid loosing any
information.

Forcing the caller to load the right part of the index file instead of
re-reading it when changing it, gives a bit of a performance advantage,
by avoiding to read parts of the index twice.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 read-cache.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index 38b9a04..7a27f9b 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1332,6 +1332,8 @@ int read_index_filtered_from(struct index_state *istate, const char *path,
 	void *mmap;
 	size_t mmap_size;
 
+	if (istate->filter_opts)
+		die("BUG: Can't re-read partially read index");
 	errno = EBUSY;
 	if (istate->initialized)
 		return istate->cache_nr;
@@ -1455,6 +1457,8 @@ void update_index_if_able(struct index_state *istate, struct lock_file *lockfile
 
 int write_index(struct index_state *istate, int newfd)
 {
+	if (istate->filter_opts)
+		die("BUG: index: cannot write a partially read index");
 	return istate->ops->write_index(istate, newfd);
 }
 
-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 11/24] grep.c: use index api
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
                   ` (9 preceding siblings ...)
  2013-08-18 19:41 ` [PATCH v3 10/24] make sure partially read index is not changed Thomas Gummerer
@ 2013-08-18 19:42 ` Thomas Gummerer
  2013-08-18 19:42 ` [PATCH v3 12/24] ls-files.c: " Thomas Gummerer
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:42 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 builtin/grep.c | 69 +++++++++++++++++++++++++++++-----------------------------
 1 file changed, 35 insertions(+), 34 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 7dc0389..1114fe8 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -369,41 +369,31 @@ static void run_pager(struct grep_opt *opt, const char *prefix)
 	free(argv);
 }
 
-static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int cached)
+struct grep_opts {
+	struct grep_opt *opt;
+	const struct pathspec *pathspec;
+	int cached;
+	int hit;
+};
+
+static int grep_cache(struct cache_entry *ce, void *cb_data)
 {
-	int hit = 0;
-	int nr;
-	read_cache();
+	struct grep_opts *opts = cb_data;
 
-	for (nr = 0; nr < active_nr; nr++) {
-		const struct cache_entry *ce = active_cache[nr];
-		if (!S_ISREG(ce->ce_mode))
-			continue;
-		if (!match_pathspec_depth(pathspec, ce->name, ce_namelen(ce), 0, NULL))
-			continue;
-		/*
-		 * If CE_VALID is on, we assume worktree file and its cache entry
-		 * are identical, even if worktree file has been modified, so use
-		 * cache version instead
-		 */
-		if (cached || (ce->ce_flags & CE_VALID) || ce_skip_worktree(ce)) {
-			if (ce_stage(ce))
-				continue;
-			hit |= grep_sha1(opt, ce->sha1, ce->name, 0, ce->name);
-		}
-		else
-			hit |= grep_file(opt, ce->name);
-		if (ce_stage(ce)) {
-			do {
-				nr++;
-			} while (nr < active_nr &&
-				 !strcmp(ce->name, active_cache[nr]->name));
-			nr--; /* compensate for loop control */
-		}
-		if (hit && opt->status_only)
-			break;
-	}
-	return hit;
+	if (!S_ISREG(ce->ce_mode))
+		return 0;
+	/*
+	 * If CE_VALID is on, we assume worktree file and its cache entry
+	 * are identical, even if worktree file has been modified, so use
+	 * cache version instead
+	 */
+	if (opts->cached || (ce->ce_flags & CE_VALID) || ce_skip_worktree(ce))
+		opts->hit |= grep_sha1(opts->opt, ce->sha1, ce->name, 0, ce->name);
+	else
+		opts->hit |= grep_file(opts->opt, ce->name);
+	if (opts->hit && opts->opt->status_only)
+		return 1;
+	return 0;
 }
 
 static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
@@ -897,10 +887,21 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	} else if (0 <= opt_exclude) {
 		die(_("--[no-]exclude-standard cannot be used for tracked contents."));
 	} else if (!list.nr) {
+		struct grep_opts opts;
+		struct filter_opts *filter_opts = xmalloc(sizeof(*filter_opts));
+
 		if (!cached)
 			setup_work_tree();
 
-		hit = grep_cache(&opt, &pathspec, cached);
+		memset(filter_opts, 0, sizeof(*filter_opts));
+		filter_opts->pathspec = &pathspec;
+		opts.opt = &opt;
+		opts.pathspec = &pathspec;
+		opts.cached = cached;
+		opts.hit = 0;
+		read_cache_filtered(filter_opts);
+		for_each_cache_entry(grep_cache, &opts);
+		hit = opts.hit;
 	} else {
 		if (cached)
 			die(_("both --cached and trees are given."));
-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 12/24] ls-files.c: use index api
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
                   ` (10 preceding siblings ...)
  2013-08-18 19:42 ` [PATCH v3 11/24] grep.c: use index api Thomas Gummerer
@ 2013-08-18 19:42 ` Thomas Gummerer
  2013-08-18 19:42 ` [PATCH v3 13/24] documentation: add documentation of the index-v5 file format Thomas Gummerer
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:42 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 builtin/ls-files.c | 36 +++++++++++++++++++++++++++++++++---
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index bebc9c2..fbf9c47 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -288,6 +288,22 @@ static void prune_cache(const char *prefix)
 	active_nr = last;
 }
 
+static int needs_trailing_slash_stripped(void)
+{
+	int i;
+
+	if (!pathspec.nr)
+		return 0;
+
+	for (i = 0; i < pathspec.nr; i++) {
+		int len = strlen(pathspec.items[i].original);
+
+		if (len > 1 && (pathspec.items[i].original)[len - 1] == '/')
+			return 1;
+	}
+	return 0;
+}
+
 /*
  * Read the tree specified with --with-tree option
  * (typically, HEAD) into stage #1 and then
@@ -445,6 +461,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 	struct dir_struct dir;
 	struct exclude_list *el;
 	struct string_list exclude_list = STRING_LIST_INIT_NODUP;
+	struct filter_opts *opts = xmalloc(sizeof(*opts));
 	struct option builtin_ls_files_options[] = {
 		{ OPTION_CALLBACK, 'z', NULL, NULL, NULL,
 			N_("paths are separated with NUL character"),
@@ -510,9 +527,6 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		prefix_len = strlen(prefix);
 	git_config(git_default_config, NULL);
 
-	if (read_cache() < 0)
-		die("index file corrupt");
-
 	argc = parse_options(argc, argv, prefix, builtin_ls_files_options,
 			ls_files_usage, 0);
 	el = add_exclude_list(&dir, EXC_CMDL, "--exclude option");
@@ -548,6 +562,22 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix)
 		       PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
 		       prefix, argv);
 
+	if (!with_tree && !needs_trailing_slash_stripped()) {
+		memset(opts, 0, sizeof(*opts));
+		opts->pathspec = &pathspec;
+		opts->read_staged = 1;
+		if (show_resolve_undo)
+			opts->read_resolve_undo = 1;
+		read_cache_filtered(opts);
+	} else {
+		read_cache();
+		parse_pathspec(&pathspec, 0,
+			       PATHSPEC_PREFER_CWD |
+			       PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
+			       prefix, argv);
+
+	}
+
 	/* Find common prefix for all pathspec's */
 	max_prefix = common_prefix(&pathspec);
 	max_prefix_len = max_prefix ? strlen(max_prefix) : 0;
-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 13/24] documentation: add documentation of the index-v5 file format
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
                   ` (11 preceding siblings ...)
  2013-08-18 19:42 ` [PATCH v3 12/24] ls-files.c: " Thomas Gummerer
@ 2013-08-18 19:42 ` Thomas Gummerer
  2013-08-18 19:42 ` [PATCH v3 14/24] read-cache: make in-memory format aware of stat_crc Thomas Gummerer
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:42 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

Add a documentation of the index file format version 5 to
Documentation/technical.

Helped-by: Michael Haggerty <mhagger@alum.mit.edu>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Thomas Rast <trast@student.ethz.ch>
Helped-by: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Helped-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 Documentation/technical/index-file-format-v5.txt | 301 +++++++++++++++++++++++
 1 file changed, 301 insertions(+)
 create mode 100644 Documentation/technical/index-file-format-v5.txt

diff --git a/Documentation/technical/index-file-format-v5.txt b/Documentation/technical/index-file-format-v5.txt
new file mode 100644
index 0000000..5209c02
--- /dev/null
+++ b/Documentation/technical/index-file-format-v5.txt
@@ -0,0 +1,301 @@
+GIT index format
+================
+
+== The git index
+
+   The git index file (.git/index) documents the status of the files
+     in the git staging area.
+
+   The staging area is used for preparing commits, merging, etc.
+
+== The git index file format
+
+   All binary numbers are in network byte order. Version 5 is described
+     here. The index file consists of various sections. They appear in
+     the following order in the file.
+
+   - header: the description of the index format, including it's signature,
+     version and various other fields that are used internally.
+
+   - diroffsets (ndir entries of "direcotry offset"): A 4-byte offset
+       relative to the beginning of the "direntries block" (see below)
+       for each of the ndir directories in the index, sorted by pathname
+       (of the directory it's pointing to). [1]
+
+   - direntries (ndir entries of "directory offset"): A directory entry
+       for each of the ndir directories in the index, sorted by pathname
+       (see below). [2]
+
+   - fileoffsets (nfile entries of "file offset"): A 4-byte offset
+       relative to the beginning of the fileentries block (see below)
+       for each of the nfile files in the index. [1]
+
+   - fileentries (nfile entries of "file entry"): A file entry for
+       each of the nfile files in the index (see below).
+
+   - crdata: A number of entries for conflicted data/resolved conflicts
+       (see below).
+
+   - Extensions (Currently none, see below in the future)
+
+     Extensions are identified by signature. Optional extensions can
+     be ignored if GIT does not understand them.
+
+     GIT supports an arbitrary number of extension, but currently none
+     is implemented. [3]
+
+     extsig (32-bits): extension signature. If the first byte is 'A'..'Z'
+     the extension is optional and can be ignored.
+
+     extsize (32-bits): size of the extension, excluding the header
+       (extsig, extsize, extchecksum).
+
+     extchecksum (32-bits): crc32 checksum of the extension signature
+       and size.
+
+    - Extension data.
+
+== Header
+   sig (32-bits): Signature:
+     The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache")
+
+   vnr (32-bits): Version number:
+     The current supported versions are 2, 3, 4 and 5.
+
+   nfile (32-bits): number of file entries in the index.
+
+   ndir (32-bits): number of directories in the index.
+
+   fblockoffset (32-bits): offset to the file block, relative to the
+     beginning of the file.
+
+   - Offset to the extensions.
+
+     nextensions (32-bits): number of extensions.
+
+     extoffset (32-bits): offset to the extension. (Possibly none, as
+       many as indicated in the 4-byte number of extensions)
+
+   headercrc (32-bits): crc checksum including the header and the
+     offsets to the extensions.
+
+
+== Directory offsets (diroffsets)
+
+  diroffset (32-bits): offset to the directory relative to the
+    beginning of the index file. There are ndir + 1 offsets in the
+    diroffset table, the last is pointing to the end of the last
+    direntry. With this last entry, we are able to replace the strlen
+    of the directory name when reading the directory name, by
+    calculating it from diroffset[n+1]-diroffset[n]-61.  61 is the
+    size of the directory data, which follows each each directory +
+    the crc sum + the NUL byte.
+
+  This part is needed for making the directory entries bisectable and
+    thus allowing a binary search.
+
+== Directory entry (direntries)
+
+  Directory entries are sorted in lexicographic order by the name
+    of their path starting with the root.
+
+  pathname (variable length, nul terminated): relative to top level
+    directory (without the leading slash). '/' is used as path
+    separator. A string of length 0 ('') indicates the root directory.
+    The special path components ".", and ".." (without quotes) are
+    disallowed. The path also includes a trailing slash. [9]
+
+  foffset (32-bits): offset to the lexicographically first file in
+    the file offsets (fileoffsets), relative to the beginning of
+    the fileoffset block.
+
+  cr (32-bits): offset to conflicted/resolved data at the end of the
+    index. 0 if there is no such data. [4]
+
+  ncr (32-bits): number of conflicted/resolved data entries at the
+    end of the index if the offset is non 0. If cr is 0, ncr is
+    also 0.
+
+  nsubtrees (32-bits): number of subtrees this tree has in the index.
+
+  nfiles (32-bits): number of files in the directory, that are in
+    the index.
+
+  nentries (32-bits): number of entries in the index that is covered
+    by the tree this entry represents. (-1 if the entry is invalid).
+    This number includes all the files in this tree, recursively.
+
+  objname (160-bits): object name for the object that would result
+    from writing this span of index as a tree. This is only valid
+    if nentries is valid, meaning the cache-tree is valid.
+
+  flags (16-bits): 'flags' field split into (high to low bits) (For
+    D/F conflicts)
+
+    stage (2-bits): stage of the directory during merge
+
+    14-bit unused
+
+  dircrc (32-bits): crc32 checksum for each directory entry.
+
+  The last 24 bytes (4-byte number of entries + 160-bit object name) are
+    for the cache tree. An entry can be in an invalidated state which is
+    represented by having -1 in the entry_count field.
+
+  The entries are written out in the top-down, depth-first order. The
+    first entry represents the root level of the repository, followed by
+    the first subtree - let's call it A - of the root level, followed by
+    the first subtree of A, ... There is no prefix compression for
+    directories.
+
+== File offsets (fileoffsets)
+
+  fileoffset (32-bits): offset to the file relative to the beginning of
+    the fileentries block.
+
+  This part is needed for making the file entries bisectable and
+    thus allowing a binary search. There are nfile + 1 offsets in the
+    fileoffset table, the last is pointing to the end of the last
+    fileentry. With this last entry, we can replace the strlen when
+    reading each filename, by calculating its length with the offsets.
+
+== File entry (fileentries)
+
+  File entries are sorted in ascending order on the name field, after the
+  respective offset given by the directory entries. All file names are
+  prefix compressed, meaning the file name is relative to the directory.
+
+  filename (variable length, nul terminated). The exact encoding is
+    undefined, but the filename cannot contain a NUL byte (iow, the same
+    encoding as a UNIX pathname).
+
+  flags (16-bits): 'flags' field split into (high to low bits)
+
+    assumevalid (1-bit): assume-valid flag
+
+    intenttoadd (1-bit): intent-to-add flag, used by "git add -N".
+      Extended flag in index v3.
+
+    stage (2-bit): stage of the file during merge
+
+    skipworktree (1-bit): skip-worktree flag, used by sparse checkout.
+      Extended flag in index v3.
+
+    smudged (1-bit): indicates if the file is racily smudged.
+
+    invalid (1-bit): This bit can be set to indicate that a file was
+      deleted, but not yet removed from the index, because the index
+      was only partially rewritten.  Entries with this flags should be
+      ignored when reading the index file.
+
+    9-bit unused, must be zero [6]
+
+  mode (16-bits): file mode, split into (high to low bits)
+
+    objtype (4-bits): object type
+      valid values in binary are 1000 (regular file), 1010 (symbolic
+      link) and 1110 (gitlink)
+
+    3-bit unused
+
+    permission (9-bits): unix permission. Only 0755 and 0644 are valid
+      for regular files. Symbolic links and gitlinks have value 0 in
+      this field.
+
+  mtimes (32-bits): mtime seconds, the last time a file's data changed
+    this is stat(2) data
+
+  mtimens (32-bits): mtime nanosecond fractions
+    this is stat(2) data
+
+  file size (32-bits): The on-disk size, trucated to 32-bit.
+    this is stat(2) data
+
+  statcrc (32-bits): crc32 checksum over ctime seconds, ctime
+    nanoseconds, ino, dev, uid, gid (All stat(2) data
+    except mtime and file size). If the statcrc is 0 it will
+    be ignored. [7]
+
+  objhash (160-bits): SHA-1 for the represented object
+
+  entrycrc (32-bits): crc32 checksum for the file entry. The crc code
+    includes the offset to the offset to the file, relative to the
+    beginning of the file.
+
+== Conflict data
+
+  A conflict is represented in the index as a set of higher stage entries.
+  These entries are stored at the end of the index. When a conflict is
+  resolved (e.g. with "git add path"). A bit is flipped, to indicate that
+  the conflict is resolved, but the entries will be kept, so that
+  conflicts can be recreated (e.g. with "git checkout -m", in case users
+  want to redo a conflict resolution from scratch.
+
+  The conflicts will also be stored in the fileentries part of the index,
+  to simplify reading and writing of the index.
+
+  filename (variable length, nul terminated): filename of the entry,
+    relative to its containing directory).
+
+  nfileconflicts (32-bits): number of conflicts for the file [8]
+
+  flags (nfileconflicts entries of "flags") (16-bits): 'flags' field
+    split into:
+
+    conflicted (1-bit): conflicted state (conflicted/resolved) (1 if
+      conflicted)
+
+    stage (2-bits): stage during merge.
+
+    13-bit unused
+
+  entry_mode (nfileconflicts entries of "entry mode") (16-bits):
+    octal numbers, entry mode of eache entry in the different stages.
+    (How many is defined by the 4-byte number before)
+
+  objectnames (nfileconflicts entries of "object name") (160-bits):
+    object names  of the different stages.
+
+  conflictcrc (32-bits): crc32 checksum over conflict data.
+
+== Design explanations
+
+[1] The directory and file offsets are included in the index format
+    to enable bisectability of the index, for binary searches.Updating
+    a single entry and partial reading will benefit from this.
+
+[2] The directories are saved in their own block, to be able to
+    quickly search for a directory in the index. They include a
+    offset to the (lexically) first file in the directory.
+
+[3] The data of the cache-tree extension and the resolve undo
+    extension is now part of the index itself, but if other extensions
+    come up in the future, there is no need to change the index, they
+    can simply be added at the end.
+
+[4] To avoid rewrites of the whole index when there are conflicts or
+    conflicts are being resolved, conflicted data will be stored at
+    the end of the index. To mark the conflict resolved, just a bit
+    has to be flipped. The data will still be there, if a user wants
+    to redo the conflict resolution.
+
+[5] Since only 4 modes are effectively allowed in git but 32-bit are
+    used to store them, having a two bit flag for the mode is enough
+    and saves 4 byte per entry.
+
+[6] The length of the file name was dropped, since each file name is
+    nul terminated anyway.
+
+[7] Since all stat data (except mtime and ctime) is just used for
+    checking if a file has changed a checksum of the data is enough.
+    In addition to that Thomas Rast suggested ctime could be ditched
+    completely (core.trustctime=false) and thus included in the
+    checksum. This would save 24 bytes per index entry, which would
+    be about 4 MB on the Webkit index.
+    (Thanks for the suggestion to Michael Haggerty)
+
+[8] Since there can be more stage #1 entries, it is necessary to know
+    the number of conflict data entries there are.
+
+[9] As Michael Haggerty pointed out on the mailing list, storing the
+    trailing slash will simplify a few operations.
-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 14/24] read-cache: make in-memory format aware of stat_crc
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
                   ` (12 preceding siblings ...)
  2013-08-18 19:42 ` [PATCH v3 13/24] documentation: add documentation of the index-v5 file format Thomas Gummerer
@ 2013-08-18 19:42 ` Thomas Gummerer
  2013-08-18 19:42 ` [PATCH v3 15/24] read-cache: read index-v5 Thomas Gummerer
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:42 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

Make the in-memory format aware of the stat_crc used by index-v5.
It is simply ignored by index version prior to v5.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 cache.h      |  1 +
 read-cache.c | 25 +++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/cache.h b/cache.h
index da224c9..714a334 100644
--- a/cache.h
+++ b/cache.h
@@ -127,6 +127,7 @@ struct cache_entry {
 	unsigned int ce_flags;
 	unsigned int ce_namelen;
 	unsigned char sha1[20];
+	uint32_t ce_stat_crc;
 	struct cache_entry *next; /* used by name_hash */
 	char name[FLEX_ARRAY]; /* more */
 };
diff --git a/read-cache.c b/read-cache.c
index 7a27f9b..a232372 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -108,6 +108,29 @@ int match_stat_data(const struct stat_data *sd, struct stat *st)
 	return changed;
 }
 
+static uint32_t calculate_stat_crc(struct cache_entry *ce)
+{
+	unsigned int ctimens = 0;
+	uint32_t stat, stat_crc;
+
+	stat = htonl(ce->ce_stat_data.sd_ctime.sec);
+	stat_crc = crc32(0, (Bytef*)&stat, 4);
+#ifdef USE_NSEC
+	ctimens = ce->ce_stat_data.sd_ctime.nsec;
+#endif
+	stat = htonl(ctimens);
+	stat_crc = crc32(stat_crc, (Bytef*)&stat, 4);
+	stat = htonl(ce->ce_stat_data.sd_ino);
+	stat_crc = crc32(stat_crc, (Bytef*)&stat, 4);
+	stat = htonl(ce->ce_stat_data.sd_dev);
+	stat_crc = crc32(stat_crc, (Bytef*)&stat, 4);
+	stat = htonl(ce->ce_stat_data.sd_uid);
+	stat_crc = crc32(stat_crc, (Bytef*)&stat, 4);
+	stat = htonl(ce->ce_stat_data.sd_gid);
+	stat_crc = crc32(stat_crc, (Bytef*)&stat, 4);
+	return stat_crc;
+}
+
 /*
  * This only updates the "non-critical" parts of the directory
  * cache, ie the parts that aren't tracked by GIT, and only used
@@ -122,6 +145,8 @@ void fill_stat_cache_info(struct cache_entry *ce, struct stat *st)
 
 	if (S_ISREG(st->st_mode))
 		ce_mark_uptodate(ce);
+
+	ce->ce_stat_crc = calculate_stat_crc(ce);
 }
 
 static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 15/24] read-cache: read index-v5
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
                   ` (13 preceding siblings ...)
  2013-08-18 19:42 ` [PATCH v3 14/24] read-cache: make in-memory format aware of stat_crc Thomas Gummerer
@ 2013-08-18 19:42 ` Thomas Gummerer
  2013-08-19  1:57   ` Eric Sunshine
                     ` (3 more replies)
  2013-08-18 19:42 ` [PATCH v3 16/24] read-cache: read resolve-undo data Thomas Gummerer
                   ` (9 subsequent siblings)
  24 siblings, 4 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:42 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

Make git read the index file version 5 without complaining.

This version of the reader doesn't read neither the cache-tree
nor the resolve undo data, but doesn't choke on an index that
includes such data.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Helped-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 Makefile        |   1 +
 cache.h         |  32 +++-
 read-cache-v5.c | 473 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 read-cache.h    |   1 +
 4 files changed, 506 insertions(+), 1 deletion(-)
 create mode 100644 read-cache-v5.c

diff --git a/Makefile b/Makefile
index afae23e..a55206d 100644
--- a/Makefile
+++ b/Makefile
@@ -857,6 +857,7 @@ LIB_OBJS += quote.o
 LIB_OBJS += reachable.o
 LIB_OBJS += read-cache.o
 LIB_OBJS += read-cache-v2.o
+LIB_OBJS += read-cache-v5.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += remote.o
diff --git a/cache.h b/cache.h
index 714a334..89f556b 100644
--- a/cache.h
+++ b/cache.h
@@ -99,7 +99,7 @@ unsigned long git_deflate_bound(git_zstream *, unsigned long);
 #define CACHE_SIGNATURE 0x44495243	/* "DIRC" */
 
 #define INDEX_FORMAT_LB 2
-#define INDEX_FORMAT_UB 4
+#define INDEX_FORMAT_UB 5
 
 /*
  * The "cache_time" is just the low 32 bits of the
@@ -121,6 +121,15 @@ struct stat_data {
 	unsigned int sd_size;
 };
 
+/*
+ * The *next pointer is used in read_entries_v5 for holding
+ * all the elements of a directory, and points to the next
+ * cache_entry in a directory.
+ *
+ * It is reset by the add_name_hash call in set_index_entry
+ * to set it to point to the next cache_entry in the
+ * correct in-memory format ordering.
+ */
 struct cache_entry {
 	struct stat_data ce_stat_data;
 	unsigned int ce_mode;
@@ -132,11 +141,17 @@ struct cache_entry {
 	char name[FLEX_ARRAY]; /* more */
 };
 
+#define CE_NAMEMASK  (0x0fff)
 #define CE_STAGEMASK (0x3000)
 #define CE_EXTENDED  (0x4000)
 #define CE_VALID     (0x8000)
+#define CE_SMUDGED   (0x0400) /* index v5 only flag */
 #define CE_STAGESHIFT 12
 
+#define CONFLICT_CONFLICTED (0x8000)
+#define CONFLICT_STAGESHIFT 13
+#define CONFLICT_STAGEMASK (0x6000)
+
 /*
  * Range 0xFFFF0000 in ce_flags is divided into
  * two parts: in-memory flags and on-disk ones.
@@ -173,6 +188,19 @@ struct cache_entry {
 #define CE_EXTENDED_FLAGS (CE_INTENT_TO_ADD | CE_SKIP_WORKTREE)
 
 /*
+ * Representation of the extended on-disk flags in the v5 format.
+ * They must not collide with the ordinary on-disk flags, and need to
+ * fit in 16 bits.  Note however that v5 does not save the name
+ * length.
+ */
+#define CE_INTENT_TO_ADD_V5  (0x4000)
+#define CE_SKIP_WORKTREE_V5  (0x0800)
+#define CE_INVALID_V5        (0x0200)
+#if (CE_VALID|CE_STAGEMASK) & (CE_INTENTTOADD_V5|CE_SKIPWORKTREE_V5|CE_INVALID_V5)
+#error "v5 on-disk flags collide with ordinary on-disk flags"
+#endif
+
+/*
  * Safeguard to avoid saving wrong flags:
  *  - CE_EXTENDED2 won't get saved until its semantic is known
  *  - Bits in 0x0000FFFF have been saved in ce_flags already
@@ -213,6 +241,8 @@ static inline unsigned create_ce_flags(unsigned stage)
 #define ce_skip_worktree(ce) ((ce)->ce_flags & CE_SKIP_WORKTREE)
 #define ce_mark_uptodate(ce) ((ce)->ce_flags |= CE_UPTODATE)
 
+#define conflict_stage(c) ((CONFLICT_STAGEMASK & (c)->flags) >> CONFLICT_STAGESHIFT)
+
 #define ce_permissions(mode) (((mode) & 0100) ? 0755 : 0644)
 static inline unsigned int create_ce_mode(unsigned int mode)
 {
diff --git a/read-cache-v5.c b/read-cache-v5.c
new file mode 100644
index 0000000..799b8e7
--- /dev/null
+++ b/read-cache-v5.c
@@ -0,0 +1,473 @@
+#include "cache.h"
+#include "read-cache.h"
+#include "resolve-undo.h"
+#include "cache-tree.h"
+#include "dir.h"
+#include "pathspec.h"
+
+#define ptr_add(x,y) ((void *)(((char *)(x)) + (y)))
+
+struct cache_header_v5 {
+	uint32_t hdr_ndir;
+	uint32_t hdr_fblockoffset;
+	uint32_t hdr_nextension;
+};
+
+struct directory_entry {
+	struct directory_entry **sub;
+	struct directory_entry *next;
+	struct directory_entry *next_hash;
+	struct cache_entry *ce;
+	struct cache_entry *ce_last;
+	struct conflict_entry *conflict;
+	struct conflict_entry *conflict_last;
+	uint32_t conflict_size;
+	uint32_t de_foffset;
+	uint32_t de_cr;
+	uint32_t de_ncr;
+	uint32_t de_nsubtrees;
+	uint32_t de_nfiles;
+	uint32_t de_nentries;
+	unsigned char sha1[20];
+	uint16_t de_flags;
+	uint32_t de_pathlen;
+	char pathname[FLEX_ARRAY];
+};
+
+struct conflict_part {
+	struct conflict_part *next;
+	uint16_t flags;
+	uint16_t entry_mode;
+	unsigned char sha1[20];
+};
+
+struct conflict_entry {
+	struct conflict_entry *next;
+	uint32_t nfileconflicts;
+	struct conflict_part *entries;
+	uint32_t namelen;
+	uint32_t pathlen;
+	char name[FLEX_ARRAY];
+};
+
+#define directory_entry_size(len) (offsetof(struct directory_entry,pathname) + (len) + 1)
+#define conflict_entry_size(len) (offsetof(struct conflict_entry,name) + (len) + 1)
+
+/*****************************************************************
+ * Index File I/O
+ *****************************************************************/
+
+struct ondisk_conflict_part {
+	uint16_t flags;
+	uint16_t entry_mode;
+	unsigned char sha1[20];
+};
+
+struct ondisk_cache_entry {
+	uint16_t flags;
+	uint16_t mode;
+	struct cache_time mtime;
+	uint32_t size;
+	int stat_crc;
+	unsigned char sha1[20];
+};
+
+struct ondisk_directory_entry {
+	uint32_t foffset;
+	uint32_t cr;
+	uint32_t ncr;
+	uint32_t nsubtrees;
+	uint32_t nfiles;
+	uint32_t nentries;
+	unsigned char sha1[20];
+	uint16_t flags;
+};
+
+static int check_crc32(int initialcrc,
+			void *data,
+			size_t len,
+			unsigned int expected_crc)
+{
+	int crc;
+
+	crc = crc32(initialcrc, (Bytef*)data, len);
+	return crc == expected_crc;
+}
+
+static int match_stat_crc(struct stat *st, uint32_t expected_crc)
+{
+	uint32_t data, stat_crc = 0;
+	unsigned int ctimens = 0;
+
+	data = htonl(st->st_ctime);
+	stat_crc = crc32(0, (Bytef*)&data, 4);
+#ifdef USE_NSEC
+	ctimens = ST_CTIME_NSEC(*st);
+#endif
+	data = htonl(ctimens);
+	stat_crc = crc32(stat_crc, (Bytef*)&data, 4);
+	data = htonl(st->st_ino);
+	stat_crc = crc32(stat_crc, (Bytef*)&data, 4);
+	data = htonl(st->st_dev);
+	stat_crc = crc32(stat_crc, (Bytef*)&data, 4);
+	data = htonl(st->st_uid);
+	stat_crc = crc32(stat_crc, (Bytef*)&data, 4);
+	data = htonl(st->st_gid);
+	stat_crc = crc32(stat_crc, (Bytef*)&data, 4);
+
+	return stat_crc == expected_crc;
+}
+
+static int match_stat_basic(const struct cache_entry *ce,
+			    struct stat *st,
+			    int changed)
+{
+
+	if (ce->ce_stat_data.sd_mtime.sec != (unsigned int)st->st_mtime)
+		changed |= MTIME_CHANGED;
+#ifdef USE_NSEC
+	if (ce->ce_stat_data.sd_mtime.nsec != ST_MTIME_NSEC(*st))
+		changed |= MTIME_CHANGED;
+#endif
+	if (ce->ce_stat_data.sd_size != (unsigned int)st->st_size)
+		changed |= DATA_CHANGED;
+
+	if (trust_ctime && ce->ce_stat_crc != 0 && !match_stat_crc(st, ce->ce_stat_crc)) {
+		changed |= OWNER_CHANGED;
+		changed |= INODE_CHANGED;
+	}
+	/* Racily smudged entry? */
+	if (ce->ce_flags & CE_SMUDGED) {
+		if (!changed && !is_empty_blob_sha1(ce->sha1) && ce_modified_check_fs(ce, st))
+			changed |= DATA_CHANGED;
+	}
+	return changed;
+}
+
+static int verify_hdr(void *mmap, unsigned long size)
+{
+	uint32_t *filecrc;
+	unsigned int header_size;
+	struct cache_header *hdr;
+	struct cache_header_v5 *hdr_v5;
+
+	if (size < sizeof(struct cache_header)
+	    + sizeof (struct cache_header_v5) + 4)
+		die("index file smaller than expected");
+
+	hdr = mmap;
+	hdr_v5 = ptr_add(mmap, sizeof(*hdr));
+	/* Size of the header + the size of the extensionoffsets */
+	header_size = sizeof(*hdr) + sizeof(*hdr_v5) + hdr_v5->hdr_nextension * 4;
+	/* Initialize crc */
+	filecrc = ptr_add(mmap, header_size);
+	if (!check_crc32(0, hdr, header_size, ntohl(*filecrc)))
+		return error("bad index file header crc signature");
+	return 0;
+}
+
+static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *ondisk,
+						   char *pathname,
+						   char *name,
+						   size_t len,
+						   size_t pathlen)
+{
+	struct cache_entry *ce = xmalloc(cache_entry_size(len + pathlen));
+	int flags;
+
+	flags = ntoh_s(ondisk->flags);
+	/*
+	 * This entry was invalidated in the index file,
+	 * we don't need any data from it
+	 */
+	if (flags & CE_INVALID_V5)
+		return NULL;
+	ce->ce_stat_data.sd_ctime.sec  = 0;
+	ce->ce_stat_data.sd_mtime.sec  = ntoh_l(ondisk->mtime.sec);
+	ce->ce_stat_data.sd_ctime.nsec = 0;
+	ce->ce_stat_data.sd_mtime.nsec = ntoh_l(ondisk->mtime.nsec);
+	ce->ce_stat_data.sd_dev        = 0;
+	ce->ce_stat_data.sd_ino        = 0;
+	ce->ce_stat_data.sd_uid        = 0;
+	ce->ce_stat_data.sd_gid        = 0;
+	ce->ce_stat_data.sd_size       = ntoh_l(ondisk->size);
+	ce->ce_mode       = ntoh_s(ondisk->mode);
+	ce->ce_flags      = flags & CE_STAGEMASK;
+	ce->ce_flags     |= flags & CE_VALID;
+	ce->ce_flags     |= flags & CE_SMUDGED;
+	if (flags & CE_INTENT_TO_ADD_V5)
+		ce->ce_flags |= CE_INTENT_TO_ADD;
+	if (flags & CE_SKIP_WORKTREE_V5)
+		ce->ce_flags |= CE_SKIP_WORKTREE;
+	ce->ce_stat_crc   = ntoh_l(ondisk->stat_crc);
+	ce->ce_namelen    = len + pathlen;
+	hashcpy(ce->sha1, ondisk->sha1);
+	memcpy(ce->name, pathname, pathlen);
+	memcpy(ce->name + pathlen, name, len);
+	ce->name[len + pathlen] = '\0';
+	return ce;
+}
+
+static struct directory_entry *directory_entry_from_ondisk(struct ondisk_directory_entry *ondisk,
+						   const char *name,
+						   size_t len)
+{
+	struct directory_entry *de = xmalloc(directory_entry_size(len));
+
+	memcpy(de->pathname, name, len);
+	de->pathname[len] = '\0';
+	de->de_flags      = ntoh_s(ondisk->flags);
+	de->de_foffset    = ntoh_l(ondisk->foffset);
+	de->de_cr         = ntoh_l(ondisk->cr);
+	de->de_ncr        = ntoh_l(ondisk->ncr);
+	de->de_nsubtrees  = ntoh_l(ondisk->nsubtrees);
+	de->de_nfiles     = ntoh_l(ondisk->nfiles);
+	de->de_nentries   = ntoh_l(ondisk->nentries);
+	de->de_pathlen    = len;
+	hashcpy(de->sha1, ondisk->sha1);
+	return de;
+}
+
+static struct directory_entry *read_directories(unsigned int *dir_offset,
+				unsigned int *dir_table_offset,
+				void *mmap,
+				int mmap_size)
+{
+	int i, ondisk_directory_size;
+	uint32_t *filecrc, *beginning, *end;
+	struct ondisk_directory_entry *disk_de;
+	struct directory_entry *de;
+	unsigned int data_len, len;
+	char *name;
+
+	/*
+	 * Length of pathname + nul byte for termination + size of
+	 * members of ondisk_directory_entry. (Just using the size
+	 * of the struct doesn't work, because there may be padding
+	 * bytes for the struct)
+	 */
+	ondisk_directory_size = sizeof(disk_de->flags)
+		+ sizeof(disk_de->foffset)
+		+ sizeof(disk_de->cr)
+		+ sizeof(disk_de->ncr)
+		+ sizeof(disk_de->nsubtrees)
+		+ sizeof(disk_de->nfiles)
+		+ sizeof(disk_de->nentries)
+		+ sizeof(disk_de->sha1);
+	name = ptr_add(mmap, *dir_offset);
+	beginning = ptr_add(mmap, *dir_table_offset);
+	end = ptr_add(mmap, *dir_table_offset + 4);
+	len = ntoh_l(*end) - ntoh_l(*beginning) - ondisk_directory_size - 5;
+	disk_de = ptr_add(mmap, *dir_offset + len + 1);
+	de = directory_entry_from_ondisk(disk_de, name, len);
+	de->next = NULL;
+	de->sub = NULL;
+
+	data_len = len + 1 + ondisk_directory_size;
+	filecrc = ptr_add(mmap, *dir_offset + data_len);
+	if (!check_crc32(0, ptr_add(mmap, *dir_offset), data_len, ntoh_l(*filecrc)))
+		die("directory crc doesn't match for '%s'", de->pathname);
+
+	*dir_table_offset += 4;
+	*dir_offset += data_len + 4; /* crc code */
+
+	de->sub = xcalloc(de->de_nsubtrees, sizeof(struct directory_entry *));
+	for (i = 0; i < de->de_nsubtrees; i++) {
+		de->sub[i] = read_directories(dir_offset, dir_table_offset,
+						   mmap, mmap_size);
+	}
+
+	return de;
+}
+
+static int read_entry(struct cache_entry **ce, char *pathname, size_t pathlen,
+		      void *mmap, unsigned long mmap_size,
+		      unsigned int first_entry_offset,
+		      unsigned int foffsetblock)
+{
+	int len, offset_to_offset;
+	char *name;
+	uint32_t foffsetblockcrc, *filecrc, *beginning, *end, entry_offset;
+	struct ondisk_cache_entry *disk_ce;
+
+	beginning = ptr_add(mmap, foffsetblock);
+	end = ptr_add(mmap, foffsetblock + 4);
+	len = ntoh_l(*end) - ntoh_l(*beginning) - sizeof(struct ondisk_cache_entry) - 5;
+	entry_offset = first_entry_offset + ntoh_l(*beginning);
+	name = ptr_add(mmap, entry_offset);
+	disk_ce = ptr_add(mmap, entry_offset + len + 1);
+	*ce = cache_entry_from_ondisk(disk_ce, pathname, name, len, pathlen);
+	filecrc = ptr_add(mmap, entry_offset + len + 1 + sizeof(*disk_ce));
+	offset_to_offset = htonl(foffsetblock);
+	foffsetblockcrc = crc32(0, (Bytef*)&offset_to_offset, 4);
+	if (!check_crc32(foffsetblockcrc,
+		ptr_add(mmap, entry_offset), len + 1 + sizeof(*disk_ce),
+		ntoh_l(*filecrc)))
+		return -1;
+
+	return 0;
+}
+
+struct conflict_entry *create_new_conflict(char *name, int len, int pathlen)
+{
+	struct conflict_entry *conflict_entry;
+
+	if (pathlen)
+		pathlen++;
+	conflict_entry = xmalloc(conflict_entry_size(len));
+	conflict_entry->entries = NULL;
+	conflict_entry->nfileconflicts = 0;
+	conflict_entry->namelen = len;
+	memcpy(conflict_entry->name, name, len);
+	conflict_entry->name[len] = '\0';
+	conflict_entry->pathlen = pathlen;
+	conflict_entry->next = NULL;
+
+	return conflict_entry;
+}
+
+void add_part_to_conflict_entry(struct directory_entry *de,
+					struct conflict_entry *entry,
+					struct conflict_part *conflict_part)
+{
+
+	struct conflict_part *conflict_search;
+
+	entry->nfileconflicts++;
+	de->conflict_size += sizeof(struct ondisk_conflict_part);
+	if (!entry->entries)
+		entry->entries = conflict_part;
+	else {
+		conflict_search = entry->entries;
+		while (conflict_search->next)
+			conflict_search = conflict_search->next;
+		conflict_search->next = conflict_part;
+	}
+}
+
+static int read_entries(struct index_state *istate, struct directory_entry *de,
+			unsigned int first_entry_offset, void *mmap,
+			unsigned long mmap_size, unsigned int *nr,
+			unsigned int foffsetblock)
+{
+	struct cache_entry *ce;
+	int i, subdir = 0;
+
+	for (i = 0; i < de->de_nfiles; i++) {
+		unsigned int subdir_foffsetblock = de->de_foffset + foffsetblock + (i * 4);
+		if (read_entry(&ce, de->pathname, de->de_pathlen, mmap, mmap_size,
+			       first_entry_offset, subdir_foffsetblock) < 0)
+			return -1;
+		while (subdir < de->de_nsubtrees &&
+		       cache_name_compare(ce->name + de->de_pathlen,
+					  ce_namelen(ce) - de->de_pathlen,
+					  de->sub[subdir]->pathname + de->de_pathlen,
+					  de->sub[subdir]->de_pathlen - de->de_pathlen) > 0) {
+			read_entries(istate, de->sub[subdir], first_entry_offset, mmap,
+				     mmap_size, nr, foffsetblock);
+			subdir++;
+		}
+		if (!ce)
+			continue;
+		set_index_entry(istate, (*nr)++, ce);
+	}
+	for (i = subdir; i < de->de_nsubtrees; i++) {
+		read_entries(istate, de->sub[i], first_entry_offset, mmap,
+			     mmap_size, nr, foffsetblock);
+	}
+	return 0;
+}
+
+static struct directory_entry *read_all_directories(struct index_state *istate,
+						    unsigned int *entry_offset,
+						    unsigned int *foffsetblock,
+						    unsigned int *ndirs,
+						    void *mmap, unsigned long mmap_size)
+{
+	unsigned int dir_offset, dir_table_offset;
+	struct cache_header *hdr;
+	struct cache_header_v5 *hdr_v5;
+	struct directory_entry *root_directory;
+
+	hdr = mmap;
+	hdr_v5 = ptr_add(mmap, sizeof(*hdr));
+	istate->cache_alloc = alloc_nr(ntohl(hdr->hdr_entries));
+	istate->cache = xcalloc(istate->cache_alloc, sizeof(struct cache_entry *));
+
+	/* Skip size of the header + crc sum + size of offsets to extensions + size of offsets */
+	dir_offset = sizeof(*hdr) + sizeof(*hdr_v5) + ntohl(hdr_v5->hdr_nextension) * 4 + 4
+		+ (ntohl(hdr_v5->hdr_ndir) + 1) * 4;
+	dir_table_offset = sizeof(*hdr) + sizeof(*hdr_v5) + ntohl(hdr_v5->hdr_nextension) * 4 + 4;
+	root_directory = read_directories(&dir_offset, &dir_table_offset,
+					  mmap, mmap_size);
+
+	*entry_offset = ntohl(hdr_v5->hdr_fblockoffset);
+	*foffsetblock = dir_offset;
+	*ndirs = ntohl(hdr_v5->hdr_ndir);
+	return root_directory;
+}
+
+static int read_index_v5(struct index_state *istate, void *mmap,
+			 unsigned long mmap_size, struct filter_opts *opts)
+{
+	unsigned int entry_offset, ndirs, foffsetblock, nr = 0;
+	struct directory_entry *root_directory, *de, *last_de;
+	const char **paths = NULL;
+	struct pathspec adjusted_pathspec;
+	int need_root = 0, i;
+
+	root_directory = read_all_directories(istate, &entry_offset,
+					      &foffsetblock, &ndirs,
+					      mmap, mmap_size);
+
+	if (opts && opts->pathspec && opts->pathspec->nr) {
+		need_root = 0;
+		paths = xmalloc((opts->pathspec->nr + 1)*sizeof(char *));
+		paths[opts->pathspec->nr] = NULL;
+		for (i = 0; i < opts->pathspec->nr; i++) {
+			char *super = strdup(opts->pathspec->items[i].match);
+			int len = strlen(super);
+			while (len && super[len - 1] == '/' && super[len - 2] == '/')
+				super[--len] = '\0'; /* strip all but one trailing slash */
+			while (len && super[--len] != '/')
+				; /* scan backwards to next / */
+			if (len >= 0)
+				super[len--] = '\0';
+			if (len <= 0) {
+				need_root = 1;
+				break;
+			}
+			paths[i] = super;
+		}
+	}
+
+	if (!need_root)
+		parse_pathspec(&adjusted_pathspec, PATHSPEC_ALL_MAGIC, PATHSPEC_PREFER_CWD, NULL, paths);
+
+	de = root_directory;
+	last_de = de;
+	while (de) {
+		if (need_root ||
+		    match_pathspec_depth(&adjusted_pathspec, de->pathname, de->de_pathlen, 0, NULL)) {
+			if (read_entries(istate, de, entry_offset,
+					 mmap, mmap_size, &nr,
+					 foffsetblock) < 0)
+				return -1;
+		} else {
+			for (i = 0; i < de->de_nsubtrees; i++) {
+				last_de->next = de->sub[i];
+				last_de = last_de->next;
+			}
+		}
+		de = de->next;
+	}
+	istate->cache_nr = nr;
+	return 0;
+}
+
+struct index_ops v5_ops = {
+	match_stat_basic,
+	verify_hdr,
+	read_index_v5,
+	NULL
+};
diff --git a/read-cache.h b/read-cache.h
index 644b199..01c76de 100644
--- a/read-cache.h
+++ b/read-cache.h
@@ -31,6 +31,7 @@ struct index_ops {
 };
 
 extern struct index_ops v2_ops;
+extern struct index_ops v5_ops;
 
 #ifndef NEEDS_ALIGNED_ACCESS
 #define ntoh_s(var) ntohs(var)
-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 16/24] read-cache: read resolve-undo data
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
                   ` (14 preceding siblings ...)
  2013-08-18 19:42 ` [PATCH v3 15/24] read-cache: read index-v5 Thomas Gummerer
@ 2013-08-18 19:42 ` Thomas Gummerer
  2013-08-19  1:59   ` Eric Sunshine
  2013-08-18 19:42 ` [PATCH v3 17/24] read-cache: read cache-tree in index-v5 Thomas Gummerer
                   ` (8 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:42 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

Make git read the resolve-undo data from the index.

Since the resolve-undo data is joined with the conflicts in
the ondisk format of the index file version 5, conflicts and
resolved data is read at the same time, and the resolve-undo
data is then converted to the in-memory format.

Helped-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 read-cache-v5.c | 130 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 130 insertions(+)

diff --git a/read-cache-v5.c b/read-cache-v5.c
index 799b8e7..85a2069 100644
--- a/read-cache-v5.c
+++ b/read-cache-v5.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "read-cache.h"
+#include "string-list.h"
 #include "resolve-undo.h"
 #include "cache-tree.h"
 #include "dir.h"
@@ -308,6 +309,31 @@ static int read_entry(struct cache_entry **ce, char *pathname, size_t pathlen,
 	return 0;
 }
 
+static struct conflict_part *conflict_part_from_ondisk(struct ondisk_conflict_part *ondisk)
+{
+	struct conflict_part *cp = xmalloc(sizeof(struct conflict_part));
+
+	cp->flags      = ntoh_s(ondisk->flags);
+	cp->entry_mode = ntoh_s(ondisk->entry_mode);
+	hashcpy(cp->sha1, ondisk->sha1);
+	return cp;
+}
+
+static void conflict_entry_push(struct conflict_entry **head,
+				struct conflict_entry **tail,
+				struct conflict_entry *conflict_entry)
+{
+	if (!*head) {
+		*head = *tail = conflict_entry;
+		(*tail)->next = NULL;
+		return;
+	}
+
+	(*tail)->next = conflict_entry;
+	conflict_entry->next = NULL;
+	*tail = (*tail)->next;
+}
+
 struct conflict_entry *create_new_conflict(char *name, int len, int pathlen)
 {
 	struct conflict_entry *conflict_entry;
@@ -345,6 +371,106 @@ void add_part_to_conflict_entry(struct directory_entry *de,
 	}
 }
 
+static int read_conflicts(struct conflict_entry **head,
+			  struct directory_entry *de,
+			  void *mmap, unsigned long mmap_size)
+{
+	struct conflict_entry *tail;
+	unsigned int croffset, i;
+	char *full_name;
+
+	croffset = de->de_cr;
+	tail = NULL;
+	for (i = 0; i < de->de_ncr; i++) {
+		struct conflict_entry *conflict_new;
+		unsigned int len, *nfileconflicts;
+		char *name;
+		void *crc_start;
+		int k, offset;
+		uint32_t *filecrc;
+
+		offset = croffset;
+		crc_start = ptr_add(mmap, offset);
+		name = ptr_add(mmap, offset);
+		len = strlen(name);
+		offset += len + 1;
+		nfileconflicts = ptr_add(mmap, offset);
+		offset += 4;
+
+		full_name = xmalloc(sizeof(char) * (len + de->de_pathlen));
+		memcpy(full_name, de->pathname, de->de_pathlen);
+		memcpy(full_name + de->de_pathlen, name, len);
+		conflict_new = create_new_conflict(full_name,
+				len + de->de_pathlen, de->de_pathlen);
+		for (k = 0; k < ntoh_l(*nfileconflicts); k++) {
+			struct ondisk_conflict_part *ondisk;
+			struct conflict_part *cp;
+
+			ondisk = ptr_add(mmap, offset);
+			cp = conflict_part_from_ondisk(ondisk);
+			cp->next = NULL;
+			add_part_to_conflict_entry(de, conflict_new, cp);
+			offset += sizeof(struct ondisk_conflict_part);
+		}
+		filecrc = ptr_add(mmap, offset);
+		free(full_name);
+		if (!check_crc32(0, crc_start,
+			len + 1 + 4 + conflict_new->nfileconflicts
+			* sizeof(struct ondisk_conflict_part),
+			ntoh_l(*filecrc)))
+			return -1;
+		croffset = offset + 4;
+		conflict_entry_push(head, &tail, conflict_new);
+	}
+	return 0;
+}
+
+static int convert_resolve_undo(struct index_state *istate,
+				struct directory_entry *de,
+				void *mmap, unsigned long mmap_size)
+{
+	int i;
+	struct conflict_entry *conflicts = NULL;
+
+	if (read_conflicts(&conflicts, de, mmap, mmap_size) < 0)
+		return -1;
+
+	while (conflicts) {
+		struct string_list_item *lost;
+		struct resolve_undo_info *ui;
+		struct conflict_part *cp;
+
+		if (conflicts->entries &&
+		    (conflicts->entries->flags & CONFLICT_CONFLICTED)) {
+			conflicts = conflicts->next;
+			continue;
+		}
+		if (!istate->resolve_undo) {
+			istate->resolve_undo = xcalloc(1, sizeof(struct string_list));
+			istate->resolve_undo->strdup_strings = 1;
+		}
+
+		lost = string_list_insert(istate->resolve_undo, conflicts->name);
+		if (!lost->util)
+			lost->util = xcalloc(1, sizeof(*ui));
+		ui = lost->util;
+
+		cp = conflicts->entries;
+		for (i = 0; i < 3; i++)
+			ui->mode[i] = 0;
+		while (cp) {
+			ui->mode[conflict_stage(cp) - 1] = cp->entry_mode;
+			hashcpy(ui->sha1[conflict_stage(cp) - 1], cp->sha1);
+			cp = cp->next;
+		}
+		conflicts = conflicts->next;
+	}
+	for (i = 0; i < de->de_nsubtrees; i++)
+		if (convert_resolve_undo(istate, de->sub[i], mmap, mmap_size) < 0)
+			return -1;
+	return 0;
+}
+
 static int read_entries(struct index_state *istate, struct directory_entry *de,
 			unsigned int first_entry_offset, void *mmap,
 			unsigned long mmap_size, unsigned int *nr,
@@ -444,6 +570,10 @@ static int read_index_v5(struct index_state *istate, void *mmap,
 	if (!need_root)
 		parse_pathspec(&adjusted_pathspec, PATHSPEC_ALL_MAGIC, PATHSPEC_PREFER_CWD, NULL, paths);
 
+	if (!opts || opts->read_resolve_undo)
+		if (convert_resolve_undo(istate, root_directory, mmap, mmap_size) < 0)
+			return -1;
+
 	de = root_directory;
 	last_de = de;
 	while (de) {
-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 17/24] read-cache: read cache-tree in index-v5
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
                   ` (15 preceding siblings ...)
  2013-08-18 19:42 ` [PATCH v3 16/24] read-cache: read resolve-undo data Thomas Gummerer
@ 2013-08-18 19:42 ` Thomas Gummerer
  2013-08-24  0:09   ` Duy Nguyen
  2013-08-18 19:42 ` [PATCH v3 18/24] read-cache: write index-v5 Thomas Gummerer
                   ` (7 subsequent siblings)
  24 siblings, 1 reply; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:42 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

Since the cache-tree data is saved as part of the directory data,
we already read it at the beginning of the index. The cache-tree
is only converted from this directory data.

The cache-tree data is arranged in a tree, with the children sorted by
pathlen at each node, while the ondisk format is sorted lexically.
So we have to rebuild this format from the on-disk directory list.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 cache-tree.c    |  2 +-
 cache-tree.h    |  1 +
 read-cache-v5.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/cache-tree.c b/cache-tree.c
index 0bbec43..1209732 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -31,7 +31,7 @@ void cache_tree_free(struct cache_tree **it_p)
 	*it_p = NULL;
 }
 
-static int subtree_name_cmp(const char *one, int onelen,
+int subtree_name_cmp(const char *one, int onelen,
 			    const char *two, int twolen)
 {
 	if (onelen < twolen)
diff --git a/cache-tree.h b/cache-tree.h
index f1923ad..9818926 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -25,6 +25,7 @@ struct cache_tree *cache_tree(void);
 void cache_tree_free(struct cache_tree **);
 void cache_tree_invalidate_path(struct cache_tree *, const char *);
 struct cache_tree_sub *cache_tree_sub(struct cache_tree *, const char *);
+int subtree_name_cmp(const char *, int, const char *, int);
 
 void cache_tree_write(struct strbuf *, struct cache_tree *root);
 struct cache_tree *cache_tree_read(const char *buffer, unsigned long size);
diff --git a/read-cache-v5.c b/read-cache-v5.c
index 85a2069..b14505a 100644
--- a/read-cache-v5.c
+++ b/read-cache-v5.c
@@ -471,6 +471,83 @@ static int convert_resolve_undo(struct index_state *istate,
 	return 0;
 }
 
+static struct cache_tree *convert_one(struct directory_entry *de)
+{
+	int i;
+	struct cache_tree *it;
+
+	it = cache_tree();
+	it->entry_count = de->de_nentries;
+	if (0 <= it->entry_count)
+		hashcpy(it->sha1, de->sha1);
+
+	/*
+	 * Just a heuristic -- we do not add directories that often but
+	 * we do not want to have to extend it immediately when we do,
+	 * hence +2.
+	 */
+	it->subtree_alloc = de->de_nsubtrees + 2;
+	it->down = xcalloc(it->subtree_alloc, sizeof(struct cache_tree_sub *));
+	for (i = 0; i < de->de_nsubtrees; i++) {
+		struct cache_tree *sub;
+		struct cache_tree_sub *subtree;
+		char *buf, *name;
+
+		sub = convert_one(de->sub[i]);
+		if(!sub)
+			goto free_return;
+
+		name = "";
+		buf = strtok(de->sub[i]->pathname, "/");
+		while (buf) {
+			name = buf;
+			buf = strtok(NULL, "/");
+		}
+		subtree = cache_tree_sub(it, name);
+		subtree->cache_tree = sub;
+	}
+	if (de->de_nsubtrees != it->subtree_nr)
+		die("cache-tree: internal error");
+	return it;
+ free_return:
+	cache_tree_free(&it);
+	return NULL;
+}
+
+static int compare_cache_tree_elements(const void *a, const void *b)
+{
+	const struct directory_entry *de1, *de2;
+
+	de1 = (const struct directory_entry *) a;
+	de2 = (const struct directory_entry *) b;
+	return subtree_name_cmp(de1->pathname, de1->de_pathlen,
+				de2->pathname, de2->de_pathlen);
+}
+
+static void sort_directories(struct directory_entry *de)
+{
+	int i;
+
+	for (i = 0; i < de->de_nsubtrees; i++) {
+		if (de->sub[i]->de_nsubtrees)
+			sort_directories(de->sub[i]);
+	}
+	qsort(de->sub, de->de_nsubtrees, sizeof(struct directory_entry *),
+	      compare_cache_tree_elements);
+}
+
+/*
+ * This function modifies the directory argument that is given to it.
+ * Don't use it if the directory entries are still needed after.
+ */
+static struct cache_tree *cache_tree_convert_v5(struct directory_entry *de)
+{
+	if (!de->de_nentries)
+		return NULL;
+	sort_directories(de);
+	return convert_one(de);
+}
+
 static int read_entries(struct index_state *istate, struct directory_entry *de,
 			unsigned int first_entry_offset, void *mmap,
 			unsigned long mmap_size, unsigned int *nr,
@@ -591,6 +668,7 @@ static int read_index_v5(struct index_state *istate, void *mmap,
 		}
 		de = de->next;
 	}
+	istate->cache_tree = cache_tree_convert_v5(root_directory);
 	istate->cache_nr = nr;
 	return 0;
 }
-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 18/24] read-cache: write index-v5
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
                   ` (16 preceding siblings ...)
  2013-08-18 19:42 ` [PATCH v3 17/24] read-cache: read cache-tree in index-v5 Thomas Gummerer
@ 2013-08-18 19:42 ` Thomas Gummerer
  2013-08-24  3:58   ` Duy Nguyen
  2013-08-24  4:07   ` Duy Nguyen
  2013-08-18 19:42 ` [PATCH v3 19/24] read-cache: write index-v5 cache-tree data Thomas Gummerer
                   ` (6 subsequent siblings)
  24 siblings, 2 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:42 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

Write the index version 5 file format to disk. This version doesn't
write the cache-tree data and resolve-undo data to the file.

The main work is done when filtering out the directories from the
current in-memory format, where in the same turn also the conflicts
and the file data is calculated.

Helped-by: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Helped-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 cache.h         |   2 +
 read-cache-v5.c | 596 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 read-cache.c    |   4 +-
 read-cache.h    |   1 +
 4 files changed, 601 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index 89f556b..a109f35 100644
--- a/cache.h
+++ b/cache.h
@@ -138,6 +138,7 @@ struct cache_entry {
 	unsigned char sha1[20];
 	uint32_t ce_stat_crc;
 	struct cache_entry *next; /* used by name_hash */
+	struct cache_entry *next_ce;
 	char name[FLEX_ARRAY]; /* more */
 };
 
@@ -532,6 +533,7 @@ extern int unmerged_index(const struct index_state *);
 extern int verify_path(const char *path);
 extern struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen, int igncase);
 extern int index_name_pos(const struct index_state *, const char *name, int namelen);
+extern struct directory_entry *init_directory_entry(char *pathname, int len);
 #define ADD_CACHE_OK_TO_ADD 1		/* Ok to add */
 #define ADD_CACHE_OK_TO_REPLACE 2	/* Ok to replace file/directory */
 #define ADD_CACHE_SKIP_DFCHECK 4	/* Ok to skip DF conflict checks */
diff --git a/read-cache-v5.c b/read-cache-v5.c
index b14505a..85b912b 100644
--- a/read-cache-v5.c
+++ b/read-cache-v5.c
@@ -673,9 +673,603 @@ static int read_index_v5(struct index_state *istate, void *mmap,
 	return 0;
 }
 
+#define WRITE_BUFFER_SIZE 8192
+static unsigned char write_buffer[WRITE_BUFFER_SIZE];
+static unsigned long write_buffer_len;
+
+static int ce_write_flush(int fd)
+{
+	unsigned int buffered = write_buffer_len;
+	if (buffered) {
+		if (write_in_full(fd, write_buffer, buffered) != buffered)
+			return -1;
+		write_buffer_len = 0;
+	}
+	return 0;
+}
+
+static int ce_write(uint32_t *crc, int fd, void *data, unsigned int len)
+{
+	if (crc)
+		*crc = crc32(*crc, (Bytef*)data, len);
+	while (len) {
+		unsigned int buffered = write_buffer_len;
+		unsigned int partial = WRITE_BUFFER_SIZE - buffered;
+		if (partial > len)
+			partial = len;
+		memcpy(write_buffer + buffered, data, partial);
+		buffered += partial;
+		if (buffered == WRITE_BUFFER_SIZE) {
+			write_buffer_len = buffered;
+			if (ce_write_flush(fd))
+				return -1;
+			buffered = 0;
+		}
+		write_buffer_len = buffered;
+		len -= partial;
+		data = (char *) data + partial;
+	}
+	return 0;
+}
+
+static int ce_flush(int fd)
+{
+	unsigned int left = write_buffer_len;
+
+	if (left)
+		write_buffer_len = 0;
+
+	if (write_in_full(fd, write_buffer, left) != left)
+		return -1;
+
+	return 0;
+}
+
+static void ce_smudge_racily_clean_entry(struct cache_entry *ce)
+{
+	/*
+	 * This method shall only be called if the timestamp of ce
+	 * is racy (check with is_racy_timestamp). If the timestamp
+	 * is racy, the writer will set the CE_SMUDGED flag.
+	 *
+	 * The reader (match_stat_basic) will then take care
+	 * of checking if the entry is really changed or not, by
+	 * taking into account the size and the stat_crc and if
+	 * that hasn't changed checking the sha1.
+	 */
+	ce->ce_flags |= CE_SMUDGED;
+}
+
+char *super_directory(const char *filename)
+{
+	char *slash;
+
+	slash = strrchr(filename, '/');
+	if (slash)
+		return xmemdupz(filename, slash-filename);
+	return NULL;
+}
+
+struct directory_entry *init_directory_entry(char *pathname, int len)
+{
+	struct directory_entry *de = xmalloc(directory_entry_size(len));
+
+	memcpy(de->pathname, pathname, len);
+	de->pathname[len] = '\0';
+	de->de_flags      = 0;
+	de->de_foffset    = 0;
+	de->de_cr         = 0;
+	de->de_ncr        = 0;
+	de->de_nsubtrees  = 0;
+	de->de_nfiles     = 0;
+	de->de_nentries   = 0;
+	memset(de->sha1, 0, 20);
+	de->de_pathlen    = len;
+	de->next          = NULL;
+	de->next_hash     = NULL;
+	de->ce            = NULL;
+	de->ce_last       = NULL;
+	de->conflict      = NULL;
+	de->conflict_last = NULL;
+	de->conflict_size = 0;
+	return de;
+}
+
+static void ondisk_from_directory_entry(struct directory_entry *de,
+					struct ondisk_directory_entry *ondisk)
+{
+	ondisk->foffset   = htonl(de->de_foffset);
+	ondisk->cr        = htonl(de->de_cr);
+	ondisk->ncr       = htonl(de->de_ncr);
+	ondisk->nsubtrees = htonl(de->de_nsubtrees);
+	ondisk->nfiles    = htonl(de->de_nfiles);
+	ondisk->nentries  = htonl(de->de_nentries);
+	hashcpy(ondisk->sha1, de->sha1);
+	ondisk->flags     = htons(de->de_flags);
+}
+
+static struct conflict_part *conflict_part_from_inmemory(struct cache_entry *ce)
+{
+	struct conflict_part *conflict;
+	int flags;
+
+	conflict = xmalloc(sizeof(struct conflict_part));
+	flags                = CONFLICT_CONFLICTED;
+	flags               |= ce_stage(ce) << CONFLICT_STAGESHIFT;
+	conflict->flags      = flags;
+	conflict->entry_mode = ce->ce_mode;
+	conflict->next       = NULL;
+	hashcpy(conflict->sha1, ce->sha1);
+	return conflict;
+}
+
+static void conflict_to_ondisk(struct conflict_part *cp,
+				struct ondisk_conflict_part *ondisk)
+{
+	ondisk->flags      = htons(cp->flags);
+	ondisk->entry_mode = htons(cp->entry_mode);
+	hashcpy(ondisk->sha1, cp->sha1);
+}
+
+void add_conflict_to_directory_entry(struct directory_entry *de,
+					struct conflict_entry *conflict_entry)
+{
+	de->de_ncr++;
+	de->conflict_size += conflict_entry->namelen + 1 + 8 - conflict_entry->pathlen;
+	conflict_entry_push(&de->conflict, &de->conflict_last, conflict_entry);
+}
+
+void insert_directory_entry(struct directory_entry *de,
+			struct hash_table *table,
+			unsigned int *total_dir_len,
+			unsigned int *ndir,
+			uint32_t crc)
+{
+	struct directory_entry *insert;
+
+	insert = (struct directory_entry *)insert_hash(crc, de, table);
+	if (insert) {
+		de->next_hash = insert->next_hash;
+		insert->next_hash = de;
+	}
+	(*ndir)++;
+	if (de->de_pathlen == 0)
+		(*total_dir_len)++;
+	else
+		*total_dir_len += de->de_pathlen + 2;
+}
+
+static struct directory_entry *find_directory(char *dir, int dir_len, uint32_t *crc,
+					      struct hash_table *table)
+{
+	struct directory_entry *search;
+
+	*crc = crc32(0, (Bytef*)dir, dir_len);
+	search = lookup_hash(*crc, table);
+	while (search && search->next_hash &&
+	       cache_name_compare(dir, dir_len, search->pathname, search->de_pathlen))
+		search = search->next_hash;
+	return search;
+}
+
+static struct directory_entry *get_directory(char *dir, unsigned int dir_len,
+					     struct hash_table *table,
+					     unsigned int *total_dir_len,
+					     unsigned int *ndir,
+					     struct directory_entry **current)
+{
+	struct directory_entry *tmp = NULL, *search, *new, *ret;
+	uint32_t crc;
+
+	search = find_directory(dir, dir_len, &crc, table);
+	if (search)
+		return search;
+	while (!search) {
+		new = init_directory_entry(dir, dir_len);
+		insert_directory_entry(new, table, total_dir_len, ndir, crc);
+		if (!tmp)
+			ret = new;
+		else
+			new->de_nsubtrees = 1;
+		new->next = tmp;
+		tmp = new;
+		dir = super_directory(dir);
+		dir_len = dir ? strlen(dir) : 0;
+		search = find_directory(dir, dir_len, &crc, table);
+	}
+	search->de_nsubtrees++;
+	(*current)->next = tmp;
+	while ((*current)->next)
+		*current = (*current)->next;
+
+	return ret;
+}
+
+static struct conflict_entry *create_conflict_entry_from_ce(struct cache_entry *ce,
+								int pathlen)
+{
+	return create_new_conflict(ce->name, ce_namelen(ce), pathlen);
+}
+
+static void ce_queue_push(struct cache_entry **head,
+			  struct cache_entry **tail,
+			  struct cache_entry *ce)
+{
+	if (!*head) {
+		*head = *tail = ce;
+		(*tail)->next_ce = NULL;
+		return;
+	}
+
+	(*tail)->next_ce = ce;
+	ce->next_ce = NULL;
+	*tail = (*tail)->next_ce;
+}
+
+static struct directory_entry *compile_directory_data(struct index_state *istate,
+						      int nfile,
+						      unsigned int *ndir,
+						      unsigned int *total_dir_len,
+						      unsigned int *total_file_len)
+{
+	int i, dir_len = -1;
+	char *dir;
+	struct directory_entry *de, *current, *search;
+	struct cache_entry **cache = istate->cache;
+	struct conflict_entry *conflict_entry;
+	struct hash_table table;
+	uint32_t crc;
+
+	init_hash(&table);
+	de = init_directory_entry("", 0);
+	current = de;
+	*ndir = 1;
+	*total_dir_len = 1;
+	crc = crc32(0, (Bytef*)de->pathname, de->de_pathlen);
+	insert_hash(crc, de, &table);
+	conflict_entry = NULL;
+	for (i = 0; i < nfile; i++) {
+		if (cache[i]->ce_flags & CE_REMOVE)
+			continue;
+
+		if (dir_len < 0
+		    || cache[i]->name[dir_len] != '/'
+		    || strchr(cache[i]->name + dir_len + 1, '/')
+		    || cache_name_compare(cache[i]->name, ce_namelen(cache[i]),
+					  dir, dir_len)) {
+			dir = super_directory(cache[i]->name);
+			dir_len = dir ? strlen(dir) : 0;
+			search = get_directory(dir, dir_len, &table,
+					       total_dir_len, ndir,
+					       &current);
+		}
+		search->de_nfiles++;
+		*total_file_len += ce_namelen(cache[i]) + 1;
+		if (search->de_pathlen)
+			*total_file_len -= search->de_pathlen + 1;
+		ce_queue_push(&(search->ce), &(search->ce_last), cache[i]);
+
+		if (ce_stage(cache[i]) > 0) {
+			struct conflict_part *conflict_part;
+			if (!conflict_entry ||
+			    cache_name_compare(conflict_entry->name, conflict_entry->namelen,
+					       cache[i]->name, ce_namelen(cache[i]))) {
+				conflict_entry = create_conflict_entry_from_ce(cache[i], search->de_pathlen);
+				add_conflict_to_directory_entry(search, conflict_entry);
+			}
+			conflict_part = conflict_part_from_inmemory(cache[i]);
+			add_part_to_conflict_entry(search, conflict_entry, conflict_part);
+		}
+	}
+	return de;
+}
+
+static void ondisk_from_cache_entry(struct cache_entry *ce,
+				    struct ondisk_cache_entry *ondisk)
+{
+	unsigned int flags;
+
+	flags  = ce->ce_flags & CE_STAGEMASK;
+	flags |= ce->ce_flags & CE_VALID;
+	flags |= ce->ce_flags & CE_SMUDGED;
+	if (ce->ce_flags & CE_INTENT_TO_ADD)
+		flags |= CE_INTENT_TO_ADD_V5;
+	if (ce->ce_flags & CE_SKIP_WORKTREE)
+		flags |= CE_SKIP_WORKTREE_V5;
+	ondisk->flags      = htons(flags);
+	ondisk->mode       = htons(ce->ce_mode);
+	ondisk->mtime.sec  = htonl(ce->ce_stat_data.sd_mtime.sec);
+#ifdef USE_NSEC
+	ondisk->mtime.nsec = htonl(ce->ce_stat_data.sd_mtime.nsec);
+#else
+	ondisk->mtime.nsec = 0;
+#endif
+	ondisk->size       = htonl(ce->ce_stat_data.sd_size);
+	if (!ce->ce_stat_crc)
+		ce->ce_stat_crc = calculate_stat_crc(ce);
+	ondisk->stat_crc   = htonl(ce->ce_stat_crc);
+	hashcpy(ondisk->sha1, ce->sha1);
+}
+
+static int write_directories(struct directory_entry *de, int fd, int conflict_offset)
+{
+	struct directory_entry *current;
+	struct ondisk_directory_entry ondisk;
+	int current_offset, offset_write, ondisk_size, foffset;
+	uint32_t crc;
+
+	/*
+	 * This is needed because the compiler aligns structs to sizes multiple
+	 * of 4
+	 */
+	ondisk_size = sizeof(ondisk.flags)
+		+ sizeof(ondisk.foffset)
+		+ sizeof(ondisk.cr)
+		+ sizeof(ondisk.ncr)
+		+ sizeof(ondisk.nsubtrees)
+		+ sizeof(ondisk.nfiles)
+		+ sizeof(ondisk.nentries)
+		+ sizeof(ondisk.sha1);
+	current = de;
+	current_offset = 0;
+	foffset = 0;
+	while (current) {
+		int pathlen;
+
+		offset_write = htonl(current_offset);
+		if (ce_write(NULL, fd, &offset_write, 4) < 0)
+			return -1;
+		if (current->de_pathlen == 0)
+			pathlen = 0;
+		else
+			pathlen = current->de_pathlen + 1;
+		current_offset += pathlen + 1 + ondisk_size + 4;
+		current = current->next;
+	}
+	/*
+	 * Write one more offset, which points to the end of the entries,
+	 * because we use it for calculating the dir length, instead of
+	 * using strlen.
+	 */
+	offset_write = htonl(current_offset);
+	if (ce_write(NULL, fd, &offset_write, 4) < 0)
+		return -1;
+	current = de;
+	while (current) {
+		crc = 0;
+		if (current->de_pathlen == 0) {
+			if (ce_write(&crc, fd, current->pathname, 1) < 0)
+				return -1;
+		} else {
+			char *path;
+			path = xmalloc(sizeof(char) * (current->de_pathlen + 2));
+			memcpy(path, current->pathname, current->de_pathlen);
+			memcpy(path + current->de_pathlen, "/\0", 2);
+			if (ce_write(&crc, fd, path, current->de_pathlen + 2) < 0)
+				return -1;
+		}
+		current->de_foffset = foffset;
+		current->de_cr = conflict_offset;
+		ondisk_from_directory_entry(current, &ondisk);
+		if (ce_write(&crc, fd, &ondisk, ondisk_size) < 0)
+			return -1;
+		crc = htonl(crc);
+		if (ce_write(NULL, fd, &crc, 4) < 0)
+			return -1;
+		conflict_offset += current->conflict_size;
+		foffset += current->de_nfiles * 4;
+		current = current->next;
+	}
+	return 0;
+}
+
+static int write_entries(struct index_state *istate,
+			    struct directory_entry *de,
+			    int entries,
+			    int fd,
+			    int offset_to_offset)
+{
+	int offset, offset_write, ondisk_size;
+	struct directory_entry *current;
+
+	offset = 0;
+	ondisk_size = sizeof(struct ondisk_cache_entry);
+	current = de;
+	while (current) {
+		int pathlen;
+		struct cache_entry *ce = current->ce;
+
+		if (current->de_pathlen == 0)
+			pathlen = 0;
+		else
+			pathlen = current->de_pathlen + 1;
+		while (ce) {
+			if (ce->ce_flags & CE_REMOVE)
+				continue;
+			if (!ce_uptodate(ce) && is_racy_timestamp(istate, ce))
+				ce_smudge_racily_clean_entry(ce);
+			if (is_null_sha1(ce->sha1))
+				return error("cache entry has null sha1: %s", ce->name);
+
+			offset_write = htonl(offset);
+			if (ce_write(NULL, fd, &offset_write, 4) < 0)
+				return -1;
+			offset += ce_namelen(ce) - pathlen + 1 + ondisk_size + 4;
+			ce = ce->next_ce;
+		}
+		current = current->next;
+	}
+	/*
+	 * Write one more offset, which points to the end of the entries,
+	 * because we use it for calculating the file length, instead of
+	 * using strlen.
+	 */
+	offset_write = htonl(offset);
+	if (ce_write(NULL, fd, &offset_write, 4) < 0)
+		return -1;
+
+	offset = offset_to_offset;
+	current = de;
+	while (current) {
+		int pathlen;
+		struct cache_entry *ce = current->ce;
+
+		if (current->de_pathlen == 0)
+			pathlen = 0;
+		else
+			pathlen = current->de_pathlen + 1;
+		while (ce) {
+			struct ondisk_cache_entry ondisk;
+			uint32_t crc, calc_crc;
+
+			if (ce->ce_flags & CE_REMOVE)
+				continue;
+			calc_crc = htonl(offset);
+			crc = crc32(0, (Bytef*)&calc_crc, 4);
+			if (ce_write(&crc, fd, ce->name + pathlen,
+					ce_namelen(ce) - pathlen + 1) < 0)
+				return -1;
+			ondisk_from_cache_entry(ce, &ondisk);
+			if (ce_write(&crc, fd, &ondisk, ondisk_size) < 0)
+				return -1;
+			crc = htonl(crc);
+			if (ce_write(NULL, fd, &crc, 4) < 0)
+				return -1;
+			offset += 4;
+			ce = ce->next_ce;
+		}
+		current = current->next;
+	}
+	return 0;
+}
+
+static int write_conflict(struct conflict_entry *conflict, int fd)
+{
+	struct conflict_entry *current;
+	struct conflict_part *current_part;
+	uint32_t crc;
+
+	current = conflict;
+	while (current) {
+		unsigned int to_write;
+
+		crc = 0;
+		if (ce_write(&crc, fd,
+		     (Bytef*)(current->name + current->pathlen),
+		     current->namelen - current->pathlen) < 0)
+			return -1;
+		if (ce_write(&crc, fd, (Bytef*)"\0", 1) < 0)
+			return -1;
+		to_write = htonl(current->nfileconflicts);
+		if (ce_write(&crc, fd, (Bytef*)&to_write, 4) < 0)
+			return -1;
+		current_part = current->entries;
+		while (current_part) {
+			struct ondisk_conflict_part ondisk;
+
+			conflict_to_ondisk(current_part, &ondisk);
+			if (ce_write(&crc, fd, (Bytef*)&ondisk, sizeof(struct ondisk_conflict_part)) < 0)
+				return 0;
+			current_part = current_part->next;
+		}
+		to_write = htonl(crc);
+		if (ce_write(NULL, fd, (Bytef*)&to_write, 4) < 0)
+			return -1;
+		current = current->next;
+	}
+	return 0;
+}
+
+static int write_conflicts(struct index_state *istate,
+			      struct directory_entry *de,
+			      int fd)
+{
+	struct directory_entry *current;
+
+	current = de;
+	while (current) {
+		if (current->de_ncr != 0) {
+			if (write_conflict(current->conflict, fd) < 0)
+				return -1;
+		}
+		current = current->next;
+	}
+	return 0;
+}
+
+static int write_index_v5(struct index_state *istate, int newfd)
+{
+	struct cache_header hdr;
+	struct cache_header_v5 hdr_v5;
+	struct cache_entry **cache = istate->cache;
+	struct directory_entry *de;
+	struct ondisk_directory_entry *ondisk;
+	unsigned int entries = istate->cache_nr;
+	unsigned int i, removed, total_dir_len, ondisk_directory_size;
+	unsigned int total_file_len, conflict_offset, foffsetblock;
+	unsigned int ndir;
+	uint32_t crc;
+
+	if (istate->filter_opts)
+		die("BUG: index: cannot write a partially read index");
+
+	for (i = removed = 0; i < entries; i++) {
+		if (cache[i]->ce_flags & CE_REMOVE)
+			removed++;
+	}
+	hdr.hdr_signature = htonl(CACHE_SIGNATURE);
+	hdr.hdr_version = htonl(istate->version);
+	hdr.hdr_entries = htonl(entries - removed);
+	hdr_v5.hdr_nextension = htonl(0); /* Currently no extensions are supported */
+
+	total_dir_len = 0;
+	total_file_len = 0;
+	de = compile_directory_data(istate, entries, &ndir,
+				    &total_dir_len, &total_file_len);
+	hdr_v5.hdr_ndir = htonl(ndir);
+
+	/*
+	 * This is needed because the compiler aligns structs to sizes multipe
+	 * of 4
+	 */
+	ondisk_directory_size = sizeof(ondisk->flags)
+		+ sizeof(ondisk->foffset)
+		+ sizeof(ondisk->cr)
+		+ sizeof(ondisk->ncr)
+		+ sizeof(ondisk->nsubtrees)
+		+ sizeof(ondisk->nfiles)
+		+ sizeof(ondisk->nentries)
+		+ sizeof(ondisk->sha1);
+	foffsetblock = sizeof(hdr) + sizeof(hdr_v5) + 4
+		+ (ndir + 1) * 4
+		+ total_dir_len
+		+ ndir * (ondisk_directory_size + 4);
+	hdr_v5.hdr_fblockoffset = htonl(foffsetblock + (entries - removed + 1) * 4);
+	crc = 0;
+	if (ce_write(&crc, newfd, &hdr, sizeof(hdr)) < 0)
+		return -1;
+	if (ce_write(&crc, newfd, &hdr_v5, sizeof(hdr_v5)) < 0)
+		return -1;
+	crc = htonl(crc);
+	if (ce_write(NULL, newfd, &crc, 4) < 0)
+		return -1;
+
+	conflict_offset = foffsetblock +
+		+ (entries - removed + 1) * 4
+		+ total_file_len
+		+ (entries - removed) * (sizeof(struct ondisk_cache_entry) + 4);
+	if (write_directories(de, newfd, conflict_offset) < 0)
+		return -1;
+	if (write_entries(istate, de, entries, newfd, foffsetblock) < 0)
+		return -1;
+	if (write_conflicts(istate, de, newfd) < 0)
+		return -1;
+	return ce_flush(newfd);
+}
+
 struct index_ops v5_ops = {
 	match_stat_basic,
 	verify_hdr,
 	read_index_v5,
-	NULL
+	write_index_v5
 };
diff --git a/read-cache.c b/read-cache.c
index a232372..1d9b615 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -108,7 +108,7 @@ int match_stat_data(const struct stat_data *sd, struct stat *st)
 	return changed;
 }
 
-static uint32_t calculate_stat_crc(struct cache_entry *ce)
+uint32_t calculate_stat_crc(struct cache_entry *ce)
 {
 	unsigned int ctimens = 0;
 	uint32_t stat, stat_crc;
@@ -229,6 +229,8 @@ static void set_istate_ops(struct index_state *istate)
 {
 	if (istate->version >= 2 && istate->version <= 4)
 		istate->ops = &v2_ops;
+	if (istate->version == 5)
+		istate->ops = &v5_ops;
 }
 
 int ce_match_stat_basic(const struct index_state *istate,
diff --git a/read-cache.h b/read-cache.h
index 01c76de..27f862f 100644
--- a/read-cache.h
+++ b/read-cache.h
@@ -58,3 +58,4 @@ extern int ce_match_stat_basic(const struct index_state *istate,
 			       const struct cache_entry *ce, struct stat *st);
 extern int is_racy_timestamp(const struct index_state *istate, const struct cache_entry *ce);
 extern void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce);
+extern uint32_t calculate_stat_crc(struct cache_entry *ce);
-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 19/24] read-cache: write index-v5 cache-tree data
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
                   ` (17 preceding siblings ...)
  2013-08-18 19:42 ` [PATCH v3 18/24] read-cache: write index-v5 Thomas Gummerer
@ 2013-08-18 19:42 ` Thomas Gummerer
  2013-08-18 19:42 ` [PATCH v3 20/24] read-cache: write resolve-undo data for index-v5 Thomas Gummerer
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:42 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

Write the cache-tree data for the index version 5 file format. The
in-memory cache-tree data is converted to the ondisk format, by adding
it to the directory entries, that were compiled from the cache-entries
in the step before.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 read-cache-v5.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/read-cache-v5.c b/read-cache-v5.c
index 85b912b..ed52b7c 100644
--- a/read-cache-v5.c
+++ b/read-cache-v5.c
@@ -891,6 +891,57 @@ static struct conflict_entry *create_conflict_entry_from_ce(struct cache_entry *
 	return create_new_conflict(ce->name, ce_namelen(ce), pathlen);
 }
 
+static void convert_one_to_ondisk_v5(struct hash_table *table, struct cache_tree *it,
+				const char *path, int pathlen, uint32_t crc)
+{
+	int i;
+	struct directory_entry *found, *search;
+
+	crc = crc32(crc, (Bytef*)path, pathlen);
+	found = lookup_hash(crc, table);
+	search = found;
+	while (search && strcmp(path, search->pathname + search->de_pathlen - strlen(path)) != 0)
+		search = search->next_hash;
+	if (!search)
+		return;
+	/*
+	 * The number of subtrees is already calculated by
+	 * compile_directory_data, therefore we only need to
+	 * add the entry_count
+	 */
+	search->de_nentries = it->entry_count;
+	if (0 <= it->entry_count)
+		hashcpy(search->sha1, it->sha1);
+	if (strcmp(path, "") != 0)
+		crc = crc32(crc, (Bytef*)"/", 1);
+
+#if DEBUG
+	if (0 <= it->entry_count)
+		fprintf(stderr, "cache-tree <%.*s> (%d ent, %d subtree) %s\n",
+			pathlen, path, it->entry_count, it->subtree_nr,
+			sha1_to_hex(it->sha1));
+	else
+		fprintf(stderr, "cache-tree <%.*s> (%d subtree) invalid\n",
+			pathlen, path, it->subtree_nr);
+#endif
+
+	for (i = 0; i < it->subtree_nr; i++) {
+		struct cache_tree_sub *down = it->down[i];
+		if (i) {
+			struct cache_tree_sub *prev = it->down[i-1];
+			if (subtree_name_cmp(down->name, down->namelen,
+					     prev->name, prev->namelen) <= 0)
+				die("fatal - unsorted cache subtree");
+		}
+		convert_one_to_ondisk_v5(table, down->cache_tree, down->name, down->namelen, crc);
+	}
+}
+
+static void cache_tree_to_ondisk_v5(struct hash_table *table, struct cache_tree *root)
+{
+	convert_one_to_ondisk_v5(table, root, "", 0, 0);
+}
+
 static void ce_queue_push(struct cache_entry **head,
 			  struct cache_entry **tail,
 			  struct cache_entry *ce)
@@ -961,6 +1012,8 @@ static struct directory_entry *compile_directory_data(struct index_state *istate
 			add_part_to_conflict_entry(search, conflict_entry, conflict_part);
 		}
 	}
+	if (istate->cache_tree)
+		cache_tree_to_ondisk_v5(&table, istate->cache_tree);
 	return de;
 }
 
-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 20/24] read-cache: write resolve-undo data for index-v5
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
                   ` (18 preceding siblings ...)
  2013-08-18 19:42 ` [PATCH v3 19/24] read-cache: write index-v5 cache-tree data Thomas Gummerer
@ 2013-08-18 19:42 ` Thomas Gummerer
  2013-08-18 19:42 ` [PATCH v3 21/24] update-index.c: rewrite index when index-version is given Thomas Gummerer
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:42 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

Make git read the resolve-undo data from the index.

Since the resolve-undo data is joined with the conflicts in
the ondisk format of the index file version 5, conflicts and
resolved data is read at the same time, and the resolve-undo
data is then converted to the in-memory format.

Helped-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 read-cache-v5.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 89 insertions(+)

diff --git a/read-cache-v5.c b/read-cache-v5.c
index ed52b7c..10960fd 100644
--- a/read-cache-v5.c
+++ b/read-cache-v5.c
@@ -942,6 +942,94 @@ static void cache_tree_to_ondisk_v5(struct hash_table *table, struct cache_tree
 	convert_one_to_ondisk_v5(table, root, "", 0, 0);
 }
 
+static void resolve_undo_to_ondisk_v5(struct hash_table *table,
+				      struct string_list *resolve_undo,
+				      unsigned int *ndir,
+				      unsigned int *total_dir_len,
+				      struct directory_entry *de)
+{
+	struct string_list_item *item;
+	struct directory_entry *search;
+
+	if (!resolve_undo)
+		return;
+	for_each_string_list_item(item, resolve_undo) {
+		struct conflict_entry *conflict_entry;
+		struct resolve_undo_info *ui = item->util;
+		char *super;
+		int i, dir_len, len;
+		uint32_t crc;
+		struct directory_entry *found, *current, *new_tree;
+
+		if (!ui)
+			continue;
+
+		super = super_directory(item->string);
+		dir_len = super ? strlen(super) : 0;
+		crc = crc32(0, (Bytef*)super, dir_len);
+		found = lookup_hash(crc, table);
+		current = NULL;
+		new_tree = NULL;
+
+		while (!found) {
+			struct directory_entry *new;
+
+			new = init_directory_entry(super, dir_len);
+			if (!current)
+				current = new;
+			insert_directory_entry(new, table, total_dir_len, ndir, crc);
+			if (new_tree != NULL)
+				new->de_nsubtrees = 1;
+			new->next = new_tree;
+			new_tree = new;
+			super = super_directory(super);
+			dir_len = super ? strlen(super) : 0;
+			crc = crc32(0, (Bytef*)super, dir_len);
+			found = lookup_hash(crc, table);
+		}
+		search = found;
+		while (search->next_hash && strcmp(super, search->pathname) != 0)
+			search = search->next_hash;
+		if (search && !current)
+			current = search;
+		if (!search && !current)
+			current = new_tree;
+		if (!super && new_tree) {
+			new_tree->next = de->next;
+			de->next = new_tree;
+			de->de_nsubtrees++;
+		} else if (new_tree) {
+			struct directory_entry *temp;
+
+			search = de->next;
+			while (strcmp(super, search->pathname))
+				search = search->next;
+			temp = new_tree;
+			while (temp->next)
+				temp = temp->next;
+			search->de_nsubtrees++;
+			temp->next = search->next;
+			search->next = new_tree;
+		}
+
+		len = strlen(item->string);
+		conflict_entry = create_new_conflict(item->string, len, current->de_pathlen);
+		add_conflict_to_directory_entry(current, conflict_entry);
+		for (i = 0; i < 3; i++) {
+			if (ui->mode[i]) {
+				struct conflict_part *cp;
+
+				cp = xmalloc(sizeof(struct conflict_part));
+				cp->flags = (i + 1) << CONFLICT_STAGESHIFT;
+				cp->entry_mode = ui->mode[i];
+				cp->next = NULL;
+				hashcpy(cp->sha1, ui->sha1[i]);
+				add_part_to_conflict_entry(current, conflict_entry, cp);
+			}
+		}
+	}
+}
+
 static void ce_queue_push(struct cache_entry **head,
 			  struct cache_entry **tail,
 			  struct cache_entry *ce)
@@ -1014,6 +1102,7 @@ static struct directory_entry *compile_directory_data(struct index_state *istate
 	}
 	if (istate->cache_tree)
 		cache_tree_to_ondisk_v5(&table, istate->cache_tree);
+	resolve_undo_to_ondisk_v5(&table, istate->resolve_undo, ndir, total_dir_len, de);
 	return de;
 }
 
-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 21/24] update-index.c: rewrite index when index-version is given
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
                   ` (19 preceding siblings ...)
  2013-08-18 19:42 ` [PATCH v3 20/24] read-cache: write resolve-undo data for index-v5 Thomas Gummerer
@ 2013-08-18 19:42 ` Thomas Gummerer
  2013-08-18 19:42 ` [PATCH v3 22/24] p0003-index.sh: add perf test for the index formats Thomas Gummerer
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:42 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

Make update-index always rewrite the index when a index-version
is given, even if the index already has the right version.
This option is used for performance testing the writer and
reader.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 builtin/update-index.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index c5bb889..8b3f7a0 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -6,6 +6,7 @@
 #include "cache.h"
 #include "quote.h"
 #include "cache-tree.h"
+#include "read-cache.h"
 #include "tree-walk.h"
 #include "builtin.h"
 #include "refs.h"
@@ -861,8 +862,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 			    preferred_index_format,
 			    INDEX_FORMAT_LB, INDEX_FORMAT_UB);
 
-		if (the_index.version != preferred_index_format)
-			active_cache_changed = 1;
+		active_cache_changed = 1;
 		change_cache_version(preferred_index_format);
 	}
 
-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 22/24] p0003-index.sh: add perf test for the index formats
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
                   ` (20 preceding siblings ...)
  2013-08-18 19:42 ` [PATCH v3 21/24] update-index.c: rewrite index when index-version is given Thomas Gummerer
@ 2013-08-18 19:42 ` Thomas Gummerer
  2013-08-18 19:42 ` [PATCH v3 23/24] introduce GIT_INDEX_VERSION environment variable Thomas Gummerer
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:42 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

From: Thomas Rast <trast@inf.ethz.ch>

Add a performance test for index version [23]/4/5 by using
git update-index --index-version=x, thus testing both the reader
and the writer speed of all index formats.

Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 t/perf/p0003-index.sh | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)
 create mode 100755 t/perf/p0003-index.sh

diff --git a/t/perf/p0003-index.sh b/t/perf/p0003-index.sh
new file mode 100755
index 0000000..5360175
--- /dev/null
+++ b/t/perf/p0003-index.sh
@@ -0,0 +1,63 @@
+#!/bin/sh
+
+test_description="Tests index versions [23]/4/5"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+test_expect_success "convert to v3" "
+	git update-index --index-version=2
+"
+
+test_perf "v[23]: update-index" "
+	git update-index --index-version=2 >/dev/null
+"
+
+subdir=$(git ls-files | sed 's#/[^/]*$##' | grep -v '^$' | uniq | tail -n 30 | head -1)
+
+test_perf "v[23]: grep nonexistent -- subdir" "
+	test_must_fail git grep nonexistent -- $subdir >/dev/null
+"
+
+test_perf "v[23]: ls-files -- subdir" "
+	git ls-files $subdir >/dev/null
+"
+
+test_expect_success "convert to v4" "
+	git update-index --index-version=4
+"
+
+test_perf "v4: update-index" "
+	git update-index --index-version=4 >/dev/null
+"
+
+test_perf "v4: grep nonexistent -- subdir" "
+	test_must_fail git grep nonexistent -- $subdir >/dev/null
+"
+
+test_perf "v4: ls-files -- subdir" "
+	git ls-files $subdir >/dev/null
+"
+
+test_expect_success "convert to v5" "
+	git update-index --index-version=5
+"
+
+test_perf "v5: update-index" "
+	git update-index --index-version=5 >/dev/null
+"
+
+test_perf "v5: ls-files" "
+	git ls-files >/dev/null
+"
+
+test_perf "v5: grep nonexistent -- subdir" "
+	test_must_fail git grep nonexistent -- $subdir >/dev/null
+"
+
+test_perf "v5: ls-files -- subdir" "
+	git ls-files $subdir >/dev/null
+"
+
+test_done
-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 23/24] introduce GIT_INDEX_VERSION environment variable
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
                   ` (21 preceding siblings ...)
  2013-08-18 19:42 ` [PATCH v3 22/24] p0003-index.sh: add perf test for the index formats Thomas Gummerer
@ 2013-08-18 19:42 ` Thomas Gummerer
  2013-08-21  0:57   ` Duy Nguyen
  2013-08-18 19:42 ` [PATCH v3 24/24] test-lib: allow setting the index format version Thomas Gummerer
  2013-08-24  4:16 ` [PATCH v3 00/24] Index-v5 Duy Nguyen
  24 siblings, 1 reply; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:42 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

Respect a GIT_INDEX_VERSION environment variable, when a new index is
initialized.  Setting the environment variable will not cause existing
index files to be converted to another format for additional safety.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 read-cache.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 1d9b615..f820d8a 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1235,8 +1235,13 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int reall
 void initialize_index(struct index_state *istate, int version)
 {
 	istate->initialized = 1;
-	if (!version)
-		version = INDEX_FORMAT_DEFAULT;
+	if (!version) {
+		char *envversion = getenv("GIT_INDEX_VERSION");
+		if (!envversion)
+			version = INDEX_FORMAT_DEFAULT;
+		else
+			version = atoi(envversion);
+	}
 	istate->version = version;
 	set_istate_ops(istate);
 }
-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v3 24/24] test-lib: allow setting the index format version
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
                   ` (22 preceding siblings ...)
  2013-08-18 19:42 ` [PATCH v3 23/24] introduce GIT_INDEX_VERSION environment variable Thomas Gummerer
@ 2013-08-18 19:42 ` Thomas Gummerer
  2013-08-24  4:16 ` [PATCH v3 00/24] Index-v5 Duy Nguyen
  24 siblings, 0 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-18 19:42 UTC (permalink / raw)
  To: git
  Cc: trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine,
	ramsay, t.gummerer

When running the test suite, it should be possible to set the default
index format for the tests.  Do that by allowing the user to add a
TEST_GIT_INDEX_VERSION variable in config.mak setting the index version.

If it isn't set, the default version given in the source code is
used (currently version 3).

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
---
 Makefile                | 7 +++++++
 t/test-lib-functions.sh | 5 +++++
 t/test-lib.sh           | 3 +++
 3 files changed, 15 insertions(+)

diff --git a/Makefile b/Makefile
index a55206d..ecae6b8 100644
--- a/Makefile
+++ b/Makefile
@@ -345,6 +345,10 @@ all::
 # Define DEFAULT_HELP_FORMAT to "man", "info" or "html"
 # (defaults to "man") if you want to have a different default when
 # "git help" is called without a parameter specifying the format.
+#
+# Define TESTGIT_INDEX_FORMAT to 2, 3, 4 or 5 to run the test suite
+# with a different indexfile format.  If it isn't set the index file
+# format used is index-v[23].
 
 GIT-VERSION-FILE: FORCE
 	@$(SHELL_PATH) ./GIT-VERSION-GEN
@@ -2229,6 +2233,9 @@ endif
 ifdef GIT_PERF_MAKE_OPTS
 	@echo GIT_PERF_MAKE_OPTS=\''$(subst ','\'',$(subst ','\'',$(GIT_PERF_MAKE_OPTS)))'\' >>$@
 endif
+ifdef TEST_GIT_INDEX_VERSION
+	@echo TEST_GIT_INDEX_VERSION='$(subst ','\'',$(subst ','\'',$(TEST_GIT_INDEX_VERSION)))' >>$@
+endif
 
 ### Detect Python interpreter path changes
 ifndef NO_PYTHON
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index a7e9aac..19cdf0b 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -31,6 +31,11 @@ test_set_editor () {
 	export EDITOR
 }
 
+test_set_index_version () {
+    GIT_INDEX_VERSION="$1"
+    export GIT_INDEX_VERSION
+}
+
 test_decode_color () {
 	awk '
 		function name(n) {
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 1aa27bd..9ca41e1 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -104,6 +104,9 @@ export GIT_AUTHOR_EMAIL GIT_AUTHOR_NAME
 export GIT_COMMITTER_EMAIL GIT_COMMITTER_NAME
 export EDITOR
 
+GIT_INDEX_VERSION="$TEST_GIT_INDEX_VERSION"
+export GIT_INDEX_VERSION
+
 # Add libc MALLOC and MALLOC_PERTURB test
 # only if we are not executing the test with valgrind
 if expr " $GIT_TEST_OPTS " : ".* --valgrind " >/dev/null ||
-- 
1.8.3.4.1231.g9fbf354.dirty

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 02/24] read-cache: use fixed width integer types
  2013-08-18 19:41 ` [PATCH v3 02/24] read-cache: use fixed width integer types Thomas Gummerer
@ 2013-08-18 20:21   ` Eric Sunshine
  2013-08-20 19:30   ` Junio C Hamano
  1 sibling, 0 replies; 55+ messages in thread
From: Eric Sunshine @ 2013-08-18 20:21 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git List, Thomas Rast, Michael Haggerty, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, Robin Rosenberg,
	Ramsay Jones

On Sun, Aug 18, 2013 at 3:41 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> Use the fixed width integer types uint16_t and uint32_t for ondisk
> structures, because unsigned short and unsigned int do not hae a

s/hae/have/

> guaranteed size.
>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 08/24] add documentation for the index api
  2013-08-18 19:41 ` [PATCH v3 08/24] add documentation for the index api Thomas Gummerer
@ 2013-08-18 20:50   ` Eric Sunshine
  0 siblings, 0 replies; 55+ messages in thread
From: Eric Sunshine @ 2013-08-18 20:50 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, trast, mhagger, gitster, pclouds, robin.rosenberg, ramsay

On Aug 18, 2013, at 3:41 PM, Thomas Gummerer wrote:
> Add documentation for the index reading api.  This also includes
> documentation for the new api functions introduced in the next patch.
> 
> Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
> Documentation/technical/api-in-core-index.txt | 54 +++++++++++++++++++++++++--
> 1 file changed, 50 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/technical/api-in-core-index.txt b/Documentation/technical/api-in-core-index.txt
> index adbdbf5..9b8c37c 100644
> --- a/Documentation/technical/api-in-core-index.txt
> +++ b/Documentation/technical/api-in-core-index.txt
> @@ -1,14 +1,60 @@
> in-core index API
> =================
> 
> +Reading API
> +-----------
> +
> +`cache`::
> +
> +	An array of cache entries.  This is used to access the cache
> +	entries directly.  Use `index_name_pos` to search for the
> +	index of a specific cache entry.
> +
> +`read_index_filtered`::
> +
> +	Read a part of the index, filtered by the pathspec given in
> +	the opts.  The function may load more than necessary, so the
> +	caller still responsible to apply filters appropriately.  The

Grammatical nit: "...the caller is still responsible for applying…"

> +	filtering is only done for performance reasons, as it's
> +	possible to only read part of the index when the on-disk
> +	format is index-v5.
> +
> +	To iterate only over the entries that match the pathspec, use
> +	the for_each_index_entry function.
> +
> +`read_index`::
> +
> +	Read the whole index file from disk.
> +
> +`index_name_pos`::
> +
> +	Find a cache_entry with name in the index.  Returns pos if an
> +	entry is matched exactly and -1-pos if an entry is matched
> +	partially.
> +	e.g.
> +	index:
> +	file1
> +	file2
> +	path/file1
> +	zzz
> +
> +	index_name_pos("path/file1", 10) returns 2, while
> +	index_name_pos("path", 4) returns -3

A couple of these entries won't format correctly. You may want to squash in something like this (sans whitespace damage):

-->8--
diff --git a/Documentation/technical/api-in-core-index.txt b/Documentation/technical/api-in-core-index.txt
index 9b8c37c..d2518c8 100644
--- a/Documentation/technical/api-in-core-index.txt
+++ b/Documentation/technical/api-in-core-index.txt
@@ -18,9 +18,9 @@ Reading API
 	filtering is only done for performance reasons, as it's
 	possible to only read part of the index when the on-disk
 	format is index-v5.
-
-	To iterate only over the entries that match the pathspec, use
-	the for_each_index_entry function.
++
+To iterate only over the entries that match the pathspec, use
+the for_each_index_entry function.
 
 `read_index`::
 
@@ -30,16 +30,18 @@ Reading API
 
 	Find a cache_entry with name in the index.  Returns pos if an
 	entry is matched exactly and -1-pos if an entry is matched
-	partially.
-	e.g.
-	index:
+	partially. e.g.
++
+....
+index:
 	file1
 	file2
 	path/file1
 	zzz
-
-	index_name_pos("path/file1", 10) returns 2, while
-	index_name_pos("path", 4) returns -3
+....
++
+`index_name_pos("path/file1", 10)` returns 2, while
+`index_name_pos("path", 4)` returns -3
 
 `for_each_index_entry`::
 
-- 
1.8.4.rc3.500.gc3113b0
-->8--

> +
> +`for_each_index_entry`::
> +
> +	Iterates over all cache_entries in the index filtered by
> +	filter_opts in the index_state.  For each cache entry fn is
> +	executed with cb_data as callback data.  From within the loop
> +	do `return 0` to continue, or `return 1` to break the loop.
> +
> +TODO
> +----
> Talk about <read-cache.c> and <cache-tree.c>, things like:
> 
> -* cache -> the_index macros
> -* read_index()
> * write_index()
> * ie_match_stat() and ie_modified(); how they are different and when to
>   use which.
> -* index_name_pos()
> * remove_index_entry_at()
> * remove_file_from_index()
> * add_file_to_index()
> @@ -18,4 +64,4 @@ Talk about <read-cache.c> and <cache-tree.c>, things like:
> * cache_tree_invalidate_path()
> * cache_tree_update()
> 
> -(JC, Linus)
> +(JC, Linus, Thomas Gummerer)
> -- 
> 1.8.3.4.1231.g9fbf354.dirty
> 

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 10/24] make sure partially read index is not changed
  2013-08-18 19:41 ` [PATCH v3 10/24] make sure partially read index is not changed Thomas Gummerer
@ 2013-08-18 21:06   ` Eric Sunshine
  2013-08-20  8:46     ` Thomas Gummerer
  0 siblings, 1 reply; 55+ messages in thread
From: Eric Sunshine @ 2013-08-18 21:06 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git List, Thomas Rast, Michael Haggerty, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, Robin Rosenberg,
	Ramsay Jones

On Sun, Aug 18, 2013 at 3:41 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> A partially read index file currently cannot be written to disk.  Make
> sure that never happens, by erroring out when a caller tries to write a

s/,//

> partially read index.  Do the same when trying to re-read a partially
> read index without having discarded it first to avoid loosing any

s/loosing/losing/

> information.
>
> Forcing the caller to load the right part of the index file instead of
> re-reading it when changing it, gives a bit of a performance advantage,

s/it,/it/  (or s/file instead/file, instead/)
s/advantage,/advantage/

> by avoiding to read parts of the index twice.

/to read/reading/

More below...

>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>  read-cache.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/read-cache.c b/read-cache.c
> index 38b9a04..7a27f9b 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1332,6 +1332,8 @@ int read_index_filtered_from(struct index_state *istate, const char *path,
>         void *mmap;
>         size_t mmap_size;
>
> +       if (istate->filter_opts)
> +               die("BUG: Can't re-read partially read index");
>         errno = EBUSY;
>         if (istate->initialized)
>                 return istate->cache_nr;
> @@ -1455,6 +1457,8 @@ void update_index_if_able(struct index_state *istate, struct lock_file *lockfile
>
>  int write_index(struct index_state *istate, int newfd)
>  {
> +       if (istate->filter_opts)
> +               die("BUG: index: cannot write a partially read index");

Consistency nit:

In the preceding hunk, the error message starts "BUG: Can't...", but
in this hunk we have "BUG: index: cannot...".

So, "BUG:" is the prefix of one, but "BUG: index:" is the prefix of the other.

Spelling difference: "Can't" vs. "cannot".

Capitalization difference: "Can't" vs. "cannot".

>         return istate->ops->write_index(istate, newfd);
>  }
>
> --
> 1.8.3.4.1231.g9fbf354.dirty
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 06/24] read-cache: Don't compare uid, gid and ino on cygwin
  2013-08-18 19:41 ` [PATCH v3 06/24] read-cache: Don't compare uid, gid and ino on cygwin Thomas Gummerer
@ 2013-08-18 22:34   ` Ramsay Jones
  2013-08-20  8:36     ` Thomas Gummerer
  0 siblings, 1 reply; 55+ messages in thread
From: Ramsay Jones @ 2013-08-18 22:34 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine

On 18/08/2013 08:41 PM, Thomas Gummerer wrote:
> Cygwin doesn't have uid, gid and ino stats fields.  Therefore we should
> never check them in the match_stat_data when working on the CYGWIN
> platform.

Hmm, this is simply not true ... ;-)

The need to omit the uid, gid and ino fields from the stat checks in
your original code was caused by the "schizophrenic stat" implementation
in cygwin. (This was also before "core.checkstat" was implemented; note
the 'check_stat' conditional below ...)

However, since commit f66450ae ("cygwin: Remove the Win32 l/stat()
implementation", 22-06-2013), this patch is no longer necessary and
can simply be dropped from this series.

[I have not had time to read your new patches yet, but I seem to remember
being concerned about those platforms which have UNRELIABLE_FSTAT set.
(ie cygwin, MinGW and Windows.)]

ATB,
Ramsay Jones

> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
> 
> This patch was not tested on Cygwin yet.  I think it's needed though,
> because the re-reading of the index if it changed will no longer use
> it's own index_changed function, but use the stat_validity_check
> function instead.  Would be great if someone running Cygwin could test
> this.
> 
>  read-cache.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/read-cache.c b/read-cache.c
> index 1f827de..aa17ce7 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -82,6 +82,7 @@ int match_stat_data(const struct stat_data *sd, struct stat *st)
>  		changed |= CTIME_CHANGED;
>  #endif
>  
> +#if !defined (__CYGWIN__)
>  	if (check_stat) {
>  		if (sd->sd_uid != (unsigned int) st->st_uid ||
>  			sd->sd_gid != (unsigned int) st->st_gid)
> @@ -89,6 +90,7 @@ int match_stat_data(const struct stat_data *sd, struct stat *st)
>  		if (sd->sd_ino != (unsigned int) st->st_ino)
>  			changed |= INODE_CHANGED;
>  	}
> +#endif
>  
>  #ifdef USE_STDEV
>  	/*
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 15/24] read-cache: read index-v5
  2013-08-18 19:42 ` [PATCH v3 15/24] read-cache: read index-v5 Thomas Gummerer
@ 2013-08-19  1:57   ` Eric Sunshine
  2013-08-20 14:01   ` Duy Nguyen
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 55+ messages in thread
From: Eric Sunshine @ 2013-08-19  1:57 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git List, Thomas Rast, Michael Haggerty, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, Robin Rosenberg,
	Ramsay Jones

On Sun, Aug 18, 2013 at 3:42 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> Make git read the index file version 5 without complaining.
>
> This version of the reader doesn't read neither the cache-tree
> nor the resolve undo data, but doesn't choke on an index that
> includes such data.

The double-negatives are difficult to digest. Grammatical fixup:

-->8--
This version of the reader reads neither the cache-tree
nor the resolve undo data, however, it won't choke on an
index that includes such data.
-->8--

> Helped-by: Junio C Hamano <gitster@pobox.com>
> Helped-by: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
> Helped-by: Thomas Rast <trast@student.ethz.ch>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 16/24] read-cache: read resolve-undo data
  2013-08-18 19:42 ` [PATCH v3 16/24] read-cache: read resolve-undo data Thomas Gummerer
@ 2013-08-19  1:59   ` Eric Sunshine
  0 siblings, 0 replies; 55+ messages in thread
From: Eric Sunshine @ 2013-08-19  1:59 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git List, Thomas Rast, Michael Haggerty, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, Robin Rosenberg,
	Ramsay Jones

On Sun, Aug 18, 2013 at 3:42 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> Make git read the resolve-undo data from the index.
>
> Since the resolve-undo data is joined with the conflicts in
> the ondisk format of the index file version 5, conflicts and
> resolved data is read at the same time, and the resolve-undo

s/is/are/

> data is then converted to the in-memory format.
>
> Helped-by: Thomas Rast <trast@student.ethz.ch>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 06/24] read-cache: Don't compare uid, gid and ino on cygwin
  2013-08-18 22:34   ` Ramsay Jones
@ 2013-08-20  8:36     ` Thomas Gummerer
  0 siblings, 0 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-20  8:36 UTC (permalink / raw)
  To: Ramsay Jones
  Cc: git, trast, mhagger, gitster, pclouds, robin.rosenberg, sunshine

Ramsay Jones <ramsay@ramsay1.demon.co.uk> writes:

> On 18/08/2013 08:41 PM, Thomas Gummerer wrote:
>> Cygwin doesn't have uid, gid and ino stats fields.  Therefore we should
>> never check them in the match_stat_data when working on the CYGWIN
>> platform.
>
> Hmm, this is simply not true ... ;-)
>
> The need to omit the uid, gid and ino fields from the stat checks in
> your original code was caused by the "schizophrenic stat" implementation
> in cygwin. (This was also before "core.checkstat" was implemented; note
> the 'check_stat' conditional below ...)
>
> However, since commit f66450ae ("cygwin: Remove the Win32 l/stat()
> implementation", 22-06-2013), this patch is no longer necessary and
> can simply be dropped from this series.
>
> [I have not had time to read your new patches yet, but I seem to remember
> being concerned about those platforms which have UNRELIABLE_FSTAT set.
> (ie cygwin, MinGW and Windows.)]

Ah ok, thanks for the clarification.  I misinterpreted your message in
the previous thread, thinking it would still be necessary.  I'll drop
this patch.

I can't recall anything about UNRELIABLE_FSTAT though.

> ATB,
> Ramsay Jones
>
>> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
>> ---
>>
>> This patch was not tested on Cygwin yet.  I think it's needed though,
>> because the re-reading of the index if it changed will no longer use
>> it's own index_changed function, but use the stat_validity_check
>> function instead.  Would be great if someone running Cygwin could test
>> this.
>>
>>  read-cache.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/read-cache.c b/read-cache.c
>> index 1f827de..aa17ce7 100644
>> --- a/read-cache.c
>> +++ b/read-cache.c
>> @@ -82,6 +82,7 @@ int match_stat_data(const struct stat_data *sd, struct stat *st)
>>  		changed |= CTIME_CHANGED;
>>  #endif
>>
>> +#if !defined (__CYGWIN__)
>>  	if (check_stat) {
>>  		if (sd->sd_uid != (unsigned int) st->st_uid ||
>>  			sd->sd_gid != (unsigned int) st->st_gid)
>> @@ -89,6 +90,7 @@ int match_stat_data(const struct stat_data *sd, struct stat *st)
>>  		if (sd->sd_ino != (unsigned int) st->st_ino)
>>  			changed |= INODE_CHANGED;
>>  	}
>> +#endif
>>
>>  #ifdef USE_STDEV
>>  	/*
>>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 10/24] make sure partially read index is not changed
  2013-08-18 21:06   ` Eric Sunshine
@ 2013-08-20  8:46     ` Thomas Gummerer
  0 siblings, 0 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-20  8:46 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Git List, Thomas Rast, Michael Haggerty, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, Robin Rosenberg,
	Ramsay Jones

Eric Sunshine <sunshine@sunshineco.com> writes:

> On Sun, Aug 18, 2013 at 3:41 PM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> A partially read index file currently cannot be written to disk.  Make
>> sure that never happens, by erroring out when a caller tries to write a
>
> s/,//
>
>> partially read index.  Do the same when trying to re-read a partially
>> read index without having discarded it first to avoid loosing any
>
> s/loosing/losing/
>
>> information.
>>
>> Forcing the caller to load the right part of the index file instead of
>> re-reading it when changing it, gives a bit of a performance advantage,
>
> s/it,/it/  (or s/file instead/file, instead/)
> s/advantage,/advantage/
>
>> by avoiding to read parts of the index twice.
>
> /to read/reading/
>
> More below...
>
>>
>> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
>> ---
>>  read-cache.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/read-cache.c b/read-cache.c
>> index 38b9a04..7a27f9b 100644
>> --- a/read-cache.c
>> +++ b/read-cache.c
>> @@ -1332,6 +1332,8 @@ int read_index_filtered_from(struct index_state *istate, const char *path,
>>         void *mmap;
>>         size_t mmap_size;
>>
>> +       if (istate->filter_opts)
>> +               die("BUG: Can't re-read partially read index");
>>         errno = EBUSY;
>>         if (istate->initialized)
>>                 return istate->cache_nr;
>> @@ -1455,6 +1457,8 @@ void update_index_if_able(struct index_state *istate, struct lock_file *lockfile
>>
>>  int write_index(struct index_state *istate, int newfd)
>>  {
>> +       if (istate->filter_opts)
>> +               die("BUG: index: cannot write a partially read index");
>
> Consistency nit:
>
> In the preceding hunk, the error message starts "BUG: Can't...", but
> in this hunk we have "BUG: index: cannot...".
>
> So, "BUG:" is the prefix of one, but "BUG: index:" is the prefix of the other.
>
> Spelling difference: "Can't" vs. "cannot".
>
> Capitalization difference: "Can't" vs. "cannot".

Thanks for catching this.  From quick grepping it seems the preferred
version seems to be the one with only "BUG:" as prefix and starting with
a lower case letter after this.  Both can't and cannot are used in the
codebase, but cannot seems to be used more often.  I'll use that.

Will fix this and the rest of the style/spelling/grammar fixes you
suggested.  Thanks.

>>         return istate->ops->write_index(istate, newfd);
>>  }
>>
>> --
>> 1.8.3.4.1231.g9fbf354.dirty
>>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 15/24] read-cache: read index-v5
  2013-08-18 19:42 ` [PATCH v3 15/24] read-cache: read index-v5 Thomas Gummerer
  2013-08-19  1:57   ` Eric Sunshine
@ 2013-08-20 14:01   ` Duy Nguyen
  2013-08-20 20:59     ` Thomas Gummerer
  2013-08-20 14:16   ` Duy Nguyen
  2013-08-23 23:52   ` Duy Nguyen
  3 siblings, 1 reply; 55+ messages in thread
From: Duy Nguyen @ 2013-08-20 14:01 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git Mailing List, Thomas Rast, Michael Haggerty, Junio C Hamano,
	Robin Rosenberg, Eric Sunshine, Ramsay Jones

General comment: a short comment before each function describing what
the function does would be helpful. This only applies for complex
functions (read_* ones). Of course verify_hdr does not require extra
explanantion.

 On Mon, Aug 19, 2013 at 2:42 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> +static struct directory_entry *directory_entry_from_ondisk(struct ondisk_directory_entry *ondisk,
> +                                                  const char *name,
> +                                                  size_t len)
> +{
> +       struct directory_entry *de = xmalloc(directory_entry_size(len));
> +
> +       memcpy(de->pathname, name, len);
> +       de->pathname[len] = '\0';
> +       de->de_flags      = ntoh_s(ondisk->flags);
> +       de->de_foffset    = ntoh_l(ondisk->foffset);
> +       de->de_cr         = ntoh_l(ondisk->cr);
> +       de->de_ncr        = ntoh_l(ondisk->ncr);
> +       de->de_nsubtrees  = ntoh_l(ondisk->nsubtrees);
> +       de->de_nfiles     = ntoh_l(ondisk->nfiles);
> +       de->de_nentries   = ntoh_l(ondisk->nentries);
> +       de->de_pathlen    = len;
> +       hashcpy(de->sha1, ondisk->sha1);
> +       return de;
> +}

This function leaves a lot of fields uninitialized..

> +static struct directory_entry *read_directories(unsigned int *dir_offset,
> +                               unsigned int *dir_table_offset,
> +                               void *mmap,
> +                               int mmap_size)
> +{
> ....
> +       de = directory_entry_from_ondisk(disk_de, name, len);
> +       de->next = NULL;
> +       de->sub = NULL;

..and two of them are set to NULL here. Maybe
directory_entry_from_ondisk() could be made to call
init_directory_entry() instead so that we don't need to manually reset
some fields here, which leaves me wondering why other fields are not
important to reset. init_directory_entry() is introduced later in
"write index-v5" patch, you so may want to move it up a few patches.

> +static int read_entry(struct cache_entry **ce, char *pathname, size_t pathlen,
> +                     void *mmap, unsigned long mmap_size,
> +                     unsigned int first_entry_offset,
> +                     unsigned int foffsetblock)
> +{
> +       int len, offset_to_offset;
> +       char *name;
> +       uint32_t foffsetblockcrc, *filecrc, *beginning, *end, entry_offset;
> +       struct ondisk_cache_entry *disk_ce;
> +
> +       beginning = ptr_add(mmap, foffsetblock);
> +       end = ptr_add(mmap, foffsetblock + 4);
> +       len = ntoh_l(*end) - ntoh_l(*beginning) - sizeof(struct ondisk_cache_entry) - 5;

It took me a while to check and figure out " - 5" here means minus NUL
and the crc. A short comment would help. I think there's also another
-5 in read_directories(). Or maybe just rename len to namelen.

> +struct conflict_entry *create_new_conflict(char *name, int len, int pathlen)
> +{
> +       struct conflict_entry *conflict_entry;
> +
> +       if (pathlen)
> +               pathlen++;
> +       conflict_entry = xmalloc(conflict_entry_size(len));
> +       conflict_entry->entries = NULL;
> +       conflict_entry->nfileconflicts = 0;
> +       conflict_entry->namelen = len;
> +       memcpy(conflict_entry->name, name, len);
> +       conflict_entry->name[len] = '\0';
> +       conflict_entry->pathlen = pathlen;
> +       conflict_entry->next = NULL;

A memset followed by memcpy and conflict_entry->pathlen = pathlen
would make this shorter and won't miss new fields added in future.

> +static int read_entries(struct index_state *istate, struct directory_entry *de,
> +                       unsigned int first_entry_offset, void *mmap,
> +                       unsigned long mmap_size, unsigned int *nr,
> +                       unsigned int foffsetblock)
> +{
> +       struct cache_entry *ce;
> +       int i, subdir = 0;
> +
> +       for (i = 0; i < de->de_nfiles; i++) {
> +               unsigned int subdir_foffsetblock = de->de_foffset + foffsetblock + (i * 4);
> +               if (read_entry(&ce, de->pathname, de->de_pathlen, mmap, mmap_size,
> +                              first_entry_offset, subdir_foffsetblock) < 0)
> +                       return -1;

You read one file entry, say abc/def...

> +               while (subdir < de->de_nsubtrees &&
> +                      cache_name_compare(ce->name + de->de_pathlen,
> +                                         ce_namelen(ce) - de->de_pathlen,
> +                                         de->sub[subdir]->pathname + de->de_pathlen,
> +                                         de->sub[subdir]->de_pathlen - de->de_pathlen) > 0) {

Oh right the entry belongs the the substree "abc" so..

> +                       read_entries(istate, de->sub[subdir], first_entry_offset, mmap,
> +                                    mmap_size, nr, foffsetblock);

you recurse in, which will add following entries like abc/def and abc/xyz...

> +                       subdir++;
> +               }
> +               if (!ce)
> +                       continue;
> +               set_index_entry(istate, (*nr)++, ce);

then back here after recusion and add abc/def, again, after abc/xyz.
Did I read this code correctly?

> +       }
> +       for (i = subdir; i < de->de_nsubtrees; i++) {
> +               read_entries(istate, de->sub[i], first_entry_offset, mmap,
> +                            mmap_size, nr, foffsetblock);
> +       }
> +       return 0;
> +}
> +

> +static int read_index_v5(struct index_state *istate, void *mmap,
> +                        unsigned long mmap_size, struct filter_opts *opts)
> +{
> +       unsigned int entry_offset, ndirs, foffsetblock, nr = 0;
> +       struct directory_entry *root_directory, *de, *last_de;
> +       const char **paths = NULL;
> +       struct pathspec adjusted_pathspec;
> +       int need_root = 0, i;
> +
> +       root_directory = read_all_directories(istate, &entry_offset,
> +                                             &foffsetblock, &ndirs,
> +                                             mmap, mmap_size);
> +
> +       if (opts && opts->pathspec && opts->pathspec->nr) {
> +               need_root = 0;

need_root is already initialized at declaration.

> +               paths = xmalloc((opts->pathspec->nr + 1)*sizeof(char *));
> +               paths[opts->pathspec->nr] = NULL;
> +               for (i = 0; i < opts->pathspec->nr; i++) {
> +                       char *super = strdup(opts->pathspec->items[i].match);
> +                       int len = strlen(super);
> +                       while (len && super[len - 1] == '/' && super[len - 2] == '/')
> +                               super[--len] = '\0'; /* strip all but one trailing slash */
> +                       while (len && super[--len] != '/')
> +                               ; /* scan backwards to next / */
> +                       if (len >= 0)
> +                               super[len--] = '\0';
> +                       if (len <= 0) {
> +                               need_root = 1;
> +                               break;
> +                       }
> +                       paths[i] = super;
> +               }
> +       }
> +
> +       if (!need_root)
> +               parse_pathspec(&adjusted_pathspec, PATHSPEC_ALL_MAGIC, PATHSPEC_PREFER_CWD, NULL, paths);
> +
> +       de = root_directory;
> +       last_de = de;
> +       while (de) {
> +               if (need_root ||
> +                   match_pathspec_depth(&adjusted_pathspec, de->pathname, de->de_pathlen, 0, NULL)) {
> +                       if (read_entries(istate, de, entry_offset,
> +                                        mmap, mmap_size, &nr,
> +                                        foffsetblock) < 0)
> +                               return -1;
> +               } else {
> +                       for (i = 0; i < de->de_nsubtrees; i++) {
> +                               last_de->next = de->sub[i];
> +                               last_de = last_de->next;
> +                       }
> +               }
> +               de = de->next;

I'm missing something here. read_entries is a function that reads all
entries inside "de" including subdirectories and the first "de" is
root_directory, which makes it read the whole index in. Because
de->next is only set in this function, de->next after read_entries()
is NULL, which termintates the loop and the else block never runs. It
does not sound right..
-- 
Duy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 15/24] read-cache: read index-v5
  2013-08-18 19:42 ` [PATCH v3 15/24] read-cache: read index-v5 Thomas Gummerer
  2013-08-19  1:57   ` Eric Sunshine
  2013-08-20 14:01   ` Duy Nguyen
@ 2013-08-20 14:16   ` Duy Nguyen
  2013-08-20 21:13     ` Thomas Gummerer
  2013-08-23 23:52   ` Duy Nguyen
  3 siblings, 1 reply; 55+ messages in thread
From: Duy Nguyen @ 2013-08-20 14:16 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git Mailing List, Thomas Rast, Michael Haggerty, Junio C Hamano,
	Robin Rosenberg, Eric Sunshine, Ramsay Jones

On Mon, Aug 19, 2013 at 2:42 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> +static int read_entry(struct cache_entry **ce, char *pathname, size_t pathlen,
> +                     void *mmap, unsigned long mmap_size,
> +                     unsigned int first_entry_offset,
> +                     unsigned int foffsetblock)
> +{
> +       int len, offset_to_offset;
> +       char *name;
> +       uint32_t foffsetblockcrc, *filecrc, *beginning, *end, entry_offset;
> +       struct ondisk_cache_entry *disk_ce;
> +
> +       beginning = ptr_add(mmap, foffsetblock);
> +       end = ptr_add(mmap, foffsetblock + 4);
> +       len = ntoh_l(*end) - ntoh_l(*beginning) - sizeof(struct ondisk_cache_entry) - 5;
> +       entry_offset = first_entry_offset + ntoh_l(*beginning);
> +       name = ptr_add(mmap, entry_offset);
> +       disk_ce = ptr_add(mmap, entry_offset + len + 1);
> +       *ce = cache_entry_from_ondisk(disk_ce, pathname, name, len, pathlen);
> +       filecrc = ptr_add(mmap, entry_offset + len + 1 + sizeof(*disk_ce));
> +       offset_to_offset = htonl(foffsetblock);
> +       foffsetblockcrc = crc32(0, (Bytef*)&offset_to_offset, 4);
> +       if (!check_crc32(foffsetblockcrc,
> +               ptr_add(mmap, entry_offset), len + 1 + sizeof(*disk_ce),
> +               ntoh_l(*filecrc)))
> +               return -1;
> +
> +       return 0;
> +}

Last thought before book+bed time. I wonder if moving the name part to
the end of the entry (i.e. chaging on disk format) would simplify this
code. The new ondisk_cache_entry would be something like this

struct ondisk_cache_entry {
   uint16_t flags;
   uint16_t mode;
   struct cache_time mtime;
   uint32_t size;
   int stat_crc;
   unsigned char sha1[20];
   char name[FLEX_ARRAY];
};
-- 
Duy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 02/24] read-cache: use fixed width integer types
  2013-08-18 19:41 ` [PATCH v3 02/24] read-cache: use fixed width integer types Thomas Gummerer
  2013-08-18 20:21   ` Eric Sunshine
@ 2013-08-20 19:30   ` Junio C Hamano
  2013-08-21  3:05     ` Thomas Gummerer
  1 sibling, 1 reply; 55+ messages in thread
From: Junio C Hamano @ 2013-08-20 19:30 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, trast, mhagger, pclouds, robin.rosenberg, sunshine, ramsay

Thomas Gummerer <t.gummerer@gmail.com> writes:

> Use the fixed width integer types uint16_t and uint32_t for ondisk
> structures, because unsigned short and unsigned int do not hae a
> guaranteed size.

This sounds like an independent fix to me.  I'd queue this early
independent from the rest of the series.

Thanks.

>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>  cache.h      | 10 +++++-----
>  read-cache.c | 30 +++++++++++++++---------------
>  2 files changed, 20 insertions(+), 20 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index bd6fb9f..9ef778a 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -101,9 +101,9 @@ unsigned long git_deflate_bound(git_zstream *, unsigned long);
>  
>  #define CACHE_SIGNATURE 0x44495243	/* "DIRC" */
>  struct cache_header {
> -	unsigned int hdr_signature;
> -	unsigned int hdr_version;
> -	unsigned int hdr_entries;
> +	uint32_t hdr_signature;
> +	uint32_t hdr_version;
> +	uint32_t hdr_entries;
>  };
>  
>  #define INDEX_FORMAT_LB 2
> @@ -115,8 +115,8 @@ struct cache_header {
>   * check it for equality in the 32 bits we save.
>   */
>  struct cache_time {
> -	unsigned int sec;
> -	unsigned int nsec;
> +	uint32_t sec;
> +	uint32_t nsec;
>  };
>  
>  struct stat_data {
> diff --git a/read-cache.c b/read-cache.c
> index ceaf207..0df5b31 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1230,14 +1230,14 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int reall
>  struct ondisk_cache_entry {
>  	struct cache_time ctime;
>  	struct cache_time mtime;
> -	unsigned int dev;
> -	unsigned int ino;
> -	unsigned int mode;
> -	unsigned int uid;
> -	unsigned int gid;
> -	unsigned int size;
> +	uint32_t dev;
> +	uint32_t ino;
> +	uint32_t mode;
> +	uint32_t uid;
> +	uint32_t gid;
> +	uint32_t size;
>  	unsigned char sha1[20];
> -	unsigned short flags;
> +	uint16_t flags;
>  	char name[FLEX_ARRAY]; /* more */
>  };
>  
> @@ -1249,15 +1249,15 @@ struct ondisk_cache_entry {
>  struct ondisk_cache_entry_extended {
>  	struct cache_time ctime;
>  	struct cache_time mtime;
> -	unsigned int dev;
> -	unsigned int ino;
> -	unsigned int mode;
> -	unsigned int uid;
> -	unsigned int gid;
> -	unsigned int size;
> +	uint32_t dev;
> +	uint32_t ino;
> +	uint32_t mode;
> +	uint32_t uid;
> +	uint32_t gid;
> +	uint32_t size;
>  	unsigned char sha1[20];
> -	unsigned short flags;
> -	unsigned short flags2;
> +	uint16_t flags;
> +	uint16_t flags2;
>  	char name[FLEX_ARRAY]; /* more */
>  };

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 04/24] read-cache: clear version in discard_index()
  2013-08-18 19:41 ` [PATCH v3 04/24] read-cache: clear version in discard_index() Thomas Gummerer
@ 2013-08-20 19:34   ` Junio C Hamano
  2013-08-21  3:06     ` Thomas Gummerer
  0 siblings, 1 reply; 55+ messages in thread
From: Junio C Hamano @ 2013-08-20 19:34 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: git, trast, mhagger, pclouds, robin.rosenberg, sunshine, ramsay

Thomas Gummerer <t.gummerer@gmail.com> writes:

> All fields except index_state->version are reset in discard_index.
> Reset the version too.

What is the practical consequence of not clearing this field?  I
somehow have a feeling that this was done deliberately, so that we
can stick to the version of the index file format better, once the
user said "update-index --index-version $N" to set it up.  I suspect
that the patch would affect a codepath that does read_cache(), calls
discard_index(), populates the index and then does write_cache().
We stick to the version the user specified earlier in our current
code, while the patched code will revert to whatever default built
into your Git binary, no?

>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>  read-cache.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/read-cache.c b/read-cache.c
> index de0bbcd..1e22f6f 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1558,6 +1558,7 @@ int discard_index(struct index_state *istate)
>  	for (i = 0; i < istate->cache_nr; i++)
>  		free(istate->cache[i]);
>  	resolve_undo_clear_index(istate);
> +	istate->version = 0;
>  	istate->cache_nr = 0;
>  	istate->cache_changed = 0;
>  	istate->timestamp.sec = 0;

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 15/24] read-cache: read index-v5
  2013-08-20 14:01   ` Duy Nguyen
@ 2013-08-20 20:59     ` Thomas Gummerer
  2013-08-21  0:44       ` Duy Nguyen
  0 siblings, 1 reply; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-20 20:59 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Thomas Rast, Michael Haggerty, Junio C Hamano,
	Robin Rosenberg, Eric Sunshine, Ramsay Jones

Duy Nguyen <pclouds@gmail.com> writes:

> General comment: a short comment before each function describing what
> the function does would be helpful. This only applies for complex
> functions (read_* ones). Of course verify_hdr does not require extra
> explanantion.

Yes, makes sense, I'll do that in the re-roll.

>  On Mon, Aug 19, 2013 at 2:42 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> +static struct directory_entry *directory_entry_from_ondisk(struct ondisk_directory_entry *ondisk,
>> +                                                  const char *name,
>> +                                                  size_t len)
>> +{
>> +       struct directory_entry *de = xmalloc(directory_entry_size(len));
>> +
>> +       memcpy(de->pathname, name, len);
>> +       de->pathname[len] = '\0';
>> +       de->de_flags      = ntoh_s(ondisk->flags);
>> +       de->de_foffset    = ntoh_l(ondisk->foffset);
>> +       de->de_cr         = ntoh_l(ondisk->cr);
>> +       de->de_ncr        = ntoh_l(ondisk->ncr);
>> +       de->de_nsubtrees  = ntoh_l(ondisk->nsubtrees);
>> +       de->de_nfiles     = ntoh_l(ondisk->nfiles);
>> +       de->de_nentries   = ntoh_l(ondisk->nentries);
>> +       de->de_pathlen    = len;
>> +       hashcpy(de->sha1, ondisk->sha1);
>> +       return de;
>> +}
>
> This function leaves a lot of fields uninitialized..
>
>> +static struct directory_entry *read_directories(unsigned int *dir_offset,
>> +                               unsigned int *dir_table_offset,
>> +                               void *mmap,
>> +                               int mmap_size)
>> +{
>> ....
>> +       de = directory_entry_from_ondisk(disk_de, name, len);
>> +       de->next = NULL;
>> +       de->sub = NULL;
>
> ..and two of them are set to NULL here. Maybe
> directory_entry_from_ondisk() could be made to call
> init_directory_entry() instead so that we don't need to manually reset
> some fields here, which leaves me wondering why other fields are not
> important to reset. init_directory_entry() is introduced later in
> "write index-v5" patch, you so may want to move it up a few patches.

The rest of the fields are only used for compiling the data that will be
written.  Using init_directory_entry() here makes sense anyway though,
thanks for the suggestion.

>> +static int read_entry(struct cache_entry **ce, char *pathname, size_t pathlen,
>> +                     void *mmap, unsigned long mmap_size,
>> +                     unsigned int first_entry_offset,
>> +                     unsigned int foffsetblock)
>> +{
>> +       int len, offset_to_offset;
>> +       char *name;
>> +       uint32_t foffsetblockcrc, *filecrc, *beginning, *end, entry_offset;
>> +       struct ondisk_cache_entry *disk_ce;
>> +
>> +       beginning = ptr_add(mmap, foffsetblock);
>> +       end = ptr_add(mmap, foffsetblock + 4);
>> +       len = ntoh_l(*end) - ntoh_l(*beginning) - sizeof(struct ondisk_cache_entry) - 5;
>
> It took me a while to check and figure out " - 5" here means minus NUL
> and the crc. A short comment would help. I think there's also another
> -5 in read_directories(). Or maybe just rename len to namelen.

Will add a short comment.

>> +struct conflict_entry *create_new_conflict(char *name, int len, int pathlen)
>> +{
>> +       struct conflict_entry *conflict_entry;
>> +
>> +       if (pathlen)
>> +               pathlen++;
>> +       conflict_entry = xmalloc(conflict_entry_size(len));
>> +       conflict_entry->entries = NULL;
>> +       conflict_entry->nfileconflicts = 0;
>> +       conflict_entry->namelen = len;
>> +       memcpy(conflict_entry->name, name, len);
>> +       conflict_entry->name[len] = '\0';
>> +       conflict_entry->pathlen = pathlen;
>> +       conflict_entry->next = NULL;
>
> A memset followed by memcpy and conflict_entry->pathlen = pathlen
> would make this shorter and won't miss new fields added in future.

Makes sense, thanks.

>> +static int read_entries(struct index_state *istate, struct directory_entry *de,
>> +                       unsigned int first_entry_offset, void *mmap,
>> +                       unsigned long mmap_size, unsigned int *nr,
>> +                       unsigned int foffsetblock)
>> +{
>> +       struct cache_entry *ce;
>> +       int i, subdir = 0;
>> +
>> +       for (i = 0; i < de->de_nfiles; i++) {
>> +               unsigned int subdir_foffsetblock = de->de_foffset + foffsetblock + (i * 4);
>> +               if (read_entry(&ce, de->pathname, de->de_pathlen, mmap, mmap_size,
>> +                              first_entry_offset, subdir_foffsetblock) < 0)
>> +                       return -1;
>
> You read one file entry, say abc/def...

You're not quite right here.  I'm reading def here, de is the root
directory and de->sub[subdir] is the first sub directory, named abc/

>> +               while (subdir < de->de_nsubtrees &&
>> +                      cache_name_compare(ce->name + de->de_pathlen,
>> +                                         ce_namelen(ce) - de->de_pathlen,
>> +                                         de->sub[subdir]->pathname + de->de_pathlen,
>> +                                         de->sub[subdir]->de_pathlen - de->de_pathlen) > 0) {
>
> Oh right the entry belongs the the substree "abc" so..

abc/ comes before def, so lets read everything in that directory first.

>> +                       read_entries(istate, de->sub[subdir], first_entry_offset, mmap,
>> +                                    mmap_size, nr, foffsetblock);
>
> you recurse in, which will add following entries like abc/def and abc/xyz...

Recurse in, add abc/def and abc/xyz, and increase nr in the recursion,
so the new entry gets added at the right place.

>> +                       subdir++;
>> +               }
>> +               if (!ce)
>> +                       continue;
>> +               set_index_entry(istate, (*nr)++, ce);
>
> then back here after recusion and add abc/def, again, after abc/xyz.
> Did I read this code correctly?

After the recursion add def to at the 3rd position in the index.  After
that it looks like:
abc/def
abc/xyz
def

I hope that makes it a little clearer.

>> +       }
>> +       for (i = subdir; i < de->de_nsubtrees; i++) {
>> +               read_entries(istate, de->sub[i], first_entry_offset, mmap,
>> +                            mmap_size, nr, foffsetblock);
>> +       }
>> +       return 0;
>> +}
>> +
>
>> +static int read_index_v5(struct index_state *istate, void *mmap,
>> +                        unsigned long mmap_size, struct filter_opts *opts)
>> +{
>> +       unsigned int entry_offset, ndirs, foffsetblock, nr = 0;
>> +       struct directory_entry *root_directory, *de, *last_de;
>> +       const char **paths = NULL;
>> +       struct pathspec adjusted_pathspec;
>> +       int need_root = 0, i;
>> +
>> +       root_directory = read_all_directories(istate, &entry_offset,
>> +                                             &foffsetblock, &ndirs,
>> +                                             mmap, mmap_size);
>> +
>> +       if (opts && opts->pathspec && opts->pathspec->nr) {
>> +               need_root = 0;
>
> need_root is already initialized at declaration.

Right, thanks.

>> +               paths = xmalloc((opts->pathspec->nr + 1)*sizeof(char *));
>> +               paths[opts->pathspec->nr] = NULL;
>> +               for (i = 0; i < opts->pathspec->nr; i++) {
>> +                       char *super = strdup(opts->pathspec->items[i].match);
>> +                       int len = strlen(super);
>> +                       while (len && super[len - 1] == '/' && super[len - 2] == '/')
>> +                               super[--len] = '\0'; /* strip all but one trailing slash */
>> +                       while (len && super[--len] != '/')
>> +                               ; /* scan backwards to next / */
>> +                       if (len >= 0)
>> +                               super[len--] = '\0';
>> +                       if (len <= 0) {
>> +                               need_root = 1;
>> +                               break;
>> +                       }
>> +                       paths[i] = super;
>> +               }
>> +       }
>> +
>> +       if (!need_root)
>> +               parse_pathspec(&adjusted_pathspec, PATHSPEC_ALL_MAGIC, PATHSPEC_PREFER_CWD, NULL, paths);
>> +
>> +       de = root_directory;
>> +       last_de = de;
>> +       while (de) {
>> +               if (need_root ||
>> +                   match_pathspec_depth(&adjusted_pathspec, de->pathname, de->de_pathlen, 0, NULL)) {
>> +                       if (read_entries(istate, de, entry_offset,
>> +                                        mmap, mmap_size, &nr,
>> +                                        foffsetblock) < 0)
>> +                               return -1;
>> +               } else {
>> +                       for (i = 0; i < de->de_nsubtrees; i++) {
>> +                               last_de->next = de->sub[i];
>> +                               last_de = last_de->next;
>> +                       }
>> +               }
>> +               de = de->next;
>
> I'm missing something here. read_entries is a function that reads all
> entries inside "de" including subdirectories and the first "de" is
> root_directory, which makes it read the whole index in.

It does, except when the adjusted_pathspec doesn't match the
root_directory.  In that case all the subdirectories of the
root_directory are added to a queue, which will then be iterated over
and tried to match with the adjusted_pathspec.

This has a bug not covered by the test suite described below when
checking against pathspecs with different levels.

> Because
> de->next is only set in this function, de->next after read_entries()
> is NULL, which termintates the loop and the else block never runs. It
> does not sound right..

If the subdirectory is read it does and the loop should terminate,
because the whole index is read.

It does have a bug in the following test case though, which is not
covered by the test suite.  I'll add this and the test and the fix to
the test-suite:

#!/bin/sh

test_description="Test index-v5 specific corner cases"

. ./test-lib.sh

test_set_index_version 5

test_expect_success 'setup' '
	mkdir -p abc/def def &&
	touch abc/def/xyz def/xyz &&
	git add . &&
	git commit -m "test commit"
'

test_expect_success 'ls-files ordering correct' '
	cat <<-\EOF >expected &&
	abc/def/xyz
	def/xyz
	EOF
	git ls-files abc/def/xyz def/xyz >actual &&
	test_cmp expected actual
'

test_done

This can be solved by the following:

diff --git a/read-cache-v5.c b/read-cache-v5.c
index 10960fd..9963d1f 100644
--- a/read-cache-v5.c
+++ b/read-cache-v5.c
@@ -661,7 +661,9 @@ static int read_index_v5(struct index_state *istate, void *mmap,
                                         foffsetblock) < 0)
                                return -1;
                } else {
+                       last_de = de;
                        for (i = 0; i < de->de_nsubtrees; i++) {
+                               de->sub[i]->next = last_de->next;
                                last_de->next = de->sub[i];
                                last_de = last_de->next;
                        }

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 15/24] read-cache: read index-v5
  2013-08-20 14:16   ` Duy Nguyen
@ 2013-08-20 21:13     ` Thomas Gummerer
  0 siblings, 0 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-20 21:13 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Thomas Rast, Michael Haggerty, Junio C Hamano,
	Robin Rosenberg, Eric Sunshine, Ramsay Jones

Duy Nguyen <pclouds@gmail.com> writes:

> On Mon, Aug 19, 2013 at 2:42 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> +static int read_entry(struct cache_entry **ce, char *pathname, size_t pathlen,
>> +                     void *mmap, unsigned long mmap_size,
>> +                     unsigned int first_entry_offset,
>> +                     unsigned int foffsetblock)
>> +{
>> +       int len, offset_to_offset;
>> +       char *name;
>> +       uint32_t foffsetblockcrc, *filecrc, *beginning, *end, entry_offset;
>> +       struct ondisk_cache_entry *disk_ce;
>> +
>> +       beginning = ptr_add(mmap, foffsetblock);
>> +       end = ptr_add(mmap, foffsetblock + 4);
>> +       len = ntoh_l(*end) - ntoh_l(*beginning) - sizeof(struct ondisk_cache_entry) - 5;
>> +       entry_offset = first_entry_offset + ntoh_l(*beginning);
>> +       name = ptr_add(mmap, entry_offset);
>> +       disk_ce = ptr_add(mmap, entry_offset + len + 1);
>> +       *ce = cache_entry_from_ondisk(disk_ce, pathname, name, len, pathlen);
>> +       filecrc = ptr_add(mmap, entry_offset + len + 1 + sizeof(*disk_ce));
>> +       offset_to_offset = htonl(foffsetblock);
>> +       foffsetblockcrc = crc32(0, (Bytef*)&offset_to_offset, 4);
>> +       if (!check_crc32(foffsetblockcrc,
>> +               ptr_add(mmap, entry_offset), len + 1 + sizeof(*disk_ce),
>> +               ntoh_l(*filecrc)))
>> +               return -1;
>> +
>> +       return 0;
>> +}
>
> Last thought before book+bed time. I wonder if moving the name part to
> the end of the entry (i.e. chaging on disk format) would simplify this
> code. The new ondisk_cache_entry would be something like this
>
> struct ondisk_cache_entry {
>    uint16_t flags;
>    uint16_t mode;
>    struct cache_time mtime;
>    uint32_t size;
>    int stat_crc;
>    unsigned char sha1[20];
>    char name[FLEX_ARRAY];
> };

I think it simplifies it a bit, but not too much, the only thing I see
avoiding the use of the name variable.  I think it will also simplify
the writing code a bit.  The only negative part would be for bisecting
the index, but that would still be possible, and only slightly more
complicated.  I'll give it a try tomorrow and check if it's worth it.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 15/24] read-cache: read index-v5
  2013-08-20 20:59     ` Thomas Gummerer
@ 2013-08-21  0:44       ` Duy Nguyen
  0 siblings, 0 replies; 55+ messages in thread
From: Duy Nguyen @ 2013-08-21  0:44 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git Mailing List, Thomas Rast, Michael Haggerty, Junio C Hamano,
	Robin Rosenberg, Eric Sunshine, Ramsay Jones

On Wed, Aug 21, 2013 at 3:59 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>>> +static int read_entries(struct index_state *istate, struct directory_entry *de,
>>> +                       unsigned int first_entry_offset, void *mmap,
>>> +                       unsigned long mmap_size, unsigned int *nr,
>>> +                       unsigned int foffsetblock)
>>> +{
>>> +       struct cache_entry *ce;
>>> +       int i, subdir = 0;
>>> +
>>> +       for (i = 0; i < de->de_nfiles; i++) {
>>> +               unsigned int subdir_foffsetblock = de->de_foffset + foffsetblock + (i * 4);
>>> +               if (read_entry(&ce, de->pathname, de->de_pathlen, mmap, mmap_size,
>>> +                              first_entry_offset, subdir_foffsetblock) < 0)
>>> +                       return -1;
>>
>> You read one file entry, say abc/def...
>
> You're not quite right here.  I'm reading def here, de is the root
> directory and de->sub[subdir] is the first sub directory, named abc/
>
>>> +               while (subdir < de->de_nsubtrees &&
>>> +                      cache_name_compare(ce->name + de->de_pathlen,
>>> +                                         ce_namelen(ce) - de->de_pathlen,
>>> +                                         de->sub[subdir]->pathname + de->de_pathlen,
>>> +                                         de->sub[subdir]->de_pathlen - de->de_pathlen) > 0) {
>>
>> Oh right the entry belongs the the substree "abc" so..
>
> abc/ comes before def, so lets read everything in that directory first.
>
>>> +                       read_entries(istate, de->sub[subdir], first_entry_offset, mmap,
>>> +                                    mmap_size, nr, foffsetblock);
>>
>> you recurse in, which will add following entries like abc/def and abc/xyz...
>
> Recurse in, add abc/def and abc/xyz, and increase nr in the recursion,
> so the new entry gets added at the right place.
>
>>> +                       subdir++;
>>> +               }
>>> +               if (!ce)
>>> +                       continue;
>>> +               set_index_entry(istate, (*nr)++, ce);
>>
>> then back here after recusion and add abc/def, again, after abc/xyz.
>> Did I read this code correctly?
>
> After the recursion add def to at the 3rd position in the index.  After
> that it looks like:
> abc/def
> abc/xyz
> def
>
> I hope that makes it a little clearer.

It does. Thanks.

>>> +       de = root_directory;
>>> +       last_de = de;
>>> +       while (de) {
>>> +               if (need_root ||
>>> +                   match_pathspec_depth(&adjusted_pathspec, de->pathname, de->de_pathlen, 0, NULL)) {
>>> +                       if (read_entries(istate, de, entry_offset,
>>> +                                        mmap, mmap_size, &nr,
>>> +                                        foffsetblock) < 0)
>>> +                               return -1;
>>> +               } else {
>>> +                       for (i = 0; i < de->de_nsubtrees; i++) {
>>> +                               last_de->next = de->sub[i];
>>> +                               last_de = last_de->next;
>>> +                       }
>>> +               }
>>> +               de = de->next;
>>
>> I'm missing something here. read_entries is a function that reads all
>> entries inside "de" including subdirectories and the first "de" is
>> root_directory, which makes it read the whole index in.
>
> It does, except when the adjusted_pathspec doesn't match the
> root_directory.  In that case all the subdirectories of the
> root_directory are added to a queue, which will then be iterated over
> and tried to match with the adjusted_pathspec.

That's what I missed. Thanks.
-- 
Duy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 23/24] introduce GIT_INDEX_VERSION environment variable
  2013-08-18 19:42 ` [PATCH v3 23/24] introduce GIT_INDEX_VERSION environment variable Thomas Gummerer
@ 2013-08-21  0:57   ` Duy Nguyen
  2013-08-21  4:01     ` Thomas Gummerer
  0 siblings, 1 reply; 55+ messages in thread
From: Duy Nguyen @ 2013-08-21  0:57 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git Mailing List, Thomas Rast, Michael Haggerty, Junio C Hamano,
	Robin Rosenberg, Eric Sunshine, Ramsay Jones

On Mon, Aug 19, 2013 at 2:42 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> Respect a GIT_INDEX_VERSION environment variable, when a new index is
> initialized.  Setting the environment variable will not cause existing
> index files to be converted to another format for additional safety.
>
> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
> ---
>  read-cache.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)

There should be a line or two about this variable in git.txt, section
"Environment variables". We could even have core.defaultIndexVersion
for people who don't want to set environment variables (and set this
key in ~/.gitconfig instead) but this is not important now.
-- 
Duy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 02/24] read-cache: use fixed width integer types
  2013-08-20 19:30   ` Junio C Hamano
@ 2013-08-21  3:05     ` Thomas Gummerer
  0 siblings, 0 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-21  3:05 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, trast, mhagger, pclouds, robin.rosenberg, sunshine, ramsay

Junio C Hamano <gitster@pobox.com> writes:

> Thomas Gummerer <t.gummerer@gmail.com> writes:
>
>> Use the fixed width integer types uint16_t and uint32_t for ondisk
>> structures, because unsigned short and unsigned int do not hae a
>> guaranteed size.
>
> This sounds like an independent fix to me.  I'd queue this early
> independent from the rest of the series.
>
> Thanks.

Sounds good to me.  Thanks.

>>
>> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
>> ---
>>  cache.h      | 10 +++++-----
>>  read-cache.c | 30 +++++++++++++++---------------
>>  2 files changed, 20 insertions(+), 20 deletions(-)
>>
>> diff --git a/cache.h b/cache.h
>> index bd6fb9f..9ef778a 100644
>> --- a/cache.h
>> +++ b/cache.h
>> @@ -101,9 +101,9 @@ unsigned long git_deflate_bound(git_zstream *, unsigned long);
>>  
>>  #define CACHE_SIGNATURE 0x44495243	/* "DIRC" */
>>  struct cache_header {
>> -	unsigned int hdr_signature;
>> -	unsigned int hdr_version;
>> -	unsigned int hdr_entries;
>> +	uint32_t hdr_signature;
>> +	uint32_t hdr_version;
>> +	uint32_t hdr_entries;
>>  };
>>  
>>  #define INDEX_FORMAT_LB 2
>> @@ -115,8 +115,8 @@ struct cache_header {
>>   * check it for equality in the 32 bits we save.
>>   */
>>  struct cache_time {
>> -	unsigned int sec;
>> -	unsigned int nsec;
>> +	uint32_t sec;
>> +	uint32_t nsec;
>>  };
>>  
>>  struct stat_data {
>> diff --git a/read-cache.c b/read-cache.c
>> index ceaf207..0df5b31 100644
>> --- a/read-cache.c
>> +++ b/read-cache.c
>> @@ -1230,14 +1230,14 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce, int reall
>>  struct ondisk_cache_entry {
>>  	struct cache_time ctime;
>>  	struct cache_time mtime;
>> -	unsigned int dev;
>> -	unsigned int ino;
>> -	unsigned int mode;
>> -	unsigned int uid;
>> -	unsigned int gid;
>> -	unsigned int size;
>> +	uint32_t dev;
>> +	uint32_t ino;
>> +	uint32_t mode;
>> +	uint32_t uid;
>> +	uint32_t gid;
>> +	uint32_t size;
>>  	unsigned char sha1[20];
>> -	unsigned short flags;
>> +	uint16_t flags;
>>  	char name[FLEX_ARRAY]; /* more */
>>  };
>>  
>> @@ -1249,15 +1249,15 @@ struct ondisk_cache_entry {
>>  struct ondisk_cache_entry_extended {
>>  	struct cache_time ctime;
>>  	struct cache_time mtime;
>> -	unsigned int dev;
>> -	unsigned int ino;
>> -	unsigned int mode;
>> -	unsigned int uid;
>> -	unsigned int gid;
>> -	unsigned int size;
>> +	uint32_t dev;
>> +	uint32_t ino;
>> +	uint32_t mode;
>> +	uint32_t uid;
>> +	uint32_t gid;
>> +	uint32_t size;
>>  	unsigned char sha1[20];
>> -	unsigned short flags;
>> -	unsigned short flags2;
>> +	uint16_t flags;
>> +	uint16_t flags2;
>>  	char name[FLEX_ARRAY]; /* more */
>>  };

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 04/24] read-cache: clear version in discard_index()
  2013-08-20 19:34   ` Junio C Hamano
@ 2013-08-21  3:06     ` Thomas Gummerer
  0 siblings, 0 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-21  3:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, trast, mhagger, pclouds, robin.rosenberg, sunshine, ramsay

Junio C Hamano <gitster@pobox.com> writes:

> Thomas Gummerer <t.gummerer@gmail.com> writes:
>
>> All fields except index_state->version are reset in discard_index.
>> Reset the version too.
>
> What is the practical consequence of not clearing this field?  I
> somehow have a feeling that this was done deliberately, so that we
> can stick to the version of the index file format better, once the
> user said "update-index --index-version $N" to set it up.  I suspect
> that the patch would affect a codepath that does read_cache(), calls
> discard_index(), populates the index and then does write_cache().
> We stick to the version the user specified earlier in our current
> code, while the patched code will revert to whatever default built
> into your Git binary, no?

Yeah you're right, I missed that use-case.  I'll drop this patch from
the re-roll.  Sorry for the noise.

>>
>> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
>> ---
>>  read-cache.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/read-cache.c b/read-cache.c
>> index de0bbcd..1e22f6f 100644
>> --- a/read-cache.c
>> +++ b/read-cache.c
>> @@ -1558,6 +1558,7 @@ int discard_index(struct index_state *istate)
>>  	for (i = 0; i < istate->cache_nr; i++)
>>  		free(istate->cache[i]);
>>  	resolve_undo_clear_index(istate);
>> +	istate->version = 0;
>>  	istate->cache_nr = 0;
>>  	istate->cache_changed = 0;
>>  	istate->timestamp.sec = 0;

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 23/24] introduce GIT_INDEX_VERSION environment variable
  2013-08-21  0:57   ` Duy Nguyen
@ 2013-08-21  4:01     ` Thomas Gummerer
  0 siblings, 0 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-21  4:01 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Thomas Rast, Michael Haggerty, Junio C Hamano,
	Robin Rosenberg, Eric Sunshine, Ramsay Jones

Duy Nguyen <pclouds@gmail.com> writes:

> On Mon, Aug 19, 2013 at 2:42 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> Respect a GIT_INDEX_VERSION environment variable, when a new index is
>> initialized.  Setting the environment variable will not cause existing
>> index files to be converted to another format for additional safety.
>>
>> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
>> ---
>>  read-cache.c | 9 +++++++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> There should be a line or two about this variable in git.txt, section
> "Environment variables". We could even have core.defaultIndexVersion
> for people who don't want to set environment variables (and set this
> key in ~/.gitconfig instead) but this is not important now.

Ok, I'll add it in git.txt.  I agree, core.defaultIndexVersion can still
be done in a follow-up patch, the environment variable is the important
thing now because it's used for testing.  Existing repositories have to
be converted with git-update-index anyway.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 15/24] read-cache: read index-v5
  2013-08-18 19:42 ` [PATCH v3 15/24] read-cache: read index-v5 Thomas Gummerer
                     ` (2 preceding siblings ...)
  2013-08-20 14:16   ` Duy Nguyen
@ 2013-08-23 23:52   ` Duy Nguyen
  3 siblings, 0 replies; 55+ messages in thread
From: Duy Nguyen @ 2013-08-23 23:52 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git Mailing List, Thomas Rast, Michael Haggerty, Junio C Hamano,
	Robin Rosenberg, Eric Sunshine, Ramsay Jones

Nit, add_part_to_conflict_entry(), create_new_conflict() and related
structures/macros are not used in this patch. The first caller is in
the next patch (read resolve-undo data).
--
Duy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 17/24] read-cache: read cache-tree in index-v5
  2013-08-18 19:42 ` [PATCH v3 17/24] read-cache: read cache-tree in index-v5 Thomas Gummerer
@ 2013-08-24  0:09   ` Duy Nguyen
  2013-11-25 15:41     ` Thomas Gummerer
  0 siblings, 1 reply; 55+ messages in thread
From: Duy Nguyen @ 2013-08-24  0:09 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git Mailing List, Thomas Rast, Michael Haggerty, Junio C Hamano,
	Robin Rosenberg, Eric Sunshine, Ramsay Jones

On Mon, Aug 19, 2013 at 2:42 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> +/*
> + * This function modifies the directory argument that is given to it.
> + * Don't use it if the directory entries are still needed after.
> + */

There goes my hope of keeping directory_entry* in core so that at
write-time, after validation, we only need to recreate some trees
instead of all of them..

Or we could make cache-tree keep references to directory_entry. If a
cache-tree is not invalidated, then the attached directory_tree should
be reused..

> +static struct cache_tree *cache_tree_convert_v5(struct directory_entry *de)
> +{
> +       if (!de->de_nentries)
> +               return NULL;
> +       sort_directories(de);
> +       return convert_one(de);
> +}
> +
>  static int read_entries(struct index_state *istate, struct directory_entry *de,
>                         unsigned int first_entry_offset, void *mmap,
>                         unsigned long mmap_size, unsigned int *nr,
> @@ -591,6 +668,7 @@ static int read_index_v5(struct index_state *istate, void *mmap,
>                 }
>                 de = de->next;
>         }
> +       istate->cache_tree = cache_tree_convert_v5(root_directory);
>         istate->cache_nr = nr;
>         return 0;
>  }

Otherwise we do need to free root_directory down to the deepest
subtrees, I think. People have been complaining about read-cache
leaking memory like mad, so this is a real issue. Even if you keep
references in cache-tree, you still need to free it
cache_tree_invalidate_path() to avoid leaking
-- 
Duy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 18/24] read-cache: write index-v5
  2013-08-18 19:42 ` [PATCH v3 18/24] read-cache: write index-v5 Thomas Gummerer
@ 2013-08-24  3:58   ` Duy Nguyen
  2013-11-25 15:37     ` Thomas Gummerer
  2013-08-24  4:07   ` Duy Nguyen
  1 sibling, 1 reply; 55+ messages in thread
From: Duy Nguyen @ 2013-08-24  3:58 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git Mailing List, Thomas Rast, Michael Haggerty, Junio C Hamano,
	Robin Rosenberg, Eric Sunshine, Ramsay Jones

On Mon, Aug 19, 2013 at 2:42 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> +char *super_directory(const char *filename)
> +{
> +       char *slash;
> +
> +       slash = strrchr(filename, '/');
> +       if (slash)
> +               return xmemdupz(filename, slash-filename);
> +       return NULL;
> +}
> +

Why is this function not static? There are a few other in this patch
(I did not notice in others, but I wasn't looking for them..)

But isn't this what dirname() is?

Another point about this function is it returns a new allocated
string, but I see no free() anywhere in this patch. Leak alert!

> +struct directory_entry *init_directory_entry(char *pathname, int len)
> +{
> +       struct directory_entry *de = xmalloc(directory_entry_size(len));
> +
> +       memcpy(de->pathname, pathname, len);
> +       de->pathname[len] = '\0';
> +       de->de_flags      = 0;
> +       de->de_foffset    = 0;
> +       de->de_cr         = 0;
> +       de->de_ncr        = 0;
> +       de->de_nsubtrees  = 0;
> +       de->de_nfiles     = 0;
> +       de->de_nentries   = 0;
> +       memset(de->sha1, 0, 20);
> +       de->de_pathlen    = len;
> +       de->next          = NULL;
> +       de->next_hash     = NULL;
> +       de->ce            = NULL;
> +       de->ce_last       = NULL;
> +       de->conflict      = NULL;
> +       de->conflict_last = NULL;
> +       de->conflict_size = 0;
> +       return de;
> +}

I think this function could be shortened to

struct directory_entry *de = xcalloc(1, directory_entry_size(len));
memcpy(de->pathname, pathname, len);
de->de_pathlen = len;
return de;

> +static struct directory_entry *get_directory(char *dir, unsigned int dir_len,
> +                                            struct hash_table *table,
> +                                            unsigned int *total_dir_len,
> +                                            unsigned int *ndir,
> +                                            struct directory_entry **current)
> +{
> +       struct directory_entry *tmp = NULL, *search, *new, *ret;
> +       uint32_t crc;
> +
> +       search = find_directory(dir, dir_len, &crc, table);
> +       if (search)
> +               return search;
> +       while (!search) {
> +               new = init_directory_entry(dir, dir_len);
> +               insert_directory_entry(new, table, total_dir_len, ndir, crc);
> +               if (!tmp)
> +                       ret = new;
> +               else
> +                       new->de_nsubtrees = 1;
> +               new->next = tmp;
> +               tmp = new;
> +               dir = super_directory(dir);

It feels more natural to create directory_entry(s) from parent to
subdir. If you do so you could reset dir to the remaining of directory
and perform strchr() and do not need to allocate new memory everytime
you call super_directory (because it relies on NUL at the end of the
string).

> +               dir_len = dir ? strlen(dir) : 0;
> +               search = find_directory(dir, dir_len, &crc, table);
> +       }
> +       search->de_nsubtrees++;
> +       (*current)->next = tmp;
> +       while ((*current)->next)
> +               *current = (*current)->next;
> +
> +       return ret;
> +}

> +static struct directory_entry *compile_directory_data(struct index_state *istate,
> +                                                     int nfile,
> +                                                     unsigned int *ndir,
> +                                                     unsigned int *total_dir_len,
> +                                                     unsigned int *total_file_len)
> +{
> +       int i, dir_len = -1;
> +       char *dir;
> +       struct directory_entry *de, *current, *search;
> +       struct cache_entry **cache = istate->cache;
> +       struct conflict_entry *conflict_entry;
> +       struct hash_table table;
> +       uint32_t crc;
> +
> +       init_hash(&table);
> +       de = init_directory_entry("", 0);
> +       current = de;
> +       *ndir = 1;
> +       *total_dir_len = 1;
> +       crc = crc32(0, (Bytef*)de->pathname, de->de_pathlen);
> +       insert_hash(crc, de, &table);
> +       conflict_entry = NULL;
> +       for (i = 0; i < nfile; i++) {
> +               if (cache[i]->ce_flags & CE_REMOVE)
> +                       continue;
> +
> +               if (dir_len < 0
> +                   || cache[i]->name[dir_len] != '/'

Need a check to make sure name[dir_len] is not out of bound

> +                   || strchr(cache[i]->name + dir_len + 1, '/')
> +                   || cache_name_compare(cache[i]->name, ce_namelen(cache[i]),
> +                                         dir, dir_len)) {

In my opinon, "if (dir_len < 0 || !(must && be && a && subdirectory))"
is easier to read..

> +                       dir = super_directory(cache[i]->name);
> +                       dir_len = dir ? strlen(dir) : 0;
> +                       search = get_directory(dir, dir_len, &table,
> +                                              total_dir_len, ndir,
> +                                              &current);
> +               }
> +               search->de_nfiles++;
> +               *total_file_len += ce_namelen(cache[i]) + 1;
> +               if (search->de_pathlen)
> +                       *total_file_len -= search->de_pathlen + 1;
> +               ce_queue_push(&(search->ce), &(search->ce_last), cache[i]);
> +
> +               if (ce_stage(cache[i]) > 0) {
> +                       struct conflict_part *conflict_part;
> +                       if (!conflict_entry ||
> +                           cache_name_compare(conflict_entry->name, conflict_entry->namelen,
> +                                              cache[i]->name, ce_namelen(cache[i]))) {
> +                               conflict_entry = create_conflict_entry_from_ce(cache[i], search->de_pathlen);
> +                               add_conflict_to_directory_entry(search, conflict_entry);
> +                       }
> +                       conflict_part = conflict_part_from_inmemory(cache[i]);
> +                       add_part_to_conflict_entry(search, conflict_entry, conflict_part);
> +               }
> +       }
> +       return de;
> +}
> +

> +static int write_directories(struct directory_entry *de, int fd, int conflict_offset)
> +{
> +       struct directory_entry *current;
> +       struct ondisk_directory_entry ondisk;
> +       int current_offset, offset_write, ondisk_size, foffset;
> +       uint32_t crc;
> +
> +       /*
> +        * This is needed because the compiler aligns structs to sizes multiple
> +        * of 4
> +        */
> +       ondisk_size = sizeof(ondisk.flags)
> +               + sizeof(ondisk.foffset)
> +               + sizeof(ondisk.cr)
> +               + sizeof(ondisk.ncr)
> +               + sizeof(ondisk.nsubtrees)
> +               + sizeof(ondisk.nfiles)
> +               + sizeof(ondisk.nentries)
> +               + sizeof(ondisk.sha1);
> +       current = de;
> +       current_offset = 0;
> +       foffset = 0;
> +       while (current) {
> +               int pathlen;
> +
> +               offset_write = htonl(current_offset);
> +               if (ce_write(NULL, fd, &offset_write, 4) < 0)
> +                       return -1;
> +               if (current->de_pathlen == 0)
> +                       pathlen = 0;
> +               else
> +                       pathlen = current->de_pathlen + 1;
> +               current_offset += pathlen + 1 + ondisk_size + 4;
> +               current = current->next;
> +       }
> +       /*
> +        * Write one more offset, which points to the end of the entries,
> +        * because we use it for calculating the dir length, instead of
> +        * using strlen.
> +        */
> +       offset_write = htonl(current_offset);
> +       if (ce_write(NULL, fd, &offset_write, 4) < 0)
> +               return -1;
> +       current = de;
> +       while (current) {
> +               crc = 0;
> +               if (current->de_pathlen == 0) {
> +                       if (ce_write(&crc, fd, current->pathname, 1) < 0)
> +                               return -1;
> +               } else {
> +                       char *path;
> +                       path = xmalloc(sizeof(char) * (current->de_pathlen + 2));
> +                       memcpy(path, current->pathname, current->de_pathlen);
> +                       memcpy(path + current->de_pathlen, "/\0", 2);
> +                       if (ce_write(&crc, fd, path, current->de_pathlen + 2) < 0)
> +                               return -1;

xmalloc without free

> +               }
> +               current->de_foffset = foffset;
> +               current->de_cr = conflict_offset;
> +               ondisk_from_directory_entry(current, &ondisk);
> +               if (ce_write(&crc, fd, &ondisk, ondisk_size) < 0)
> +                       return -1;
> +               crc = htonl(crc);
> +               if (ce_write(NULL, fd, &crc, 4) < 0)
> +                       return -1;
> +               conflict_offset += current->conflict_size;
> +               foffset += current->de_nfiles * 4;
> +               current = current->next;
> +       }
> +       return 0;
> +}
> +
> +static int write_entries(struct index_state *istate,
> +                           struct directory_entry *de,
> +                           int entries,
> +                           int fd,
> +                           int offset_to_offset)
> +{
> +       int offset, offset_write, ondisk_size;
> +       struct directory_entry *current;
> +
> +       offset = 0;
> +       ondisk_size = sizeof(struct ondisk_cache_entry);
> +       current = de;
> +       while (current) {

A short comment a the beginning of this block saying this writes
fileoffsets table would be nice.

> +               int pathlen;
> +               struct cache_entry *ce = current->ce;
> +
> +               if (current->de_pathlen == 0)
> +                       pathlen = 0;
> +               else
> +                       pathlen = current->de_pathlen + 1;
> +               while (ce) {
> +                       if (ce->ce_flags & CE_REMOVE)
> +                               continue;

How come CE_REMOVE'd entries get here? I thought they were all ignored
at compile_directory_data()

> +                       if (!ce_uptodate(ce) && is_racy_timestamp(istate, ce))
> +                               ce_smudge_racily_clean_entry(ce);
> +                       if (is_null_sha1(ce->sha1))
> +                               return error("cache entry has null sha1: %s", ce->name);
> +
> +                       offset_write = htonl(offset);
> +                       if (ce_write(NULL, fd, &offset_write, 4) < 0)
> +                               return -1;
> +                       offset += ce_namelen(ce) - pathlen + 1 + ondisk_size + 4;
> +                       ce = ce->next_ce;
> +               }
> +               current = current->next;
> +       }
> +       /*
> +        * Write one more offset, which points to the end of the entries,
> +        * because we use it for calculating the file length, instead of
> +        * using strlen.
> +        */
> +       offset_write = htonl(offset);
> +       if (ce_write(NULL, fd, &offset_write, 4) < 0)
> +               return -1;
> +
> +       offset = offset_to_offset;
> +       current = de;
> +       while (current) {
> +               int pathlen;
> +               struct cache_entry *ce = current->ce;
> +
> +               if (current->de_pathlen == 0)
> +                       pathlen = 0;
> +               else
> +                       pathlen = current->de_pathlen + 1;
> +               while (ce) {
> +                       struct ondisk_cache_entry ondisk;
> +                       uint32_t crc, calc_crc;
> +
> +                       if (ce->ce_flags & CE_REMOVE)
> +                               continue;
> +                       calc_crc = htonl(offset);
> +                       crc = crc32(0, (Bytef*)&calc_crc, 4);
> +                       if (ce_write(&crc, fd, ce->name + pathlen,
> +                                       ce_namelen(ce) - pathlen + 1) < 0)
> +                               return -1;
> +                       ondisk_from_cache_entry(ce, &ondisk);
> +                       if (ce_write(&crc, fd, &ondisk, ondisk_size) < 0)
> +                               return -1;
> +                       crc = htonl(crc);
> +                       if (ce_write(NULL, fd, &crc, 4) < 0)
> +                               return -1;
> +                       offset += 4;
> +                       ce = ce->next_ce;
> +               }
> +               current = current->next;
> +       }
> +       return 0;
> +}

> +static int write_index_v5(struct index_state *istate, int newfd)
> +{
> +       struct cache_header hdr;
> +       struct cache_header_v5 hdr_v5;
> +       struct cache_entry **cache = istate->cache;
> +       struct directory_entry *de;
> +       struct ondisk_directory_entry *ondisk;
> +       unsigned int entries = istate->cache_nr;
> +       unsigned int i, removed, total_dir_len, ondisk_directory_size;
> +       unsigned int total_file_len, conflict_offset, foffsetblock;
> +       unsigned int ndir;
> +       uint32_t crc;
> +
> +       if (istate->filter_opts)
> +               die("BUG: index: cannot write a partially read index");
> +
> +       for (i = removed = 0; i < entries; i++) {
> +               if (cache[i]->ce_flags & CE_REMOVE)
> +                       removed++;
> +       }
> +       hdr.hdr_signature = htonl(CACHE_SIGNATURE);
> +       hdr.hdr_version = htonl(istate->version);
> +       hdr.hdr_entries = htonl(entries - removed);
> +       hdr_v5.hdr_nextension = htonl(0); /* Currently no extensions are supported */
> +
> +       total_dir_len = 0;
> +       total_file_len = 0;
> +       de = compile_directory_data(istate, entries, &ndir,
> +                                   &total_dir_len, &total_file_len);
> +       hdr_v5.hdr_ndir = htonl(ndir);
> +
> +       /*
> +        * This is needed because the compiler aligns structs to sizes multipe
> +        * of 4
> +        */
> +       ondisk_directory_size = sizeof(ondisk->flags)
> +               + sizeof(ondisk->foffset)
> +               + sizeof(ondisk->cr)
> +               + sizeof(ondisk->ncr)
> +               + sizeof(ondisk->nsubtrees)
> +               + sizeof(ondisk->nfiles)
> +               + sizeof(ondisk->nentries)
> +               + sizeof(ondisk->sha1);

There is a similar statement in read code. This calls for a macro to
share this sum.

> +       foffsetblock = sizeof(hdr) + sizeof(hdr_v5) + 4
> +               + (ndir + 1) * 4
> +               + total_dir_len
> +               + ndir * (ondisk_directory_size + 4);
> +       hdr_v5.hdr_fblockoffset = htonl(foffsetblock + (entries - removed + 1) * 4);
> +       crc = 0;
> +       if (ce_write(&crc, newfd, &hdr, sizeof(hdr)) < 0)
> +               return -1;
> +       if (ce_write(&crc, newfd, &hdr_v5, sizeof(hdr_v5)) < 0)
> +               return -1;
> +       crc = htonl(crc);
> +       if (ce_write(NULL, newfd, &crc, 4) < 0)
> +               return -1;
> +
> +       conflict_offset = foffsetblock +
> +               + (entries - removed + 1) * 4
> +               + total_file_len
> +               + (entries - removed) * (sizeof(struct ondisk_cache_entry) + 4);
> +       if (write_directories(de, newfd, conflict_offset) < 0)
> +               return -1;
> +       if (write_entries(istate, de, entries, newfd, foffsetblock) < 0)
> +               return -1;
> +       if (write_conflicts(istate, de, newfd) < 0)
> +               return -1;
> +       return ce_flush(newfd);
> +}
-- 
Duy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 18/24] read-cache: write index-v5
  2013-08-18 19:42 ` [PATCH v3 18/24] read-cache: write index-v5 Thomas Gummerer
  2013-08-24  3:58   ` Duy Nguyen
@ 2013-08-24  4:07   ` Duy Nguyen
  2013-08-24  9:56     ` Duy Nguyen
  1 sibling, 1 reply; 55+ messages in thread
From: Duy Nguyen @ 2013-08-24  4:07 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git Mailing List, Thomas Rast, Michael Haggerty, Junio C Hamano,
	Robin Rosenberg, Eric Sunshine, Ramsay Jones

On Mon, Aug 19, 2013 at 2:42 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> Write the index version 5 file format to disk. This version doesn't
> write the cache-tree data and resolve-undo data to the file.

I keep having things to add after sending my emails. Now that we have
all conflicted entries in file block, the conflict data block becomes
optional, it functions exactly (I think) like resolve-undo extension,
which makes me think it might make sense to make conflict data block
an extension.

If we make it so, we might want to move "cr" and "ncr" fields out of
direntries. I don't see a solution yet, but I think it's interesting
because future extensions might want to attach stuff to direntries,
just like "cr"/"ncr" from conflict extension. We may want to think now
how that might be done (and conflict extension is a good exercise to
see how it works out)
-- 
Duy
I

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 00/24] Index-v5
  2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
                   ` (23 preceding siblings ...)
  2013-08-18 19:42 ` [PATCH v3 24/24] test-lib: allow setting the index format version Thomas Gummerer
@ 2013-08-24  4:16 ` Duy Nguyen
  2013-08-25  3:07   ` Junio C Hamano
  24 siblings, 1 reply; 55+ messages in thread
From: Duy Nguyen @ 2013-08-24  4:16 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git Mailing List, Thomas Rast, Michael Haggerty, Junio C Hamano,
	Robin Rosenberg, Eric Sunshine, Ramsay Jones

On Mon, Aug 19, 2013 at 2:41 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
> Hi,
>
> previous rounds (without api) are at $gmane/202752, $gmane/202923,
> $gmane/203088 and $gmane/203517, the previous rounds with api were at
> $gmane/229732 and $gmane/230210.  Thanks to Duy for reviewing the the
> last round and Junio and Ramsay for additional comments.

I'm done reviewing this version (I neglected the extension writing
patches because after spending hours on the main write patch I don't
want to look at them anymore :p). Now that rc period is over, with a
partial write proof-of-concept, I think it's enough to call Junio's
attention on the series, see if we have any chance of merging it. The
partial write POC is needed to make sure we don't overlook anything,
just support update-index is enough.
-- 
Duy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 18/24] read-cache: write index-v5
  2013-08-24  4:07   ` Duy Nguyen
@ 2013-08-24  9:56     ` Duy Nguyen
  0 siblings, 0 replies; 55+ messages in thread
From: Duy Nguyen @ 2013-08-24  9:56 UTC (permalink / raw)
  To: Thomas Gummerer
  Cc: Git Mailing List, Thomas Rast, Michael Haggerty, Junio C Hamano,
	Robin Rosenberg, Eric Sunshine, Ramsay Jones

On Sat, Aug 24, 2013 at 11:07 AM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Mon, Aug 19, 2013 at 2:42 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> Write the index version 5 file format to disk. This version doesn't
>> write the cache-tree data and resolve-undo data to the file.
>
> I keep having things to add after sending my emails. Now that we have
> all conflicted entries in file block, the conflict data block becomes
> optional, it functions exactly (I think) like resolve-undo extension,
> which makes me think it might make sense to make conflict data block
> an extension.
>
> If we make it so, we might want to move "cr" and "ncr" fields out of
> direntries. I don't see a solution yet, but I think it's interesting
> because future extensions might want to attach stuff to direntries,
> just like "cr"/"ncr" from conflict extension. We may want to think now
> how that might be done (and conflict extension is a good exercise to
> see how it works out)

And the solution is pretty obvious: keep resolve-undo extension as
_extension_ just like in v2 (and no "cr/ncr" in direntries). The only
difference is the time resolve-undo extension is updated: in v2, new
entries are added at remove_index_entry_at(), in v5 new entries are
added when new stage entries  are detected at write_index().
-- 
Duy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 00/24] Index-v5
  2013-08-24  4:16 ` [PATCH v3 00/24] Index-v5 Duy Nguyen
@ 2013-08-25  3:07   ` Junio C Hamano
  2013-08-25  4:40     ` Duy Nguyen
  2013-08-31  5:23     ` Thomas Gummerer
  0 siblings, 2 replies; 55+ messages in thread
From: Junio C Hamano @ 2013-08-25  3:07 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Thomas Gummerer, Git Mailing List, Thomas Rast, Michael Haggerty,
	Robin Rosenberg, Eric Sunshine, Ramsay Jones

Duy Nguyen <pclouds@gmail.com> writes:

> On Mon, Aug 19, 2013 at 2:41 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> I'm done reviewing this version (I neglected the extension writing
> patches because after spending hours on the main write patch I don't
> want to look at them anymore :p). Now that rc period is over, with a
> partial write proof-of-concept, I think it's enough to call Junio's
> attention on the series, see if we have any chance of merging it. The
> partial write POC is needed to make sure we don't overlook anything,
> just support update-index is enough.

I've been following the review comment threads after looking at the
patches myself when they were posted. I was hoping to see some API
improvement over the current "we (have to) have everything available
in-core in a flat array" model, which gives a lot of convenience and
IO overhead at the same time, that would make me say "yes, this
operation, that we need to do very often, will certainly be helped
by this new API, and in order to support that style of API better,
the current file format is inadequate and we do need to go to the
proposed tree like on-disk format" for at least one, but
unfortunately I haven't found any (yet).

So...

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 00/24] Index-v5
  2013-08-25  3:07   ` Junio C Hamano
@ 2013-08-25  4:40     ` Duy Nguyen
  2013-08-31  5:23     ` Thomas Gummerer
  1 sibling, 0 replies; 55+ messages in thread
From: Duy Nguyen @ 2013-08-25  4:40 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Thomas Gummerer, Git Mailing List, Thomas Rast, Michael Haggerty,
	Robin Rosenberg, Eric Sunshine, Ramsay Jones

On Sun, Aug 25, 2013 at 10:07 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Duy Nguyen <pclouds@gmail.com> writes:
>
>> On Mon, Aug 19, 2013 at 2:41 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>>
>> I'm done reviewing this version (I neglected the extension writing
>> patches because after spending hours on the main write patch I don't
>> want to look at them anymore :p). Now that rc period is over, with a
>> partial write proof-of-concept, I think it's enough to call Junio's
>> attention on the series, see if we have any chance of merging it. The
>> partial write POC is needed to make sure we don't overlook anything,
>> just support update-index is enough.
>
> I've been following the review comment threads after looking at the
> patches myself when they were posted. I was hoping to see some API
> improvement over the current "we (have to) have everything available
> in-core in a flat array" model, which gives a lot of convenience and
> IO overhead at the same time, that would make me say "yes, this
> operation, that we need to do very often, will certainly be helped
> by this new API, and in order to support that style of API better,
> the current file format is inadequate and we do need to go to the
> proposed tree like on-disk format" for at least one, but
> unfortunately I haven't found any (yet).

Thomas is in the best position to answer this, but I'll give it a try.
In my opinon, v2-4 works well for moderate-sized worktrees, v5 aims to
make the index scale better. One way to make it scale is not to read
the whole index up when you only need a portion of the index.
read_index_filtered() enables this. We could implement
read_index_filtered() on v2 too, but because v2 lacks proper data
structure to support it, we need to scan through all on-disk entries.
"git diff" and "git status" with pathspec may benefit from this (and
for large worktrees, people better use pathspec than whole-tree
"status"). The flat (but not full) array model seems best fit because
we still need to support v2. Another v5 improvement is fast "git add
-u/git commit -a" when partial write is implemented. I don't think
such a patch is posted. There may be API addition to aid v5 code but
it should not be big API change.
-- 
Duy

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 00/24] Index-v5
  2013-08-25  3:07   ` Junio C Hamano
  2013-08-25  4:40     ` Duy Nguyen
@ 2013-08-31  5:23     ` Thomas Gummerer
  1 sibling, 0 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-08-31  5:23 UTC (permalink / raw)
  To: Junio C Hamano, Duy Nguyen
  Cc: Git Mailing List, Thomas Rast, Michael Haggerty, Robin Rosenberg,
	Eric Sunshine, Ramsay Jones

Junio C Hamano <gitster@pobox.com> writes:

> Duy Nguyen <pclouds@gmail.com> writes:
>
>> On Mon, Aug 19, 2013 at 2:41 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>>
>> I'm done reviewing this version (I neglected the extension writing
>> patches because after spending hours on the main write patch I don't
>> want to look at them anymore :p). Now that rc period is over, with a
>> partial write proof-of-concept, I think it's enough to call Junio's
>> attention on the series, see if we have any chance of merging it. The
>> partial write POC is needed to make sure we don't overlook anything,
>> just support update-index is enough.
>
> I've been following the review comment threads after looking at the
> patches myself when they were posted. I was hoping to see some API
> improvement over the current "we (have to) have everything available
> in-core in a flat array" model, which gives a lot of convenience and
> IO overhead at the same time, that would make me say "yes, this
> operation, that we need to do very often, will certainly be helped
> by this new API, and in order to support that style of API better,
> the current file format is inadequate and we do need to go to the
> proposed tree like on-disk format" for at least one, but
> unfortunately I haven't found any (yet).
>
> So...

I think the issue is a bit different.  The current API, with some small
additions (e.g. read_index_filtered()) works well as in-memory format,
even for partial reading/writing.  I will try to write a POC for partial
writing to show that the current in-memory format works for this too.
As Duy wrote in the other email, some API changes will be necessary to
allow that, but not a big API change moving from a flat array to a tree
based format.

I think it comes down to "this operation will be helped by partial
loading/writing and we need this small API changes
(read_index_filtered() for now, more to follow) and the index format
change to be able to do that".

Does that make sense, with at least Duy's comments in the review
addressed and a POC for partial writing?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 18/24] read-cache: write index-v5
  2013-08-24  3:58   ` Duy Nguyen
@ 2013-11-25 15:37     ` Thomas Gummerer
  0 siblings, 0 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-11-25 15:37 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Thomas Rast, Michael Haggerty, Junio C Hamano,
	Robin Rosenberg, Eric Sunshine, Ramsay Jones


[sorry about taking so much time to get back to you, was too busy with
other stuff to work on git]

Duy Nguyen <pclouds@gmail.com> writes:

> On Mon, Aug 19, 2013 at 2:42 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> +char *super_directory(const char *filename)
>> +{
>> +       char *slash;
>> +
>> +       slash = strrchr(filename, '/');
>> +       if (slash)
>> +               return xmemdupz(filename, slash-filename);
>> +       return NULL;
>> +}
>> +
>
> Why is this function not static? There are a few other in this patch
> (I did not notice in others, but I wasn't looking for them..)

That's just an oversight, it should be static.  Will look for the others
and change them to static too.

> But isn't this what dirname() is?
>
> Another point about this function is it returns a new allocated
> string, but I see no free() anywhere in this patch. Leak alert!

Yes, you're right, I'll use dirname instead.  That also solves the free
problem, as dirname modifies the string.  I'll still use a wrapper
around it, because dirname returns "." when there is no super_directory,
but using NULL for that makes it simpler to check it.

>> +struct directory_entry *init_directory_entry(char *pathname, int len)
>> +{
>> +       struct directory_entry *de = xmalloc(directory_entry_size(len));
>> +
>> +       memcpy(de->pathname, pathname, len);
>> +       de->pathname[len] = '\0';
>> +       de->de_flags      = 0;
>> +       de->de_foffset    = 0;
>> +       de->de_cr         = 0;
>> +       de->de_ncr        = 0;
>> +       de->de_nsubtrees  = 0;
>> +       de->de_nfiles     = 0;
>> +       de->de_nentries   = 0;
>> +       memset(de->sha1, 0, 20);
>> +       de->de_pathlen    = len;
>> +       de->next          = NULL;
>> +       de->next_hash     = NULL;
>> +       de->ce            = NULL;
>> +       de->ce_last       = NULL;
>> +       de->conflict      = NULL;
>> +       de->conflict_last = NULL;
>> +       de->conflict_size = 0;
>> +       return de;
>> +}
>
> I think this function could be shortened to
>
> struct directory_entry *de = xcalloc(1, directory_entry_size(len));
> memcpy(de->pathname, pathname, len);
> de->de_pathlen = len;
> return de;

Makes sense, thanks.

>> +static struct directory_entry *get_directory(char *dir, unsigned int dir_len,
>> +                                            struct hash_table *table,
>> +                                            unsigned int *total_dir_len,
>> +                                            unsigned int *ndir,
>> +                                            struct directory_entry **current)
>> +{
>> +       struct directory_entry *tmp = NULL, *search, *new, *ret;
>> +       uint32_t crc;
>> +
>> +       search = find_directory(dir, dir_len, &crc, table);
>> +       if (search)
>> +               return search;
>> +       while (!search) {
>> +               new = init_directory_entry(dir, dir_len);
>> +               insert_directory_entry(new, table, total_dir_len, ndir, crc);
>> +               if (!tmp)
>> +                       ret = new;
>> +               else
>> +                       new->de_nsubtrees = 1;
>> +               new->next = tmp;
>> +               tmp = new;
>> +               dir = super_directory(dir);
>
> It feels more natural to create directory_entry(s) from parent to
> subdir. If you do so you could reset dir to the remaining of directory
> and perform strchr() and do not need to allocate new memory everytime
> you call super_directory (because it relies on NUL at the end of the
> string).

I'm creating them from subdir to parent because it saves a few calls
whenever a superdir is already in the hash_table.  And I'm using dirname
so the new memory allocations are not a problem anymore.

>> +               dir_len = dir ? strlen(dir) : 0;
>> +               search = find_directory(dir, dir_len, &crc, table);
>> +       }
>> +       search->de_nsubtrees++;
>> +       (*current)->next = tmp;
>> +       while ((*current)->next)
>> +               *current = (*current)->next;
>> +
>> +       return ret;
>> +}
>
>> +static struct directory_entry *compile_directory_data(struct index_state *istate,
>> +                                                     int nfile,
>> +                                                     unsigned int *ndir,
>> +                                                     unsigned int *total_dir_len,
>> +                                                     unsigned int *total_file_len)
>> +{
>> +       int i, dir_len = -1;
>> +       char *dir;
>> +       struct directory_entry *de, *current, *search;
>> +       struct cache_entry **cache = istate->cache;
>> +       struct conflict_entry *conflict_entry;
>> +       struct hash_table table;
>> +       uint32_t crc;
>> +
>> +       init_hash(&table);
>> +       de = init_directory_entry("", 0);
>> +       current = de;
>> +       *ndir = 1;
>> +       *total_dir_len = 1;
>> +       crc = crc32(0, (Bytef*)de->pathname, de->de_pathlen);
>> +       insert_hash(crc, de, &table);
>> +       conflict_entry = NULL;
>> +       for (i = 0; i < nfile; i++) {
>> +               if (cache[i]->ce_flags & CE_REMOVE)
>> +                       continue;
>> +
>> +               if (dir_len < 0
>> +                   || cache[i]->name[dir_len] != '/'
>
> Need a check to make sure name[dir_len] is not out of bound

Thanks.

>> +                   || strchr(cache[i]->name + dir_len + 1, '/')
>> +                   || cache_name_compare(cache[i]->name, ce_namelen(cache[i]),
>> +                                         dir, dir_len)) {
>
> In my opinon, "if (dir_len < 0 || !(must && be && a && subdirectory))"
> is easier to read..

Makes sense, will change.

>> +                       dir = super_directory(cache[i]->name);
>> +                       dir_len = dir ? strlen(dir) : 0;
>> +                       search = get_directory(dir, dir_len, &table,
>> +                                              total_dir_len, ndir,
>> +                                              &current);
>> +               }
>> +               search->de_nfiles++;
>> +               *total_file_len += ce_namelen(cache[i]) + 1;
>> +               if (search->de_pathlen)
>> +                       *total_file_len -= search->de_pathlen + 1;
>> +               ce_queue_push(&(search->ce), &(search->ce_last), cache[i]);
>> +
>> +               if (ce_stage(cache[i]) > 0) {
>> +                       struct conflict_part *conflict_part;
>> +                       if (!conflict_entry ||
>> +                           cache_name_compare(conflict_entry->name, conflict_entry->namelen,
>> +                                              cache[i]->name, ce_namelen(cache[i]))) {
>> +                               conflict_entry = create_conflict_entry_from_ce(cache[i], search->de_pathlen);
>> +                               add_conflict_to_directory_entry(search, conflict_entry);
>> +                       }
>> +                       conflict_part = conflict_part_from_inmemory(cache[i]);
>> +                       add_part_to_conflict_entry(search, conflict_entry, conflict_part);
>> +               }
>> +       }
>> +       return de;
>> +}
>> +
>
>> +static int write_directories(struct directory_entry *de, int fd, int conflict_offset)
>> +{
>> +       struct directory_entry *current;
>> +       struct ondisk_directory_entry ondisk;
>> +       int current_offset, offset_write, ondisk_size, foffset;
>> +       uint32_t crc;
>> +
>> +       /*
>> +        * This is needed because the compiler aligns structs to sizes multiple
>> +        * of 4
>> +        */
>> +       ondisk_size = sizeof(ondisk.flags)
>> +               + sizeof(ondisk.foffset)
>> +               + sizeof(ondisk.cr)
>> +               + sizeof(ondisk.ncr)
>> +               + sizeof(ondisk.nsubtrees)
>> +               + sizeof(ondisk.nfiles)
>> +               + sizeof(ondisk.nentries)
>> +               + sizeof(ondisk.sha1);
>> +       current = de;
>> +       current_offset = 0;
>> +       foffset = 0;
>> +       while (current) {
>> +               int pathlen;
>> +
>> +               offset_write = htonl(current_offset);
>> +               if (ce_write(NULL, fd, &offset_write, 4) < 0)
>> +                       return -1;
>> +               if (current->de_pathlen == 0)
>> +                       pathlen = 0;
>> +               else
>> +                       pathlen = current->de_pathlen + 1;
>> +               current_offset += pathlen + 1 + ondisk_size + 4;
>> +               current = current->next;
>> +       }
>> +       /*
>> +        * Write one more offset, which points to the end of the entries,
>> +        * because we use it for calculating the dir length, instead of
>> +        * using strlen.
>> +        */
>> +       offset_write = htonl(current_offset);
>> +       if (ce_write(NULL, fd, &offset_write, 4) < 0)
>> +               return -1;
>> +       current = de;
>> +       while (current) {
>> +               crc = 0;
>> +               if (current->de_pathlen == 0) {
>> +                       if (ce_write(&crc, fd, current->pathname, 1) < 0)
>> +                               return -1;
>> +               } else {
>> +                       char *path;
>> +                       path = xmalloc(sizeof(char) * (current->de_pathlen + 2));
>> +                       memcpy(path, current->pathname, current->de_pathlen);
>> +                       memcpy(path + current->de_pathlen, "/\0", 2);
>> +                       if (ce_write(&crc, fd, path, current->de_pathlen + 2) < 0)
>> +                               return -1;
>
> xmalloc without free

In the new version the pathname is included at the end of the ondisk
struct, for which I added a free.

>> +               }
>> +               current->de_foffset = foffset;
>> +               current->de_cr = conflict_offset;
>> +               ondisk_from_directory_entry(current, &ondisk);
>> +               if (ce_write(&crc, fd, &ondisk, ondisk_size) < 0)
>> +                       return -1;
>> +               crc = htonl(crc);
>> +               if (ce_write(NULL, fd, &crc, 4) < 0)
>> +                       return -1;
>> +               conflict_offset += current->conflict_size;
>> +               foffset += current->de_nfiles * 4;
>> +               current = current->next;
>> +       }
>> +       return 0;
>> +}
>> +
>> +static int write_entries(struct index_state *istate,
>> +                           struct directory_entry *de,
>> +                           int entries,
>> +                           int fd,
>> +                           int offset_to_offset)
>> +{
>> +       int offset, offset_write, ondisk_size;
>> +       struct directory_entry *current;
>> +
>> +       offset = 0;
>> +       ondisk_size = sizeof(struct ondisk_cache_entry);
>> +       current = de;
>> +       while (current) {
>
> A short comment a the beginning of this block saying this writes
> fileoffsets table would be nice.

Done, thanks.

>> +               int pathlen;
>> +               struct cache_entry *ce = current->ce;
>> +
>> +               if (current->de_pathlen == 0)
>> +                       pathlen = 0;
>> +               else
>> +                       pathlen = current->de_pathlen + 1;
>> +               while (ce) {
>> +                       if (ce->ce_flags & CE_REMOVE)
>> +                               continue;
>
> How come CE_REMOVE'd entries get here? I thought they were all ignored
> at compile_directory_data()

They don't, I've added this line unnecessarily.  Will remove it.

>> +                       if (!ce_uptodate(ce) && is_racy_timestamp(istate, ce))
>> +                               ce_smudge_racily_clean_entry(ce);
>> +                       if (is_null_sha1(ce->sha1))
>> +                               return error("cache entry has null sha1: %s", ce->name);
>> +
>> +                       offset_write = htonl(offset);
>> +                       if (ce_write(NULL, fd, &offset_write, 4) < 0)
>> +                               return -1;
>> +                       offset += ce_namelen(ce) - pathlen + 1 + ondisk_size + 4;
>> +                       ce = ce->next_ce;
>> +               }
>> +               current = current->next;
>> +       }
>> +       /*
>> +        * Write one more offset, which points to the end of the entries,
>> +        * because we use it for calculating the file length, instead of
>> +        * using strlen.
>> +        */
>> +       offset_write = htonl(offset);
>> +       if (ce_write(NULL, fd, &offset_write, 4) < 0)
>> +               return -1;
>> +
>> +       offset = offset_to_offset;
>> +       current = de;
>> +       while (current) {
>> +               int pathlen;
>> +               struct cache_entry *ce = current->ce;
>> +
>> +               if (current->de_pathlen == 0)
>> +                       pathlen = 0;
>> +               else
>> +                       pathlen = current->de_pathlen + 1;
>> +               while (ce) {
>> +                       struct ondisk_cache_entry ondisk;
>> +                       uint32_t crc, calc_crc;
>> +
>> +                       if (ce->ce_flags & CE_REMOVE)
>> +                               continue;
>> +                       calc_crc = htonl(offset);
>> +                       crc = crc32(0, (Bytef*)&calc_crc, 4);
>> +                       if (ce_write(&crc, fd, ce->name + pathlen,
>> +                                       ce_namelen(ce) - pathlen + 1) < 0)
>> +                               return -1;
>> +                       ondisk_from_cache_entry(ce, &ondisk);
>> +                       if (ce_write(&crc, fd, &ondisk, ondisk_size) < 0)
>> +                               return -1;
>> +                       crc = htonl(crc);
>> +                       if (ce_write(NULL, fd, &crc, 4) < 0)
>> +                               return -1;
>> +                       offset += 4;
>> +                       ce = ce->next_ce;
>> +               }
>> +               current = current->next;
>> +       }
>> +       return 0;
>> +}
>
>> +static int write_index_v5(struct index_state *istate, int newfd)
>> +{
>> +       struct cache_header hdr;
>> +       struct cache_header_v5 hdr_v5;
>> +       struct cache_entry **cache = istate->cache;
>> +       struct directory_entry *de;
>> +       struct ondisk_directory_entry *ondisk;
>> +       unsigned int entries = istate->cache_nr;
>> +       unsigned int i, removed, total_dir_len, ondisk_directory_size;
>> +       unsigned int total_file_len, conflict_offset, foffsetblock;
>> +       unsigned int ndir;
>> +       uint32_t crc;
>> +
>> +       if (istate->filter_opts)
>> +               die("BUG: index: cannot write a partially read index");
>> +
>> +       for (i = removed = 0; i < entries; i++) {
>> +               if (cache[i]->ce_flags & CE_REMOVE)
>> +                       removed++;
>> +       }
>> +       hdr.hdr_signature = htonl(CACHE_SIGNATURE);
>> +       hdr.hdr_version = htonl(istate->version);
>> +       hdr.hdr_entries = htonl(entries - removed);
>> +       hdr_v5.hdr_nextension = htonl(0); /* Currently no extensions are supported */
>> +
>> +       total_dir_len = 0;
>> +       total_file_len = 0;
>> +       de = compile_directory_data(istate, entries, &ndir,
>> +                                   &total_dir_len, &total_file_len);
>> +       hdr_v5.hdr_ndir = htonl(ndir);
>> +
>> +       /*
>> +        * This is needed because the compiler aligns structs to sizes multipe
>> +        * of 4
>> +        */
>> +       ondisk_directory_size = sizeof(ondisk->flags)
>> +               + sizeof(ondisk->foffset)
>> +               + sizeof(ondisk->cr)
>> +               + sizeof(ondisk->ncr)
>> +               + sizeof(ondisk->nsubtrees)
>> +               + sizeof(ondisk->nfiles)
>> +               + sizeof(ondisk->nentries)
>> +               + sizeof(ondisk->sha1);
>
> There is a similar statement in read code. This calls for a macro to
> share this sum.

This is no longer needed, as I switched the flag size to 32 bits, to
enable adding the pathname at the end of the struct.

>> +       foffsetblock = sizeof(hdr) + sizeof(hdr_v5) + 4
>> +               + (ndir + 1) * 4
>> +               + total_dir_len
>> +               + ndir * (ondisk_directory_size + 4);
>> +       hdr_v5.hdr_fblockoffset = htonl(foffsetblock + (entries - removed + 1) * 4);
>> +       crc = 0;
>> +       if (ce_write(&crc, newfd, &hdr, sizeof(hdr)) < 0)
>> +               return -1;
>> +       if (ce_write(&crc, newfd, &hdr_v5, sizeof(hdr_v5)) < 0)
>> +               return -1;
>> +       crc = htonl(crc);
>> +       if (ce_write(NULL, newfd, &crc, 4) < 0)
>> +               return -1;
>> +
>> +       conflict_offset = foffsetblock +
>> +               + (entries - removed + 1) * 4
>> +               + total_file_len
>> +               + (entries - removed) * (sizeof(struct ondisk_cache_entry) + 4);
>> +       if (write_directories(de, newfd, conflict_offset) < 0)
>> +               return -1;
>> +       if (write_entries(istate, de, entries, newfd, foffsetblock) < 0)
>> +               return -1;
>> +       if (write_conflicts(istate, de, newfd) < 0)
>> +               return -1;
>> +       return ce_flush(newfd);
>> +}

In the re-roll the conflicted and the resolve-undo entries will also be
written in an extension as you suggested in the other email.  I'll
re-roll the series with a POC for partial writing tomorrow if all goes
well.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v3 17/24] read-cache: read cache-tree in index-v5
  2013-08-24  0:09   ` Duy Nguyen
@ 2013-11-25 15:41     ` Thomas Gummerer
  0 siblings, 0 replies; 55+ messages in thread
From: Thomas Gummerer @ 2013-11-25 15:41 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Thomas Rast, Michael Haggerty, Junio C Hamano,
	Robin Rosenberg, Eric Sunshine, Ramsay Jones

Duy Nguyen <pclouds@gmail.com> writes:

> On Mon, Aug 19, 2013 at 2:42 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>> +/*
>> + * This function modifies the directory argument that is given to it.
>> + * Don't use it if the directory entries are still needed after.
>> + */
>
> There goes my hope of keeping directory_entry* in core so that at
> write-time, after validation, we only need to recreate some trees
> instead of all of them..
>
> Or we could make cache-tree keep references to directory_entry. If a
> cache-tree is not invalidated, then the attached directory_tree should
> be reused..

I've now re-written the algorithm that converts the directory entries to
cache entries, and it's now no longer destructive.  For now the
directory entries are not needed in core, so I'll free them when done
with reading the index, but it's possible to keep it.

>> +static struct cache_tree *cache_tree_convert_v5(struct directory_entry *de)
>> +{
>> +       if (!de->de_nentries)
>> +               return NULL;
>> +       sort_directories(de);
>> +       return convert_one(de);
>> +}
>> +
>>  static int read_entries(struct index_state *istate, struct directory_entry *de,
>>                         unsigned int first_entry_offset, void *mmap,
>>                         unsigned long mmap_size, unsigned int *nr,
>> @@ -591,6 +668,7 @@ static int read_index_v5(struct index_state *istate, void *mmap,
>>                 }
>>                 de = de->next;
>>         }
>> +       istate->cache_tree = cache_tree_convert_v5(root_directory);
>>         istate->cache_nr = nr;
>>         return 0;
>>  }
>
> Otherwise we do need to free root_directory down to the deepest
> subtrees, I think. People have been complaining about read-cache
> leaking memory like mad, so this is a real issue. Even if you keep
> references in cache-tree, you still need to free it
> cache_tree_invalidate_path() to avoid leaking

I'm freeing them for now, as they are not used anywhere, but in the
future we might want to keep them for some optimizations.

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2013-11-25 15:41 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-18 19:41 [PATCH v3 00/24] Index-v5 Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 01/24] t2104: Don't fail for index versions other than [23] Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 02/24] read-cache: use fixed width integer types Thomas Gummerer
2013-08-18 20:21   ` Eric Sunshine
2013-08-20 19:30   ` Junio C Hamano
2013-08-21  3:05     ` Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 03/24] read-cache: split index file version specific functionality Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 04/24] read-cache: clear version in discard_index() Thomas Gummerer
2013-08-20 19:34   ` Junio C Hamano
2013-08-21  3:06     ` Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 05/24] read-cache: move index v2 specific functions to their own file Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 06/24] read-cache: Don't compare uid, gid and ino on cygwin Thomas Gummerer
2013-08-18 22:34   ` Ramsay Jones
2013-08-20  8:36     ` Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 07/24] read-cache: Re-read index if index file changed Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 08/24] add documentation for the index api Thomas Gummerer
2013-08-18 20:50   ` Eric Sunshine
2013-08-18 19:41 ` [PATCH v3 09/24] read-cache: add index reading api Thomas Gummerer
2013-08-18 19:41 ` [PATCH v3 10/24] make sure partially read index is not changed Thomas Gummerer
2013-08-18 21:06   ` Eric Sunshine
2013-08-20  8:46     ` Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 11/24] grep.c: use index api Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 12/24] ls-files.c: " Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 13/24] documentation: add documentation of the index-v5 file format Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 14/24] read-cache: make in-memory format aware of stat_crc Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 15/24] read-cache: read index-v5 Thomas Gummerer
2013-08-19  1:57   ` Eric Sunshine
2013-08-20 14:01   ` Duy Nguyen
2013-08-20 20:59     ` Thomas Gummerer
2013-08-21  0:44       ` Duy Nguyen
2013-08-20 14:16   ` Duy Nguyen
2013-08-20 21:13     ` Thomas Gummerer
2013-08-23 23:52   ` Duy Nguyen
2013-08-18 19:42 ` [PATCH v3 16/24] read-cache: read resolve-undo data Thomas Gummerer
2013-08-19  1:59   ` Eric Sunshine
2013-08-18 19:42 ` [PATCH v3 17/24] read-cache: read cache-tree in index-v5 Thomas Gummerer
2013-08-24  0:09   ` Duy Nguyen
2013-11-25 15:41     ` Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 18/24] read-cache: write index-v5 Thomas Gummerer
2013-08-24  3:58   ` Duy Nguyen
2013-11-25 15:37     ` Thomas Gummerer
2013-08-24  4:07   ` Duy Nguyen
2013-08-24  9:56     ` Duy Nguyen
2013-08-18 19:42 ` [PATCH v3 19/24] read-cache: write index-v5 cache-tree data Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 20/24] read-cache: write resolve-undo data for index-v5 Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 21/24] update-index.c: rewrite index when index-version is given Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 22/24] p0003-index.sh: add perf test for the index formats Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 23/24] introduce GIT_INDEX_VERSION environment variable Thomas Gummerer
2013-08-21  0:57   ` Duy Nguyen
2013-08-21  4:01     ` Thomas Gummerer
2013-08-18 19:42 ` [PATCH v3 24/24] test-lib: allow setting the index format version Thomas Gummerer
2013-08-24  4:16 ` [PATCH v3 00/24] Index-v5 Duy Nguyen
2013-08-25  3:07   ` Junio C Hamano
2013-08-25  4:40     ` Duy Nguyen
2013-08-31  5:23     ` Thomas Gummerer

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).