git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v2 00/15] index-helper, watchman
@ 2016-03-19  1:04 David Turner
  2016-03-19  1:04 ` [PATCH v2 01/17] read-cache.c: fix constness of verify_hdr() David Turner
                   ` (16 more replies)
  0 siblings, 17 replies; 25+ messages in thread
From: David Turner @ 2016-03-19  1:04 UTC (permalink / raw)
  To: git, pclouds, Johannes Schindelin; +Cc: David Turner

In this version:

I removed the statistics since they're not really a core part of the
series and we can always add them later.

I added a little more testing.

I merged the untracked cache/watchman mashup into the relevant
patches.  Hopefully I didn't screw it up -- I got into the rebase
weeds here and took a while to get out.  I wish I were better at
rebasing.

I moved the index-helper to a named-pipe based IPC mechanism instead
of a signal-based one.  I don't actualy know much about named pipes
other than what I learned while writing this, so someone with clue
should probably take a peek.

The index-helper pid files was kind of a hassle -- there was a
lot of mechanism involved in making sure that they were in fact
pointing to a live index-helper process.  With named pipes, we don't
really need them: if a live index-helper is running, it can simply be
instructed to die, and if it's not, then the pipe can be safely
removed and a new index-helper started.

Because there is no longer a pid file for index-helper, index-helper
has no way to let git know that it should wait for a watchman update
reply when poked.  I worked around this by making the waiting
configurable, and giving a sensible default.  Note that in the
no-watchman case, the wait will be short even if misconfigured,
because the daemon can immediately send an "OK" message.

I updated some commit messages, following Junio and Duy's suggestions.

Johannes Schindelin said that he would work on the Windows side, so
I've dropped the Windows bits (including one patch).  Since I don't
know anything about Windows, it's probably for the best that I'm not
coding for it.

I eliminated the ability to start up an index-helper when there was
already one running because I don't see why you would want to do that,
and because it would lead to multiple readers on the same named pipe,
which has undefined behavior.  Just kill the old one if you're done
with it.

David Turner (7):
  read-cache: invalidate untracked cache data when reading WAMA
  unpack-trees: preserve index extensions
  index-helper: kill mode
  index-helper: don't run if already running
  index-helper: autorun mode
  index-helper: optionally automatically run
  read-cache: config for waiting for index-helper

Nguyễn Thái Ngọc Duy (10):
  read-cache.c: fix constness of verify_hdr()
  read-cache: allow to keep mmap'd memory after reading
  index-helper: new daemon for caching index and related stuff
  index-helper: add --strict
  daemonize(): set a flag before exiting the main process
  index-helper: add --detach
  read-cache: add watchman 'WAMA' extension
  Add watchman support to reduce index refresh cost
  index-helper: use watchman to avoid refreshing index with lstat()
  update-index: enable/disable watchman support

 .gitignore                         |   1 +
 Documentation/git-index-helper.txt |  64 ++++++
 Makefile                           |  21 ++
 builtin/gc.c                       |   2 +-
 builtin/update-index.c             |  11 +
 cache.h                            |  18 +-
 config.c                           |  10 +
 config.mak.uname                   |   1 +
 configure.ac                       |   8 +
 daemon.c                           |   2 +-
 dir.c                              |  23 +-
 dir.h                              |   6 +
 environment.c                      |   8 +
 git-compat-util.h                  |   1 +
 git.c                              |  35 ++-
 index-helper.c                     | 439 +++++++++++++++++++++++++++++++++++
 read-cache.c                       | 455 +++++++++++++++++++++++++++++++++++--
 setup.c                            |   4 +-
 shm.c                              |  67 ++++++
 shm.h                              |  23 ++
 t/t7063-status-untracked-cache.sh  |  23 ++
 t/t7900-index-helper.sh            |  60 +++++
 t/test-lib-functions.sh            |   4 +
 unpack-trees.c                     |   1 +
 watchman-support.c                 | 134 +++++++++++
 watchman-support.h                 |   7 +
 26 files changed, 1406 insertions(+), 22 deletions(-)
 create mode 100644 Documentation/git-index-helper.txt
 create mode 100644 index-helper.c
 create mode 100644 shm.c
 create mode 100644 shm.h
 create mode 100755 t/t7900-index-helper.sh
 create mode 100644 watchman-support.c
 create mode 100644 watchman-support.h

-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 01/17] read-cache.c: fix constness of verify_hdr()
  2016-03-19  1:04 [PATCH v2 00/15] index-helper, watchman David Turner
@ 2016-03-19  1:04 ` David Turner
  2016-03-19  1:04 ` [PATCH v2 02/17] read-cache: allow to keep mmap'd memory after reading David Turner
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: David Turner @ 2016-03-19  1:04 UTC (permalink / raw)
  To: git, pclouds, Johannes Schindelin

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 read-cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index d9fb78b..16cc487 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1345,7 +1345,7 @@ struct ondisk_cache_entry_extended {
 			    ondisk_cache_entry_extended_size(ce_namelen(ce)) : \
 			    ondisk_cache_entry_size(ce_namelen(ce)))
 
-static int verify_hdr(struct cache_header *hdr, unsigned long size)
+static int verify_hdr(const struct cache_header *hdr, unsigned long size)
 {
 	git_SHA_CTX c;
 	unsigned char sha1[20];
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 02/17] read-cache: allow to keep mmap'd memory after reading
  2016-03-19  1:04 [PATCH v2 00/15] index-helper, watchman David Turner
  2016-03-19  1:04 ` [PATCH v2 01/17] read-cache.c: fix constness of verify_hdr() David Turner
@ 2016-03-19  1:04 ` David Turner
  2016-03-19  1:04 ` [PATCH v2 03/17] index-helper: new daemon for caching index and related stuff David Turner
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: David Turner @ 2016-03-19  1:04 UTC (permalink / raw)
  To: git, pclouds, Johannes Schindelin

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Later, we will introduce git index-helper to share this memory with
other git processes.

Since the memory will be shared, it will never be unmapped (although
the kernel may of course choose to page it out).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h      |  3 +++
 read-cache.c | 13 ++++++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index b829410..4180e2b 100644
--- a/cache.h
+++ b/cache.h
@@ -333,11 +333,14 @@ struct index_state {
 	struct split_index *split_index;
 	struct cache_time timestamp;
 	unsigned name_hash_initialized : 1,
+		 keep_mmap : 1,
 		 initialized : 1;
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
 	unsigned char sha1[20];
 	struct untracked_cache *untracked;
+	void *mmap;
+	size_t mmap_size;
 };
 
 extern struct index_state the_index;
diff --git a/read-cache.c b/read-cache.c
index 16cc487..7e387e9 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1574,6 +1574,10 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	mmap = xmmap(NULL, mmap_size, PROT_READ, MAP_PRIVATE, fd, 0);
 	if (mmap == MAP_FAILED)
 		die_errno("unable to map index file");
+	if (istate->keep_mmap) {
+		istate->mmap = mmap;
+		istate->mmap_size = mmap_size;
+	}
 	close(fd);
 
 	hdr = mmap;
@@ -1626,10 +1630,12 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 		src_offset += 8;
 		src_offset += extsize;
 	}
-	munmap(mmap, mmap_size);
+	if (!istate->keep_mmap)
+		munmap(mmap, mmap_size);
 	return istate->cache_nr;
 
 unmap:
+	istate->mmap = NULL;
 	munmap(mmap, mmap_size);
 	die("index file corrupt");
 }
@@ -1655,6 +1661,7 @@ int read_index_from(struct index_state *istate, const char *path)
 		discard_index(split_index->base);
 	else
 		split_index->base = xcalloc(1, sizeof(*split_index->base));
+	split_index->base->keep_mmap = istate->keep_mmap;
 	ret = do_read_index(split_index->base,
 			    git_path("sharedindex.%s",
 				     sha1_to_hex(split_index->base_sha1)), 1);
@@ -1698,6 +1705,10 @@ int discard_index(struct index_state *istate)
 	free(istate->cache);
 	istate->cache = NULL;
 	istate->cache_alloc = 0;
+	if (istate->keep_mmap && istate->mmap) {
+		munmap(istate->mmap, istate->mmap_size);
+		istate->mmap = NULL;
+	}
 	discard_split_index(istate);
 	free_untracked_cache(istate->untracked);
 	istate->untracked = NULL;
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 03/17] index-helper: new daemon for caching index and related stuff
  2016-03-19  1:04 [PATCH v2 00/15] index-helper, watchman David Turner
  2016-03-19  1:04 ` [PATCH v2 01/17] read-cache.c: fix constness of verify_hdr() David Turner
  2016-03-19  1:04 ` [PATCH v2 02/17] read-cache: allow to keep mmap'd memory after reading David Turner
@ 2016-03-19  1:04 ` David Turner
  2016-03-29  2:31   ` Duy Nguyen
  2016-03-19  1:04 ` [PATCH v2 04/17] index-helper: add --strict David Turner
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 25+ messages in thread
From: David Turner @ 2016-03-19  1:04 UTC (permalink / raw)
  To: git, pclouds, Johannes Schindelin; +Cc: David Turner

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Instead of reading the index from disk and worrying about disk
corruption, the index is cached in memory (memory bit-flips happen
too, but hopefully less often). The result is faster read. Read time
is reduced by 70%.

The biggest gain is not having to verify the trailing SHA-1, which
takes lots of time especially on large index files. But this also
opens doors for further optimiztions:

 - we could create an in-memory format that's essentially the memory
   dump of the index to eliminate most of parsing/allocation
   overhead. The mmap'd memory can be used straight away. Experiment
   [1] shows we could reduce read time by 88%.

 - we could cache non-index info such as name hash

The shared memory's name folows the template "git-<object>-<SHA1>"
where <SHA1> is the trailing SHA-1 of the index file. <object> is
"index" for cached index files (and may be "name-hash" for name-hash
cache). If such shared memory exists, it contains the same index
content as on disk. The content is already validated by the daemon and
git won't validate it again (except comparing the trailing SHA-1s).

We keep this daemon's logic as thin as possible. The "brain" stays in
git. So the daemon can read and validate stuff, but that's all it's
allowed to do. It does not add/create new information. It doesn't even
accept direct updates from git.

Git can poke the daemon via named pipes to tell it to refresh the
index cache, or to keep it alive some more minutes. It can't give any
real index data directly to the daemon. Real data goes to disk first,
then the daemon reads and verifies it from there. Poking only happens
for $GIT_DIR/index, not temporary index files.

$GIT_DIR/index-helper.pipe is the named pipe for daemon process. The
daemon reads from the pipe and executes commands.  Commands that need
replies from the daemon will have to open their own pipe, since a
named pipe should only have one reader.  Unix domain sockets don't
have this problem, but are less portable.

index-helper requires POSIX realtime extension. POSIX shm interface
however is abstracted away so that Windows support can be added later.

On webkit.git with index format v2, duplicating 8 times to 1.4m
entries and 200MB in size:

(vanilla)      0.986986364 s: read_index_from .git/index
(index-helper) 0.267850279 s: read_index_from .git/index

Interestingly with index v4, we get less out of index-helper. It makes
sense as v4 requires more processing after loading the index:

(vanilla)      0.722496666 s: read_index_from .git/index
(index-helper) 0.302741500 s: read_index_from .git/index

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
---
 .gitignore                         |   1 +
 Documentation/git-index-helper.txt |  46 ++++++++
 Makefile                           |   9 ++
 cache.h                            |   3 +
 config.mak.uname                   |   1 +
 git-compat-util.h                  |   1 +
 index-helper.c                     | 215 +++++++++++++++++++++++++++++++++++++
 read-cache.c                       |  99 +++++++++++++++--
 shm.c                              |  67 ++++++++++++
 shm.h                              |  23 ++++
 t/t7900-index-helper.sh            |  18 ++++
 11 files changed, 474 insertions(+), 9 deletions(-)
 create mode 100644 Documentation/git-index-helper.txt
 create mode 100644 index-helper.c
 create mode 100644 shm.c
 create mode 100644 shm.h
 create mode 100755 t/t7900-index-helper.sh

diff --git a/.gitignore b/.gitignore
index 5087ce1..b92f122 100644
--- a/.gitignore
+++ b/.gitignore
@@ -71,6 +71,7 @@
 /git-http-fetch
 /git-http-push
 /git-imap-send
+/git-index-helper
 /git-index-pack
 /git-init
 /git-init-db
diff --git a/Documentation/git-index-helper.txt b/Documentation/git-index-helper.txt
new file mode 100644
index 0000000..59a5abc
--- /dev/null
+++ b/Documentation/git-index-helper.txt
@@ -0,0 +1,46 @@
+git-index-helper(1)
+===================
+
+NAME
+----
+git-index-helper - A simple cache daemon for speeding up index file access
+
+SYNOPSIS
+--------
+[verse]
+'git index-helper' [options]
+
+DESCRIPTION
+-----------
+Keep the index file in memory for faster access. This daemon is per
+repository.
+
+OPTIONS
+-------
+
+--exit-after=<n>::
+	Exit if the cached index is not accessed for `<n>`
+	minutes. Specify 0 to wait forever. Default is 10.
+
+NOTES
+-----
+$GIT_DIR/index-helper.pipe is a named pipe that the daemon reads
+commands from. At least on Linux, shared memory objects are availble
+via /dev/shm with the name pattern "git-<something>-<SHA1>".  Normally
+the daemon will clean up shared memory objects when it exits.  But if
+it crashes, some objects could remain there and they can be safely
+deleted with "rm" command. The following commands are used to control
+the daemon:
+
+"refresh"::
+	Reread the index.
+
+"poke":
+	Let the daemon know the index is to be read. It keeps the
+	daemon alive longer, unless `--exit-after=0` is used.
+
+All commands and replies are terminated by a 0 byte.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index 2742a69..2d72771 100644
--- a/Makefile
+++ b/Makefile
@@ -370,6 +370,8 @@ all::
 # Define HAVE_BSD_SYSCTL if your platform has a BSD-compatible sysctl function.
 #
 # Define HAVE_GETDELIM if your system has the getdelim() function.
+#
+# Define HAVE_SHM if you platform support shm_* functions in librt.
 
 GIT-VERSION-FILE: FORCE
 	@$(SHELL_PATH) ./GIT-VERSION-GEN
@@ -805,6 +807,7 @@ LIB_OBJS += sha1-lookup.o
 LIB_OBJS += sha1_file.o
 LIB_OBJS += sha1_name.o
 LIB_OBJS += shallow.o
+LIB_OBJS += shm.o
 LIB_OBJS += sideband.o
 LIB_OBJS += sigchain.o
 LIB_OBJS += split-index.o
@@ -1433,6 +1436,12 @@ ifdef HAVE_DEV_TTY
 	BASIC_CFLAGS += -DHAVE_DEV_TTY
 endif
 
+ifdef HAVE_SHM
+	BASIC_CFLAGS	+= -DHAVE_SHM
+	EXTLIBS	        += -lrt
+	PROGRAM_OBJS	+= index-helper.o
+endif
+
 ifdef DIR_HAS_BSD_GROUP_SEMANTICS
 	COMPAT_CFLAGS += -DDIR_HAS_BSD_GROUP_SEMANTICS
 endif
diff --git a/cache.h b/cache.h
index 4180e2b..c3c6d85 100644
--- a/cache.h
+++ b/cache.h
@@ -334,6 +334,8 @@ struct index_state {
 	struct cache_time timestamp;
 	unsigned name_hash_initialized : 1,
 		 keep_mmap : 1,
+		 from_shm : 1,
+		 to_shm : 1,
 		 initialized : 1;
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
@@ -560,6 +562,7 @@ extern int is_index_unborn(struct index_state *);
 extern int read_index_unmerged(struct index_state *);
 #define COMMIT_LOCK		(1 << 0)
 #define CLOSE_LOCK		(1 << 1)
+#define REFRESH_DAEMON		(1 << 2)
 extern int write_locked_index(struct index_state *, struct lock_file *lock, unsigned flags);
 extern int discard_index(struct index_state *);
 extern int unmerged_index(const struct index_state *);
diff --git a/config.mak.uname b/config.mak.uname
index 1139b44..d28d05d 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -38,6 +38,7 @@ ifeq ($(uname_S),Linux)
 	HAVE_CLOCK_MONOTONIC = YesPlease
 	HAVE_GETDELIM = YesPlease
 	SANE_TEXT_GREP=-a
+	HAVE_SHM = YesPlease
 endif
 ifeq ($(uname_S),GNU/kFreeBSD)
 	HAVE_ALLOCA_H = YesPlease
diff --git a/git-compat-util.h b/git-compat-util.h
index c07e0c1..56945a7 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -513,6 +513,7 @@ static inline int ends_with(const char *str, const char *suffix)
 #define PROT_READ 1
 #define PROT_WRITE 2
 #define MAP_PRIVATE 1
+#define MAP_SHARED 2
 #endif
 
 #define mmap git_mmap
diff --git a/index-helper.c b/index-helper.c
new file mode 100644
index 0000000..962c973
--- /dev/null
+++ b/index-helper.c
@@ -0,0 +1,215 @@
+#include "cache.h"
+#include "parse-options.h"
+#include "sigchain.h"
+#include "strbuf.h"
+#include "exec_cmd.h"
+#include "split-index.h"
+#include "shm.h"
+#include "lockfile.h"
+
+struct shm {
+	unsigned char sha1[20];
+	void *shm;
+	size_t size;
+};
+
+static struct shm shm_index;
+static struct shm shm_base_index;
+
+static void release_index_shm(struct shm *is)
+{
+	if (!is->shm)
+		return;
+	munmap(is->shm, is->size);
+	git_shm_unlink("git-index-%s", sha1_to_hex(is->sha1));
+	is->shm = NULL;
+}
+
+static void cleanup_shm(void)
+{
+	release_index_shm(&shm_index);
+	release_index_shm(&shm_base_index);
+}
+
+static void cleanup(void)
+{
+	unlink(git_path("index-helper.pipe"));
+	cleanup_shm();
+}
+
+static void cleanup_on_signal(int sig)
+{
+	/* We ignore sigpipes -- that's just a client being broken. */
+	if (sig == SIGPIPE)
+		return;
+	cleanup();
+	sigchain_pop(sig);
+	raise(sig);
+}
+
+static void share_index(struct index_state *istate, struct shm *is)
+{
+	void *new_mmap;
+	if (istate->mmap_size <= 20 ||
+	    hashcmp(istate->sha1,
+		    (unsigned char *)istate->mmap + istate->mmap_size - 20) ||
+	    !hashcmp(istate->sha1, is->sha1) ||
+	    git_shm_map(O_CREAT | O_EXCL | O_RDWR, 0700, istate->mmap_size,
+			&new_mmap, PROT_READ | PROT_WRITE, MAP_SHARED,
+			"git-index-%s", sha1_to_hex(istate->sha1)) < 0)
+		return;
+
+	release_index_shm(is);
+	is->size = istate->mmap_size;
+	is->shm = new_mmap;
+	hashcpy(is->sha1, istate->sha1);
+	memcpy(new_mmap, istate->mmap, istate->mmap_size - 20);
+
+	/*
+	 * The trailing hash must be written last after everything is
+	 * written. It's the indication that the shared memory is now
+	 * ready.
+	 */
+	hashcpy((unsigned char *)new_mmap + istate->mmap_size - 20, is->sha1);
+}
+
+static void share_the_index(void)
+{
+	if (the_index.split_index && the_index.split_index->base)
+		share_index(the_index.split_index->base, &shm_base_index);
+	share_index(&the_index, &shm_index);
+	discard_index(&the_index);
+}
+
+static void refresh(void)
+{
+	the_index.keep_mmap = 1;
+	the_index.to_shm    = 1;
+	if (read_cache() < 0)
+		die(_("could not read index"));
+	share_the_index();
+}
+
+#ifdef HAVE_SHM
+
+static void loop(int fd, int idle_in_seconds)
+{
+	struct timeval timeout;
+	struct timeval *timeout_p;
+
+	if (idle_in_seconds) {
+		timeout.tv_usec = 0;
+		timeout.tv_sec = idle_in_seconds;
+		timeout_p = &timeout;
+	} else {
+		timeout_p = NULL;
+	}
+
+	while (1) {
+		fd_set readfds;
+
+		/* Wait for a request */
+		FD_ZERO(&readfds);
+		FD_SET(fd, &readfds);
+		if (select(fd + 1, &readfds, NULL, NULL, timeout_p)) {
+			/* Got one! */
+			struct strbuf command = STRBUF_INIT;
+			/*
+			 * We assume that after the client does a
+			 * write, we can do a full read of the data
+			 * they wrote (which will be less than
+			 * PIPE_BUF).
+			 */
+			if (strbuf_getwholeline_fd(&command, fd, '\0') == 0) {
+				if (!strcmp(command.buf, "refresh")) {
+					refresh();
+				} else if (!strcmp(command.buf, "poke")) {
+					  /*
+					   * Just a poke to keep us
+					   * alive, nothing to do.
+					   */
+				} else {
+					warning("BUG: Bogus command %s", command.buf);
+				}
+			}
+		} else {
+			/* No request before timeout */
+			break;
+		}
+	}
+
+	close(fd);
+}
+
+#else
+
+static void loop(int fd, int idle_in_seconds)
+{
+	die(_("index-helper is not supported on this platform"));
+}
+
+#endif
+
+static int setup_pipe(const char *pipe_path)
+{
+	int fd;
+
+	if (mkfifo(pipe_path, 0600)) {
+		if (errno != EEXIST)
+			die(_("failed to create pipe %s"), pipe_path);
+
+		/* Left over from a previous run, delete & retry */
+		if (unlink(pipe_path))
+			die(_("failed to delete %s"), pipe_path);
+		if (mkfifo(pipe_path, 0600))
+			die(_("failed to create pipe %s"), pipe_path);
+	}
+
+	/*
+	 * Even though we never write to this pipe, we need to open
+	 * O_RDWR to prevent select() looping on EOF.
+	 */
+	fd = open(pipe_path, O_RDWR | O_NONBLOCK);
+	if (fd < 0)
+		die(_("Failed to open %s"), pipe_path);
+	return fd;
+}
+
+static const char * const usage_text[] = {
+	N_("git index-helper [options]"),
+	NULL
+};
+
+int main(int argc, char **argv)
+{
+	const char *prefix;
+	int idle_in_seconds = 600;
+	int fd;
+	struct strbuf pipe_path = STRBUF_INIT;
+	struct option options[] = {
+		OPT_INTEGER(0, "exit-after", &idle_in_seconds,
+			    N_("exit if not used after some seconds")),
+		OPT_END()
+	};
+
+	git_extract_argv0_path(argv[0]);
+	git_setup_gettext();
+
+	if (argc == 2 && !strcmp(argv[1], "-h"))
+		usage_with_options(usage_text, options);
+
+	prefix = setup_git_directory();
+	if (parse_options(argc, (const char **)argv, prefix,
+			  options, usage_text, 0))
+		die(_("too many arguments"));
+
+	atexit(cleanup);
+	sigchain_push_common(cleanup_on_signal);
+
+	strbuf_git_path(&pipe_path, "index-helper.pipe");
+	fd = setup_pipe(pipe_path.buf);
+	if (fd < 0)
+		die_errno("Could not set up index-helper.pipe");
+	loop(fd, idle_in_seconds);
+	return 0;
+}
diff --git a/read-cache.c b/read-cache.c
index 7e387e9..f5fafd6 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -18,6 +18,7 @@
 #include "varint.h"
 #include "split-index.h"
 #include "utf8.h"
+#include "shm.h"
 
 static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 					       unsigned int options);
@@ -1541,6 +1542,74 @@ static void post_read_index_from(struct index_state *istate)
 	tweak_untracked_cache(istate);
 }
 
+static int poke_daemon(struct index_state *istate,
+		       const struct stat *st, int refresh_cache)
+{
+	int fd;
+	int ret = 0;
+
+	/* if this is from index-helper, do not poke itself (recursively) */
+	if (istate->to_shm)
+		return 0;
+
+	fd = open(git_path("index-helper.pipe"), O_WRONLY | O_NONBLOCK);
+	if (fd < 0)
+		return -1;
+	if (refresh_cache) {
+		ret = write_in_full(fd, "refresh", 8) == 8;
+	} else {
+		ret = write_in_full(fd, "poke", 5) == 5;
+	}
+
+	close(fd);
+	return ret;
+}
+
+static int is_main_index(struct index_state *istate)
+{
+	return istate == &the_index ||
+		(the_index.split_index &&
+		 istate == the_index.split_index->base);
+}
+
+/*
+ * Try to open and verify a cached shm index if available. Return 0 if
+ * succeeds (istate->mmap and istate->mmap_size are updated). Return
+ * negative otherwise.
+ */
+static int try_shm(struct index_state *istate)
+{
+	void *new_mmap = NULL;
+	size_t old_size = istate->mmap_size;
+	ssize_t new_size;
+	const unsigned char *sha1;
+	struct stat st;
+
+	if (!is_main_index(istate) ||
+	    old_size <= 20 ||
+	    stat(git_path("index-helper.pipe"), &st))
+		return -1;
+	if (poke_daemon(istate, &st, 0))
+		return -1;
+	sha1 = (unsigned char *)istate->mmap + old_size - 20;
+	new_size = git_shm_map(O_RDONLY, 0700, -1, &new_mmap,
+				 PROT_READ, MAP_SHARED,
+				 "git-index-%s", sha1_to_hex(sha1));
+	if (new_size <= 20 ||
+	    hashcmp((unsigned char *)istate->mmap + old_size - 20,
+		    (unsigned char *)new_mmap + new_size - 20)) {
+		if (new_mmap)
+			munmap(new_mmap, new_size);
+		poke_daemon(istate, &st, 1);
+		return -1;
+	}
+	munmap(istate->mmap, istate->mmap_size);
+	istate->mmap = new_mmap;
+	istate->mmap_size = new_size;
+	istate->from_shm = 1;
+	return 0;
+}
+
 /* remember to discard_cache() before reading a different cache! */
 int do_read_index(struct index_state *istate, const char *path, int must_exist)
 {
@@ -1555,6 +1624,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	if (istate->initialized)
 		return istate->cache_nr;
 
+	istate->from_shm = 0;
 	istate->timestamp.sec = 0;
 	istate->timestamp.nsec = 0;
 	fd = open(path, O_RDONLY);
@@ -1574,15 +1644,17 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	mmap = xmmap(NULL, mmap_size, PROT_READ, MAP_PRIVATE, fd, 0);
 	if (mmap == MAP_FAILED)
 		die_errno("unable to map index file");
-	if (istate->keep_mmap) {
-		istate->mmap = mmap;
-		istate->mmap_size = mmap_size;
-	}
 	close(fd);
 
-	hdr = mmap;
-	if (verify_hdr(hdr, mmap_size) < 0)
+	istate->mmap = mmap;
+	istate->mmap_size = mmap_size;
+	if (try_shm(istate) &&
+	    verify_hdr(istate->mmap, istate->mmap_size) < 0)
 		goto unmap;
+	hdr = mmap = istate->mmap;
+	mmap_size = istate->mmap_size;
+	if (!istate->keep_mmap)
+		istate->mmap = NULL;
 
 	hashcpy(istate->sha1, (const unsigned char *)hdr + mmap_size - 20);
 	istate->version = ntohl(hdr->hdr_version);
@@ -1662,6 +1734,8 @@ int read_index_from(struct index_state *istate, const char *path)
 	else
 		split_index->base = xcalloc(1, sizeof(*split_index->base));
 	split_index->base->keep_mmap = istate->keep_mmap;
+	split_index->base->to_shm    = istate->to_shm;
+	split_index->base->from_shm  = istate->from_shm;
 	ret = do_read_index(split_index->base,
 			    git_path("sharedindex.%s",
 				     sha1_to_hex(split_index->base_sha1)), 1);
@@ -1712,6 +1786,8 @@ int discard_index(struct index_state *istate)
 	discard_split_index(istate);
 	free_untracked_cache(istate->untracked);
 	istate->untracked = NULL;
+	istate->from_shm = 0;
+	istate->to_shm   = 0;
 	return 0;
 }
 
@@ -2138,9 +2214,14 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
 		return ret;
 	assert((flags & (COMMIT_LOCK | CLOSE_LOCK)) !=
 	       (COMMIT_LOCK | CLOSE_LOCK));
-	if (flags & COMMIT_LOCK)
-		return commit_locked_index(lock);
-	else if (flags & CLOSE_LOCK)
+	if (flags & COMMIT_LOCK) {
+		struct stat st;
+		ret = commit_locked_index(lock);
+		if (!ret && is_main_index(istate) &&
+		    !stat(git_path("index-helper.pipe"), &st))
+			poke_daemon(istate, &st, 1);
+		return ret;
+	} else if (flags & CLOSE_LOCK)
 		return close_lock_file(lock);
 	else
 		return ret;
diff --git a/shm.c b/shm.c
new file mode 100644
index 0000000..4ec1a00
--- /dev/null
+++ b/shm.c
@@ -0,0 +1,67 @@
+#include "git-compat-util.h"
+#include "shm.h"
+
+#ifdef HAVE_SHM
+
+#define SHM_PATH_LEN 72		/* we don't create very long paths.. */
+
+ssize_t git_shm_map(int oflag, int perm, ssize_t length, void **mmap,
+		    int prot, int flags, const char *fmt, ...)
+{
+	va_list ap;
+	char path[SHM_PATH_LEN];
+	int fd;
+
+	path[0] = '/';
+	va_start(ap, fmt);
+	vsprintf(path + 1, fmt, ap);
+	va_end(ap);
+	fd = shm_open(path, oflag, perm);
+	if (fd < 0)
+		return -1;
+	if (length > 0 && ftruncate(fd, length)) {
+		shm_unlink(path);
+		close(fd);
+		return -1;
+	}
+	if (length < 0 && !(oflag & O_CREAT)) {
+		struct stat st;
+		if (fstat(fd, &st))
+			die_errno("unable to stat %s", path);
+		length = st.st_size;
+	}
+	*mmap = xmmap(NULL, length, prot, flags, fd, 0);
+	close(fd);
+	if (*mmap == MAP_FAILED) {
+		*mmap = NULL;
+		shm_unlink(path);
+		return -1;
+	}
+	return length;
+}
+
+void git_shm_unlink(const char *fmt, ...)
+{
+	va_list ap;
+	char path[SHM_PATH_LEN];
+
+	path[0] = '/';
+	va_start(ap, fmt);
+	vsprintf(path + 1, fmt, ap);
+	va_end(ap);
+	shm_unlink(path);
+}
+
+#else
+
+ssize_t git_shm_map(int oflag, int perm, ssize_t length, void **mmap,
+		    int prot, int flags, const char *fmt, ...)
+{
+	return -1;
+}
+
+void git_shm_unlink(const char *fmt, ...)
+{
+}
+
+#endif
diff --git a/shm.h b/shm.h
new file mode 100644
index 0000000..798d3fd
--- /dev/null
+++ b/shm.h
@@ -0,0 +1,23 @@
+#ifndef SHM_H
+#define SHM_H
+
+/*
+ * Create or open a shared memory and mmap it. Return mmap size if
+ * successful, -1 otherwise. If successful mmap contains the mmap'd
+ * pointer. If oflag does not contain O_CREAT and length is negative,
+ * the mmap size is retrieved from existing shared memory object.
+ *
+ * The mmap could be freed by munmap, even on Windows. Note that on
+ * Windows, git_shm_unlink() is no-op, so the last unmap will destroy
+ * the shared memory.
+ */
+ssize_t git_shm_map(int oflag, int perm, ssize_t length, void **mmap,
+		    int prot, int flags, const char *fmt, ...);
+
+/*
+ * Unlink a shared memory object. Only needed on POSIX platforms. On
+ * Windows this is no-op.
+ */
+void git_shm_unlink(const char *fmt, ...);
+
+#endif
diff --git a/t/t7900-index-helper.sh b/t/t7900-index-helper.sh
new file mode 100755
index 0000000..7126037
--- /dev/null
+++ b/t/t7900-index-helper.sh
@@ -0,0 +1,18 @@
+#!/bin/sh
+#
+# Copyright (c) 2016, Twitter, Inc
+#
+
+test_description='git-index-helper
+
+Testing git index-helper
+'
+
+. ./test-lib.sh
+
+test_expect_success 'index-helper smoke test' '
+	git index-helper --exit-after 1 &&
+	test_path_is_missing .git/index-helper.pipe
+'
+
+test_done
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 04/17] index-helper: add --strict
  2016-03-19  1:04 [PATCH v2 00/15] index-helper, watchman David Turner
                   ` (2 preceding siblings ...)
  2016-03-19  1:04 ` [PATCH v2 03/17] index-helper: new daemon for caching index and related stuff David Turner
@ 2016-03-19  1:04 ` David Turner
  2016-03-19  1:04 ` [PATCH v2 05/17] daemonize(): set a flag before exiting the main process David Turner
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: David Turner @ 2016-03-19  1:04 UTC (permalink / raw)
  To: git, pclouds, Johannes Schindelin

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

There are "holes" in the index-helper approach because the shared
memory is not verified again by git. If $USER is compromised, shared
memory could be modified. But anyone who could do this could already
modify $GIT_DIR/index. A more realistic risk is some bugs in
index-helper that produce corrupt shared memory. --strict is added to
avoid that.

Strictly speaking there's still a very small gap where corrupt shared
memory could still be read by git: after we write the trailing SHA-1 in
the shared memory (thus signaling "this shm is ready") and before
verify_shm() detects an error.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-index-helper.txt |  9 +++++++
 cache.h                            |  1 +
 index-helper.c                     | 48 ++++++++++++++++++++++++++++++++++++++
 read-cache.c                       |  9 ++++---
 4 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-index-helper.txt b/Documentation/git-index-helper.txt
index 59a5abc..7afc5c9 100644
--- a/Documentation/git-index-helper.txt
+++ b/Documentation/git-index-helper.txt
@@ -22,6 +22,15 @@ OPTIONS
 	Exit if the cached index is not accessed for `<n>`
 	minutes. Specify 0 to wait forever. Default is 10.
 
+--strict::
+--no-strict::
+	Strict mode makes index-helper verify the shared memory after
+	it's created. If the result does not match what's read from
+	$GIT_DIR/index, the shared memory is destroyed. This makes
+	index-helper take more than double the amount of time required
+	for reading an index, but because it will happen in the
+	background, it's not noticable. `--strict` is enabled by default.
+
 NOTES
 -----
 $GIT_DIR/index-helper.pipe is a named pipe that the daemon reads
diff --git a/cache.h b/cache.h
index c3c6d85..5805962 100644
--- a/cache.h
+++ b/cache.h
@@ -336,6 +336,7 @@ struct index_state {
 		 keep_mmap : 1,
 		 from_shm : 1,
 		 to_shm : 1,
+		 always_verify_trailing_sha1 : 1,
 		 initialized : 1;
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
diff --git a/index-helper.c b/index-helper.c
index 962c973..8d221e0 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -15,6 +15,7 @@ struct shm {
 
 static struct shm shm_index;
 static struct shm shm_base_index;
+static int to_verify = 1;
 
 static void release_index_shm(struct shm *is)
 {
@@ -73,11 +74,56 @@ static void share_index(struct index_state *istate, struct shm *is)
 	hashcpy((unsigned char *)new_mmap + istate->mmap_size - 20, is->sha1);
 }
 
+static int verify_shm(void)
+{
+	int i;
+	struct index_state istate;
+	memset(&istate, 0, sizeof(istate));
+	istate.always_verify_trailing_sha1 = 1;
+	istate.to_shm = 1;
+	i = read_index(&istate);
+	if (i != the_index.cache_nr)
+		goto done;
+	for (; i < the_index.cache_nr; i++) {
+		struct cache_entry *base, *ce;
+		/* namelen is checked separately */
+		const unsigned int ondisk_flags =
+			CE_STAGEMASK | CE_VALID | CE_EXTENDED_FLAGS;
+		unsigned int ce_flags, base_flags, ret;
+		base = the_index.cache[i];
+		ce = istate.cache[i];
+		if (ce->ce_namelen != base->ce_namelen ||
+		    strcmp(ce->name, base->name)) {
+			warning("mismatch at entry %d", i);
+			break;
+		}
+		ce_flags = ce->ce_flags;
+		base_flags = base->ce_flags;
+		/* only on-disk flags matter */
+		ce->ce_flags   &= ondisk_flags;
+		base->ce_flags &= ondisk_flags;
+		ret = memcmp(&ce->ce_stat_data, &base->ce_stat_data,
+			     offsetof(struct cache_entry, name) -
+			     offsetof(struct cache_entry, ce_stat_data));
+		ce->ce_flags = ce_flags;
+		base->ce_flags = base_flags;
+		if (ret) {
+			warning("mismatch at entry %d", i);
+			break;
+		}
+	}
+done:
+	discard_index(&istate);
+	return i == the_index.cache_nr;
+}
+
 static void share_the_index(void)
 {
 	if (the_index.split_index && the_index.split_index->base)
 		share_index(the_index.split_index->base, &shm_base_index);
 	share_index(&the_index, &shm_index);
+	if (to_verify && !verify_shm())
+		cleanup_shm();
 	discard_index(&the_index);
 }
 
@@ -189,6 +235,8 @@ int main(int argc, char **argv)
 	struct option options[] = {
 		OPT_INTEGER(0, "exit-after", &idle_in_seconds,
 			    N_("exit if not used after some seconds")),
+		OPT_BOOL(0, "strict", &to_verify,
+			 "verify shared memory after creating"),
 		OPT_END()
 	};
 
diff --git a/read-cache.c b/read-cache.c
index f5fafd6..fe73828 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1648,9 +1648,12 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 
 	istate->mmap = mmap;
 	istate->mmap_size = mmap_size;
-	if (try_shm(istate) &&
-	    verify_hdr(istate->mmap, istate->mmap_size) < 0)
-		goto unmap;
+	if (try_shm(istate)) {
+		if (verify_hdr(istate->mmap, istate->mmap_size) < 0)
+			goto unmap;
+	} else if (istate->always_verify_trailing_sha1 &&
+		   verify_hdr(istate->mmap, istate->mmap_size) < 0)
+			goto unmap;
 	hdr = mmap = istate->mmap;
 	mmap_size = istate->mmap_size;
 	if (!istate->keep_mmap)
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 05/17] daemonize(): set a flag before exiting the main process
  2016-03-19  1:04 [PATCH v2 00/15] index-helper, watchman David Turner
                   ` (3 preceding siblings ...)
  2016-03-19  1:04 ` [PATCH v2 04/17] index-helper: add --strict David Turner
@ 2016-03-19  1:04 ` David Turner
  2016-03-19  1:04 ` [PATCH v2 06/17] index-helper: add --detach David Turner
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: David Turner @ 2016-03-19  1:04 UTC (permalink / raw)
  To: git, pclouds, Johannes Schindelin

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

This allows signal handlers and atexit functions to realize this
situation and not clean up.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/gc.c | 2 +-
 cache.h      | 2 +-
 daemon.c     | 2 +-
 setup.c      | 4 +++-
 4 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index c583aad..37180de 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -385,7 +385,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 			 * failure to daemonize is ok, we'll continue
 			 * in foreground
 			 */
-			daemonized = !daemonize();
+			daemonized = !daemonize(NULL);
 		}
 	} else
 		add_repack_all_option();
diff --git a/cache.h b/cache.h
index 5805962..d386722 100644
--- a/cache.h
+++ b/cache.h
@@ -530,7 +530,7 @@ extern int set_git_dir_init(const char *git_dir, const char *real_git_dir, int);
 extern int init_db(const char *template_dir, unsigned int flags);
 
 extern void sanitize_stdfds(void);
-extern int daemonize(void);
+extern int daemonize(int *);
 
 #define alloc_nr(x) (((x)+16)*3/2)
 
diff --git a/daemon.c b/daemon.c
index 8d45c33..a5cf954 100644
--- a/daemon.c
+++ b/daemon.c
@@ -1365,7 +1365,7 @@ int main(int argc, char **argv)
 		return execute();
 
 	if (detach) {
-		if (daemonize())
+		if (daemonize(NULL))
 			die("--detach not supported on this platform");
 	} else
 		sanitize_stdfds();
diff --git a/setup.c b/setup.c
index de1a2a7..9adf13f 100644
--- a/setup.c
+++ b/setup.c
@@ -1017,7 +1017,7 @@ void sanitize_stdfds(void)
 		close(fd);
 }
 
-int daemonize(void)
+int daemonize(int *daemonized)
 {
 #ifdef NO_POSIX_GOODIES
 	errno = ENOSYS;
@@ -1029,6 +1029,8 @@ int daemonize(void)
 		case -1:
 			die_errno("fork failed");
 		default:
+			if (daemonized)
+				*daemonized = 1;
 			exit(0);
 	}
 	if (setsid() == -1)
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 06/17] index-helper: add --detach
  2016-03-19  1:04 [PATCH v2 00/15] index-helper, watchman David Turner
                   ` (4 preceding siblings ...)
  2016-03-19  1:04 ` [PATCH v2 05/17] daemonize(): set a flag before exiting the main process David Turner
@ 2016-03-19  1:04 ` David Turner
  2016-03-29  2:35   ` Duy Nguyen
  2016-03-19  1:04 ` [PATCH v2 07/17] read-cache: add watchman 'WAMA' extension David Turner
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 25+ messages in thread
From: David Turner @ 2016-03-19  1:04 UTC (permalink / raw)
  To: git, pclouds, Johannes Schindelin

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

We detach after creating and opening the named pipe, because otherwise
we might return control to the shell before index-helper is ready to
accept commands.  This might lead to flaky tests.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-index-helper.txt |  3 +++
 index-helper.c                     | 11 +++++++++--
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-index-helper.txt b/Documentation/git-index-helper.txt
index 7afc5c9..b858a8d 100644
--- a/Documentation/git-index-helper.txt
+++ b/Documentation/git-index-helper.txt
@@ -31,6 +31,9 @@ OPTIONS
 	for reading an index, but because it will happen in the
 	background, it's not noticable. `--strict` is enabled by default.
 
+--detach::
+	Detach from the shell.
+
 NOTES
 -----
 $GIT_DIR/index-helper.pipe is a named pipe that the daemon reads
diff --git a/index-helper.c b/index-helper.c
index 8d221e0..a854ed8 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -15,7 +15,7 @@ struct shm {
 
 static struct shm shm_index;
 static struct shm shm_base_index;
-static int to_verify = 1;
+static int daemonized, to_verify = 1;
 
 static void release_index_shm(struct shm *is)
 {
@@ -34,6 +34,8 @@ static void cleanup_shm(void)
 
 static void cleanup(void)
 {
+	if (daemonized)
+		return;
 	unlink(git_path("index-helper.pipe"));
 	cleanup_shm();
 }
@@ -229,7 +231,7 @@ static const char * const usage_text[] = {
 int main(int argc, char **argv)
 {
 	const char *prefix;
-	int idle_in_seconds = 600;
+	int idle_in_seconds = 600, detach = 0;
 	int fd;
 	struct strbuf pipe_path = STRBUF_INIT;
 	struct option options[] = {
@@ -237,6 +239,7 @@ int main(int argc, char **argv)
 			    N_("exit if not used after some seconds")),
 		OPT_BOOL(0, "strict", &to_verify,
 			 "verify shared memory after creating"),
+		OPT_BOOL(0, "detach", &detach, "detach the process"),
 		OPT_END()
 	};
 
@@ -258,6 +261,10 @@ int main(int argc, char **argv)
 	fd = setup_pipe(pipe_path.buf);
 	if (fd < 0)
 		die_errno("Could not set up index-helper.pipe");
+
+	if (detach && daemonize(&daemonized))
+		die_errno("unable to detach");
+
 	loop(fd, idle_in_seconds);
 	return 0;
 }
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 07/17] read-cache: add watchman 'WAMA' extension
  2016-03-19  1:04 [PATCH v2 00/15] index-helper, watchman David Turner
                   ` (5 preceding siblings ...)
  2016-03-19  1:04 ` [PATCH v2 06/17] index-helper: add --detach David Turner
@ 2016-03-19  1:04 ` David Turner
  2016-03-19  1:04 ` [PATCH v2 08/17] read-cache: invalidate untracked cache data when reading WAMA David Turner
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: David Turner @ 2016-03-19  1:04 UTC (permalink / raw)
  To: git, pclouds, Johannes Schindelin

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

The extension contains a bitmap, one bit for each entry in the
index. If the n-th bit is zero, the n-th entry is considered
unchanged, we can ce_mark_uptodate() it without refreshing. If the bit
is non-zero and we found out the corresponding file is clean after
refresh, we can clear the bit.

In addition, there's a list of directories in the untracked-cache
to invalidate (because they have new or modified entries).

The 'skipping refresh' bit is not in this patch yet as we would need
watchman. More details in later patches.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 cache.h      |   4 ++
 dir.h        |   3 ++
 read-cache.c | 117 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 122 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index d386722..5b10d52 100644
--- a/cache.h
+++ b/cache.h
@@ -182,6 +182,8 @@ struct cache_entry {
 #define CE_VALID     (0x8000)
 #define CE_STAGESHIFT 12
 
+#define CE_WATCHMAN_DIRTY  (0x0001)
+
 /*
  * Range 0xFFFF0FFF in ce_flags is divided into
  * two parts: in-memory flags and on-disk ones.
@@ -320,6 +322,7 @@ static inline unsigned int canon_mode(unsigned int mode)
 #define CACHE_TREE_CHANGED	(1 << 5)
 #define SPLIT_INDEX_ORDERED	(1 << 6)
 #define UNTRACKED_CHANGED	(1 << 7)
+#define WATCHMAN_CHANGED	(1 << 8)
 
 struct split_index;
 struct untracked_cache;
@@ -344,6 +347,7 @@ struct index_state {
 	struct untracked_cache *untracked;
 	void *mmap;
 	size_t mmap_size;
+	char *last_update;
 };
 
 extern struct index_state the_index;
diff --git a/dir.h b/dir.h
index 3ec3fb0..3d540de 100644
--- a/dir.h
+++ b/dir.h
@@ -142,6 +142,9 @@ struct untracked_cache {
 	int gitignore_invalidated;
 	int dir_invalidated;
 	int dir_opened;
+	/* watchman invalidation data */
+	unsigned int use_watchman : 1;
+	struct string_list invalid_untracked;
 };
 
 struct dir_struct {
diff --git a/read-cache.c b/read-cache.c
index fe73828..6d5e871 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -19,6 +19,7 @@
 #include "split-index.h"
 #include "utf8.h"
 #include "shm.h"
+#include "ewah/ewok.h"
 
 static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 					       unsigned int options);
@@ -41,11 +42,13 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 #define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */
 #define CACHE_EXT_LINK 0x6c696e6b	  /* "link" */
 #define CACHE_EXT_UNTRACKED 0x554E5452	  /* "UNTR" */
+#define CACHE_EXT_WATCHMAN 0x57414D41	  /* "WAMA" */
 
 /* changes that can be kept in $GIT_DIR/index (basically all extensions) */
 #define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \
 		 CE_ENTRY_ADDED | CE_ENTRY_REMOVED | CE_ENTRY_CHANGED | \
-		 SPLIT_INDEX_ORDERED | UNTRACKED_CHANGED)
+		 SPLIT_INDEX_ORDERED | UNTRACKED_CHANGED | \
+		 WATCHMAN_CHANGED)
 
 struct index_state the_index;
 static const char *alternate_index_output;
@@ -1220,8 +1223,13 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 			continue;
 
 		new = refresh_cache_ent(istate, ce, options, &cache_errno, &changed);
-		if (new == ce)
+		if (new == ce) {
+			if (ce->ce_flags & CE_WATCHMAN_DIRTY) {
+				ce->ce_flags          &= ~CE_WATCHMAN_DIRTY;
+				istate->cache_changed |= WATCHMAN_CHANGED;
+			}
 			continue;
+		}
 		if (!new) {
 			const char *fmt;
 
@@ -1365,6 +1373,94 @@ static int verify_hdr(const struct cache_header *hdr, unsigned long size)
 	return 0;
 }
 
+static void mark_no_watchman(size_t pos, void *data)
+{
+	struct index_state *istate = data;
+	assert(pos < istate->cache_nr);
+	istate->cache[pos]->ce_flags |= CE_WATCHMAN_DIRTY;
+}
+
+static int read_watchman_ext(struct index_state *istate, const void *data,
+			     unsigned long sz)
+{
+	struct ewah_bitmap *bitmap;
+	int ret, len;
+	uint32_t bitmap_size;
+	uint32_t untracked_nr;
+
+	if (memchr(data, 0, sz) == NULL)
+		return error("invalid extension");
+
+	len = strlen(data) + 1;
+	memcpy(&bitmap_size, (const char *)data + len, 4);
+	memcpy(&untracked_nr, (const char *)data + len + 4, 4);
+	untracked_nr = ntohl(untracked_nr);
+	bitmap_size = ntohl(bitmap_size);
+
+	bitmap = ewah_new();
+	ret = ewah_read_mmap(bitmap, (const char *)data + len + 8, bitmap_size);
+	if (ret != bitmap_size) {
+		ewah_free(bitmap);
+		return error("failed to parse ewah bitmap reading watchman index extension");
+	}
+	istate->last_update = xstrdup(data);
+	ewah_each_bit(bitmap, mark_no_watchman, istate);
+	ewah_free(bitmap);
+
+	/*
+	 * TODO: update the untracked cache from the untracked data in this
+	 * extension.
+	 */
+	return 0;
+}
+
+static int untracked_entry_append(struct string_list_item *item, void *sbvoid)
+{
+	struct strbuf *sb = sbvoid;
+
+	strbuf_addstr(sb, item->string);
+	strbuf_addch(sb, 0);
+	return 0;
+}
+
+void write_watchman_ext(struct strbuf *sb, struct index_state* istate)
+{
+	struct ewah_bitmap *bitmap;
+	int i;
+	int ewah_start;
+	int ewah_size = 0;
+	int fixup = 0;
+
+	strbuf_add(sb, istate->last_update, strlen(istate->last_update) + 1);
+	fixup = sb->len;
+	strbuf_add(sb, &ewah_size, 4); /* we'll fix this up later */
+	if (istate->untracked) {
+		uint32_t nr = istate->untracked->invalid_untracked.nr;
+		nr = htonl(nr);
+		strbuf_add(sb, &nr, 4);
+	} else {
+		/* zero */
+		strbuf_add(sb, &ewah_size, 4);
+	}
+
+	ewah_start = sb->len;
+	bitmap = ewah_new();
+	for (i = 0; i < istate->cache_nr; i++)
+		if (istate->cache[i]->ce_flags & CE_WATCHMAN_DIRTY)
+			ewah_set(bitmap, i);
+	ewah_serialize_strbuf(bitmap, sb);
+	ewah_free(bitmap);
+
+	/* fix up size field */
+	ewah_size = sb->len - ewah_start;
+	ewah_size = htonl(ewah_size);
+	memcpy(sb->buf + fixup, &ewah_size, 4);
+
+	if (istate->untracked)
+		for_each_string_list(&istate->untracked->invalid_untracked,
+				     untracked_entry_append, sb);
+}
+
 static int read_index_extension(struct index_state *istate,
 				const char *ext, void *data, unsigned long sz)
 {
@@ -1382,6 +1478,11 @@ static int read_index_extension(struct index_state *istate,
 	case CACHE_EXT_UNTRACKED:
 		istate->untracked = read_untracked_extension(data, sz);
 		break;
+
+	case CACHE_EXT_WATCHMAN:
+		read_watchman_ext(istate, data, sz);
+		break;
+
 	default:
 		if (*ext < 'A' || 'Z' < *ext)
 			return error("index uses %.4s extension, which we do not understand",
@@ -1791,6 +1892,8 @@ int discard_index(struct index_state *istate)
 	istate->untracked = NULL;
 	istate->from_shm = 0;
 	istate->to_shm   = 0;
+	free(istate->last_update);
+	istate->last_update = NULL;
 	return 0;
 }
 
@@ -2188,6 +2291,16 @@ static int do_write_index(struct index_state *istate, int newfd,
 		if (err)
 			return -1;
 	}
+	if (!strip_extensions && istate->last_update) {
+		struct strbuf sb = STRBUF_INIT;
+
+		write_watchman_ext(&sb, istate);
+		err = write_index_ext_header(&c, newfd, CACHE_EXT_WATCHMAN, sb.len) < 0
+			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
+		strbuf_release(&sb);
+		if (err)
+			return -1;
+	}
 
 	if (ce_flush(&c, newfd, istate->sha1) || fstat(newfd, &st))
 		return -1;
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 08/17] read-cache: invalidate untracked cache data when reading WAMA
  2016-03-19  1:04 [PATCH v2 00/15] index-helper, watchman David Turner
                   ` (6 preceding siblings ...)
  2016-03-19  1:04 ` [PATCH v2 07/17] read-cache: add watchman 'WAMA' extension David Turner
@ 2016-03-19  1:04 ` David Turner
  2016-03-29  2:50   ` Duy Nguyen
  2016-03-19  1:04 ` [PATCH v2 09/17] Add watchman support to reduce index refresh cost David Turner
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 25+ messages in thread
From: David Turner @ 2016-03-19  1:04 UTC (permalink / raw)
  To: git, pclouds, Johannes Schindelin; +Cc: David Turner

When reading the watchman extension, invalidate the listed
untracked-cache entries.

We don't clear these entries yet; we can only do that once we know
that they came from a live watchman (rather than from disk).

Signed-off-by: David Turner <dturner@twopensource.com>
---
 dir.c        | 11 +++++---
 dir.h        |  3 ++
 read-cache.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 94 insertions(+), 9 deletions(-)

diff --git a/dir.c b/dir.c
index 69e0be6..9b659e6 100644
--- a/dir.c
+++ b/dir.c
@@ -597,9 +597,9 @@ static void trim_trailing_spaces(char *buf)
  *
  * If "name" has the trailing slash, it'll be excluded in the search.
  */
-static struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
-						    struct untracked_cache_dir *dir,
-						    const char *name, int len)
+struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
+					     struct untracked_cache_dir *dir,
+					     const char *name, int len)
 {
 	int first, last;
 	struct untracked_cache_dir *d;
@@ -2625,8 +2625,10 @@ static void free_untracked(struct untracked_cache_dir *ucd)
 
 void free_untracked_cache(struct untracked_cache *uc)
 {
-	if (uc)
+	if (uc) {
 		free_untracked(uc->root);
+		string_list_clear(&uc->invalid_untracked, 0);
+	}
 	free(uc);
 }
 
@@ -2775,6 +2777,7 @@ struct untracked_cache *read_untracked_extension(const void *data, unsigned long
 		return NULL;
 
 	uc = xcalloc(1, sizeof(*uc));
+	string_list_init(&uc->invalid_untracked, 1);
 	strbuf_init(&uc->ident, ident_len);
 	strbuf_add(&uc->ident, ident, ident_len);
 	load_sha1_stat(&uc->ss_info_exclude, &ouc->info_exclude_stat,
diff --git a/dir.h b/dir.h
index 3d540de..8fd3f9e 100644
--- a/dir.h
+++ b/dir.h
@@ -315,4 +315,7 @@ struct untracked_cache *read_untracked_extension(const void *data, unsigned long
 void write_untracked_extension(struct strbuf *out, struct untracked_cache *untracked);
 void add_untracked_cache(struct index_state *istate);
 void remove_untracked_cache(struct index_state *istate);
+struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
+					     struct untracked_cache_dir *dir,
+					     const char *name, int len);
 #endif
diff --git a/read-cache.c b/read-cache.c
index 6d5e871..b4bd15c 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1373,11 +1373,76 @@ static int verify_hdr(const struct cache_header *hdr, unsigned long size)
 	return 0;
 }
 
+static struct untracked_cache_dir *find_untracked_cache_dir(
+	struct untracked_cache *uc, struct untracked_cache_dir *ucd,
+	const char *name)
+{
+	int component_len;
+	const char *end;
+	struct untracked_cache_dir *dir;
+
+	if (!*name)
+		return ucd;
+
+	end = strchr(name, '/');
+	if (end)
+		component_len = end - name;
+	else
+		component_len = strlen(name);
+
+	dir = lookup_untracked(uc, ucd, name, component_len);
+	if (dir)
+		return find_untracked_cache_dir(uc, dir, name + component_len + 1);
+
+	return NULL;
+}
+
 static void mark_no_watchman(size_t pos, void *data)
 {
 	struct index_state *istate = data;
+	struct cache_entry *ce = istate->cache[pos];
+	struct strbuf sb = STRBUF_INIT;
+	char *c;
+	struct untracked_cache_dir *dir;
+
 	assert(pos < istate->cache_nr);
-	istate->cache[pos]->ce_flags |= CE_WATCHMAN_DIRTY;
+	ce->ce_flags |= CE_WATCHMAN_DIRTY;
+
+	if (!istate->untracked || !istate->untracked->root)
+		return;
+
+	strbuf_add(&sb, ce->name, ce_namelen(ce));
+
+	for (c = sb.buf + sb.len - 1; c > sb.buf; c--) {
+		if (*c == '/') {
+			strbuf_setlen(&sb, c - sb.buf);
+			break;
+		}
+	}
+
+	if (c == sb.buf) {
+		strbuf_setlen(&sb, 0);
+	}
+
+	dir = find_untracked_cache_dir(istate->untracked,
+				       istate->untracked->root, sb.buf);
+	if (dir)
+		dir->valid = 0;
+
+	strbuf_release(&sb);
+}
+
+static int mark_untracked_invalid(struct string_list_item *item, void *uc)
+{
+	struct untracked_cache *untracked = uc;
+	struct untracked_cache_dir *dir;
+
+	dir = find_untracked_cache_dir(untracked, untracked->root,
+				       item->string);
+	if (dir)
+		dir->valid = 0;
+
+	return 0;
 }
 
 static int read_watchman_ext(struct index_state *istate, const void *data,
@@ -1407,10 +1472,24 @@ static int read_watchman_ext(struct index_state *istate, const void *data,
 	ewah_each_bit(bitmap, mark_no_watchman, istate);
 	ewah_free(bitmap);
 
-	/*
-	 * TODO: update the untracked cache from the untracked data in this
-	 * extension.
-	 */
+	if (istate->untracked && istate->untracked->root) {
+		int i;
+		const char *untracked;
+
+		untracked = data + len + 8 + bitmap_size;
+		for (i = 0; i < untracked_nr; ++i) {
+			int len = strlen(untracked);
+			string_list_append(&istate->untracked->invalid_untracked,
+					   untracked);
+			untracked += len + 1;
+		}
+
+		for_each_string_list(&istate->untracked->invalid_untracked,
+			 mark_untracked_invalid, istate->untracked);
+
+		if (untracked_nr)
+			istate->cache_changed |= WATCHMAN_CHANGED;
+	}
 	return 0;
 }
 
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 09/17] Add watchman support to reduce index refresh cost
  2016-03-19  1:04 [PATCH v2 00/15] index-helper, watchman David Turner
                   ` (7 preceding siblings ...)
  2016-03-19  1:04 ` [PATCH v2 08/17] read-cache: invalidate untracked cache data when reading WAMA David Turner
@ 2016-03-19  1:04 ` David Turner
  2016-03-24 13:47   ` Jeff Hostetler
  2016-03-19  1:04 ` [PATCH v2 10/17] index-helper: use watchman to avoid refreshing index with lstat() David Turner
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 25+ messages in thread
From: David Turner @ 2016-03-19  1:04 UTC (permalink / raw)
  To: git, pclouds, Johannes Schindelin; +Cc: David Turner

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

The previous patch has the logic to clear bits in 'WAMA' bitmap. This
patch has logic to set bits as told by watchman. The missing bit,
_using_ these bits, are not here yet.

A lot of this code is written by David Turner originally, mostly from
[1]. I'm just copying and polishing it a bit.

[1] http://article.gmane.org/gmane.comp.version-control.git/248006

Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Makefile           |  12 +++++
 cache.h            |   1 +
 config.c           |   5 ++
 configure.ac       |   8 ++++
 environment.c      |   3 ++
 watchman-support.c | 134 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 watchman-support.h |   7 +++
 7 files changed, 170 insertions(+)
 create mode 100644 watchman-support.c
 create mode 100644 watchman-support.h

diff --git a/Makefile b/Makefile
index 2d72771..8bf705b 100644
--- a/Makefile
+++ b/Makefile
@@ -453,6 +453,7 @@ MSGFMT = msgfmt
 CURL_CONFIG = curl-config
 PTHREAD_LIBS = -lpthread
 PTHREAD_CFLAGS =
+WATCHMAN_LIBS =
 GCOV = gcov
 
 export TCL_PATH TCLTK_PATH
@@ -1419,6 +1420,13 @@ else
 	LIB_OBJS += thread-utils.o
 endif
 
+ifdef USE_WATCHMAN
+	LIB_H += watchman-support.h
+	LIB_OBJS += watchman-support.o
+	WATCHMAN_LIBS = -lwatchman
+	BASIC_CFLAGS += -DUSE_WATCHMAN
+endif
+
 ifdef HAVE_PATHS_H
 	BASIC_CFLAGS += -DHAVE_PATHS_H
 endif
@@ -2030,6 +2038,9 @@ git-remote-testsvn$X: remote-testsvn.o GIT-LDFLAGS $(GITLIBS) $(VCSSVN_LIB)
 	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(LIBS) \
 	$(VCSSVN_LIB)
 
+git-index-helper$X: index-helper.o GIT-LDFLAGS $(GITLIBS)
+	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(LIBS) $(WATCHMAN_LIBS)
+
 $(REMOTE_CURL_ALIASES): $(REMOTE_CURL_PRIMARY)
 	$(QUIET_LNCP)$(RM) $@ && \
 	ln $< $@ 2>/dev/null || \
@@ -2168,6 +2179,7 @@ GIT-BUILD-OPTIONS: FORCE
 	@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+
 	@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
 	@echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+
+	@echo USE_WATCHMAN=\''$(subst ','\'',$(subst ','\'',$(USE_WATCHMAN)))'\' >>$@+
 ifdef TEST_OUTPUT_DIRECTORY
 	@echo TEST_OUTPUT_DIRECTORY=\''$(subst ','\'',$(subst ','\'',$(TEST_OUTPUT_DIRECTORY)))'\' >>$@+
 endif
diff --git a/cache.h b/cache.h
index 5b10d52..95715fd 100644
--- a/cache.h
+++ b/cache.h
@@ -688,6 +688,7 @@ extern char *git_replace_ref_base;
 
 extern int fsync_object_files;
 extern int core_preload_index;
+extern int core_watchman_sync_timeout;
 extern int core_apply_sparse_checkout;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/config.c b/config.c
index 9ba40bc..e6dc141 100644
--- a/config.c
+++ b/config.c
@@ -882,6 +882,11 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.watchmansynctimeout")) {
+		core_watchman_sync_timeout = git_config_int(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.createobject")) {
 		if (!strcmp(value, "rename"))
 			object_creation_mode = OBJECT_CREATION_USES_RENAMES;
diff --git a/configure.ac b/configure.ac
index 0cd9f46..334d63b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1099,6 +1099,14 @@ AC_COMPILE_IFELSE([BSD_SYSCTL_SRC],
 	HAVE_BSD_SYSCTL=])
 GIT_CONF_SUBST([HAVE_BSD_SYSCTL])
 
+#
+# Check for watchman client library
+
+AC_CHECK_LIB([watchman], [watchman_connect],
+	[USE_WATCHMAN=YesPlease],
+	[USE_WATCHMAN=])
+GIT_CONF_SUBST([USE_WATCHMAN])
+
 ## Other checks.
 # Define USE_PIC if you need the main git objects to be built with -fPIC
 # in order to build and link perl/Git.so.  x86-64 seems to need this.
diff --git a/environment.c b/environment.c
index 6dec9d0..35e03c7 100644
--- a/environment.c
+++ b/environment.c
@@ -94,6 +94,9 @@ int core_preload_index = 1;
  */
 int ignore_untracked_cache_config;
 
+int core_watchman_sync_timeout = 300;
+
+
 /* This is set by setup_git_dir_gently() and/or git_default_config() */
 char *git_work_tree_cfg;
 static char *work_tree;
diff --git a/watchman-support.c b/watchman-support.c
new file mode 100644
index 0000000..b7302b9
--- /dev/null
+++ b/watchman-support.c
@@ -0,0 +1,134 @@
+#include "cache.h"
+#include "watchman-support.h"
+#include "strbuf.h"
+#include "dir.h"
+#include <watchman.h>
+
+static struct watchman_query *make_query(const char *last_update)
+{
+	struct watchman_query *query = watchman_query();
+	watchman_query_set_fields(query, WATCHMAN_FIELD_NAME |
+					 WATCHMAN_FIELD_EXISTS |
+					 WATCHMAN_FIELD_NEWER);
+	watchman_query_set_empty_on_fresh(query, 1);
+	query->sync_timeout = core_watchman_sync_timeout;
+	if (*last_update)
+		watchman_query_set_since_oclock(query, last_update);
+	return query;
+}
+
+static struct watchman_query_result* query_watchman(
+	struct index_state *istate, struct watchman_connection *connection,
+	const char *fs_path, const char *last_update)
+{
+	struct watchman_error wm_error;
+	struct watchman_query *query;
+	struct watchman_expression *expr;
+	struct watchman_query_result *result;
+
+	query = make_query(last_update);
+	expr = watchman_true_expression();
+	result = watchman_do_query(connection, fs_path, query, expr, &wm_error);
+	watchman_free_query(query);
+	watchman_free_expression(expr);
+
+	if (!result)
+		warning("Watchman query error: %s (at %s)",
+			wm_error.message,
+			*last_update ? last_update : "the beginning");
+
+	return result;
+}
+
+static void update_index(struct index_state *istate,
+			 struct watchman_query_result *result)
+{
+	int i;
+
+	if (result->is_fresh_instance) {
+		/* let refresh clear them later */
+		for (i = 0; i < istate->cache_nr; i++)
+			istate->cache[i]->ce_flags |= CE_WATCHMAN_DIRTY;
+		goto done;
+	}
+
+	for (i = 0; i < result->nr; i++) {
+		struct watchman_stat *wm = result->stats + i;
+		int pos;
+
+		if (!strncmp(wm->name, ".git/", 5) ||
+		    strstr(wm->name, "/.git/"))
+			continue;
+
+		pos = index_name_pos(istate, wm->name, strlen(wm->name));
+		if (pos < 0) {
+			if (istate->untracked) {
+				char *name = xstrdup(wm->name);
+				char *dname = dirname(name);
+
+				/*
+				 * dirname() returns '.' for the root,
+				 * but we call it ''.
+				 */
+				if (dname[0] == '.' && dname[1] == 0)
+					string_list_append(&istate->untracked->invalid_untracked, "");
+				else
+					string_list_append(&istate->untracked->invalid_untracked,
+							   dname);
+				free(name);
+			}
+			continue;
+		}
+		/* FIXME: ignore staged entries and gitlinks too? */
+
+		istate->cache[pos]->ce_flags |= CE_WATCHMAN_DIRTY;
+	}
+
+done:
+	free(istate->last_update);
+	istate->last_update    = xstrdup(result->clock);
+	istate->cache_changed |= WATCHMAN_CHANGED;
+	if (istate->untracked)
+		string_list_remove_duplicates(&istate->untracked->invalid_untracked, 0);
+}
+
+int check_watchman(struct index_state *istate)
+{
+	struct watchman_error wm_error;
+	struct watchman_connection *connection;
+	struct watchman_query_result *result;
+	const char *fs_path;
+	struct timeval timeout;
+	/*
+	 * Convert core_watchman_sync_timeout, in milliseconds, to
+	 * struct timeval, in seconds and microseconds.
+	 */
+
+	fs_path = get_git_work_tree();
+	if (!fs_path)
+		return -1;
+
+	timeout.tv_sec = core_watchman_sync_timeout / 1000;
+	timeout.tv_usec = (core_watchman_sync_timeout % 1000) * 1000;
+	connection = watchman_connect(timeout, &wm_error);
+
+	if (!connection) {
+		warning("Watchman watch error: %s", wm_error.message);
+		return -1;
+	}
+
+	if (watchman_watch(connection, fs_path, &wm_error)) {
+		warning("Watchman watch error: %s", wm_error.message);
+		watchman_connection_close(connection);
+		return -1;
+	}
+
+
+	result = query_watchman(istate, connection, fs_path, istate->last_update);
+	watchman_connection_close(connection);
+	if (!result)
+		return -1;
+	update_index(istate, result);
+	watchman_free_query_result(result);
+	return 0;
+}
diff --git a/watchman-support.h b/watchman-support.h
new file mode 100644
index 0000000..ee1ef2c
--- /dev/null
+++ b/watchman-support.h
@@ -0,0 +1,7 @@
+#ifndef WATCHMAN_SUPPORT_H
+#define WATCHMAN_SUPPORT_H
+
+struct index_state;
+int check_watchman(struct index_state *index);
+
+#endif /* WATCHMAN_SUPPORT_H */
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 10/17] index-helper: use watchman to avoid refreshing index with lstat()
  2016-03-19  1:04 [PATCH v2 00/15] index-helper, watchman David Turner
                   ` (8 preceding siblings ...)
  2016-03-19  1:04 ` [PATCH v2 09/17] Add watchman support to reduce index refresh cost David Turner
@ 2016-03-19  1:04 ` David Turner
  2016-03-19  1:04 ` [PATCH v2 11/17] update-index: enable/disable watchman support David Turner
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: David Turner @ 2016-03-19  1:04 UTC (permalink / raw)
  To: git, pclouds, Johannes Schindelin; +Cc: David Turner

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Watchman is hidden behind index-helper. Before git tries to read the
index from shm, it notifies index-helper through the named pipe and
waits for index-helper to prepare shm. index-helper then contacts
watchman, updates 'WAMA' extension and put it in a separate shm and
wakes git up with a write to git's named pipe.

Git uses this extension to not lstat unchanged entries. Git only
trusts the 'WAMA' extension when it's received from the separate shm,
not from disk. Unmarked entries are "clean". Marked entries are dirty
from watchman point of view. If it finds out some entries are
'watchman-dirty', but are really unchanged (e.g. the file was changed,
then reverted back), then Git will clear the marking in 'WAMA' before
writing it down.

Hiding watchman behind index-helper means you need both daemons. You
can't run watchman alone. Not so good. But on the other hand, 'git'
binary is not linked to watchman/json libraries, which is good for
packaging. Core git package will run fine without watchman-related
packages. If they need watchman, they can install git-index-helper and
dependencies.

This also lets us trust anything in the untracked cache that we haven't
marked invalid, saving those stat() calls.

Another reason for tying watchman to index-helper is, when used with
untracked cache, we need to keep track of $GIT_WORK_TREE file
listing. That kind of list can be kept in index-helper.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds <at> gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
---
 Documentation/git-index-helper.txt |   6 ++
 cache.h                            |   2 +
 dir.c                              |  12 ++++
 index-helper.c                     | 128 ++++++++++++++++++++++++++++++---
 read-cache.c                       | 141 +++++++++++++++++++++++++++++++++++--
 5 files changed, 275 insertions(+), 14 deletions(-)

diff --git a/Documentation/git-index-helper.txt b/Documentation/git-index-helper.txt
index b858a8d..7a8042e 100644
--- a/Documentation/git-index-helper.txt
+++ b/Documentation/git-index-helper.txt
@@ -51,6 +51,12 @@ the daemon:
 	Let the daemon know the index is to be read. It keeps the
 	daemon alive longer, unless `--exit-after=0` is used.
 
+"poke <path>":
+	Like "poke", but replies with "OK" by opening the named pipe
+	at <path> and writing the string.  If watchman is configured,
+	index-helper queries watchman, then prepares a shared memory
+	object with the watchman index extension before replying.
+
 All commands and replies are terminated by a 0 byte.
 
 GIT
diff --git a/cache.h b/cache.h
index 95715fd..9fa339a 100644
--- a/cache.h
+++ b/cache.h
@@ -558,6 +558,7 @@ extern int daemonize(int *);
 
 /* Initialize and use the cache information */
 struct lock_file;
+extern int verify_index(const struct index_state *);
 extern int read_index(struct index_state *);
 extern int read_index_preload(struct index_state *, const struct pathspec *pathspec);
 extern int do_read_index(struct index_state *istate, const char *path,
@@ -565,6 +566,7 @@ extern int do_read_index(struct index_state *istate, const char *path,
 extern int read_index_from(struct index_state *, const char *path);
 extern int is_index_unborn(struct index_state *);
 extern int read_index_unmerged(struct index_state *);
+extern void write_watchman_ext(struct strbuf *sb, struct index_state* istate);
 #define COMMIT_LOCK		(1 << 0)
 #define CLOSE_LOCK		(1 << 1)
 #define REFRESH_DAEMON		(1 << 2)
diff --git a/dir.c b/dir.c
index 9b659e6..5058b29 100644
--- a/dir.c
+++ b/dir.c
@@ -1726,6 +1726,17 @@ static int valid_cached_dir(struct dir_struct *dir,
 	if (!untracked)
 		return 0;
 
+	if (dir->untracked->use_watchman) {
+		/*
+		 * With watchman, we can trust the untracked cache's
+		 * valid field.
+		 */
+		if (untracked->valid)
+			goto skip_stat;
+		else
+			invalidate_directory(dir->untracked, untracked);
+	}
+
 	if (stat(path->len ? path->buf : ".", &st)) {
 		invalidate_directory(dir->untracked, untracked);
 		memset(&untracked->stat_data, 0, sizeof(untracked->stat_data));
@@ -1739,6 +1750,7 @@ static int valid_cached_dir(struct dir_struct *dir,
 		return 0;
 	}
 
+skip_stat:
 	if (untracked->check_only != !!check_only) {
 		invalidate_directory(dir->untracked, untracked);
 		return 0;
diff --git a/index-helper.c b/index-helper.c
index a854ed8..445a0ac 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -6,15 +6,18 @@
 #include "split-index.h"
 #include "shm.h"
 #include "lockfile.h"
+#include "watchman-support.h"
 
 struct shm {
 	unsigned char sha1[20];
 	void *shm;
 	size_t size;
+	pid_t pid;
 };
 
 static struct shm shm_index;
 static struct shm shm_base_index;
+static struct shm shm_watchman;
 static int daemonized, to_verify = 1;
 
 static void release_index_shm(struct shm *is)
@@ -26,10 +29,21 @@ static void release_index_shm(struct shm *is)
 	is->shm = NULL;
 }
 
+static void release_watchman_shm(struct shm *is)
+{
+	if (!is->shm)
+		return;
+	munmap(is->shm, is->size);
+	git_shm_unlink("git-watchman-%s-%" PRIuMAX,
+		       sha1_to_hex(is->sha1), (uintmax_t)is->pid);
+	is->shm = NULL;
+}
+
 static void cleanup_shm(void)
 {
 	release_index_shm(&shm_index);
 	release_index_shm(&shm_base_index);
+	release_watchman_shm(&shm_watchman);
 }
 
 static void cleanup(void)
@@ -124,13 +138,62 @@ static void share_the_index(void)
 	if (the_index.split_index && the_index.split_index->base)
 		share_index(the_index.split_index->base, &shm_base_index);
 	share_index(&the_index, &shm_index);
-	if (to_verify && !verify_shm())
+	if (to_verify && !verify_shm()) {
 		cleanup_shm();
-	discard_index(&the_index);
+		discard_index(&the_index);
+	}
+}
+
+#ifdef HAVE_SHM
+
+#ifdef USE_WATCHMAN
+static void share_watchman(struct index_state *istate,
+			   struct shm *is, pid_t pid)
+{
+	struct strbuf sb = STRBUF_INIT;
+	void *shm;
+
+	write_watchman_ext(&sb, istate);
+	if (git_shm_map(O_CREAT | O_EXCL | O_RDWR, 0700, sb.len + 20,
+			&shm, PROT_READ | PROT_WRITE, MAP_SHARED,
+			"git-watchman-%s-%" PRIuMAX,
+			sha1_to_hex(istate->sha1), (uintmax_t)pid) == sb.len + 20) {
+		is->size = sb.len + 20;
+		is->shm = shm;
+		is->pid = pid;
+		hashcpy(is->sha1, istate->sha1);
+
+		memcpy(shm, sb.buf, sb.len);
+		hashcpy((unsigned char *)shm + is->size - 20, is->sha1);
+	}
+	strbuf_release(&sb);
 }
 
-static void refresh(void)
+
+static void prepare_with_watchman(pid_t pid)
 {
+	/*
+	 * TODO: with the help of watchman, maybe we could detect if
+	 * $GIT_DIR/index is updated.
+	 */
+	if (check_watchman(&the_index))
+		return;
+
+	share_watchman(&the_index, &shm_watchman, pid);
+}
+
+static void prepare_index(pid_t pid)
+{
+	release_watchman_shm(&shm_watchman);
+	if (the_index.last_update)
+		prepare_with_watchman(pid);
+}
+
+#endif
+
+static void refresh()
+{
+	discard_index(&the_index);
 	the_index.keep_mmap = 1;
 	the_index.to_shm    = 1;
 	if (read_cache() < 0)
@@ -138,7 +201,50 @@ static void refresh(void)
 	share_the_index();
 }
 
-#ifdef HAVE_SHM
+static int send_response(const char *client_pipe_path, const char *response)
+{
+	int fd;
+	int len;
+
+	fd = open(client_pipe_path, O_WRONLY | O_NONBLOCK);
+	if (fd < 0)
+		return -1;
+
+	len = strlen(response) + 1;
+	assert(len < PIPE_BUF);
+	if (write_in_full(fd, response, len) != len) {
+		close(fd);
+		return -1;
+	}
+
+	close(fd);
+	return 0;
+}
+
+static uintmax_t get_trailing_digits(const char *path)
+{
+	const char *start = strrchr(path, '/');
+	if (!start)
+		return 0;
+	while (*start && !isdigit(*start)) start ++;
+	if (!*start)
+		return 0;
+	return strtoull(start, NULL, 10);
+}
+
+static void reply(const char *path)
+{
+	uintmax_t pid;
+	/*
+	 * Parse caller pid out of provided path.  It'll be some
+	 * digits on the end.
+	 */
+	pid = (pid_t)get_trailing_digits(path);
+#ifdef USE_WATCHMAN
+	prepare_index(pid);
+#endif
+	send_response(path, "OK");
+}
 
 static void loop(int fd, int idle_in_seconds)
 {
@@ -171,11 +277,15 @@ static void loop(int fd, int idle_in_seconds)
 			if (strbuf_getwholeline_fd(&command, fd, '\0') == 0) {
 				if (!strcmp(command.buf, "refresh")) {
 					refresh();
-				} else if (!strcmp(command.buf, "poke")) {
-					  /*
-					   * Just a poke to keep us
-					   * alive, nothing to do.
-					   */
+				} else if (starts_with(command.buf, "poke")) {
+					if (command.buf[4] == ' ')
+						reply(command.buf + 5);
+					else
+						/*
+						 * Just a poke to keep us
+						 * alive, nothing to do.
+						 */
+						;
 				} else {
 					warning("BUG: Bogus command %s", command.buf);
 				}
diff --git a/read-cache.c b/read-cache.c
index b4bd15c..8e886d1 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1233,7 +1233,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 		if (!new) {
 			const char *fmt;
 
-			if (really && cache_errno == EINVAL) {
+			if (really || cache_errno == EINVAL) {
 				/* If we are doing --really-refresh that
 				 * means the index is not valid anymore.
 				 */
@@ -1722,11 +1722,94 @@ static void post_read_index_from(struct index_state *istate)
 	tweak_untracked_cache(istate);
 }
 
+static int poke_and_wait_for_reply(int fd)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct strbuf reply = STRBUF_INIT;
+	int pid = getpid();
+	int read_fd = -1;
+	int ret = -1;
+	fd_set fds;
+	struct timeval timeout;
+	char *my_pipe_path;
+
+	timeout.tv_usec = 0;
+	timeout.tv_sec = 1;
+
+	/*
+	 * Create a fifo so that index-helper can reply (we don't want
+	 * it to reply on its own fifo because then we maybe have a
+	 * fifo with multiple readers, which causes doom).
+	 */
+	my_pipe_path = xstrdup(real_path(git_path("%d.pipe", pid)));
+
+	/*
+	 * It's important that the command fit in one PIPE_BUF
+	 * so that it doesn't interleave with other messages.
+	 *
+	 * POSIX specifies a 512 byte minimum for PIPE_BUF, so
+	 * this should rarely be a problem.
+	 */
+	if (strlen(my_pipe_path) + 6 /* "poke \0" */ > PIPE_BUF) {
+		const char *tmp = getenv("TMPDIR");
+
+		if (!tmp)
+			tmp = "/tmp";
+
+		free(my_pipe_path);
+		strbuf_addf(&buf, "%s/git-%d.pipe", tmp, pid);
+		my_pipe_path = strbuf_detach(&buf, NULL);
+		if (buf.len + 6 > PIPE_BUF)
+			goto no_fifo;
+	}
+	if (mkfifo(my_pipe_path, 0600)) {
+		if (errno != EEXIST)
+			goto no_fifo;
+
+		unlink(my_pipe_path);
+		if (mkfifo(my_pipe_path, 0600))
+			goto no_fifo;
+	}
+	read_fd = open(my_pipe_path, O_RDONLY | O_NONBLOCK);
+	if (read_fd < 0)
+		goto done_poke;
+
+	strbuf_addstr(&buf, "poke ");
+	strbuf_addstr(&buf, my_pipe_path);
+	if (write_in_full(fd, buf.buf, buf.len + 1) != buf.len + 1)
+		goto done_poke;
+
+	/* Now wait for a reply */
+	FD_ZERO(&fds);
+	FD_SET(read_fd, &fds);
+	if (select(read_fd + 1, &fds, NULL, NULL, &timeout) == 0)
+		/* No reply, giving up */
+		goto done_poke;
+
+	if (strbuf_getwholeline_fd(&reply, read_fd, 0))
+		goto done_poke;
+
+	if (!starts_with(reply.buf, "OK"))
+		goto done_poke;
+
+	ret = 0;
+done_poke:
+	unlink(my_pipe_path);
+no_fifo:
+	free(my_pipe_path);
+	if (read_fd != -1)
+		close(read_fd);
+	strbuf_release(&buf);
+	strbuf_release(&reply);
+
+	return ret;
+}
+
 static int poke_daemon(struct index_state *istate,
 		       const struct stat *st, int refresh_cache)
 {
 	int fd;
-	int ret = 0;
+	int ret = -1;
 
 	/* if this is from index-helper, do not poke itself (recursively) */
 	if (istate->to_shm)
@@ -1738,7 +1821,7 @@ static int poke_daemon(struct index_state *istate,
 	if (refresh_cache) {
 		ret = write_in_full(fd, "refresh", 8) == 8;
 	} else {
-		ret = write_in_full(fd, "poke", 5) == 5;
+		ret = poke_and_wait_for_reply(fd);
 	}
 
 	close(fd);
@@ -1790,6 +1873,50 @@ static int try_shm(struct index_state *istate)
 	return 0;
 }
 
+static void refresh_by_watchman(struct index_state *istate)
+{
+	void *shm = NULL;
+	int length;
+	int i;
+
+	length = git_shm_map(O_RDONLY, 0700, -1, &shm,
+			     PROT_READ, MAP_SHARED,
+			     "git-watchman-%s-%" PRIuMAX,
+			     sha1_to_hex(istate->sha1),
+			     (uintmax_t)getpid());
+
+	if (length <= 20 ||
+	    hashcmp(istate->sha1, (unsigned char *)shm + length - 20) ||
+	    /*
+	     * No need to clear CE_WATCHMAN_DIRTY set by 'WAMA' on
+	     * disk. Watchman can only set more, not clear any, so
+	     * this is OR mask.
+	     */
+	    read_watchman_ext(istate, shm, length - 20))
+		goto done;
+
+	/*
+	 * Now that we've marked the invalid entries in the
+	 * untracked-cache itself, we can erase them from the list of
+	 * entries to be processed and mark the untracked cache for
+	 * watchman usage.
+	 */
+	if (istate->untracked) {
+		string_list_clear(&istate->untracked->invalid_untracked, 0);
+		istate->untracked->use_watchman = 1;
+	}
+
+	for (i = 0; i < istate->cache_nr; i++) {
+		struct cache_entry *ce = istate->cache[i];
+		if (ce_stage(ce) || (ce->ce_flags & CE_WATCHMAN_DIRTY))
+			continue;
+		ce_mark_uptodate(ce);
+	}
+done:
+	if (shm)
+		munmap(shm, length);
+}
+
 /* remember to discard_cache() before reading a different cache! */
 int do_read_index(struct index_state *istate, const char *path, int must_exist)
 {
@@ -1909,7 +2036,7 @@ int read_index_from(struct index_state *istate, const char *path)
 	split_index = istate->split_index;
 	if (!split_index || is_null_sha1(split_index->base_sha1)) {
 		post_read_index_from(istate);
-		return ret;
+		goto done;
 	}
 
 	if (split_index->base)
@@ -1930,6 +2057,10 @@ int read_index_from(struct index_state *istate, const char *path)
 		    sha1_to_hex(split_index->base->sha1));
 	merge_base_index(istate);
 	post_read_index_from(istate);
+
+done:
+	if (ret > 0 && istate->from_shm && istate->last_update)
+		refresh_by_watchman(istate);
 	return ret;
 }
 
@@ -2231,7 +2362,7 @@ out:
 	return 0;
 }
 
-static int verify_index(const struct index_state *istate)
+int verify_index(const struct index_state *istate)
 {
 	return verify_index_from(istate, get_index_file());
 }
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 11/17] update-index: enable/disable watchman support
  2016-03-19  1:04 [PATCH v2 00/15] index-helper, watchman David Turner
                   ` (9 preceding siblings ...)
  2016-03-19  1:04 ` [PATCH v2 10/17] index-helper: use watchman to avoid refreshing index with lstat() David Turner
@ 2016-03-19  1:04 ` David Turner
  2016-03-19  1:04 ` [PATCH v2 12/17] unpack-trees: preserve index extensions David Turner
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: David Turner @ 2016-03-19  1:04 UTC (permalink / raw)
  To: git, pclouds, Johannes Schindelin

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 builtin/update-index.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 1c94ca5..7c08e1c 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -914,6 +914,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 {
 	int newfd, entries, has_errors = 0, nul_term_line = 0;
 	enum uc_mode untracked_cache = UC_UNSPECIFIED;
+	int use_watchman = -1;
 	int read_from_stdin = 0;
 	int prefix_length = prefix ? strlen(prefix) : 0;
 	int preferred_index_format = 0;
@@ -1012,6 +1013,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 			    N_("test if the filesystem supports untracked cache"), UC_TEST),
 		OPT_SET_INT(0, "force-untracked-cache", &untracked_cache,
 			    N_("enable untracked cache without testing the filesystem"), UC_FORCE),
+		OPT_BOOL(0, "watchman", &use_watchman,
+			N_("use or not use watchman to reduce refresh cost")),
 		OPT_END()
 	};
 
@@ -1149,6 +1152,14 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 		die("Bug: bad untracked_cache value: %d", untracked_cache);
 	}
 
+	if (use_watchman > 0) {
+		the_index.last_update    = xstrdup("");
+		the_index.cache_changed |= WATCHMAN_CHANGED;
+	} else if (!use_watchman) {
+		the_index.last_update    = NULL;
+		the_index.cache_changed |= WATCHMAN_CHANGED;
+	}
+
 	if (active_cache_changed) {
 		if (newfd < 0) {
 			if (refresh_args.flags & REFRESH_QUIET)
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 12/17] unpack-trees: preserve index extensions
  2016-03-19  1:04 [PATCH v2 00/15] index-helper, watchman David Turner
                   ` (10 preceding siblings ...)
  2016-03-19  1:04 ` [PATCH v2 11/17] update-index: enable/disable watchman support David Turner
@ 2016-03-19  1:04 ` David Turner
  2016-03-19  1:04 ` [PATCH v2 13/17] index-helper: kill mode David Turner
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: David Turner @ 2016-03-19  1:04 UTC (permalink / raw)
  To: git, pclouds, Johannes Schindelin; +Cc: David Turner

Make git checkout (and other unpack_tree operations) preserve the
untracked cache and watchman status. This is valuable for two reasons:

1. Often, an unpack_tree operation will not touch large parts of the
working tree, and thus most of the untracked cache will continue to be
valid.

2. Even if the untracked cache were entirely invalidated by such an
operation, the user has signaled their intention to have such a cache,
and we don't want to throw it away.

The same logic applies to the watchman state.

Signed-off-by: David Turner <dturner@twopensource.com>
---
 cache.h                           |  1 +
 read-cache.c                      |  8 ++++++++
 t/t7063-status-untracked-cache.sh | 23 +++++++++++++++++++++++
 t/test-lib-functions.sh           |  4 ++++
 unpack-trees.c                    |  1 +
 5 files changed, 37 insertions(+)

diff --git a/cache.h b/cache.h
index 9fa339a..4ae7dd0 100644
--- a/cache.h
+++ b/cache.h
@@ -572,6 +572,7 @@ extern void write_watchman_ext(struct strbuf *sb, struct index_state* istate);
 #define REFRESH_DAEMON		(1 << 2)
 extern int write_locked_index(struct index_state *, struct lock_file *lock, unsigned flags);
 extern int discard_index(struct index_state *);
+extern void move_index_extensions(struct index_state *dst, struct index_state *src);
 extern int unmerged_index(const struct index_state *);
 extern int verify_path(const char *path);
 extern int index_dir_exists(struct index_state *istate, const char *name, int namelen);
diff --git a/read-cache.c b/read-cache.c
index 8e886d1..c141fec 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2742,3 +2742,11 @@ void stat_validity_update(struct stat_validity *sv, int fd)
 		fill_stat_data(sv->sd, &st);
 	}
 }
+
+void move_index_extensions(struct index_state *dst, struct index_state *src)
+{
+	dst->untracked = src->untracked;
+	src->untracked = NULL;
+	dst->last_update = src->last_update;
+	src->last_update = NULL;
+}
diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index a971884..a2c8535 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -646,4 +646,27 @@ test_expect_success 'test ident field is working' '
 	test_cmp ../expect ../err
 '
 
+test_expect_success 'untracked cache survives a checkout' '
+	git commit --allow-empty -m empty &&
+	test-dump-untracked-cache >../before &&
+	test_when_finished  "git checkout master" &&
+	git checkout -b other_branch &&
+	test-dump-untracked-cache >../after &&
+	test_cmp ../before ../after &&
+	test_commit test &&
+	test-dump-untracked-cache >../before &&
+	git checkout master &&
+	test-dump-untracked-cache >../after &&
+	test_cmp ../before ../after
+'
+
+
+test_expect_success 'untracked cache survives a commit' '
+	test-dump-untracked-cache >../before &&
+	git add done/two &&
+	git commit -m commit &&
+	test-dump-untracked-cache >../after &&
+	test_cmp ../before ../after
+'
+
 test_done
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 8d99eb3..e974b5b 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -186,6 +186,10 @@ test_commit () {
 		test_tick
 	fi &&
 	git commit $signoff -m "$1" &&
+	if [ "$(git config core.bare)" = false ]
+	then
+	    git update-index --force-untracked-cache
+	fi
 	git tag "${4:-$1}"
 }
 
diff --git a/unpack-trees.c b/unpack-trees.c
index 9f55cc2..fc90eb3 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1215,6 +1215,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 						  WRITE_TREE_SILENT |
 						  WRITE_TREE_REPAIR);
 		}
+		move_index_extensions(&o->result, o->dst_index);
 		discard_index(o->dst_index);
 		*o->dst_index = o->result;
 	} else {
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 13/17] index-helper: kill mode
  2016-03-19  1:04 [PATCH v2 00/15] index-helper, watchman David Turner
                   ` (11 preceding siblings ...)
  2016-03-19  1:04 ` [PATCH v2 12/17] unpack-trees: preserve index extensions David Turner
@ 2016-03-19  1:04 ` David Turner
  2016-03-19  1:04 ` [PATCH v2 14/17] index-helper: don't run if already running David Turner
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: David Turner @ 2016-03-19  1:04 UTC (permalink / raw)
  To: git, pclouds, Johannes Schindelin; +Cc: David Turner

Add a new command (and command-line arg) to allow index-helpers to
exit cleanly.

This is mainly useful for tests.

Signed-off-by: David Turner <dturner@twopensource.com>
---
 index-helper.c          | 39 +++++++++++++++++++++++++++++++++++++--
 t/t7900-index-helper.sh |  9 +++++++++
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/index-helper.c b/index-helper.c
index 445a0ac..ffa3c2a 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -286,6 +286,8 @@ static void loop(int fd, int idle_in_seconds)
 						 * alive, nothing to do.
 						 */
 						;
+				} else if (!strcmp(command.buf, "die")) {
+					break;
 				} else {
 					warning("BUG: Bogus command %s", command.buf);
 				}
@@ -338,10 +340,35 @@ static const char * const usage_text[] = {
 	NULL
 };
 
+static void request_kill(const char *pipe_path)
+{
+	int fd;
+
+	fd = open(pipe_path, O_WRONLY | O_NONBLOCK);
+	if (fd < 0) {
+		warning("No existing pipe; can't send kill message to old process");
+		goto done;
+	}
+
+	write_in_full(fd, "die", 4);
+	close(fd);
+
+done:
+	/*
+	 * The child will try to do this anyway, but we want to be
+	 * ready to launch a new daemon immediately after this command
+	 * returns.
+	 */
+
+	unlink(pipe_path);
+	return;
+}
+
+
 int main(int argc, char **argv)
 {
 	const char *prefix;
-	int idle_in_seconds = 600, detach = 0;
+	int idle_in_seconds = 600, detach = 0, kill = 0;
 	int fd;
 	struct strbuf pipe_path = STRBUF_INIT;
 	struct option options[] = {
@@ -350,6 +377,7 @@ int main(int argc, char **argv)
 		OPT_BOOL(0, "strict", &to_verify,
 			 "verify shared memory after creating"),
 		OPT_BOOL(0, "detach", &detach, "detach the process"),
+		OPT_BOOL(0, "kill", &kill, "request that existing index helper processes exit"),
 		OPT_END()
 	};
 
@@ -364,10 +392,17 @@ int main(int argc, char **argv)
 			  options, usage_text, 0))
 		die(_("too many arguments"));
 
+	strbuf_git_path(&pipe_path, "index-helper.pipe");
+	if (kill) {
+		if (detach)
+			die(_("--kill doesn't want any other options"));
+		request_kill(pipe_path.buf);
+		return 0;
+	}
+
 	atexit(cleanup);
 	sigchain_push_common(cleanup_on_signal);
 
-	strbuf_git_path(&pipe_path, "index-helper.pipe");
 	fd = setup_pipe(pipe_path.buf);
 	if (fd < 0)
 		die_errno("Could not set up index-helper.pipe");
diff --git a/t/t7900-index-helper.sh b/t/t7900-index-helper.sh
index 7126037..14b5a6c 100755
--- a/t/t7900-index-helper.sh
+++ b/t/t7900-index-helper.sh
@@ -15,4 +15,13 @@ test_expect_success 'index-helper smoke test' '
 	test_path_is_missing .git/index-helper.pipe
 '
 
+test_expect_success 'index-helper creates usable pipe file and can be killed' '
+	test_when_finished "git index-helper --kill" &&
+	test_path_is_missing .git/index-helper.pipe &&
+	git index-helper --detach &&
+	test -p .git/index-helper.pipe &&
+	git index-helper --kill &&
+	test_path_is_missing .git/index-helper.pipe
+'
+
 test_done
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 14/17] index-helper: don't run if already running
  2016-03-19  1:04 [PATCH v2 00/15] index-helper, watchman David Turner
                   ` (12 preceding siblings ...)
  2016-03-19  1:04 ` [PATCH v2 13/17] index-helper: kill mode David Turner
@ 2016-03-19  1:04 ` David Turner
  2016-03-19  1:04 ` [PATCH v2 15/17] index-helper: autorun mode David Turner
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: David Turner @ 2016-03-19  1:04 UTC (permalink / raw)
  To: git, pclouds, Johannes Schindelin; +Cc: David Turner

Signed-off-by: David Turner <dturner@twopensource.com>
---
 index-helper.c          | 7 +++++++
 t/t7900-index-helper.sh | 9 +++++++++
 2 files changed, 16 insertions(+)

diff --git a/index-helper.c b/index-helper.c
index ffa3c2a..7362abb 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -400,6 +400,13 @@ int main(int argc, char **argv)
 		return 0;
 	}
 
+	/* check that no other copy is running */
+	fd = open(pipe_path.buf, O_RDONLY | O_NONBLOCK);
+	if (fd > 0)
+		die(_("Already running"));
+	if (errno != ENXIO && errno != ENOENT)
+		die_errno(_("Unexpected error checking pipe"));
+
 	atexit(cleanup);
 	sigchain_push_common(cleanup_on_signal);
 
diff --git a/t/t7900-index-helper.sh b/t/t7900-index-helper.sh
index 14b5a6c..ce0cc27 100755
--- a/t/t7900-index-helper.sh
+++ b/t/t7900-index-helper.sh
@@ -24,4 +24,13 @@ test_expect_success 'index-helper creates usable pipe file and can be killed' '
 	test_path_is_missing .git/index-helper.pipe
 '
 
+test_expect_success 'index-helper will not start if already running' '
+	test_when_finished "git index-helper --kill" &&
+	git index-helper --detach &&
+	test -p .git/index-helper.pipe &&
+	test_must_fail git index-helper 2>err &&
+	test -p .git/index-helper.pipe &&
+	grep "Already running" err
+'
+
 test_done
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 15/17] index-helper: autorun mode
  2016-03-19  1:04 [PATCH v2 00/15] index-helper, watchman David Turner
                   ` (13 preceding siblings ...)
  2016-03-19  1:04 ` [PATCH v2 14/17] index-helper: don't run if already running David Turner
@ 2016-03-19  1:04 ` David Turner
  2016-03-19  1:04 ` [PATCH v2 16/17] index-helper: optionally automatically run David Turner
  2016-03-19  1:04 ` [PATCH v2 17/17] read-cache: config for waiting for index-helper David Turner
  16 siblings, 0 replies; 25+ messages in thread
From: David Turner @ 2016-03-19  1:04 UTC (permalink / raw)
  To: git, pclouds, Johannes Schindelin; +Cc: David Turner

Soon, we'll want to automatically start index-helper, so we need
a mode that silently exists if it can't start up (either because
it's not in a git dir, or because another one is already running).

Signed-off-by: David Turner <dturner@twopensource.com>
---
 index-helper.c          | 29 +++++++++++++++++++++++------
 t/t7900-index-helper.sh |  8 ++++++++
 2 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/index-helper.c b/index-helper.c
index 7362abb..ab96ded 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -368,8 +368,9 @@ done:
 int main(int argc, char **argv)
 {
 	const char *prefix;
-	int idle_in_seconds = 600, detach = 0, kill = 0;
+	int idle_in_seconds = 600, detach = 0, kill = 0, autorun = 0;
 	int fd;
+	int nongit;
 	struct strbuf pipe_path = STRBUF_INIT;
 	struct option options[] = {
 		OPT_INTEGER(0, "exit-after", &idle_in_seconds,
@@ -378,6 +379,7 @@ int main(int argc, char **argv)
 			 "verify shared memory after creating"),
 		OPT_BOOL(0, "detach", &detach, "detach the process"),
 		OPT_BOOL(0, "kill", &kill, "request that existing index helper processes exit"),
+		OPT_BOOL(0, "autorun", &autorun, "this is an automatic run of git index-helper, so certain errors can be solved by silently exiting"),
 		OPT_END()
 	};
 
@@ -387,7 +389,14 @@ int main(int argc, char **argv)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(usage_text, options);
 
-	prefix = setup_git_directory();
+	prefix = setup_git_directory_gently(&nongit);
+	if (nongit) {
+		if (autorun)
+			exit(0);
+		else
+			die("Not a git repository");
+	}
+
 	if (parse_options(argc, (const char **)argv, prefix,
 			  options, usage_text, 0))
 		die(_("too many arguments"));
@@ -402,10 +411,18 @@ int main(int argc, char **argv)
 
 	/* check that no other copy is running */
 	fd = open(pipe_path.buf, O_RDONLY | O_NONBLOCK);
-	if (fd > 0)
-		die(_("Already running"));
-	if (errno != ENXIO && errno != ENOENT)
-		die_errno(_("Unexpected error checking pipe"));
+	if (fd > 0) {
+		if (autorun)
+			return 0;
+		else
+			die(_("Already running"));
+	}
+	if (errno != ENXIO && errno != ENOENT) {
+		if (autorun)
+			return 0;
+		else
+			die_errno(_("Unexpected error checking pipe"));
+	}
 
 	atexit(cleanup);
 	sigchain_push_common(cleanup_on_signal);
diff --git a/t/t7900-index-helper.sh b/t/t7900-index-helper.sh
index ce0cc27..6020fe4 100755
--- a/t/t7900-index-helper.sh
+++ b/t/t7900-index-helper.sh
@@ -33,4 +33,12 @@ test_expect_success 'index-helper will not start if already running' '
 	grep "Already running" err
 '
 
+test_expect_success 'index-helper is quiet with --autorun' '
+	test_when_finished "git index-helper --kill" &&
+	git index-helper --kill &&
+	git index-helper --detach &&
+	test -p .git/index-helper.pipe &&
+	git index-helper --autorun
+'
+
 test_done
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 16/17] index-helper: optionally automatically run
  2016-03-19  1:04 [PATCH v2 00/15] index-helper, watchman David Turner
                   ` (14 preceding siblings ...)
  2016-03-19  1:04 ` [PATCH v2 15/17] index-helper: autorun mode David Turner
@ 2016-03-19  1:04 ` David Turner
  2016-03-19  1:04 ` [PATCH v2 17/17] read-cache: config for waiting for index-helper David Turner
  16 siblings, 0 replies; 25+ messages in thread
From: David Turner @ 2016-03-19  1:04 UTC (permalink / raw)
  To: git, pclouds, Johannes Schindelin; +Cc: David Turner

Introduce a new config option, indexhelper.autorun, to automatically
run git index-helper before starting up a builtin git command.  This
enables users to keep index-helper running without manual
intervention.

Signed-off-by: David Turner <dturner@twopensource.com>
---
 git.c                   | 35 ++++++++++++++++++++++++++++++++++-
 t/t7900-index-helper.sh | 16 ++++++++++++++++
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/git.c b/git.c
index 6cc0c07..7d27782 100644
--- a/git.c
+++ b/git.c
@@ -521,6 +521,37 @@ static void strip_extension(const char **argv)
 #define strip_extension(cmd)
 #endif
 
+static int want_auto_index_helper(void)
+{
+	int want = -1;
+	const char *value = NULL;
+	const char *key = "indexhelper.autorun";
+
+	if (git_config_key_is_valid(key) &&
+	    !git_config_get_value(key, &value)) {
+		int b = git_config_maybe_bool(key, value);
+		want = b >= 0 ? b : 0;
+	}
+	return want;
+}
+
+static void maybe_run_index_helper(struct cmd_struct *cmd)
+{
+	const char *argv[] = {"git-index-helper", "--detach", "--autorun", NULL};
+
+	if (!(cmd->option & NEED_WORK_TREE))
+		return;
+
+	if (want_auto_index_helper() <= 0)
+		return;
+
+	trace_argv_printf(argv, "trace: auto index-helper:");
+
+	if (run_command_v_opt(argv,
+			      RUN_SILENT_EXEC_FAILURE | RUN_CLEAN_ON_EXIT))
+		warning(_("You specified indexhelper.autorun, but running git-index-helper failed."));
+}
+
 static void handle_builtin(int argc, const char **argv)
 {
 	const char *cmd;
@@ -536,8 +567,10 @@ static void handle_builtin(int argc, const char **argv)
 	}
 
 	builtin = get_builtin(cmd);
-	if (builtin)
+	if (builtin) {
+		maybe_run_index_helper(builtin);
 		exit(run_builtin(builtin, argc, argv));
+	}
 }
 
 static void execv_dashed_external(const char **argv)
diff --git a/t/t7900-index-helper.sh b/t/t7900-index-helper.sh
index 6020fe4..7fe66b7 100755
--- a/t/t7900-index-helper.sh
+++ b/t/t7900-index-helper.sh
@@ -41,4 +41,20 @@ test_expect_success 'index-helper is quiet with --autorun' '
 	git index-helper --autorun
 '
 
+test_expect_success 'index-helper autorun works' '
+	rm -f .git/index-helper.pipe &&
+	git status &&
+	test_path_is_missing .git/index-helper.pipe &&
+	test_config indexhelper.autorun true &&
+	git status &&
+	test -p .git/index-helper.pipe &&
+	git status 2>err &&
+	test -p .git/index-helper.pipe &&
+	! grep -q . err &&
+	git index-helper --kill &&
+	test_config indexhelper.autorun false &&
+	git status &&
+	test_path_is_missing .git/index-helper.pipe
+'
+
 test_done
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 17/17] read-cache: config for waiting for index-helper
  2016-03-19  1:04 [PATCH v2 00/15] index-helper, watchman David Turner
                   ` (15 preceding siblings ...)
  2016-03-19  1:04 ` [PATCH v2 16/17] index-helper: optionally automatically run David Turner
@ 2016-03-19  1:04 ` David Turner
  16 siblings, 0 replies; 25+ messages in thread
From: David Turner @ 2016-03-19  1:04 UTC (permalink / raw)
  To: git, pclouds, Johannes Schindelin; +Cc: David Turner

When we poke index-helper, and index-helper is using watchman, we want
to wait for a response (showing that the watchman extension shm has
been prepared).  If it's not using watchman, we don't.

So add a new config, core.waitforindexhelper, with sensible defaults, to
configure this behavior.

Signed-off-by: David Turner <dturner@twopensource.com>
---
 cache.h       | 1 +
 config.c      | 5 +++++
 environment.c | 5 +++++
 read-cache.c  | 5 ++++-
 4 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index 4ae7dd0..f8b8dbf 100644
--- a/cache.h
+++ b/cache.h
@@ -692,6 +692,7 @@ extern char *git_replace_ref_base;
 extern int fsync_object_files;
 extern int core_preload_index;
 extern int core_watchman_sync_timeout;
+extern int wait_for_index_helper;
 extern int core_apply_sparse_checkout;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/config.c b/config.c
index e6dc141..5f1b8bd 100644
--- a/config.c
+++ b/config.c
@@ -887,6 +887,11 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.waitforindexhelper")) {
+		wait_for_index_helper = git_config_bool(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.createobject")) {
 		if (!strcmp(value, "rename"))
 			object_creation_mode = OBJECT_CREATION_USES_RENAMES;
diff --git a/environment.c b/environment.c
index 35e03c7..c7fb0a9 100644
--- a/environment.c
+++ b/environment.c
@@ -95,6 +95,11 @@ int core_preload_index = 1;
 int ignore_untracked_cache_config;
 
 int core_watchman_sync_timeout = 300;
+#ifdef USE_WATCHMAN
+int wait_for_index_helper = 1;
+#else
+int wait_for_index_helper = 0;
+#endif
 
 
 /* This is set by setup_git_dir_gently() and/or git_default_config() */
diff --git a/read-cache.c b/read-cache.c
index c141fec..7fd9b2c 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1821,7 +1821,10 @@ static int poke_daemon(struct index_state *istate,
 	if (refresh_cache) {
 		ret = write_in_full(fd, "refresh", 8) == 8;
 	} else {
-		ret = poke_and_wait_for_reply(fd);
+		if (wait_for_index_helper)
+			ret = poke_and_wait_for_reply(fd);
+		else
+			ret = write_in_full(fd, "poke", 5) == 5;
 	}
 
 	close(fd);
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 09/17] Add watchman support to reduce index refresh cost
  2016-03-19  1:04 ` [PATCH v2 09/17] Add watchman support to reduce index refresh cost David Turner
@ 2016-03-24 13:47   ` Jeff Hostetler
  2016-03-24 17:58     ` David Turner
  0 siblings, 1 reply; 25+ messages in thread
From: Jeff Hostetler @ 2016-03-24 13:47 UTC (permalink / raw)
  To: David Turner, git, pclouds, Johannes Schindelin


I'm seeing wm->name have value ".git" rather than ".git/" on Linux.


On 03/18/2016 09:04 PM, David Turner wrote:
> +		if (!strncmp(wm->name, ".git/", 5) ||
> +		    strstr(wm->name, "/.git/"))
> +			continue;


thanks,
Jeff

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 09/17] Add watchman support to reduce index refresh cost
  2016-03-24 13:47   ` Jeff Hostetler
@ 2016-03-24 17:58     ` David Turner
  0 siblings, 0 replies; 25+ messages in thread
From: David Turner @ 2016-03-24 17:58 UTC (permalink / raw)
  To: Jeff Hostetler, git, pclouds, Johannes Schindelin

On Thu, 2016-03-24 at 09:47 -0400, Jeff Hostetler wrote:
> I'm seeing wm->name have value ".git" rather than ".git/" on Linux.
> 
> 
> On 03/18/2016 09:04 PM, David Turner wrote:
> > +		if (!strncmp(wm->name, ".git/", 5) ||
> > +		    strstr(wm->name, "/.git/"))
> > +			continue;
> 

Thanks.  I don't think I considered the case of .git itself being
modified, although clearly files are occasionally created/deleted in
there.  I'm going to fix this by ignoring all directories, since we'll
capture untracked-cache changes via files in those dirs.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 03/17] index-helper: new daemon for caching index and related stuff
  2016-03-19  1:04 ` [PATCH v2 03/17] index-helper: new daemon for caching index and related stuff David Turner
@ 2016-03-29  2:31   ` Duy Nguyen
  2016-03-29 21:51     ` David Turner
  0 siblings, 1 reply; 25+ messages in thread
From: Duy Nguyen @ 2016-03-29  2:31 UTC (permalink / raw)
  To: David Turner; +Cc: Git Mailing List, Johannes Schindelin

On Sat, Mar 19, 2016 at 8:04 AM, David Turner <dturner@twopensource.com> wrote:
> From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>
> Instead of reading the index from disk and worrying about disk
> corruption, the index is cached in memory (memory bit-flips happen
> too, but hopefully less often). The result is faster read. Read time
> is reduced by 70%.
>
> The biggest gain is not having to verify the trailing SHA-1, which
> takes lots of time especially on large index files. But this also
> opens doors for further optimiztions:
>
>  - we could create an in-memory format that's essentially the memory
>    dump of the index to eliminate most of parsing/allocation
>    overhead. The mmap'd memory can be used straight away. Experiment
>    [1] shows we could reduce read time by 88%.

This reference [1] is missing (even in my old version). I believe it's
http://thread.gmane.org/gmane.comp.version-control.git/247268/focus=248771,
comparing 256.442ms in that mail with v4 number, 2245.113ms in 0/8
mail from the same thread.

> Git can poke the daemon via named pipes to tell it to refresh the
> index cache, or to keep it alive some more minutes. It can't give any
> real index data directly to the daemon. Real data goes to disk first,
> then the daemon reads and verifies it from there. Poking only happens
> for $GIT_DIR/index, not temporary index files.

I think we should go with unix socket on *nix platform instead of
named pipe. UNIX named pipe only allows one communication channel at a
time. Windows named pipe is different and allows multiple clients,
which is the same as unix socket.


> $GIT_DIR/index-helper.pipe is the named pipe for daemon process. The
> daemon reads from the pipe and executes commands.  Commands that need
> replies from the daemon will have to open their own pipe, since a
> named pipe should only have one reader.  Unix domain sockets don't
> have this problem, but are less portable.

Hm..NO_UNIX_SOCKETS is only set for Windows in config.mak.uname and
Windows will need to be specially tailored anyway, I think unix socket
would be more elegant.

> +static void share_index(struct index_state *istate, struct shm *is)
> +{
> +       void *new_mmap;
> +       if (istate->mmap_size <= 20 ||
> +           hashcmp(istate->sha1,
> +                   (unsigned char *)istate->mmap + istate->mmap_size - 20) ||
> +           !hashcmp(istate->sha1, is->sha1) ||
> +           git_shm_map(O_CREAT | O_EXCL | O_RDWR, 0700, istate->mmap_size,
> +                       &new_mmap, PROT_READ | PROT_WRITE, MAP_SHARED,
> +                       "git-index-%s", sha1_to_hex(istate->sha1)) < 0)
> +               return;
> +
> +       release_index_shm(is);
> +       is->size = istate->mmap_size;
> +       is->shm = new_mmap;
> +       hashcpy(is->sha1, istate->sha1);
> +       memcpy(new_mmap, istate->mmap, istate->mmap_size - 20);
> +
> +       /*
> +        * The trailing hash must be written last after everything is
> +        * written. It's the indication that the shared memory is now
> +        * ready.
> +        */
> +       hashcpy((unsigned char *)new_mmap + istate->mmap_size - 20, is->sha1);

You commented here [1] a long time ago about memory barrier. I'm not
entirely sure if compilers dare to reorder function calls, but when
hashcpy is inlined and memcpy is builtin, I suppose that's possible...

[1] http://article.gmane.org/gmane.comp.version-control.git/280729

> +}
-- 
Duy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 06/17] index-helper: add --detach
  2016-03-19  1:04 ` [PATCH v2 06/17] index-helper: add --detach David Turner
@ 2016-03-29  2:35   ` Duy Nguyen
  0 siblings, 0 replies; 25+ messages in thread
From: Duy Nguyen @ 2016-03-29  2:35 UTC (permalink / raw)
  To: David Turner; +Cc: Git Mailing List, Johannes Schindelin

On Sat, Mar 19, 2016 at 8:04 AM, David Turner <dturner@twopensource.com> wrote:
> @@ -237,6 +239,7 @@ int main(int argc, char **argv)
>                             N_("exit if not used after some seconds")),
>                 OPT_BOOL(0, "strict", &to_verify,
>                          "verify shared memory after creating"),

N_() here (my bad)

> +               OPT_BOOL(0, "detach", &detach, "detach the process"),

ditto

>                 OPT_END()
>         };
>
> @@ -258,6 +261,10 @@ int main(int argc, char **argv)
>         fd = setup_pipe(pipe_path.buf);
>         if (fd < 0)
>                 die_errno("Could not set up index-helper.pipe");

_() here

> +
> +       if (detach && daemonize(&daemonized))
> +               die_errno("unable to detach");

and here

> +
>         loop(fd, idle_in_seconds);
>         return 0;
>  }
> --
> 2.4.2.767.g62658d5-twtrsrc
>



-- 
Duy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 08/17] read-cache: invalidate untracked cache data when reading WAMA
  2016-03-19  1:04 ` [PATCH v2 08/17] read-cache: invalidate untracked cache data when reading WAMA David Turner
@ 2016-03-29  2:50   ` Duy Nguyen
  2016-03-29 21:22     ` David Turner
  0 siblings, 1 reply; 25+ messages in thread
From: Duy Nguyen @ 2016-03-29  2:50 UTC (permalink / raw)
  To: David Turner; +Cc: Git Mailing List, Johannes Schindelin

On Sat, Mar 19, 2016 at 8:04 AM, David Turner <dturner@twopensource.com> wrote:
> @@ -1407,10 +1472,24 @@ static int read_watchman_ext(struct index_state *istate, const void *data,
>         ewah_each_bit(bitmap, mark_no_watchman, istate);
>         ewah_free(bitmap);
>
> -       /*
> -        * TODO: update the untracked cache from the untracked data in this
> -        * extension.
> -        */
> +       if (istate->untracked && istate->untracked->root) {
> +               int i;
> +               const char *untracked;
> +
> +               untracked = data + len + 8 + bitmap_size;
> +               for (i = 0; i < untracked_nr; ++i) {
> +                       int len = strlen(untracked);
> +                       string_list_append(&istate->untracked->invalid_untracked,
> +                                          untracked);
> +                       untracked += len + 1;
> +               }
> +
> +               for_each_string_list(&istate->untracked->invalid_untracked,
> +                        mark_untracked_invalid, istate->untracked);

I think it's a bit early to invalidate untracked cache here. We can do
that in refresh_by_watchman() in 10/17, where ce_mark_uptodate() to
prevent lstat() is also done.

> +
> +               if (untracked_nr)
> +                       istate->cache_changed |= WATCHMAN_CHANGED;
> +       }
>         return 0;
>  }
-- 
Duy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 08/17] read-cache: invalidate untracked cache data when reading WAMA
  2016-03-29  2:50   ` Duy Nguyen
@ 2016-03-29 21:22     ` David Turner
  0 siblings, 0 replies; 25+ messages in thread
From: David Turner @ 2016-03-29 21:22 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Johannes Schindelin

On Tue, 2016-03-29 at 09:50 +0700, Duy Nguyen wrote:
> On Sat, Mar 19, 2016 at 8:04 AM, David Turner <
> dturner@twopensource.com> wrote:
> > @@ -1407,10 +1472,24 @@ static int read_watchman_ext(struct
> > index_state *istate, const void *data,
> >         ewah_each_bit(bitmap, mark_no_watchman, istate);
> >         ewah_free(bitmap);
> > 
> > -       /*
> > -        * TODO: update the untracked cache from the untracked data
> > in this
> > -        * extension.
> > -        */
> > +       if (istate->untracked && istate->untracked->root) {
> > +               int i;
> > +               const char *untracked;
> > +
> > +               untracked = data + len + 8 + bitmap_size;
> > +               for (i = 0; i < untracked_nr; ++i) {
> > +                       int len = strlen(untracked);
> > +                       string_list_append(&istate->untracked
> > ->invalid_untracked,
> > +                                          untracked);
> > +                       untracked += len + 1;
> > +               }
> > +
> > +               for_each_string_list(&istate->untracked
> > ->invalid_untracked,
> > +                        mark_untracked_invalid, istate
> > ->untracked);
> 
> I think it's a bit early to invalidate untracked cache here. We can
> do
> that in refresh_by_watchman() in 10/17, where ce_mark_uptodate() to
> prevent lstat() is also done.

Will move/squash

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 03/17] index-helper: new daemon for caching index and related stuff
  2016-03-29  2:31   ` Duy Nguyen
@ 2016-03-29 21:51     ` David Turner
  0 siblings, 0 replies; 25+ messages in thread
From: David Turner @ 2016-03-29 21:51 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List, Johannes Schindelin

On Tue, 2016-03-29 at 09:31 +0700, Duy Nguyen wrote:
> On Sat, Mar 19, 2016 at 8:04 AM, David Turner <
> dturner@twopensource.com> wrote:
> > From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> > 
> > Instead of reading the index from disk and worrying about disk
> > corruption, the index is cached in memory (memory bit-flips happen
> > too, but hopefully less often). The result is faster read. Read
> > time
> > is reduced by 70%.
> > 
> > The biggest gain is not having to verify the trailing SHA-1, which
> > takes lots of time especially on large index files. But this also
> > opens doors for further optimiztions:
> > 
> >  - we could create an in-memory format that's essentially the
> > memory
> >    dump of the index to eliminate most of parsing/allocation
> >    overhead. The mmap'd memory can be used straight away.
> > Experiment
> >    [1] shows we could reduce read time by 88%.
> 
> This reference [1] is missing (even in my old version). I believe
> it's
> http://thread.gmane.org/gmane.comp.version-control.git/247268/focus=2
> 48771,
> comparing 256.442ms in that mail with v4 number, 2245.113ms in 0/8
> mail from the same thread.
> 
> > Git can poke the daemon via named pipes to tell it to refresh the
> > index cache, or to keep it alive some more minutes. It can't give
> > any
> > real index data directly to the daemon. Real data goes to disk
> > first,
> > then the daemon reads and verifies it from there. Poking only
> > happens
> > for $GIT_DIR/index, not temporary index files.
> 
> I think we should go with unix socket on *nix platform instead of
> named pipe. UNIX named pipe only allows one communication channel at
> a
> time. Windows named pipe is different and allows multiple clients,
> which is the same as unix socket.
> 
> 
> > $GIT_DIR/index-helper.pipe is the named pipe for daemon process.
> > The
> > daemon reads from the pipe and executes commands.  Commands that
> > need
> > replies from the daemon will have to open their own pipe, since a
> > named pipe should only have one reader.  Unix domain sockets don't
> > have this problem, but are less portable.
> 
> Hm..NO_UNIX_SOCKETS is only set for Windows in config.mak.uname and
> Windows will need to be specially tailored anyway, I think unix
> socket
> would be more elegant.

One annoyance with unix sockets is that they must have short paths
(UNIX_PATH_MAX -- about a hundred characters).  This basically means
they should be in $TMPDIR.  I guess we could go back to pid files in
$GIT_DIR, and then have a socket named after the pid.  There's also
some security issues, but it actually looks like there's a simple
enough workaround for them.

I'll change this, but it might take a bit as I'm busy with other things
this week.

> > +static void share_index(struct index_state *istate, struct shm
> > *is)
> > +{
> > +       void *new_mmap;
> > +       if (istate->mmap_size <= 20 ||
> > +           hashcmp(istate->sha1,
> > +                   (unsigned char *)istate->mmap + istate
> > ->mmap_size - 20) ||
> > +           !hashcmp(istate->sha1, is->sha1) ||
> > +           git_shm_map(O_CREAT | O_EXCL | O_RDWR, 0700, istate
> > ->mmap_size,
> > +                       &new_mmap, PROT_READ | PROT_WRITE,
> > MAP_SHARED,
> > +                       "git-index-%s", sha1_to_hex(istate->sha1))
> > < 0)
> > +               return;
> > +
> > +       release_index_shm(is);
> > +       is->size = istate->mmap_size;
> > +       is->shm = new_mmap;
> > +       hashcpy(is->sha1, istate->sha1);
> > +       memcpy(new_mmap, istate->mmap, istate->mmap_size - 20);
> > +
> > +       /*
> > +        * The trailing hash must be written last after everything
> > is
> > +        * written. It's the indication that the shared memory is
> > now
> > +        * ready.
> > +        */
> > +       hashcpy((unsigned char *)new_mmap + istate->mmap_size - 20,
> > is->sha1);
> 
> You commented here [1] a long time ago about memory barrier. I'm not
> entirely sure if compilers dare to reorder function calls, but when
> hashcpy is inlined and memcpy is builtin, I suppose that's
> possible...
> 
> [1] http://article.gmane.org/gmane.comp.version-control.git/280729

Oh, right.  Will fix.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-03-29 21:52 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-19  1:04 [PATCH v2 00/15] index-helper, watchman David Turner
2016-03-19  1:04 ` [PATCH v2 01/17] read-cache.c: fix constness of verify_hdr() David Turner
2016-03-19  1:04 ` [PATCH v2 02/17] read-cache: allow to keep mmap'd memory after reading David Turner
2016-03-19  1:04 ` [PATCH v2 03/17] index-helper: new daemon for caching index and related stuff David Turner
2016-03-29  2:31   ` Duy Nguyen
2016-03-29 21:51     ` David Turner
2016-03-19  1:04 ` [PATCH v2 04/17] index-helper: add --strict David Turner
2016-03-19  1:04 ` [PATCH v2 05/17] daemonize(): set a flag before exiting the main process David Turner
2016-03-19  1:04 ` [PATCH v2 06/17] index-helper: add --detach David Turner
2016-03-29  2:35   ` Duy Nguyen
2016-03-19  1:04 ` [PATCH v2 07/17] read-cache: add watchman 'WAMA' extension David Turner
2016-03-19  1:04 ` [PATCH v2 08/17] read-cache: invalidate untracked cache data when reading WAMA David Turner
2016-03-29  2:50   ` Duy Nguyen
2016-03-29 21:22     ` David Turner
2016-03-19  1:04 ` [PATCH v2 09/17] Add watchman support to reduce index refresh cost David Turner
2016-03-24 13:47   ` Jeff Hostetler
2016-03-24 17:58     ` David Turner
2016-03-19  1:04 ` [PATCH v2 10/17] index-helper: use watchman to avoid refreshing index with lstat() David Turner
2016-03-19  1:04 ` [PATCH v2 11/17] update-index: enable/disable watchman support David Turner
2016-03-19  1:04 ` [PATCH v2 12/17] unpack-trees: preserve index extensions David Turner
2016-03-19  1:04 ` [PATCH v2 13/17] index-helper: kill mode David Turner
2016-03-19  1:04 ` [PATCH v2 14/17] index-helper: don't run if already running David Turner
2016-03-19  1:04 ` [PATCH v2 15/17] index-helper: autorun mode David Turner
2016-03-19  1:04 ` [PATCH v2 16/17] index-helper: optionally automatically run David Turner
2016-03-19  1:04 ` [PATCH v2 17/17] read-cache: config for waiting for index-helper David Turner

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).