git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v5 00/15] index-helper/watchman
@ 2016-04-19 23:27 David Turner
  2016-04-19 23:27 ` [PATCH v5 01/15] read-cache.c: fix constness of verify_hdr() David Turner
                   ` (14 more replies)
  0 siblings, 15 replies; 35+ messages in thread
From: David Turner @ 2016-04-19 23:27 UTC (permalink / raw)
  To: git, pclouds; +Cc: David Turner

This version includes the following changes since v4:
1. The last patch has been removed; it's pretty much always a good
idea to wait for the index-helper

2. Documentation for index-helper --kill and --autorun.  Documentation
for update-index --watchman.  Documentation for index-format for WAMA.
Documentation for index-helper suggesting running update-index --watchman.

3. index-helper autorun moved to read-cache so it's only run on relevant
commands.

4. Tests: Broken && chain fixed; removed a subshell

5. A couple of fd leaks fixed.

6. Cruft removed.

7. New perf numbers.

David Turner (5):
  unpack-trees: preserve index extensions
  index-helper: kill mode
  index-helper: don't run if already running
  index-helper: autorun mode
  index-helper: optionally automatically run

Nguyễn Thái Ngọc Duy (10):
  read-cache.c: fix constness of verify_hdr()
  read-cache: allow to keep mmap'd memory after reading
  index-helper: new daemon for caching index and related stuff
  index-helper: add --strict
  daemonize(): set a flag before exiting the main process
  index-helper: add --detach
  read-cache: add watchman 'WAMA' extension
  Add watchman support to reduce index refresh cost
  index-helper: use watchman to avoid refreshing index with lstat()
  update-index: enable/disable watchman support

 .gitignore                               |   1 +
 Documentation/config.txt                 |   4 +
 Documentation/git-index-helper.txt       |  75 +++++
 Documentation/git-update-index.txt       |   6 +
 Documentation/technical/index-format.txt |  22 ++
 Makefile                                 |  17 ++
 builtin/gc.c                             |   2 +-
 builtin/update-index.c                   |  11 +
 cache.h                                  |  16 +-
 config.c                                 |   5 +
 configure.ac                             |   8 +
 daemon.c                                 |   2 +-
 dir.c                                    |  23 +-
 dir.h                                    |   6 +
 environment.c                            |   3 +
 git-compat-util.h                        |   1 +
 index-helper.c                           | 454 ++++++++++++++++++++++++++++
 read-cache.c                             | 488 ++++++++++++++++++++++++++++++-
 setup.c                                  |   4 +-
 t/t7063-status-untracked-cache.sh        |  22 ++
 t/t7900-index-helper.sh                  |  68 +++++
 t/test-lib-functions.sh                  |   4 +
 unpack-trees.c                           |   1 +
 watchman-support.c                       | 135 +++++++++
 watchman-support.h                       |   7 +
 25 files changed, 1364 insertions(+), 21 deletions(-)
 create mode 100644 Documentation/git-index-helper.txt
 create mode 100644 index-helper.c
 create mode 100755 t/t7900-index-helper.sh
 create mode 100644 watchman-support.c
 create mode 100644 watchman-support.h

-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v5 01/15] read-cache.c: fix constness of verify_hdr()
  2016-04-19 23:27 [PATCH v5 00/15] index-helper/watchman David Turner
@ 2016-04-19 23:27 ` David Turner
  2016-04-19 23:27 ` [PATCH v5 02/15] read-cache: allow to keep mmap'd memory after reading David Turner
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-04-19 23:27 UTC (permalink / raw)
  To: git, pclouds; +Cc: David Turner

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
---
 read-cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index d9fb78b..16cc487 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1345,7 +1345,7 @@ struct ondisk_cache_entry_extended {
 			    ondisk_cache_entry_extended_size(ce_namelen(ce)) : \
 			    ondisk_cache_entry_size(ce_namelen(ce)))
 
-static int verify_hdr(struct cache_header *hdr, unsigned long size)
+static int verify_hdr(const struct cache_header *hdr, unsigned long size)
 {
 	git_SHA_CTX c;
 	unsigned char sha1[20];
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 02/15] read-cache: allow to keep mmap'd memory after reading
  2016-04-19 23:27 [PATCH v5 00/15] index-helper/watchman David Turner
  2016-04-19 23:27 ` [PATCH v5 01/15] read-cache.c: fix constness of verify_hdr() David Turner
@ 2016-04-19 23:27 ` David Turner
  2016-04-20  9:01   ` Johannes Schindelin
  2016-04-20  9:26   ` Duy Nguyen
  2016-04-19 23:27 ` [PATCH v5 03/15] index-helper: new daemon for caching index and related stuff David Turner
                   ` (12 subsequent siblings)
  14 siblings, 2 replies; 35+ messages in thread
From: David Turner @ 2016-04-19 23:27 UTC (permalink / raw)
  To: git, pclouds; +Cc: David Turner

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Later, we will introduce git index-helper to share this memory with
other git processes.

Since the memory will be shared, it will never be unmapped (although
the kernel may of course choose to page it out).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
---
 cache.h      |  3 +++
 read-cache.c | 13 ++++++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index b829410..4180e2b 100644
--- a/cache.h
+++ b/cache.h
@@ -333,11 +333,14 @@ struct index_state {
 	struct split_index *split_index;
 	struct cache_time timestamp;
 	unsigned name_hash_initialized : 1,
+		 keep_mmap : 1,
 		 initialized : 1;
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
 	unsigned char sha1[20];
 	struct untracked_cache *untracked;
+	void *mmap;
+	size_t mmap_size;
 };
 
 extern struct index_state the_index;
diff --git a/read-cache.c b/read-cache.c
index 16cc487..7e387e9 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1574,6 +1574,10 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	mmap = xmmap(NULL, mmap_size, PROT_READ, MAP_PRIVATE, fd, 0);
 	if (mmap == MAP_FAILED)
 		die_errno("unable to map index file");
+	if (istate->keep_mmap) {
+		istate->mmap = mmap;
+		istate->mmap_size = mmap_size;
+	}
 	close(fd);
 
 	hdr = mmap;
@@ -1626,10 +1630,12 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 		src_offset += 8;
 		src_offset += extsize;
 	}
-	munmap(mmap, mmap_size);
+	if (!istate->keep_mmap)
+		munmap(mmap, mmap_size);
 	return istate->cache_nr;
 
 unmap:
+	istate->mmap = NULL;
 	munmap(mmap, mmap_size);
 	die("index file corrupt");
 }
@@ -1655,6 +1661,7 @@ int read_index_from(struct index_state *istate, const char *path)
 		discard_index(split_index->base);
 	else
 		split_index->base = xcalloc(1, sizeof(*split_index->base));
+	split_index->base->keep_mmap = istate->keep_mmap;
 	ret = do_read_index(split_index->base,
 			    git_path("sharedindex.%s",
 				     sha1_to_hex(split_index->base_sha1)), 1);
@@ -1698,6 +1705,10 @@ int discard_index(struct index_state *istate)
 	free(istate->cache);
 	istate->cache = NULL;
 	istate->cache_alloc = 0;
+	if (istate->keep_mmap && istate->mmap) {
+		munmap(istate->mmap, istate->mmap_size);
+		istate->mmap = NULL;
+	}
 	discard_split_index(istate);
 	free_untracked_cache(istate->untracked);
 	istate->untracked = NULL;
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 03/15] index-helper: new daemon for caching index and related stuff
  2016-04-19 23:27 [PATCH v5 00/15] index-helper/watchman David Turner
  2016-04-19 23:27 ` [PATCH v5 01/15] read-cache.c: fix constness of verify_hdr() David Turner
  2016-04-19 23:27 ` [PATCH v5 02/15] read-cache: allow to keep mmap'd memory after reading David Turner
@ 2016-04-19 23:27 ` David Turner
  2016-04-20 12:17   ` Johannes Schindelin
  2016-04-19 23:27 ` [PATCH v5 04/15] index-helper: add --strict David Turner
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 35+ messages in thread
From: David Turner @ 2016-04-19 23:27 UTC (permalink / raw)
  To: git, pclouds; +Cc: David Turner, Ramsay Jones

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Instead of reading the index from disk and worrying about disk
corruption, the index is cached in memory (memory bit-flips happen
too, but hopefully less often). The result is faster read. Read time
is reduced by 70%.

The biggest gain is not having to verify the trailing SHA-1, which
takes lots of time especially on large index files. But this also
opens doors for further optimiztions:

 - we could create an in-memory format that's essentially the memory
   dump of the index to eliminate most of parsing/allocation
   overhead. The mmap'd memory can be used straight away. Experiment
   [1] shows we could reduce read time by 88%.

 - we could cache non-index info such as name hash

Shared memory is done by storing files in a per-repository temporary
directory.  This is more portable than shm (which requires
posix-realtime and has various quirks on OS X).  It might even work on
Windows, although this has not been tested. The shared memory file's
name folows the template "shm-<object>-<SHA1>" where <SHA1> is the
trailing SHA-1 of the index file. <object> is "index" for cached index
files (and might later be "name-hash" for name-hash cache). If such
shared memory exists, it contains the same index content as on
disk. The content is already validated by the daemon and git won't
validate it again (except comparing the trailing SHA-1s).

We keep this daemon's logic as thin as possible. The "brain" stays in
git. So the daemon can read and validate stuff, but that's all it's
allowed to do. It does not add/create new information. It doesn't even
accept direct updates from git.

Git can poke the daemon via unix domain sockets to tell it to refresh
the index cache, or to keep it alive some more minutes. It can't give
any real index data directly to the daemon. Real data goes to disk
first, then the daemon reads and verifies it from there. Poking only
happens for $GIT_DIR/index, not temporary index files.

$GIT_DIR/index-helper.sock is the socket for the daemon process. The
daemon reads from the socket and executes commands.

Named pipes were considered for portability reasons, but then commands
that need replies from the daemon would have open their own pipes,
since a named pipe should only have one reader.  Unix domain sockets
don't have this problem.

On webkit.git with index format v2, duplicating 8 times to 1.5m
entries and 236MB in size:

(vanilla)      0.50 s: read_index_from .git/index
(index-helper) 0.18 s: read_index_from .git/index

Interestingly with index v4, we get less out of index-helper. It makes
sense as v4 requires more processing after loading the index:

(vanilla)      0.37 s: read_index_from .git/index
(index-helper) 0.22 s: read_index_from .git/index

[1] http://thread.gmane.org/gmane.comp.version-control.git/247268/focus=248771

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
---
 .gitignore                         |   1 +
 Documentation/git-index-helper.txt |  47 +++++++
 Makefile                           |   5 +
 cache.h                            |   2 +
 git-compat-util.h                  |   1 +
 index-helper.c                     | 276 +++++++++++++++++++++++++++++++++++++
 read-cache.c                       | 120 ++++++++++++++--
 t/t7900-index-helper.sh            |  23 ++++
 8 files changed, 466 insertions(+), 9 deletions(-)
 create mode 100644 Documentation/git-index-helper.txt
 create mode 100644 index-helper.c
 create mode 100755 t/t7900-index-helper.sh

diff --git a/.gitignore b/.gitignore
index 5087ce1..b92f122 100644
--- a/.gitignore
+++ b/.gitignore
@@ -71,6 +71,7 @@
 /git-http-fetch
 /git-http-push
 /git-imap-send
+/git-index-helper
 /git-index-pack
 /git-init
 /git-init-db
diff --git a/Documentation/git-index-helper.txt b/Documentation/git-index-helper.txt
new file mode 100644
index 0000000..77687c0
--- /dev/null
+++ b/Documentation/git-index-helper.txt
@@ -0,0 +1,47 @@
+git-index-helper(1)
+===================
+
+NAME
+----
+git-index-helper - A simple cache daemon for speeding up index file access
+
+SYNOPSIS
+--------
+[verse]
+'git index-helper' [options]
+
+DESCRIPTION
+-----------
+Keep the index file in memory for faster access. This daemon is per
+repository.
+
+OPTIONS
+-------
+
+--exit-after=<n>::
+	Exit if the cached index is not accessed for `<n>`
+	seconds. Specify 0 to wait forever. Default is 600.
+
+NOTES
+-----
+
+$GIT_DIR/index-helper.sock a Unix domain socket that the daemon reads
+commands from.  The directory will also contain files named
+"shm-index-<SHA1>".  These are used as backing stores for shared
+memory.  Normally the daemon will clean up these files when it exits
+or when they are no longer relevant.  But if it crashes, some objects
+could remain there and they can be safely deleted with "rm"
+command. The following commands are used to control the daemon:
+
+"refresh"::
+	Reread the index.
+
+"poke":
+	Let the daemon know the index is to be read. It keeps the
+	daemon alive longer, unless `--exit-after=0` is used.
+
+All commands and replies are terminated by a 0 byte.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index 2742a69..c8be0e7 100644
--- a/Makefile
+++ b/Makefile
@@ -1433,6 +1433,10 @@ ifdef HAVE_DEV_TTY
 	BASIC_CFLAGS += -DHAVE_DEV_TTY
 endif
 
+ifndef NO_MMAP
+	PROGRAM_OBJS += index-helper.o
+endif
+
 ifdef DIR_HAS_BSD_GROUP_SEMANTICS
 	COMPAT_CFLAGS += -DDIR_HAS_BSD_GROUP_SEMANTICS
 endif
@@ -2159,6 +2163,7 @@ GIT-BUILD-OPTIONS: FORCE
 	@echo NO_PERL=\''$(subst ','\'',$(subst ','\'',$(NO_PERL)))'\' >>$@+
 	@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
 	@echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+
+	@echo NO_MMAP=\''$(subst ','\'',$(subst ','\'',$(NO_MMAP)))'\' >>$@+
 ifdef TEST_OUTPUT_DIRECTORY
 	@echo TEST_OUTPUT_DIRECTORY=\''$(subst ','\'',$(subst ','\'',$(TEST_OUTPUT_DIRECTORY)))'\' >>$@+
 endif
diff --git a/cache.h b/cache.h
index 4180e2b..43fb314 100644
--- a/cache.h
+++ b/cache.h
@@ -334,6 +334,8 @@ struct index_state {
 	struct cache_time timestamp;
 	unsigned name_hash_initialized : 1,
 		 keep_mmap : 1,
+		 from_shm : 1,
+		 to_shm : 1,
 		 initialized : 1;
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
diff --git a/git-compat-util.h b/git-compat-util.h
index c07e0c1..56945a7 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -513,6 +513,7 @@ static inline int ends_with(const char *str, const char *suffix)
 #define PROT_READ 1
 #define PROT_WRITE 2
 #define MAP_PRIVATE 1
+#define MAP_SHARED 2
 #endif
 
 #define mmap git_mmap
diff --git a/index-helper.c b/index-helper.c
new file mode 100644
index 0000000..d64e49b
--- /dev/null
+++ b/index-helper.c
@@ -0,0 +1,276 @@
+#include "cache.h"
+#include "parse-options.h"
+#include "sigchain.h"
+#include "strbuf.h"
+#include "exec_cmd.h"
+#include "split-index.h"
+#include "lockfile.h"
+#include "cache.h"
+#include "unix-socket.h"
+
+struct shm {
+	unsigned char sha1[20];
+	void *shm;
+	size_t size;
+};
+
+static struct shm shm_index;
+static struct shm shm_base_index;
+
+static void release_index_shm(struct shm *is)
+{
+	if (!is->shm)
+		return;
+	munmap(is->shm, is->size);
+	unlink(git_path("shm-index-%s", sha1_to_hex(is->sha1)));
+	is->shm = NULL;
+}
+
+static void cleanup_shm(void)
+{
+	release_index_shm(&shm_index);
+	release_index_shm(&shm_base_index);
+}
+
+static void cleanup(void)
+{
+	unlink(git_path("index-helper.sock"));
+	cleanup_shm();
+}
+
+static void cleanup_on_signal(int sig)
+{
+	/* We ignore sigpipes -- that's just a client being broken. */
+	if (sig == SIGPIPE)
+		return;
+	cleanup();
+	sigchain_pop(sig);
+	raise(sig);
+}
+
+static int shared_mmap_create(int file_flags, int file_mode, size_t size,
+			      void **new_mmap, int mmap_prot, int mmap_flags,
+			      const char *path)
+{
+	int fd = -1;
+	int ret = -1;
+
+	fd = open(path, file_flags, file_mode);
+
+	if (fd < 0)
+		goto done;
+
+	if (ftruncate(fd, size))
+		goto done;
+
+	*new_mmap = mmap(NULL, size, mmap_prot, mmap_flags, fd, 0);
+
+	if (*new_mmap == MAP_FAILED) {
+		*new_mmap = NULL;
+		goto done;
+	}
+	madvise(new_mmap, size, MADV_WILLNEED);
+
+	ret = 0;
+done:
+	if (fd > 0)
+		close(fd);
+	return ret;
+}
+
+static void share_index(struct index_state *istate, struct shm *is)
+{
+	void *new_mmap;
+	if (istate->mmap_size <= 20 ||
+	    hashcmp(istate->sha1,
+		    (unsigned char *)istate->mmap + istate->mmap_size - 20) ||
+	    !hashcmp(istate->sha1, is->sha1) ||
+	    shared_mmap_create(O_CREAT | O_EXCL | O_RDWR, 0700,
+			       istate->mmap_size, &new_mmap,
+			       PROT_READ | PROT_WRITE, MAP_SHARED,
+			       git_path("shm-index-%s",
+					sha1_to_hex(istate->sha1))) < 0)
+		return;
+
+	release_index_shm(is);
+	is->size = istate->mmap_size;
+	is->shm = new_mmap;
+	hashcpy(is->sha1, istate->sha1);
+
+	memcpy(new_mmap, istate->mmap, istate->mmap_size - 20);
+
+	/*
+	 * The trailing hash must be written last after everything is
+	 * written. It's the indication that the shared memory is now
+	 * ready.
+	 * The memory barrier here matches read-cache.c:try_shm.
+	 */
+	__sync_synchronize();
+
+	hashcpy((unsigned char *)new_mmap + istate->mmap_size - 20, is->sha1);
+}
+
+static void share_the_index(void)
+{
+	if (the_index.split_index && the_index.split_index->base)
+		share_index(the_index.split_index->base, &shm_base_index);
+	share_index(&the_index, &shm_index);
+	discard_index(&the_index);
+}
+
+static void set_socket_blocking_flag(int fd, int make_nonblocking)
+{
+	int flags;
+
+	flags = fcntl(fd, F_GETFL, NULL);
+
+	if (flags < 0)
+		die(_("fcntl failed"));
+
+	if (make_nonblocking)
+		flags |= O_NONBLOCK;
+	else
+		flags &= ~O_NONBLOCK;
+
+	if (fcntl(fd, F_SETFL, flags) < 0)
+		die(_("fcntl failed"));
+}
+
+static void refresh(void)
+{
+	discard_index(&the_index);
+	the_index.keep_mmap = 1;
+	the_index.to_shm    = 1;
+	if (read_cache() < 0)
+		die(_("could not read index"));
+	share_the_index();
+}
+
+#ifndef NO_MMAP
+
+static void loop(int fd, int idle_in_seconds)
+{
+	struct timeval timeout;
+	struct timeval *timeout_p;
+
+	while (1) {
+		fd_set readfds;
+		int result, client_fd;
+		struct strbuf command = STRBUF_INIT;
+
+		/* need to reset timer in case select() decremented it */
+		if (idle_in_seconds) {
+			timeout.tv_usec = 0;
+			timeout.tv_sec = idle_in_seconds;
+			timeout_p = &timeout;
+		} else {
+			timeout_p = NULL;
+		}
+
+		/* Wait for a request */
+		FD_ZERO(&readfds);
+		FD_SET(fd, &readfds);
+		result = select(fd + 1, &readfds, NULL, NULL, timeout_p);
+		if (result < 0) {
+			if (errno == EINTR)
+				/*
+				 * This can lead to an overlong keepalive,
+				 * but that is better than a premature exit.
+				 */
+				continue;
+			die_errno(_("select() failed"));
+		}
+		if (result == 0)
+			/* timeout */
+			break;
+
+		client_fd = accept(fd, NULL, NULL);
+		if (client_fd < 0)
+			/*
+			 * An error here is unlikely -- it probably
+			 * indicates that the connecting process has
+			 * already dropped the connection.
+			 */
+			continue;
+
+		/*
+		 * Our connection to the client is blocking since a client
+		 * can always be killed by SIGINT or similar.
+		 */
+		set_socket_blocking_flag(client_fd, 0);
+
+		if (strbuf_getwholeline_fd(&command, client_fd, '\0') == 0) {
+			if (!strcmp(command.buf, "refresh")) {
+				refresh();
+			} else if (!strcmp(command.buf, "poke")) {
+				/*
+				 * Just a poke to keep us
+				 * alive, nothing to do.
+				 */
+			} else {
+				warning("BUG: Bogus command %s", command.buf);
+			}
+		} else {
+			/*
+			 * No command from client.  Probably it's just
+			 * a liveness check.  Just close up.
+			 */
+		}
+		close(client_fd);
+		strbuf_release(&command);
+	}
+
+	close(fd);
+}
+
+#else
+
+static void loop(int fd, int idle_in_seconds)
+{
+	die(_("index-helper is not supported on this platform"));
+}
+
+#endif
+
+static const char * const usage_text[] = {
+	N_("git index-helper [options]"),
+	NULL
+};
+
+int main(int argc, char **argv)
+{
+	const char *prefix;
+	int idle_in_seconds = 600;
+	int fd;
+	struct strbuf socket_path = STRBUF_INIT;
+	struct option options[] = {
+		OPT_INTEGER(0, "exit-after", &idle_in_seconds,
+			    N_("exit if not used after some seconds")),
+		OPT_END()
+	};
+
+	git_extract_argv0_path(argv[0]);
+	git_setup_gettext();
+
+	if (argc == 2 && !strcmp(argv[1], "-h"))
+		usage_with_options(usage_text, options);
+
+	prefix = setup_git_directory();
+	if (parse_options(argc, (const char **)argv, prefix,
+			  options, usage_text, 0))
+		die(_("too many arguments"));
+
+	atexit(cleanup);
+	sigchain_push_common(cleanup_on_signal);
+
+	strbuf_git_path(&socket_path, "index-helper.sock");
+
+	fd = unix_stream_listen(socket_path.buf);
+	if (fd < 0)
+		die_errno(_("could not set up index-helper socket"));
+
+	loop(fd, idle_in_seconds);
+
+	close(fd);
+	return 0;
+}
diff --git a/read-cache.c b/read-cache.c
index 7e387e9..af2101f 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -18,6 +18,7 @@
 #include "varint.h"
 #include "split-index.h"
 #include "utf8.h"
+#include "unix-socket.h"
 
 static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 					       unsigned int options);
@@ -1541,6 +1542,95 @@ static void post_read_index_from(struct index_state *istate)
 	tweak_untracked_cache(istate);
 }
 
+static int poke_daemon(struct index_state *istate,
+		       const struct stat *st, int refresh_cache)
+{
+	int fd;
+	int ret = 0;
+	const char *socket_path;
+
+	/* if this is from index-helper, do not poke itself (recursively) */
+	if (istate->to_shm)
+		return 0;
+
+	socket_path = git_path("index-helper.sock");
+	if (!socket_path)
+		return -1;
+
+	fd = unix_stream_connect(socket_path);
+	if (refresh_cache) {
+		ret = write_in_full(fd, "refresh", 8) != 8;
+	} else {
+		ret = write_in_full(fd, "poke", 5) != 5;
+	}
+
+	close(fd);
+	return ret;
+}
+
+static int is_main_index(struct index_state *istate)
+{
+	return istate == &the_index ||
+		(the_index.split_index &&
+		 istate == the_index.split_index->base);
+}
+
+/*
+ * Try to open and verify a cached shm index if available. Return 0 if
+ * succeeds (istate->mmap and istate->mmap_size are updated). Return
+ * negative otherwise.
+ */
+static int try_shm(struct index_state *istate)
+{
+	void *new_mmap = NULL;
+	size_t old_size = istate->mmap_size;
+	ssize_t new_size;
+	const unsigned char *sha1;
+	struct stat st;
+	int fd = -1;
+
+	if (!is_main_index(istate) ||
+	    old_size <= 20 ||
+	    stat(git_path("index-helper.sock"), &st))
+		return -1;
+	if (poke_daemon(istate, &st, 0))
+		return -1;
+	sha1 = (unsigned char *)istate->mmap + old_size - 20;
+
+	fd = open(git_path("shm-index-%s", sha1_to_hex(sha1)), O_RDONLY);
+	if (fd < 0)
+		goto fail;
+
+	if (fstat(fd, &st))
+		goto fail;
+
+	new_size = st.st_size;
+	new_mmap = mmap(NULL, new_size, PROT_READ, MAP_SHARED, fd, 0);
+	if (new_size <= 20 ||
+	    hashcmp((unsigned char *)istate->mmap + old_size - 20,
+		    (unsigned char *)new_mmap + new_size - 20)) {
+		if (new_mmap)
+			munmap(new_mmap, new_size);
+		goto fail;
+	}
+
+	/* The memory barrier here matches index-helper.c:share_index. */
+	__sync_synchronize();
+
+	munmap(istate->mmap, istate->mmap_size);
+	istate->mmap = new_mmap;
+	istate->mmap_size = new_size;
+	istate->from_shm = 1;
+	close(fd);
+	return 0;
+
+fail:
+	if (fd >= 0)
+		close(fd);
+	poke_daemon(istate, &st, 1);
+	return -1;
+}
+
 /* remember to discard_cache() before reading a different cache! */
 int do_read_index(struct index_state *istate, const char *path, int must_exist)
 {
@@ -1555,6 +1645,7 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	if (istate->initialized)
 		return istate->cache_nr;
 
+	istate->from_shm = 0;
 	istate->timestamp.sec = 0;
 	istate->timestamp.nsec = 0;
 	fd = open(path, O_RDONLY);
@@ -1574,15 +1665,17 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 	mmap = xmmap(NULL, mmap_size, PROT_READ, MAP_PRIVATE, fd, 0);
 	if (mmap == MAP_FAILED)
 		die_errno("unable to map index file");
-	if (istate->keep_mmap) {
-		istate->mmap = mmap;
-		istate->mmap_size = mmap_size;
-	}
 	close(fd);
 
-	hdr = mmap;
-	if (verify_hdr(hdr, mmap_size) < 0)
+	istate->mmap = mmap;
+	istate->mmap_size = mmap_size;
+	if (try_shm(istate) &&
+	    verify_hdr(istate->mmap, istate->mmap_size) < 0)
 		goto unmap;
+	hdr = mmap = istate->mmap;
+	mmap_size = istate->mmap_size;
+	if (!istate->keep_mmap)
+		istate->mmap = NULL;
 
 	hashcpy(istate->sha1, (const unsigned char *)hdr + mmap_size - 20);
 	istate->version = ntohl(hdr->hdr_version);
@@ -1662,6 +1755,8 @@ int read_index_from(struct index_state *istate, const char *path)
 	else
 		split_index->base = xcalloc(1, sizeof(*split_index->base));
 	split_index->base->keep_mmap = istate->keep_mmap;
+	split_index->base->to_shm    = istate->to_shm;
+	split_index->base->from_shm  = istate->from_shm;
 	ret = do_read_index(split_index->base,
 			    git_path("sharedindex.%s",
 				     sha1_to_hex(split_index->base_sha1)), 1);
@@ -1712,6 +1807,8 @@ int discard_index(struct index_state *istate)
 	discard_split_index(istate);
 	free_untracked_cache(istate->untracked);
 	istate->untracked = NULL;
+	istate->from_shm = 0;
+	istate->to_shm   = 0;
 	return 0;
 }
 
@@ -2138,9 +2235,14 @@ static int do_write_locked_index(struct index_state *istate, struct lock_file *l
 		return ret;
 	assert((flags & (COMMIT_LOCK | CLOSE_LOCK)) !=
 	       (COMMIT_LOCK | CLOSE_LOCK));
-	if (flags & COMMIT_LOCK)
-		return commit_locked_index(lock);
-	else if (flags & CLOSE_LOCK)
+	if (flags & COMMIT_LOCK) {
+		struct stat st;
+		ret = commit_locked_index(lock);
+		if (!ret && is_main_index(istate) &&
+		    !stat(git_path("index-helper.sock"), &st))
+			poke_daemon(istate, &st, 1);
+		return ret;
+	} else if (flags & CLOSE_LOCK)
 		return close_lock_file(lock);
 	else
 		return ret;
diff --git a/t/t7900-index-helper.sh b/t/t7900-index-helper.sh
new file mode 100755
index 0000000..114c112
--- /dev/null
+++ b/t/t7900-index-helper.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+#
+# Copyright (c) 2016, Twitter, Inc
+#
+
+test_description='git-index-helper
+
+Testing git index-helper
+'
+
+. ./test-lib.sh
+
+test -n "$NO_MMAP" && {
+	skip_all='skipping index-helper tests: no mmap'
+	test_done
+}
+
+test_expect_success 'index-helper smoke test' '
+	git index-helper --exit-after 1 &&
+	test_path_is_missing .git/index-helper.sock
+'
+
+test_done
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 04/15] index-helper: add --strict
  2016-04-19 23:27 [PATCH v5 00/15] index-helper/watchman David Turner
                   ` (2 preceding siblings ...)
  2016-04-19 23:27 ` [PATCH v5 03/15] index-helper: new daemon for caching index and related stuff David Turner
@ 2016-04-19 23:27 ` David Turner
  2016-04-19 23:27 ` [PATCH v5 05/15] daemonize(): set a flag before exiting the main process David Turner
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-04-19 23:27 UTC (permalink / raw)
  To: git, pclouds; +Cc: David Turner

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

There are "holes" in the index-helper approach because the shared
memory is not verified again by git. If $USER is compromised, shared
memory could be modified. But anyone who could do this could already
modify $GIT_DIR/index. A more realistic risk is some bugs in
index-helper that produce corrupt shared memory. --strict is added to
avoid that.

Strictly speaking there's still a very small gap where corrupt shared
memory could still be read by git: after we write the trailing SHA-1 in
the shared memory (thus signaling "this shm is ready") and before
verify_shm() detects an error.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
---
 Documentation/git-index-helper.txt |  9 +++++++
 cache.h                            |  1 +
 index-helper.c                     | 48 ++++++++++++++++++++++++++++++++++++++
 read-cache.c                       |  9 ++++---
 4 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-index-helper.txt b/Documentation/git-index-helper.txt
index 77687c0..a5c058f 100644
--- a/Documentation/git-index-helper.txt
+++ b/Documentation/git-index-helper.txt
@@ -22,6 +22,15 @@ OPTIONS
 	Exit if the cached index is not accessed for `<n>`
 	seconds. Specify 0 to wait forever. Default is 600.
 
+--strict::
+--no-strict::
+	Strict mode makes index-helper verify the shared memory after
+	it's created. If the result does not match what's read from
+	$GIT_DIR/index, the shared memory is destroyed. This makes
+	index-helper take more than double the amount of time required
+	for reading an index, but because it will happen in the
+	background, it's not noticable. `--strict` is enabled by default.
+
 NOTES
 -----
 
diff --git a/cache.h b/cache.h
index 43fb314..4b678e9 100644
--- a/cache.h
+++ b/cache.h
@@ -336,6 +336,7 @@ struct index_state {
 		 keep_mmap : 1,
 		 from_shm : 1,
 		 to_shm : 1,
+		 always_verify_trailing_sha1 : 1,
 		 initialized : 1;
 	struct hashmap name_hash;
 	struct hashmap dir_hash;
diff --git a/index-helper.c b/index-helper.c
index d64e49b..e8d6b64 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -16,6 +16,7 @@ struct shm {
 
 static struct shm shm_index;
 static struct shm shm_base_index;
+static int to_verify = 1;
 
 static void release_index_shm(struct shm *is)
 {
@@ -110,11 +111,56 @@ static void share_index(struct index_state *istate, struct shm *is)
 	hashcpy((unsigned char *)new_mmap + istate->mmap_size - 20, is->sha1);
 }
 
+static int verify_shm(void)
+{
+	int i;
+	struct index_state istate;
+	memset(&istate, 0, sizeof(istate));
+	istate.always_verify_trailing_sha1 = 1;
+	istate.to_shm = 1;
+	i = read_index(&istate);
+	if (i != the_index.cache_nr)
+		goto done;
+	for (; i < the_index.cache_nr; i++) {
+		struct cache_entry *base, *ce;
+		/* namelen is checked separately */
+		const unsigned int ondisk_flags =
+			CE_STAGEMASK | CE_VALID | CE_EXTENDED_FLAGS;
+		unsigned int ce_flags, base_flags, ret;
+		base = the_index.cache[i];
+		ce = istate.cache[i];
+		if (ce->ce_namelen != base->ce_namelen ||
+		    strcmp(ce->name, base->name)) {
+			warning("mismatch at entry %d", i);
+			break;
+		}
+		ce_flags = ce->ce_flags;
+		base_flags = base->ce_flags;
+		/* only on-disk flags matter */
+		ce->ce_flags   &= ondisk_flags;
+		base->ce_flags &= ondisk_flags;
+		ret = memcmp(&ce->ce_stat_data, &base->ce_stat_data,
+			     offsetof(struct cache_entry, name) -
+			     offsetof(struct cache_entry, ce_stat_data));
+		ce->ce_flags = ce_flags;
+		base->ce_flags = base_flags;
+		if (ret) {
+			warning("mismatch at entry %d", i);
+			break;
+		}
+	}
+done:
+	discard_index(&istate);
+	return i == the_index.cache_nr;
+}
+
 static void share_the_index(void)
 {
 	if (the_index.split_index && the_index.split_index->base)
 		share_index(the_index.split_index->base, &shm_base_index);
 	share_index(&the_index, &shm_index);
+	if (to_verify && !verify_shm())
+		cleanup_shm();
 	discard_index(&the_index);
 }
 
@@ -246,6 +292,8 @@ int main(int argc, char **argv)
 	struct option options[] = {
 		OPT_INTEGER(0, "exit-after", &idle_in_seconds,
 			    N_("exit if not used after some seconds")),
+		OPT_BOOL(0, "strict", &to_verify,
+			 "verify shared memory after creating"),
 		OPT_END()
 	};
 
diff --git a/read-cache.c b/read-cache.c
index af2101f..5df5b21 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1669,9 +1669,12 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
 
 	istate->mmap = mmap;
 	istate->mmap_size = mmap_size;
-	if (try_shm(istate) &&
-	    verify_hdr(istate->mmap, istate->mmap_size) < 0)
-		goto unmap;
+	if (try_shm(istate)) {
+		if (verify_hdr(istate->mmap, istate->mmap_size) < 0)
+			goto unmap;
+	} else if (istate->always_verify_trailing_sha1 &&
+		   verify_hdr(istate->mmap, istate->mmap_size) < 0)
+			goto unmap;
 	hdr = mmap = istate->mmap;
 	mmap_size = istate->mmap_size;
 	if (!istate->keep_mmap)
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 05/15] daemonize(): set a flag before exiting the main process
  2016-04-19 23:27 [PATCH v5 00/15] index-helper/watchman David Turner
                   ` (3 preceding siblings ...)
  2016-04-19 23:27 ` [PATCH v5 04/15] index-helper: add --strict David Turner
@ 2016-04-19 23:27 ` David Turner
  2016-04-19 23:28 ` [PATCH v5 06/15] index-helper: add --detach David Turner
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-04-19 23:27 UTC (permalink / raw)
  To: git, pclouds; +Cc: David Turner

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

This allows signal handlers and atexit functions to realize this
situation and not clean up.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
---
 builtin/gc.c | 2 +-
 cache.h      | 2 +-
 daemon.c     | 2 +-
 setup.c      | 4 +++-
 4 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index c583aad..37180de 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -385,7 +385,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 			 * failure to daemonize is ok, we'll continue
 			 * in foreground
 			 */
-			daemonized = !daemonize();
+			daemonized = !daemonize(NULL);
 		}
 	} else
 		add_repack_all_option();
diff --git a/cache.h b/cache.h
index 4b678e9..0aeb994 100644
--- a/cache.h
+++ b/cache.h
@@ -530,7 +530,7 @@ extern int set_git_dir_init(const char *git_dir, const char *real_git_dir, int);
 extern int init_db(const char *template_dir, unsigned int flags);
 
 extern void sanitize_stdfds(void);
-extern int daemonize(void);
+extern int daemonize(int *);
 
 #define alloc_nr(x) (((x)+16)*3/2)
 
diff --git a/daemon.c b/daemon.c
index 8d45c33..a5cf954 100644
--- a/daemon.c
+++ b/daemon.c
@@ -1365,7 +1365,7 @@ int main(int argc, char **argv)
 		return execute();
 
 	if (detach) {
-		if (daemonize())
+		if (daemonize(NULL))
 			die("--detach not supported on this platform");
 	} else
 		sanitize_stdfds();
diff --git a/setup.c b/setup.c
index de1a2a7..9adf13f 100644
--- a/setup.c
+++ b/setup.c
@@ -1017,7 +1017,7 @@ void sanitize_stdfds(void)
 		close(fd);
 }
 
-int daemonize(void)
+int daemonize(int *daemonized)
 {
 #ifdef NO_POSIX_GOODIES
 	errno = ENOSYS;
@@ -1029,6 +1029,8 @@ int daemonize(void)
 		case -1:
 			die_errno("fork failed");
 		default:
+			if (daemonized)
+				*daemonized = 1;
 			exit(0);
 	}
 	if (setsid() == -1)
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 06/15] index-helper: add --detach
  2016-04-19 23:27 [PATCH v5 00/15] index-helper/watchman David Turner
                   ` (4 preceding siblings ...)
  2016-04-19 23:27 ` [PATCH v5 05/15] daemonize(): set a flag before exiting the main process David Turner
@ 2016-04-19 23:28 ` David Turner
  2016-04-19 23:50   ` Duy Nguyen
  2016-04-19 23:28 ` [PATCH v5 07/15] read-cache: add watchman 'WAMA' extension David Turner
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 35+ messages in thread
From: David Turner @ 2016-04-19 23:28 UTC (permalink / raw)
  To: git, pclouds; +Cc: David Turner

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

We detach after creating and opening the socket, because otherwise
we might return control to the shell before index-helper is ready to
accept commands.  This might lead to flaky tests.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
---
 Documentation/git-index-helper.txt | 3 +++
 index-helper.c                     | 9 +++++++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-index-helper.txt b/Documentation/git-index-helper.txt
index a5c058f..f6aa525 100644
--- a/Documentation/git-index-helper.txt
+++ b/Documentation/git-index-helper.txt
@@ -31,6 +31,9 @@ OPTIONS
 	for reading an index, but because it will happen in the
 	background, it's not noticable. `--strict` is enabled by default.
 
+--detach::
+	Detach from the shell.
+
 NOTES
 -----
 
diff --git a/index-helper.c b/index-helper.c
index e8d6b64..80b1dbe 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -16,7 +16,7 @@ struct shm {
 
 static struct shm shm_index;
 static struct shm shm_base_index;
-static int to_verify = 1;
+static int daemonized, to_verify = 1;
 
 static void release_index_shm(struct shm *is)
 {
@@ -35,6 +35,8 @@ static void cleanup_shm(void)
 
 static void cleanup(void)
 {
+	if (daemonized)
+		return;
 	unlink(git_path("index-helper.sock"));
 	cleanup_shm();
 }
@@ -286,7 +288,7 @@ static const char * const usage_text[] = {
 int main(int argc, char **argv)
 {
 	const char *prefix;
-	int idle_in_seconds = 600;
+	int idle_in_seconds = 600, detach = 0;
 	int fd;
 	struct strbuf socket_path = STRBUF_INIT;
 	struct option options[] = {
@@ -294,6 +296,7 @@ int main(int argc, char **argv)
 			    N_("exit if not used after some seconds")),
 		OPT_BOOL(0, "strict", &to_verify,
 			 "verify shared memory after creating"),
+		OPT_BOOL(0, "detach", &detach, "detach the process"),
 		OPT_END()
 	};
 
@@ -317,6 +320,8 @@ int main(int argc, char **argv)
 	if (fd < 0)
 		die_errno(_("could not set up index-helper socket"));
 
+	if (detach && daemonize(&daemonized))
+		die_errno(_("unable to detach"));
 	loop(fd, idle_in_seconds);
 
 	close(fd);
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 07/15] read-cache: add watchman 'WAMA' extension
  2016-04-19 23:27 [PATCH v5 00/15] index-helper/watchman David Turner
                   ` (5 preceding siblings ...)
  2016-04-19 23:28 ` [PATCH v5 06/15] index-helper: add --detach David Turner
@ 2016-04-19 23:28 ` David Turner
  2016-04-19 23:28 ` [PATCH v5 08/15] Add watchman support to reduce index refresh cost David Turner
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-04-19 23:28 UTC (permalink / raw)
  To: git, pclouds; +Cc: David Turner

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

The extension contains a bitmap, one bit for each entry in the
index. If the n-th bit is zero, the n-th entry is considered
unchanged, we can ce_mark_uptodate() it without refreshing. If the bit
is non-zero and we found out the corresponding file is clean after
refresh, we can clear the bit.

In addition, there's a list of directories in the untracked-cache
to invalidate (because they have new or modified entries).

The 'skipping refresh' bit is not in this patch yet as we would need
watchman. More details in later patches.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
---
 Documentation/technical/index-format.txt |  22 ++++++
 cache.h                                  |   4 ++
 dir.h                                    |   3 +
 read-cache.c                             | 117 ++++++++++++++++++++++++++++++-
 4 files changed, 144 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
index ade0b0c..86ed3a6 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -295,3 +295,25 @@ The remaining data of each directory block is grouped by type:
     in the previous ewah bitmap.
 
   - One NUL.
+
+== Watchman cache
+
+  The watchman cache tracks files for which watchman has told us about
+  changes.  The signature for this extension is { 'W', 'A', 'M', 'A' }.
+
+  The extension starts with
+
+  - A NUL-terminated string: the watchman vector clock at the last
+    time we heard from watchman.
+
+  - 32-bit bitmap size: the size of the CE_WATCHMAN_DIRTY bitmap
+
+  - 32-bit untracked cache entry count: the number of dirty untracked
+    cache entries
+
+  - An ewah bitmap, the n-th bit indicates whether the n-th index entry
+    is CE_WATCHMAN_DIRTY.
+
+  - a list of N NUL-terminated strings.  Each is a directory that should
+    be marked dirty in the untracked cache because watchman has told us
+    about an update to a file in it.
diff --git a/cache.h b/cache.h
index 0aeb994..f4f7eef 100644
--- a/cache.h
+++ b/cache.h
@@ -182,6 +182,8 @@ struct cache_entry {
 #define CE_VALID     (0x8000)
 #define CE_STAGESHIFT 12
 
+#define CE_WATCHMAN_DIRTY  (0x0001)
+
 /*
  * Range 0xFFFF0FFF in ce_flags is divided into
  * two parts: in-memory flags and on-disk ones.
@@ -320,6 +322,7 @@ static inline unsigned int canon_mode(unsigned int mode)
 #define CACHE_TREE_CHANGED	(1 << 5)
 #define SPLIT_INDEX_ORDERED	(1 << 6)
 #define UNTRACKED_CHANGED	(1 << 7)
+#define WATCHMAN_CHANGED	(1 << 8)
 
 struct split_index;
 struct untracked_cache;
@@ -344,6 +347,7 @@ struct index_state {
 	struct untracked_cache *untracked;
 	void *mmap;
 	size_t mmap_size;
+	char *last_update;
 };
 
 extern struct index_state the_index;
diff --git a/dir.h b/dir.h
index 3ec3fb0..3d540de 100644
--- a/dir.h
+++ b/dir.h
@@ -142,6 +142,9 @@ struct untracked_cache {
 	int gitignore_invalidated;
 	int dir_invalidated;
 	int dir_opened;
+	/* watchman invalidation data */
+	unsigned int use_watchman : 1;
+	struct string_list invalid_untracked;
 };
 
 struct dir_struct {
diff --git a/read-cache.c b/read-cache.c
index 5df5b21..050668f 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -19,6 +19,7 @@
 #include "split-index.h"
 #include "utf8.h"
 #include "unix-socket.h"
+#include "ewah/ewok.h"
 
 static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 					       unsigned int options);
@@ -41,11 +42,13 @@ static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 #define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */
 #define CACHE_EXT_LINK 0x6c696e6b	  /* "link" */
 #define CACHE_EXT_UNTRACKED 0x554E5452	  /* "UNTR" */
+#define CACHE_EXT_WATCHMAN 0x57414D41	  /* "WAMA" */
 
 /* changes that can be kept in $GIT_DIR/index (basically all extensions) */
 #define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \
 		 CE_ENTRY_ADDED | CE_ENTRY_REMOVED | CE_ENTRY_CHANGED | \
-		 SPLIT_INDEX_ORDERED | UNTRACKED_CHANGED)
+		 SPLIT_INDEX_ORDERED | UNTRACKED_CHANGED | \
+		 WATCHMAN_CHANGED)
 
 struct index_state the_index;
 static const char *alternate_index_output;
@@ -1220,8 +1223,13 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 			continue;
 
 		new = refresh_cache_ent(istate, ce, options, &cache_errno, &changed);
-		if (new == ce)
+		if (new == ce) {
+			if (ce->ce_flags & CE_WATCHMAN_DIRTY) {
+				ce->ce_flags          &= ~CE_WATCHMAN_DIRTY;
+				istate->cache_changed |= WATCHMAN_CHANGED;
+			}
 			continue;
+		}
 		if (!new) {
 			const char *fmt;
 
@@ -1365,6 +1373,94 @@ static int verify_hdr(const struct cache_header *hdr, unsigned long size)
 	return 0;
 }
 
+static void mark_no_watchman(size_t pos, void *data)
+{
+	struct index_state *istate = data;
+	assert(pos < istate->cache_nr);
+	istate->cache[pos]->ce_flags |= CE_WATCHMAN_DIRTY;
+}
+
+static int read_watchman_ext(struct index_state *istate, const void *data,
+			     unsigned long sz)
+{
+	struct ewah_bitmap *bitmap;
+	int ret, len;
+	uint32_t bitmap_size;
+	uint32_t untracked_nr;
+
+	if (memchr(data, 0, sz) == NULL)
+		return error("invalid extension");
+
+	len = strlen(data) + 1;
+	memcpy(&bitmap_size, (const char *)data + len, 4);
+	memcpy(&untracked_nr, (const char *)data + len + 4, 4);
+	untracked_nr = ntohl(untracked_nr);
+	bitmap_size = ntohl(bitmap_size);
+
+	bitmap = ewah_new();
+	ret = ewah_read_mmap(bitmap, (const char *)data + len + 8, bitmap_size);
+	if (ret != bitmap_size) {
+		ewah_free(bitmap);
+		return error("failed to parse ewah bitmap reading watchman index extension");
+	}
+	istate->last_update = xstrdup(data);
+	ewah_each_bit(bitmap, mark_no_watchman, istate);
+	ewah_free(bitmap);
+
+	/*
+	 * TODO: update the untracked cache from the untracked data in this
+	 * extension.
+	 */
+	return 0;
+}
+
+static int untracked_entry_append(struct string_list_item *item, void *sbvoid)
+{
+	struct strbuf *sb = sbvoid;
+
+	strbuf_addstr(sb, item->string);
+	strbuf_addch(sb, 0);
+	return 0;
+}
+
+void write_watchman_ext(struct strbuf *sb, struct index_state* istate)
+{
+	struct ewah_bitmap *bitmap;
+	int i;
+	int ewah_start;
+	int ewah_size = 0;
+	int fixup = 0;
+
+	strbuf_add(sb, istate->last_update, strlen(istate->last_update) + 1);
+	fixup = sb->len;
+	strbuf_add(sb, &ewah_size, 4); /* we'll fix this up later */
+	if (istate->untracked) {
+		uint32_t nr = istate->untracked->invalid_untracked.nr;
+		nr = htonl(nr);
+		strbuf_add(sb, &nr, 4);
+	} else {
+		/* zero */
+		strbuf_add(sb, &ewah_size, 4);
+	}
+
+	ewah_start = sb->len;
+	bitmap = ewah_new();
+	for (i = 0; i < istate->cache_nr; i++)
+		if (istate->cache[i]->ce_flags & CE_WATCHMAN_DIRTY)
+			ewah_set(bitmap, i);
+	ewah_serialize_strbuf(bitmap, sb);
+	ewah_free(bitmap);
+
+	/* fix up size field */
+	ewah_size = sb->len - ewah_start;
+	ewah_size = htonl(ewah_size);
+	memcpy(sb->buf + fixup, &ewah_size, 4);
+
+	if (istate->untracked)
+		for_each_string_list(&istate->untracked->invalid_untracked,
+				     untracked_entry_append, sb);
+}
+
 static int read_index_extension(struct index_state *istate,
 				const char *ext, void *data, unsigned long sz)
 {
@@ -1382,6 +1478,11 @@ static int read_index_extension(struct index_state *istate,
 	case CACHE_EXT_UNTRACKED:
 		istate->untracked = read_untracked_extension(data, sz);
 		break;
+
+	case CACHE_EXT_WATCHMAN:
+		read_watchman_ext(istate, data, sz);
+		break;
+
 	default:
 		if (*ext < 'A' || 'Z' < *ext)
 			return error("index uses %.4s extension, which we do not understand",
@@ -1812,6 +1913,8 @@ int discard_index(struct index_state *istate)
 	istate->untracked = NULL;
 	istate->from_shm = 0;
 	istate->to_shm   = 0;
+	free(istate->last_update);
+	istate->last_update = NULL;
 	return 0;
 }
 
@@ -2209,6 +2312,16 @@ static int do_write_index(struct index_state *istate, int newfd,
 		if (err)
 			return -1;
 	}
+	if (!strip_extensions && istate->last_update) {
+		struct strbuf sb = STRBUF_INIT;
+
+		write_watchman_ext(&sb, istate);
+		err = write_index_ext_header(&c, newfd, CACHE_EXT_WATCHMAN, sb.len) < 0
+			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
+		strbuf_release(&sb);
+		if (err)
+			return -1;
+	}
 
 	if (ce_flush(&c, newfd, istate->sha1) || fstat(newfd, &st))
 		return -1;
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 08/15] Add watchman support to reduce index refresh cost
  2016-04-19 23:27 [PATCH v5 00/15] index-helper/watchman David Turner
                   ` (6 preceding siblings ...)
  2016-04-19 23:28 ` [PATCH v5 07/15] read-cache: add watchman 'WAMA' extension David Turner
@ 2016-04-19 23:28 ` David Turner
  2016-04-19 23:28 ` [PATCH v5 09/15] index-helper: use watchman to avoid refreshing index with lstat() David Turner
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-04-19 23:28 UTC (permalink / raw)
  To: git, pclouds; +Cc: David Turner

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

The previous patch has the logic to clear bits in 'WAMA' bitmap. This
patch has logic to set bits as told by watchman. The missing bit,
_using_ these bits, are not here yet.

A lot of this code is written by David Turner originally, mostly from
[1]. I'm just copying and polishing it a bit.

[1] http://article.gmane.org/gmane.comp.version-control.git/248006

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
---
 Makefile           |  12 +++++
 cache.h            |   1 +
 config.c           |   5 ++
 configure.ac       |   8 ++++
 environment.c      |   3 ++
 watchman-support.c | 135 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 watchman-support.h |   7 +++
 7 files changed, 171 insertions(+)
 create mode 100644 watchman-support.c
 create mode 100644 watchman-support.h

diff --git a/Makefile b/Makefile
index c8be0e7..65ab0f4 100644
--- a/Makefile
+++ b/Makefile
@@ -451,6 +451,7 @@ MSGFMT = msgfmt
 CURL_CONFIG = curl-config
 PTHREAD_LIBS = -lpthread
 PTHREAD_CFLAGS =
+WATCHMAN_LIBS =
 GCOV = gcov
 
 export TCL_PATH TCLTK_PATH
@@ -1416,6 +1417,13 @@ else
 	LIB_OBJS += thread-utils.o
 endif
 
+ifdef USE_WATCHMAN
+	LIB_H += watchman-support.h
+	LIB_OBJS += watchman-support.o
+	WATCHMAN_LIBS = -lwatchman
+	BASIC_CFLAGS += -DUSE_WATCHMAN
+endif
+
 ifdef HAVE_PATHS_H
 	BASIC_CFLAGS += -DHAVE_PATHS_H
 endif
@@ -2025,6 +2033,9 @@ git-remote-testsvn$X: remote-testsvn.o GIT-LDFLAGS $(GITLIBS) $(VCSSVN_LIB)
 	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(LIBS) \
 	$(VCSSVN_LIB)
 
+git-index-helper$X: index-helper.o GIT-LDFLAGS $(GITLIBS)
+	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(LIBS) $(WATCHMAN_LIBS)
+
 $(REMOTE_CURL_ALIASES): $(REMOTE_CURL_PRIMARY)
 	$(QUIET_LNCP)$(RM) $@ && \
 	ln $< $@ 2>/dev/null || \
@@ -2164,6 +2175,7 @@ GIT-BUILD-OPTIONS: FORCE
 	@echo NO_PYTHON=\''$(subst ','\'',$(subst ','\'',$(NO_PYTHON)))'\' >>$@+
 	@echo NO_UNIX_SOCKETS=\''$(subst ','\'',$(subst ','\'',$(NO_UNIX_SOCKETS)))'\' >>$@+
 	@echo NO_MMAP=\''$(subst ','\'',$(subst ','\'',$(NO_MMAP)))'\' >>$@+
+	@echo USE_WATCHMAN=\''$(subst ','\'',$(subst ','\'',$(USE_WATCHMAN)))'\' >>$@+
 ifdef TEST_OUTPUT_DIRECTORY
 	@echo TEST_OUTPUT_DIRECTORY=\''$(subst ','\'',$(subst ','\'',$(TEST_OUTPUT_DIRECTORY)))'\' >>$@+
 endif
diff --git a/cache.h b/cache.h
index f4f7eef..37f211b 100644
--- a/cache.h
+++ b/cache.h
@@ -687,6 +687,7 @@ extern char *git_replace_ref_base;
 
 extern int fsync_object_files;
 extern int core_preload_index;
+extern int core_watchman_sync_timeout;
 extern int core_apply_sparse_checkout;
 extern int precomposed_unicode;
 extern int protect_hfs;
diff --git a/config.c b/config.c
index 9ba40bc..e6dc141 100644
--- a/config.c
+++ b/config.c
@@ -882,6 +882,11 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.watchmansynctimeout")) {
+		core_watchman_sync_timeout = git_config_int(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.createobject")) {
 		if (!strcmp(value, "rename"))
 			object_creation_mode = OBJECT_CREATION_USES_RENAMES;
diff --git a/configure.ac b/configure.ac
index 0cd9f46..334d63b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1099,6 +1099,14 @@ AC_COMPILE_IFELSE([BSD_SYSCTL_SRC],
 	HAVE_BSD_SYSCTL=])
 GIT_CONF_SUBST([HAVE_BSD_SYSCTL])
 
+#
+# Check for watchman client library
+
+AC_CHECK_LIB([watchman], [watchman_connect],
+	[USE_WATCHMAN=YesPlease],
+	[USE_WATCHMAN=])
+GIT_CONF_SUBST([USE_WATCHMAN])
+
 ## Other checks.
 # Define USE_PIC if you need the main git objects to be built with -fPIC
 # in order to build and link perl/Git.so.  x86-64 seems to need this.
diff --git a/environment.c b/environment.c
index 6dec9d0..35e03c7 100644
--- a/environment.c
+++ b/environment.c
@@ -94,6 +94,9 @@ int core_preload_index = 1;
  */
 int ignore_untracked_cache_config;
 
+int core_watchman_sync_timeout = 300;
+
+
 /* This is set by setup_git_dir_gently() and/or git_default_config() */
 char *git_work_tree_cfg;
 static char *work_tree;
diff --git a/watchman-support.c b/watchman-support.c
new file mode 100644
index 0000000..b168e88
--- /dev/null
+++ b/watchman-support.c
@@ -0,0 +1,135 @@
+#include "cache.h"
+#include "watchman-support.h"
+#include "strbuf.h"
+#include "dir.h"
+#include <watchman.h>
+
+static struct watchman_query *make_query(const char *last_update)
+{
+	struct watchman_query *query = watchman_query();
+	watchman_query_set_fields(query, WATCHMAN_FIELD_NAME |
+					 WATCHMAN_FIELD_EXISTS |
+					 WATCHMAN_FIELD_NEWER);
+	watchman_query_set_empty_on_fresh(query, 1);
+	query->sync_timeout = core_watchman_sync_timeout;
+	if (*last_update)
+		watchman_query_set_since_oclock(query, last_update);
+	return query;
+}
+
+static struct watchman_query_result* query_watchman(
+	struct index_state *istate, struct watchman_connection *connection,
+	const char *fs_path, const char *last_update)
+{
+	struct watchman_error wm_error;
+	struct watchman_query *query;
+	struct watchman_expression *expr;
+	struct watchman_query_result *result;
+
+	query = make_query(last_update);
+	expr = watchman_true_expression();
+	result = watchman_do_query(connection, fs_path, query, expr, &wm_error);
+	watchman_free_query(query);
+	watchman_free_expression(expr);
+
+	if (!result)
+		warning("Watchman query error: %s (at %s)",
+			wm_error.message,
+			*last_update ? last_update : "the beginning");
+
+	return result;
+}
+
+static void update_index(struct index_state *istate,
+			 struct watchman_query_result *result)
+{
+	int i;
+
+	if (result->is_fresh_instance) {
+		/* let refresh clear them later */
+		for (i = 0; i < istate->cache_nr; i++)
+			istate->cache[i]->ce_flags |= CE_WATCHMAN_DIRTY;
+		goto done;
+	}
+
+	for (i = 0; i < result->nr; i++) {
+		struct watchman_stat *wm = result->stats + i;
+		int pos;
+
+		if (S_ISDIR(wm->mode) ||
+		    !strncmp(wm->name, ".git/", 5) ||
+		    strstr(wm->name, "/.git/"))
+			continue;
+
+		pos = index_name_pos(istate, wm->name, strlen(wm->name));
+		if (pos < 0) {
+			if (istate->untracked) {
+				char *name = xstrdup(wm->name);
+				char *dname = dirname(name);
+
+				/*
+				 * dirname() returns '.' for the root,
+				 * but we call it ''.
+				 */
+				if (dname[0] == '.' && dname[1] == 0)
+					string_list_append(&istate->untracked->invalid_untracked, "");
+				else
+					string_list_append(&istate->untracked->invalid_untracked,
+							   dname);
+				free(name);
+			}
+			continue;
+		}
+		/* FIXME: ignore staged entries and gitlinks too? */
+
+		istate->cache[pos]->ce_flags |= CE_WATCHMAN_DIRTY;
+	}
+
+done:
+	free(istate->last_update);
+	istate->last_update    = xstrdup(result->clock);
+	istate->cache_changed |= WATCHMAN_CHANGED;
+	if (istate->untracked)
+		string_list_remove_duplicates(&istate->untracked->invalid_untracked, 0);
+}
+
+int check_watchman(struct index_state *istate)
+{
+	struct watchman_error wm_error;
+	struct watchman_connection *connection;
+	struct watchman_query_result *result;
+	const char *fs_path;
+	struct timeval timeout;
+	/*
+	 * Convert core_watchman_sync_timeout, in milliseconds, to
+	 * struct timeval, in seconds and microseconds.
+	 */
+
+	fs_path = get_git_work_tree();
+	if (!fs_path)
+		return -1;
+
+	timeout.tv_sec = core_watchman_sync_timeout / 1000;
+	timeout.tv_usec = (core_watchman_sync_timeout % 1000) * 1000;
+	connection = watchman_connect(timeout, &wm_error);
+
+	if (!connection) {
+		warning("Watchman watch error: %s", wm_error.message);
+		return -1;
+	}
+
+	if (watchman_watch(connection, fs_path, &wm_error)) {
+		warning("Watchman watch error: %s", wm_error.message);
+		watchman_connection_close(connection);
+		return -1;
+	}
+
+
+	result = query_watchman(istate, connection, fs_path, istate->last_update);
+	watchman_connection_close(connection);
+	if (!result)
+		return -1;
+	update_index(istate, result);
+	watchman_free_query_result(result);
+	return 0;
+}
diff --git a/watchman-support.h b/watchman-support.h
new file mode 100644
index 0000000..ee1ef2c
--- /dev/null
+++ b/watchman-support.h
@@ -0,0 +1,7 @@
+#ifndef WATCHMAN_SUPPORT_H
+#define WATCHMAN_SUPPORT_H
+
+struct index_state;
+int check_watchman(struct index_state *index);
+
+#endif /* WATCHMAN_SUPPORT_H */
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 09/15] index-helper: use watchman to avoid refreshing index with lstat()
  2016-04-19 23:27 [PATCH v5 00/15] index-helper/watchman David Turner
                   ` (7 preceding siblings ...)
  2016-04-19 23:28 ` [PATCH v5 08/15] Add watchman support to reduce index refresh cost David Turner
@ 2016-04-19 23:28 ` David Turner
  2016-04-20  0:15   ` Duy Nguyen
  2016-04-19 23:28 ` [PATCH v5 10/15] update-index: enable/disable watchman support David Turner
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 35+ messages in thread
From: David Turner @ 2016-04-19 23:28 UTC (permalink / raw)
  To: git, pclouds; +Cc: David Turner

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Watchman is hidden behind index-helper. Before git tries to read the
index from shm, it notifies index-helper through the socket and waits
for index-helper to prepare a file for sharing memory (with
MAP_SHARED). index-helper then contacts watchman, updates 'WAMA'
extension and put it in a separate file and wakes git up with a reply
to git's socket.

Git uses this extension to not lstat unchanged entries. Git only
trusts the 'WAMA' extension when it's received from the separate file,
not from disk. Unmarked entries are "clean". Marked entries are dirty
from watchman point of view. If it finds out some entries are
'watchman-dirty', but are really unchanged (e.g. the file was changed,
then reverted back), then Git will clear the marking in 'WAMA' before
writing it down.

Hiding watchman behind index-helper means you need both daemons. You
can't run watchman alone. Not so good. But on the other hand, 'git'
binary is not linked to watchman/json libraries, which is good for
packaging. Core git package will run fine without watchman-related
packages. If they need watchman, they can install git-index-helper and
dependencies.

This also lets us trust anything in the untracked cache that we haven't
marked invalid, saving those stat() calls.

Another reason for tying watchman to index-helper is, when used with
untracked cache, we need to keep track of $GIT_WORK_TREE file
listing. That kind of list can be kept in index-helper.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
---
 Documentation/git-index-helper.txt |   6 +
 cache.h                            |   2 +
 dir.c                              |  23 +++-
 dir.h                              |   3 +
 index-helper.c                     |  86 +++++++++++++--
 read-cache.c                       | 220 ++++++++++++++++++++++++++++++++++---
 6 files changed, 315 insertions(+), 25 deletions(-)

diff --git a/Documentation/git-index-helper.txt b/Documentation/git-index-helper.txt
index f6aa525..b1fa49e 100644
--- a/Documentation/git-index-helper.txt
+++ b/Documentation/git-index-helper.txt
@@ -52,6 +52,12 @@ command. The following commands are used to control the daemon:
 	Let the daemon know the index is to be read. It keeps the
 	daemon alive longer, unless `--exit-after=0` is used.
 
+"poke <path>":
+	Like "poke", but replies with "OK".  If the index has the
+	watchman extension, index-helper queries watchman, then
+	prepares a shared memory object with the watchman index
+	extension before replying.
+
 All commands and replies are terminated by a 0 byte.
 
 GIT
diff --git a/cache.h b/cache.h
index 37f211b..4cc89bb 100644
--- a/cache.h
+++ b/cache.h
@@ -558,6 +558,7 @@ extern int daemonize(int *);
 
 /* Initialize and use the cache information */
 struct lock_file;
+extern int verify_index(const struct index_state *);
 extern int read_index(struct index_state *);
 extern int read_index_preload(struct index_state *, const struct pathspec *pathspec);
 extern int do_read_index(struct index_state *istate, const char *path,
@@ -565,6 +566,7 @@ extern int do_read_index(struct index_state *istate, const char *path,
 extern int read_index_from(struct index_state *, const char *path);
 extern int is_index_unborn(struct index_state *);
 extern int read_index_unmerged(struct index_state *);
+extern void write_watchman_ext(struct strbuf *sb, struct index_state* istate);
 #define COMMIT_LOCK		(1 << 0)
 #define CLOSE_LOCK		(1 << 1)
 extern int write_locked_index(struct index_state *, struct lock_file *lock, unsigned flags);
diff --git a/dir.c b/dir.c
index 69e0be6..5058b29 100644
--- a/dir.c
+++ b/dir.c
@@ -597,9 +597,9 @@ static void trim_trailing_spaces(char *buf)
  *
  * If "name" has the trailing slash, it'll be excluded in the search.
  */
-static struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
-						    struct untracked_cache_dir *dir,
-						    const char *name, int len)
+struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
+					     struct untracked_cache_dir *dir,
+					     const char *name, int len)
 {
 	int first, last;
 	struct untracked_cache_dir *d;
@@ -1726,6 +1726,17 @@ static int valid_cached_dir(struct dir_struct *dir,
 	if (!untracked)
 		return 0;
 
+	if (dir->untracked->use_watchman) {
+		/*
+		 * With watchman, we can trust the untracked cache's
+		 * valid field.
+		 */
+		if (untracked->valid)
+			goto skip_stat;
+		else
+			invalidate_directory(dir->untracked, untracked);
+	}
+
 	if (stat(path->len ? path->buf : ".", &st)) {
 		invalidate_directory(dir->untracked, untracked);
 		memset(&untracked->stat_data, 0, sizeof(untracked->stat_data));
@@ -1739,6 +1750,7 @@ static int valid_cached_dir(struct dir_struct *dir,
 		return 0;
 	}
 
+skip_stat:
 	if (untracked->check_only != !!check_only) {
 		invalidate_directory(dir->untracked, untracked);
 		return 0;
@@ -2625,8 +2637,10 @@ static void free_untracked(struct untracked_cache_dir *ucd)
 
 void free_untracked_cache(struct untracked_cache *uc)
 {
-	if (uc)
+	if (uc) {
 		free_untracked(uc->root);
+		string_list_clear(&uc->invalid_untracked, 0);
+	}
 	free(uc);
 }
 
@@ -2775,6 +2789,7 @@ struct untracked_cache *read_untracked_extension(const void *data, unsigned long
 		return NULL;
 
 	uc = xcalloc(1, sizeof(*uc));
+	string_list_init(&uc->invalid_untracked, 1);
 	strbuf_init(&uc->ident, ident_len);
 	strbuf_add(&uc->ident, ident, ident_len);
 	load_sha1_stat(&uc->ss_info_exclude, &ouc->info_exclude_stat,
diff --git a/dir.h b/dir.h
index 3d540de..8fd3f9e 100644
--- a/dir.h
+++ b/dir.h
@@ -315,4 +315,7 @@ struct untracked_cache *read_untracked_extension(const void *data, unsigned long
 void write_untracked_extension(struct strbuf *out, struct untracked_cache *untracked);
 void add_untracked_cache(struct index_state *istate);
 void remove_untracked_cache(struct index_state *istate);
+struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
+					     struct untracked_cache_dir *dir,
+					     const char *name, int len);
 #endif
diff --git a/index-helper.c b/index-helper.c
index 80b1dbe..9d85a7c 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -7,15 +7,18 @@
 #include "lockfile.h"
 #include "cache.h"
 #include "unix-socket.h"
+#include "watchman-support.h"
 
 struct shm {
 	unsigned char sha1[20];
 	void *shm;
 	size_t size;
+	pid_t pid;
 };
 
 static struct shm shm_index;
 static struct shm shm_base_index;
+static struct shm shm_watchman;
 static int daemonized, to_verify = 1;
 
 static void release_index_shm(struct shm *is)
@@ -27,10 +30,21 @@ static void release_index_shm(struct shm *is)
 	is->shm = NULL;
 }
 
+static void release_watchman_shm(struct shm *is)
+{
+	if (!is->shm)
+		return;
+	munmap(is->shm, is->size);
+	unlink(git_path("shm-watchman-%s-%" PRIuMAX,
+			sha1_to_hex(is->sha1), (uintmax_t)is->pid));
+	is->shm = NULL;
+}
+
 static void cleanup_shm(void)
 {
 	release_index_shm(&shm_index);
 	release_index_shm(&shm_base_index);
+	release_watchman_shm(&shm_watchman);
 }
 
 static void cleanup(void)
@@ -161,9 +175,10 @@ static void share_the_index(void)
 	if (the_index.split_index && the_index.split_index->base)
 		share_index(the_index.split_index->base, &shm_base_index);
 	share_index(&the_index, &shm_index);
-	if (to_verify && !verify_shm())
+	if (to_verify && !verify_shm()) {
 		cleanup_shm();
-	discard_index(&the_index);
+		discard_index(&the_index);
+	}
 }
 
 static void set_socket_blocking_flag(int fd, int make_nonblocking)
@@ -196,6 +211,54 @@ static void refresh(void)
 
 #ifndef NO_MMAP
 
+#ifdef USE_WATCHMAN
+static void share_watchman(struct index_state *istate,
+			   struct shm *is, pid_t pid)
+{
+	struct strbuf sb = STRBUF_INIT;
+	void *shm;
+
+	write_watchman_ext(&sb, istate);
+	if (!shared_mmap_create(O_CREAT | O_EXCL | O_RDWR, 0700, sb.len + 20,
+				&shm, PROT_READ | PROT_WRITE, MAP_SHARED,
+				git_path("shm-watchman-%s-%" PRIuMAX,
+					 sha1_to_hex(istate->sha1),
+					 (uintmax_t)pid))) {
+		is->size = sb.len + 20;
+		is->shm = shm;
+		is->pid = pid;
+		hashcpy(is->sha1, istate->sha1);
+
+		memcpy(shm, sb.buf, sb.len);
+		hashcpy((unsigned char *)shm + is->size - 20, is->sha1);
+	}
+	strbuf_release(&sb);
+}
+
+
+static void prepare_with_watchman(pid_t pid)
+{
+	/*
+	 * TODO: with the help of watchman, maybe we could detect if
+	 * $GIT_DIR/index is updated.
+	 */
+	if (check_watchman(&the_index))
+		return;
+
+	share_watchman(&the_index, &shm_watchman, pid);
+}
+
+static void prepare_index(pid_t pid)
+{
+	if (shm_index.shm == NULL)
+		refresh();
+	release_watchman_shm(&shm_watchman);
+	if (the_index.last_update)
+		prepare_with_watchman(pid);
+}
+
+#endif
+
 static void loop(int fd, int idle_in_seconds)
 {
 	struct timeval timeout;
@@ -250,11 +313,20 @@ static void loop(int fd, int idle_in_seconds)
 		if (strbuf_getwholeline_fd(&command, client_fd, '\0') == 0) {
 			if (!strcmp(command.buf, "refresh")) {
 				refresh();
-			} else if (!strcmp(command.buf, "poke")) {
-				/*
-				 * Just a poke to keep us
-				 * alive, nothing to do.
-				 */
+			} else if (starts_with(command.buf, "poke")) {
+				if (command.buf[4] == ' ') {
+#ifdef USE_WATCHMAN
+					pid_t client_pid = strtoull(command.buf + 5, NULL, 10);
+					prepare_index(client_pid);
+#endif
+					if (write_in_full(client_fd, "OK", 3) != 3)
+						warning(_("client write failed"));
+				} else {
+					/*
+					 * Just a poke to keep us
+					 * alive, nothing to do.
+					 */
+				}
 			} else {
 				warning("BUG: Bogus command %s", command.buf);
 			}
diff --git a/read-cache.c b/read-cache.c
index 050668f..fb0168e 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1233,7 +1233,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 		if (!new) {
 			const char *fmt;
 
-			if (really && cache_errno == EINVAL) {
+			if (really || cache_errno == EINVAL) {
 				/* If we are doing --really-refresh that
 				 * means the index is not valid anymore.
 				 */
@@ -1373,11 +1373,76 @@ static int verify_hdr(const struct cache_header *hdr, unsigned long size)
 	return 0;
 }
 
+static struct untracked_cache_dir *find_untracked_cache_dir(
+	struct untracked_cache *uc, struct untracked_cache_dir *ucd,
+	const char *name)
+{
+	int component_len;
+	const char *end;
+	struct untracked_cache_dir *dir;
+
+	if (!*name)
+		return ucd;
+
+	end = strchr(name, '/');
+	if (end)
+		component_len = end - name;
+	else
+		component_len = strlen(name);
+
+	dir = lookup_untracked(uc, ucd, name, component_len);
+	if (dir)
+		return find_untracked_cache_dir(uc, dir, name + component_len + 1);
+
+	return NULL;
+}
+
 static void mark_no_watchman(size_t pos, void *data)
 {
 	struct index_state *istate = data;
+	struct cache_entry *ce = istate->cache[pos];
+	struct strbuf sb = STRBUF_INIT;
+	char *c;
+	struct untracked_cache_dir *dir;
+
 	assert(pos < istate->cache_nr);
-	istate->cache[pos]->ce_flags |= CE_WATCHMAN_DIRTY;
+	ce->ce_flags |= CE_WATCHMAN_DIRTY;
+
+	if (!istate->untracked || !istate->untracked->root)
+		return;
+
+	strbuf_add(&sb, ce->name, ce_namelen(ce));
+
+	for (c = sb.buf + sb.len - 1; c > sb.buf; c--) {
+		if (*c == '/') {
+			strbuf_setlen(&sb, c - sb.buf);
+			break;
+		}
+	}
+
+	if (c == sb.buf) {
+		strbuf_setlen(&sb, 0);
+	}
+
+	dir = find_untracked_cache_dir(istate->untracked,
+				       istate->untracked->root, sb.buf);
+	if (dir)
+		dir->valid = 0;
+
+	strbuf_release(&sb);
+}
+
+static int mark_untracked_invalid(struct string_list_item *item, void *uc)
+{
+	struct untracked_cache *untracked = uc;
+	struct untracked_cache_dir *dir;
+
+	dir = find_untracked_cache_dir(untracked, untracked->root,
+				       item->string);
+	if (dir)
+		dir->valid = 0;
+
+	return 0;
 }
 
 static int read_watchman_ext(struct index_state *istate, const void *data,
@@ -1407,10 +1472,24 @@ static int read_watchman_ext(struct index_state *istate, const void *data,
 	ewah_each_bit(bitmap, mark_no_watchman, istate);
 	ewah_free(bitmap);
 
-	/*
-	 * TODO: update the untracked cache from the untracked data in this
-	 * extension.
-	 */
+	if (istate->untracked && istate->untracked->root) {
+		int i;
+		const char *untracked;
+
+		untracked = (const char *)data + len + 8 + bitmap_size;
+		for (i = 0; i < untracked_nr; ++i) {
+			int len = strlen(untracked);
+			string_list_append(&istate->untracked->invalid_untracked,
+					   untracked);
+			untracked += len + 1;
+		}
+
+		for_each_string_list(&istate->untracked->invalid_untracked,
+			 mark_untracked_invalid, istate->untracked);
+
+		if (untracked_nr)
+			istate->cache_changed |= WATCHMAN_CHANGED;
+	}
 	return 0;
 }
 
@@ -1643,26 +1722,67 @@ static void post_read_index_from(struct index_state *istate)
 	tweak_untracked_cache(istate);
 }
 
+static int poke_and_wait_for_reply(int fd)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct strbuf reply = STRBUF_INIT;
+	int ret = -1;
+	fd_set fds;
+	struct timeval timeout;
+
+	timeout.tv_usec = 0;
+	timeout.tv_sec = 1;
+
+	if (fd < 0)
+		return -1;
+
+	strbuf_addf(&buf, "poke %d", getpid());
+	if (write_in_full(fd, buf.buf, buf.len + 1) != buf.len + 1)
+		goto done_poke;
+
+	/* Now wait for a reply */
+	FD_ZERO(&fds);
+	FD_SET(fd, &fds);
+	if (select(fd + 1, &fds, NULL, NULL, &timeout) == 0)
+		/* No reply, giving up */
+		goto done_poke;
+
+	if (strbuf_getwholeline_fd(&reply, fd, 0))
+		goto done_poke;
+
+	if (!starts_with(reply.buf, "OK"))
+		goto done_poke;
+
+	ret = 0;
+done_poke:
+	close(fd);
+	strbuf_release(&buf);
+	strbuf_release(&reply);
+
+	return ret;
+}
+
 static int poke_daemon(struct index_state *istate,
 		       const struct stat *st, int refresh_cache)
 {
 	int fd;
-	int ret = 0;
-	const char *socket_path;
+	int ret = -1;
 
 	/* if this is from index-helper, do not poke itself (recursively) */
 	if (istate->to_shm)
 		return 0;
 
-	socket_path = git_path("index-helper.sock");
-	if (!socket_path)
+	fd = unix_stream_connect(git_path("index-helper.sock"));
+	if (fd < 0) {
+		warning("Failed to connect to index-helper socket");
+		unlink(git_path("index-helper.sock"));
 		return -1;
+	}
 
-	fd = unix_stream_connect(socket_path);
 	if (refresh_cache) {
 		ret = write_in_full(fd, "refresh", 8) != 8;
 	} else {
-		ret = write_in_full(fd, "poke", 5) != 5;
+		ret = poke_and_wait_for_reply(fd);
 	}
 
 	close(fd);
@@ -1732,6 +1852,74 @@ fail:
 	return -1;
 }
 
+static void refresh_by_watchman(struct index_state *istate)
+{
+	void *shm = NULL;
+	int length;
+	int i;
+	struct stat st;
+	int fd = -1;
+	const char *path = git_path("shm-watchman-%s-%"PRIuMAX,
+				    sha1_to_hex(istate->sha1),
+				    (uintmax_t)getpid());
+
+	fd = open(path, O_RDONLY);
+	if (fd < 0)
+		return;
+
+	/*
+	 * This watchman data is just for us -- no need to keep it
+	 * around once we've got it open.
+	 */
+	unlink(path);
+
+	if (fstat(fd, &st) < 0)
+		goto done;
+
+	length = st.st_size;
+	shm = mmap(NULL, length, PROT_READ, MAP_SHARED, fd, 0);
+
+	if (shm == MAP_FAILED)
+		goto done;
+
+	close(fd);
+	fd = -1;
+
+	if (length <= 20 ||
+	    hashcmp(istate->sha1, (unsigned char *)shm + length - 20) ||
+	    /*
+	     * No need to clear CE_WATCHMAN_DIRTY set by 'WAMA' on
+	     * disk. Watchman can only set more, not clear any, so
+	     * this is OR mask.
+	     */
+	    read_watchman_ext(istate, shm, length - 20))
+		goto done;
+
+	/*
+	 * Now that we've marked the invalid entries in the
+	 * untracked-cache itself, we can erase them from the list of
+	 * entries to be processed and mark the untracked cache for
+	 * watchman usage.
+	 */
+	if (istate->untracked) {
+		string_list_clear(&istate->untracked->invalid_untracked, 0);
+		istate->untracked->use_watchman = 1;
+	}
+
+	for (i = 0; i < istate->cache_nr; i++) {
+		struct cache_entry *ce = istate->cache[i];
+		if (ce_stage(ce) || (ce->ce_flags & CE_WATCHMAN_DIRTY))
+			continue;
+		ce_mark_uptodate(ce);
+	}
+done:
+	if (shm)
+		munmap(shm, length);
+
+	if (fd >= 0)
+		close(fd);
+}
+
 /* remember to discard_cache() before reading a different cache! */
 int do_read_index(struct index_state *istate, const char *path, int must_exist)
 {
@@ -1851,7 +2039,7 @@ int read_index_from(struct index_state *istate, const char *path)
 	split_index = istate->split_index;
 	if (!split_index || is_null_sha1(split_index->base_sha1)) {
 		post_read_index_from(istate);
-		return ret;
+		goto done;
 	}
 
 	if (split_index->base)
@@ -1872,6 +2060,10 @@ int read_index_from(struct index_state *istate, const char *path)
 		    sha1_to_hex(split_index->base->sha1));
 	merge_base_index(istate);
 	post_read_index_from(istate);
+
+done:
+	if (ret > 0 && istate->from_shm && istate->last_update)
+		refresh_by_watchman(istate);
 	return ret;
 }
 
@@ -2173,7 +2365,7 @@ out:
 	return 0;
 }
 
-static int verify_index(const struct index_state *istate)
+int verify_index(const struct index_state *istate)
 {
 	return verify_index_from(istate, get_index_file());
 }
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 10/15] update-index: enable/disable watchman support
  2016-04-19 23:27 [PATCH v5 00/15] index-helper/watchman David Turner
                   ` (8 preceding siblings ...)
  2016-04-19 23:28 ` [PATCH v5 09/15] index-helper: use watchman to avoid refreshing index with lstat() David Turner
@ 2016-04-19 23:28 ` David Turner
  2016-04-19 23:45   ` Duy Nguyen
  2016-04-19 23:28 ` [PATCH v5 11/15] unpack-trees: preserve index extensions David Turner
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 35+ messages in thread
From: David Turner @ 2016-04-19 23:28 UTC (permalink / raw)
  To: git, pclouds; +Cc: David Turner

From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: David Turner <dturner@twopensource.com>
---
 Documentation/git-index-helper.txt |  3 +++
 Documentation/git-update-index.txt |  6 ++++++
 builtin/update-index.c             | 11 +++++++++++
 3 files changed, 20 insertions(+)

diff --git a/Documentation/git-index-helper.txt b/Documentation/git-index-helper.txt
index b1fa49e..dd039a1 100644
--- a/Documentation/git-index-helper.txt
+++ b/Documentation/git-index-helper.txt
@@ -15,6 +15,9 @@ DESCRIPTION
 Keep the index file in memory for faster access. This daemon is per
 repository.
 
+If you want the index-helper to accelerate untracked file checking,
+run git update-index --watchman before using it.
+
 OPTIONS
 -------
 
diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt
index c6cbed1..6736487 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -19,6 +19,7 @@ SYNOPSIS
 	     [--ignore-submodules]
 	     [--[no-]split-index]
 	     [--[no-|test-|force-]untracked-cache]
+	     [--[no-]watchman]
 	     [--really-refresh] [--unresolve] [--again | -g]
 	     [--info-only] [--index-info]
 	     [-z] [--stdin] [--index-version <n>]
@@ -176,6 +177,11 @@ may not support it yet.
 --no-untracked-cache::
 	Enable or disable untracked cache feature. Please use
 	`--test-untracked-cache` before enabling it.
+
+--watchman::
+--no-watchman::
+	Enable or disable watchman support. This is, at present,
+	only useful with git index-helper.
 +
 These options take effect whatever the value of the `core.untrackedCache`
 configuration variable (see linkgit:git-config[1]). But a warning is
diff --git a/builtin/update-index.c b/builtin/update-index.c
index 1c94ca5..7c08e1c 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -914,6 +914,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 {
 	int newfd, entries, has_errors = 0, nul_term_line = 0;
 	enum uc_mode untracked_cache = UC_UNSPECIFIED;
+	int use_watchman = -1;
 	int read_from_stdin = 0;
 	int prefix_length = prefix ? strlen(prefix) : 0;
 	int preferred_index_format = 0;
@@ -1012,6 +1013,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 			    N_("test if the filesystem supports untracked cache"), UC_TEST),
 		OPT_SET_INT(0, "force-untracked-cache", &untracked_cache,
 			    N_("enable untracked cache without testing the filesystem"), UC_FORCE),
+		OPT_BOOL(0, "watchman", &use_watchman,
+			N_("use or not use watchman to reduce refresh cost")),
 		OPT_END()
 	};
 
@@ -1149,6 +1152,14 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 		die("Bug: bad untracked_cache value: %d", untracked_cache);
 	}
 
+	if (use_watchman > 0) {
+		the_index.last_update    = xstrdup("");
+		the_index.cache_changed |= WATCHMAN_CHANGED;
+	} else if (!use_watchman) {
+		the_index.last_update    = NULL;
+		the_index.cache_changed |= WATCHMAN_CHANGED;
+	}
+
 	if (active_cache_changed) {
 		if (newfd < 0) {
 			if (refresh_args.flags & REFRESH_QUIET)
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 11/15] unpack-trees: preserve index extensions
  2016-04-19 23:27 [PATCH v5 00/15] index-helper/watchman David Turner
                   ` (9 preceding siblings ...)
  2016-04-19 23:28 ` [PATCH v5 10/15] update-index: enable/disable watchman support David Turner
@ 2016-04-19 23:28 ` David Turner
  2016-04-19 23:28 ` [PATCH v5 12/15] index-helper: kill mode David Turner
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-04-19 23:28 UTC (permalink / raw)
  To: git, pclouds; +Cc: David Turner

Make git checkout (and other unpack_tree operations) preserve the
untracked cache and watchman status. This is valuable for two reasons:

1. Often, an unpack_tree operation will not touch large parts of the
working tree, and thus most of the untracked cache will continue to be
valid.

2. Even if the untracked cache were entirely invalidated by such an
operation, the user has signaled their intention to have such a cache,
and we don't want to throw it away.

The same logic applies to the watchman state.

Signed-off-by: David Turner <dturner@twopensource.com>
---
 cache.h                           |  1 +
 read-cache.c                      |  8 ++++++++
 t/t7063-status-untracked-cache.sh | 22 ++++++++++++++++++++++
 t/test-lib-functions.sh           |  4 ++++
 unpack-trees.c                    |  1 +
 5 files changed, 36 insertions(+)

diff --git a/cache.h b/cache.h
index 4cc89bb..49fa128 100644
--- a/cache.h
+++ b/cache.h
@@ -571,6 +571,7 @@ extern void write_watchman_ext(struct strbuf *sb, struct index_state* istate);
 #define CLOSE_LOCK		(1 << 1)
 extern int write_locked_index(struct index_state *, struct lock_file *lock, unsigned flags);
 extern int discard_index(struct index_state *);
+extern void move_index_extensions(struct index_state *dst, struct index_state *src);
 extern int unmerged_index(const struct index_state *);
 extern int verify_path(const char *path);
 extern int index_dir_exists(struct index_state *istate, const char *name, int namelen);
diff --git a/read-cache.c b/read-cache.c
index fb0168e..65f22f9 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2745,3 +2745,11 @@ void stat_validity_update(struct stat_validity *sv, int fd)
 		fill_stat_data(sv->sd, &st);
 	}
 }
+
+void move_index_extensions(struct index_state *dst, struct index_state *src)
+{
+	dst->untracked = src->untracked;
+	src->untracked = NULL;
+	dst->last_update = src->last_update;
+	src->last_update = NULL;
+}
diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
index a971884..083516d 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -646,4 +646,26 @@ test_expect_success 'test ident field is working' '
 	test_cmp ../expect ../err
 '
 
+test_expect_success 'untracked cache survives a checkout' '
+	git commit --allow-empty -m empty &&
+	test-dump-untracked-cache >../before &&
+	test_when_finished  "git checkout master" &&
+	git checkout -b other_branch &&
+	test-dump-untracked-cache >../after &&
+	test_cmp ../before ../after &&
+	test_commit test &&
+	test-dump-untracked-cache >../before &&
+	git checkout master &&
+	test-dump-untracked-cache >../after &&
+	test_cmp ../before ../after
+'
+
+test_expect_success 'untracked cache survives a commit' '
+	test-dump-untracked-cache >../before &&
+	git add done/two &&
+	git commit -m commit &&
+	test-dump-untracked-cache >../after &&
+	test_cmp ../before ../after
+'
+
 test_done
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 8d99eb3..e974b5b 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -186,6 +186,10 @@ test_commit () {
 		test_tick
 	fi &&
 	git commit $signoff -m "$1" &&
+	if [ "$(git config core.bare)" = false ]
+	then
+	    git update-index --force-untracked-cache
+	fi
 	git tag "${4:-$1}"
 }
 
diff --git a/unpack-trees.c b/unpack-trees.c
index 9f55cc2..fc90eb3 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1215,6 +1215,7 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 						  WRITE_TREE_SILENT |
 						  WRITE_TREE_REPAIR);
 		}
+		move_index_extensions(&o->result, o->dst_index);
 		discard_index(o->dst_index);
 		*o->dst_index = o->result;
 	} else {
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 12/15] index-helper: kill mode
  2016-04-19 23:27 [PATCH v5 00/15] index-helper/watchman David Turner
                   ` (10 preceding siblings ...)
  2016-04-19 23:28 ` [PATCH v5 11/15] unpack-trees: preserve index extensions David Turner
@ 2016-04-19 23:28 ` David Turner
  2016-04-19 23:28 ` [PATCH v5 13/15] index-helper: don't run if already running David Turner
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-04-19 23:28 UTC (permalink / raw)
  To: git, pclouds; +Cc: David Turner

Add a new command (and command-line arg) to allow index-helpers to
exit cleanly.

This is mainly useful for tests.

Signed-off-by: David Turner <dturner@twopensource.com>
---
 Documentation/git-index-helper.txt |  3 +++
 index-helper.c                     | 31 ++++++++++++++++++++++++++++++-
 t/t7900-index-helper.sh            |  9 +++++++++
 3 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-index-helper.txt b/Documentation/git-index-helper.txt
index dd039a1..7d80638 100644
--- a/Documentation/git-index-helper.txt
+++ b/Documentation/git-index-helper.txt
@@ -37,6 +37,9 @@ OPTIONS
 --detach::
 	Detach from the shell.
 
+--kill::
+	Kill any running index-helper and clean up the socket
+
 NOTES
 -----
 
diff --git a/index-helper.c b/index-helper.c
index 9d85a7c..6af01c9 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -327,6 +327,8 @@ static void loop(int fd, int idle_in_seconds)
 					 * alive, nothing to do.
 					 */
 				}
+			} else if (!strcmp(command.buf, "die")) {
+				break;
 			} else {
 				warning("BUG: Bogus command %s", command.buf);
 			}
@@ -357,10 +359,29 @@ static const char * const usage_text[] = {
 	NULL
 };
 
+static void request_kill(void)
+{
+	int fd = unix_stream_connect(git_path("index-helper.sock"));
+
+	if (fd >= 0) {
+		write_in_full(fd, "die", 4);
+		close(fd);
+	}
+
+	/*
+	 * The child will try to do this anyway, but we want to be
+	 * ready to launch a new daemon immediately after this command
+	 * returns.
+	 */
+
+	unlink(git_path("index-helper.sock"));
+	return;
+}
+
 int main(int argc, char **argv)
 {
 	const char *prefix;
-	int idle_in_seconds = 600, detach = 0;
+	int idle_in_seconds = 600, detach = 0, kill = 0;
 	int fd;
 	struct strbuf socket_path = STRBUF_INIT;
 	struct option options[] = {
@@ -369,6 +390,7 @@ int main(int argc, char **argv)
 		OPT_BOOL(0, "strict", &to_verify,
 			 "verify shared memory after creating"),
 		OPT_BOOL(0, "detach", &detach, "detach the process"),
+		OPT_BOOL(0, "kill", &kill, "request that existing index helper processes exit"),
 		OPT_END()
 	};
 
@@ -383,6 +405,13 @@ int main(int argc, char **argv)
 			  options, usage_text, 0))
 		die(_("too many arguments"));
 
+	if (kill) {
+		if (detach)
+			die(_("--kill doesn't want any other options"));
+		request_kill();
+		return 0;
+	}
+
 	atexit(cleanup);
 	sigchain_push_common(cleanup_on_signal);
 
diff --git a/t/t7900-index-helper.sh b/t/t7900-index-helper.sh
index 114c112..e71b5af 100755
--- a/t/t7900-index-helper.sh
+++ b/t/t7900-index-helper.sh
@@ -20,4 +20,13 @@ test_expect_success 'index-helper smoke test' '
 	test_path_is_missing .git/index-helper.sock
 '
 
+test_expect_success 'index-helper creates usable path file and can be killed' '
+	test_when_finished "git index-helper --kill" &&
+	test_path_is_missing .git/index-helper.sock &&
+	git index-helper --detach &&
+	test -S .git/index-helper.sock &&
+	git index-helper --kill &&
+	test_path_is_missing .git/index-helper.sock
+'
+
 test_done
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 13/15] index-helper: don't run if already running
  2016-04-19 23:27 [PATCH v5 00/15] index-helper/watchman David Turner
                   ` (11 preceding siblings ...)
  2016-04-19 23:28 ` [PATCH v5 12/15] index-helper: kill mode David Turner
@ 2016-04-19 23:28 ` David Turner
  2016-04-19 23:28 ` [PATCH v5 14/15] index-helper: autorun mode David Turner
  2016-04-19 23:28 ` [PATCH v5 15/15] index-helper: optionally automatically run David Turner
  14 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-04-19 23:28 UTC (permalink / raw)
  To: git, pclouds; +Cc: David Turner

Signed-off-by: David Turner <dturner@twopensource.com>
---
 index-helper.c          | 7 +++++++
 t/t7900-index-helper.sh | 9 +++++++++
 2 files changed, 16 insertions(+)

diff --git a/index-helper.c b/index-helper.c
index 6af01c9..8fcb76e 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -412,6 +412,13 @@ int main(int argc, char **argv)
 		return 0;
 	}
 
+	/* check that no other copy is running */
+	fd = unix_stream_connect(git_path("index-helper.sock"));
+	if (fd > 0)
+		die(_("Already running"));
+	if (errno != ECONNREFUSED && errno != ENOENT)
+		die_errno(_("Unexpected error checking socket"));
+
 	atexit(cleanup);
 	sigchain_push_common(cleanup_on_signal);
 
diff --git a/t/t7900-index-helper.sh b/t/t7900-index-helper.sh
index e71b5af..7159971 100755
--- a/t/t7900-index-helper.sh
+++ b/t/t7900-index-helper.sh
@@ -29,4 +29,13 @@ test_expect_success 'index-helper creates usable path file and can be killed' '
 	test_path_is_missing .git/index-helper.sock
 '
 
+test_expect_success 'index-helper will not start if already running' '
+	test_when_finished "git index-helper --kill" &&
+	git index-helper --detach &&
+	test -S .git/index-helper.sock &&
+	test_must_fail git index-helper 2>err &&
+	test -S .git/index-helper.sock &&
+	grep "Already running" err
+'
+
 test_done
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 14/15] index-helper: autorun mode
  2016-04-19 23:27 [PATCH v5 00/15] index-helper/watchman David Turner
                   ` (12 preceding siblings ...)
  2016-04-19 23:28 ` [PATCH v5 13/15] index-helper: don't run if already running David Turner
@ 2016-04-19 23:28 ` David Turner
  2016-04-19 23:28 ` [PATCH v5 15/15] index-helper: optionally automatically run David Turner
  14 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-04-19 23:28 UTC (permalink / raw)
  To: git, pclouds; +Cc: David Turner

Soon, we'll want to automatically start index-helper, so we need
a mode that silently exits if it can't start up (either because
it's not in a git dir, or because another one is already running).

Signed-off-by: David Turner <dturner@twopensource.com>
---
 Documentation/git-index-helper.txt |  4 ++++
 index-helper.c                     | 29 +++++++++++++++++++++++------
 t/t7900-index-helper.sh            |  8 ++++++++
 3 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-index-helper.txt b/Documentation/git-index-helper.txt
index 7d80638..91d7a36 100644
--- a/Documentation/git-index-helper.txt
+++ b/Documentation/git-index-helper.txt
@@ -40,6 +40,10 @@ OPTIONS
 --kill::
 	Kill any running index-helper and clean up the socket
 
+--autorun::
+	If index-helper is not already running, start it.  Else, do
+	nothing.
+
 NOTES
 -----
 
diff --git a/index-helper.c b/index-helper.c
index 8fcb76e..3d884da 100644
--- a/index-helper.c
+++ b/index-helper.c
@@ -381,8 +381,9 @@ static void request_kill(void)
 int main(int argc, char **argv)
 {
 	const char *prefix;
-	int idle_in_seconds = 600, detach = 0, kill = 0;
+	int idle_in_seconds = 600, detach = 0, kill = 0, autorun = 0;
 	int fd;
+	int nongit;
 	struct strbuf socket_path = STRBUF_INIT;
 	struct option options[] = {
 		OPT_INTEGER(0, "exit-after", &idle_in_seconds,
@@ -391,6 +392,7 @@ int main(int argc, char **argv)
 			 "verify shared memory after creating"),
 		OPT_BOOL(0, "detach", &detach, "detach the process"),
 		OPT_BOOL(0, "kill", &kill, "request that existing index helper processes exit"),
+		OPT_BOOL(0, "autorun", &autorun, "this is an automatic run of git index-helper, so certain errors can be solved by silently exiting"),
 		OPT_END()
 	};
 
@@ -400,7 +402,14 @@ int main(int argc, char **argv)
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage_with_options(usage_text, options);
 
-	prefix = setup_git_directory();
+	prefix = setup_git_directory_gently(&nongit);
+	if (nongit) {
+		if (autorun)
+			exit(0);
+		else
+			die(_("not a git repository"));
+	}
+
 	if (parse_options(argc, (const char **)argv, prefix,
 			  options, usage_text, 0))
 		die(_("too many arguments"));
@@ -414,10 +423,18 @@ int main(int argc, char **argv)
 
 	/* check that no other copy is running */
 	fd = unix_stream_connect(git_path("index-helper.sock"));
-	if (fd > 0)
-		die(_("Already running"));
-	if (errno != ECONNREFUSED && errno != ENOENT)
-		die_errno(_("Unexpected error checking socket"));
+	if (fd > 0) {
+		if (autorun)
+			exit(0);
+		else
+			die(_("Already running"));
+	}
+	if (errno != ECONNREFUSED && errno != ENOENT) {
+		if (autorun)
+			return 0;
+		else
+			die_errno(_("Unexpected error checking socket"));
+	}
 
 	atexit(cleanup);
 	sigchain_push_common(cleanup_on_signal);
diff --git a/t/t7900-index-helper.sh b/t/t7900-index-helper.sh
index 7159971..aa6e5fc 100755
--- a/t/t7900-index-helper.sh
+++ b/t/t7900-index-helper.sh
@@ -38,4 +38,12 @@ test_expect_success 'index-helper will not start if already running' '
 	grep "Already running" err
 '
 
+test_expect_success 'index-helper is quiet with --autorun' '
+	test_when_finished "git index-helper --kill" &&
+	git index-helper --kill &&
+	git index-helper --detach &&
+	test -S .git/index-helper.sock &&
+	git index-helper --autorun
+'
+
 test_done
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 15/15] index-helper: optionally automatically run
  2016-04-19 23:27 [PATCH v5 00/15] index-helper/watchman David Turner
                   ` (13 preceding siblings ...)
  2016-04-19 23:28 ` [PATCH v5 14/15] index-helper: autorun mode David Turner
@ 2016-04-19 23:28 ` David Turner
  2016-04-20  9:59   ` [PATCH 16/15] Add tracing to measure where most of the time is spent Nguyễn Thái Ngọc Duy
  14 siblings, 1 reply; 35+ messages in thread
From: David Turner @ 2016-04-19 23:28 UTC (permalink / raw)
  To: git, pclouds; +Cc: David Turner

Introduce a new config option, indexhelper.autorun, to automatically
run git index-helper before starting up a builtin git command.  This
enables users to keep index-helper running without manual
intervention.

Signed-off-by: David Turner <dturner@twopensource.com>
---
 Documentation/config.txt |  4 ++++
 read-cache.c             | 37 +++++++++++++++++++++++++++++++++++--
 t/t7900-index-helper.sh  | 19 +++++++++++++++++++
 3 files changed, 58 insertions(+), 2 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 2cd6bdd..8ec8824 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1852,6 +1852,10 @@ index.version::
 	Specify the version with which new index files should be
 	initialized.  This does not affect existing repositories.
 
+indexhelper.autorun::
+	Automatically run git index-helper when any builtin git
+	command is run inside a repository.
+
 init.templateDir::
 	Specify the directory from which templates will be copied.
 	(See the "TEMPLATE DIRECTORY" section of linkgit:git-init[1].)
diff --git a/read-cache.c b/read-cache.c
index 65f22f9..a73487e 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -20,6 +20,7 @@
 #include "utf8.h"
 #include "unix-socket.h"
 #include "ewah/ewok.h"
+#include "run-command.h"
 
 static struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
 					       unsigned int options);
@@ -1722,6 +1723,33 @@ static void post_read_index_from(struct index_state *istate)
 	tweak_untracked_cache(istate);
 }
 
+static int want_auto_index_helper(void)
+{
+	int want = -1;
+	const char *value = NULL;
+	const char *key = "indexhelper.autorun";
+
+	if (git_config_key_is_valid(key) &&
+	    !git_config_get_value(key, &value)) {
+		int b = git_config_maybe_bool(key, value);
+		want = b >= 0 ? b : 0;
+	}
+	return want;
+}
+
+static void autorun_index_helper(void)
+{
+	const char *argv[] = {"git-index-helper", "--detach", "--autorun", NULL};
+	if (want_auto_index_helper() <= 0)
+		return;
+
+	trace_argv_printf(argv, "trace: auto index-helper:");
+
+	if (run_command_v_opt(argv,
+			      RUN_SILENT_EXEC_FAILURE | RUN_CLEAN_ON_EXIT))
+		warning(_("You specified indexhelper.autorun, but running git-index-helper failed."));
+}
+
 static int poke_and_wait_for_reply(int fd)
 {
 	struct strbuf buf = STRBUF_INIT;
@@ -1776,6 +1804,7 @@ static int poke_daemon(struct index_state *istate,
 	if (fd < 0) {
 		warning("Failed to connect to index-helper socket");
 		unlink(git_path("index-helper.sock"));
+		autorun_index_helper();
 		return -1;
 	}
 
@@ -1811,9 +1840,13 @@ static int try_shm(struct index_state *istate)
 	int fd = -1;
 
 	if (!is_main_index(istate) ||
-	    old_size <= 20 ||
-	    stat(git_path("index-helper.sock"), &st))
+	    old_size <= 20)
 		return -1;
+
+	if (stat(git_path("index-helper.sock"), &st)) {
+		autorun_index_helper();
+		return -1;
+	}
 	if (poke_daemon(istate, &st, 0))
 		return -1;
 	sha1 = (unsigned char *)istate->mmap + old_size - 20;
diff --git a/t/t7900-index-helper.sh b/t/t7900-index-helper.sh
index aa6e5fc..2d3ce3c 100755
--- a/t/t7900-index-helper.sh
+++ b/t/t7900-index-helper.sh
@@ -16,6 +16,9 @@ test -n "$NO_MMAP" && {
 }
 
 test_expect_success 'index-helper smoke test' '
+	# We need an existing commit so that the index exists (otherwise,
+	# the index-helper will not be autostarted)
+	test_commit x &&
 	git index-helper --exit-after 1 &&
 	test_path_is_missing .git/index-helper.sock
 '
@@ -46,4 +49,20 @@ test_expect_success 'index-helper is quiet with --autorun' '
 	git index-helper --autorun
 '
 
+test_expect_success 'index-helper autorun works' '
+	rm -f .git/index-helper.sock &&
+	git status &&
+	test_path_is_missing .git/index-helper.sock &&
+	test_config indexhelper.autorun true &&
+	git status &&
+	test -S .git/index-helper.sock &&
+	git status 2>err &&
+	test -S .git/index-helper.sock &&
+	! grep -q . err &&
+	git index-helper --kill &&
+	test_config indexhelper.autorun false &&
+	git status &&
+	test_path_is_missing .git/index-helper.sock
+'
+
 test_done
-- 
2.4.2.767.g62658d5-twtrsrc

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 10/15] update-index: enable/disable watchman support
  2016-04-19 23:28 ` [PATCH v5 10/15] update-index: enable/disable watchman support David Turner
@ 2016-04-19 23:45   ` Duy Nguyen
  2016-04-20 19:50     ` David Turner
  0 siblings, 1 reply; 35+ messages in thread
From: Duy Nguyen @ 2016-04-19 23:45 UTC (permalink / raw)
  To: David Turner; +Cc: Git Mailing List

On Wed, Apr 20, 2016 at 6:28 AM, David Turner <dturner@twopensource.com> wrote:
> +       if (use_watchman > 0) {
> +               the_index.last_update    = xstrdup("");
> +               the_index.cache_changed |= WATCHMAN_CHANGED;
> +       } else if (!use_watchman) {
> +               the_index.last_update    = NULL;
> +               the_index.cache_changed |= WATCHMAN_CHANGED;
> +       }
> +

We probably should warn people if index-helper is not built with
watchman support, which makes this knob completely useless. If
watchman fails to start, that's a separate problem..
-- 
Duy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 06/15] index-helper: add --detach
  2016-04-19 23:28 ` [PATCH v5 06/15] index-helper: add --detach David Turner
@ 2016-04-19 23:50   ` Duy Nguyen
  2016-04-20  1:04     ` David Turner
  2016-04-25 20:53     ` David Turner
  0 siblings, 2 replies; 35+ messages in thread
From: Duy Nguyen @ 2016-04-19 23:50 UTC (permalink / raw)
  To: David Turner; +Cc: Git Mailing List

On Wed, Apr 20, 2016 at 6:28 AM, David Turner <dturner@twopensource.com> wrote:
> @@ -317,6 +320,8 @@ int main(int argc, char **argv)
>         if (fd < 0)
>                 die_errno(_("could not set up index-helper socket"));
>
> +       if (detach && daemonize(&daemonized))
> +               die_errno(_("unable to detach"));

At the least, I think we need to redirect both stdout and stderr to a
file, so we can catch errors. The watchman patch uses warning() to
report errors, for example. And there is always a chance of die().

Then we need to report the errors back. I faced the same problem with
daemonizing git-gc, but I'm not sure if we can do exactly the same
here like in commit 329e6e8 (gc: save log from daemonized gc --auto
and print it next time - 2015-09-19)
-- 
Duy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 09/15] index-helper: use watchman to avoid refreshing index with lstat()
  2016-04-19 23:28 ` [PATCH v5 09/15] index-helper: use watchman to avoid refreshing index with lstat() David Turner
@ 2016-04-20  0:15   ` Duy Nguyen
  2016-04-20  1:01     ` David Turner
  0 siblings, 1 reply; 35+ messages in thread
From: Duy Nguyen @ 2016-04-20  0:15 UTC (permalink / raw)
  To: David Turner; +Cc: Git Mailing List

Continuing my comment from the --use-watchman patch about watchman not
being supported...

On Wed, Apr 20, 2016 at 6:28 AM, David Turner <dturner@twopensource.com> wrote:
> +static int poke_and_wait_for_reply(int fd)
> +{
> +       struct strbuf buf = STRBUF_INIT;
> +       struct strbuf reply = STRBUF_INIT;
> +       int ret = -1;
> +       fd_set fds;
> +       struct timeval timeout;
> +
> +       timeout.tv_usec = 0;
> +       timeout.tv_sec = 1;
> +
> +       if (fd < 0)
> +               return -1;
> +
> +       strbuf_addf(&buf, "poke %d", getpid());
> +       if (write_in_full(fd, buf.buf, buf.len + 1) != buf.len + 1)
> +               goto done_poke;
> +
> +       /* Now wait for a reply */
> +       FD_ZERO(&fds);
> +       FD_SET(fd, &fds);
> +       if (select(fd + 1, &fds, NULL, NULL, &timeout) == 0)
> +               /* No reply, giving up */
> +               goto done_poke;
> +
> +       if (strbuf_getwholeline_fd(&reply, fd, 0))
> +               goto done_poke;
> +
> +       if (!starts_with(reply.buf, "OK"))
> +               goto done_poke;

... while we could simply check USE_WATCHMAN macro and reject in
update-index, a better solution is sending "poke %d watchman" and
returning "OK watchman" (vs "OK") when watchman is supported and
active. If the user already requests watchman and index-helper returns
just "OK" then we can warn the user the reason of possible performance
degradation. It's related to the error reporting, but I don't think
you can send straight errors over unix socket. It's possible but it's
a bit more complicated.

> +static void refresh_by_watchman(struct index_state *istate)
> +{
> +       void *shm = NULL;
> +       int length;
> +       int i;
> +       struct stat st;
> +       int fd = -1;
> +       const char *path = git_path("shm-watchman-%s-%"PRIuMAX,
> +                                   sha1_to_hex(istate->sha1),
> +                                   (uintmax_t)getpid());
> +
> +       fd = open(path, O_RDONLY);
> +       if (fd < 0)
> +               return;
> +
> +       /*
> +        * This watchman data is just for us -- no need to keep it
> +        * around once we've got it open.
> +        */
> +       unlink(path);

This will not play well when multiple processes read and refresh the
index at the same time. They could refresh non-overlapping
subdirectories, and I think it's perfectly ok for them to do so
(writing index down is a different matter). I don't have a good answer
for this. Perhaps if shm-watchman-%s-%d file is small enough (and it
should be, we store it in the index), then we can just send the
content straight over unix socket. I didn't have this option with my
signal-based communication model.

This is really extra. But if we know in advance that git does not need
refresh(), then we should be able to tell index-helper not to waste
cycles contacting watchman and preparing shm-watchman-%s-%d (the poke
line gets more parameters). Either that, or we decouple watchman
requests from read_cache() requests. Only when refresh_index() is
called that we send something to request shm-watchman-%s-%d. The same
for read_directory() (i.e. untracked cache stuff). Hmm?

Now that I think of it, with watchman backing us, we probably should
just do nothing in update_index_if_able() (or write_locked_index()
when we know only stat info is changed) when watchman is active. The
purpose of update_index_if_able() is to avoid costly refresh, but we
can already avoid that with watchman. And updating big index files is
always costly (even though it should cost less with split-index). Of
course this can only be done if watchman (inotify to be precise) can
cover whole worktree. I'm not sure how watchman behaves when there's
not enough inotify resource to cover full worktree.
-- 
Duy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 09/15] index-helper: use watchman to avoid refreshing index with lstat()
  2016-04-20  0:15   ` Duy Nguyen
@ 2016-04-20  1:01     ` David Turner
  2016-04-20  9:36       ` Duy Nguyen
  0 siblings, 1 reply; 35+ messages in thread
From: David Turner @ 2016-04-20  1:01 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

On Wed, 2016-04-20 at 07:15 +0700, Duy Nguyen wrote:
> Continuing my comment from the --use-watchman patch about watchman
> not
> being supported...
> 
> On Wed, Apr 20, 2016 at 6:28 AM, David Turner <
> dturner@twopensource.com> wrote:
> > +static int poke_and_wait_for_reply(int fd)
> > +{
> > +       struct strbuf buf = STRBUF_INIT;
> > +       struct strbuf reply = STRBUF_INIT;
> > +       int ret = -1;
> > +       fd_set fds;
> > +       struct timeval timeout;
> > +
> > +       timeout.tv_usec = 0;
> > +       timeout.tv_sec = 1;
> > +
> > +       if (fd < 0)
> > +               return -1;
> > +
> > +       strbuf_addf(&buf, "poke %d", getpid());
> > +       if (write_in_full(fd, buf.buf, buf.len + 1) != buf.len + 1)
> > +               goto done_poke;
> > +
> > +       /* Now wait for a reply */
> > +       FD_ZERO(&fds);
> > +       FD_SET(fd, &fds);
> > +       if (select(fd + 1, &fds, NULL, NULL, &timeout) == 0)
> > +               /* No reply, giving up */
> > +               goto done_poke;
> > +
> > +       if (strbuf_getwholeline_fd(&reply, fd, 0))
> > +               goto done_poke;
> > +
> > +       if (!starts_with(reply.buf, "OK"))
> > +               goto done_poke;
> 
> ... while we could simply check USE_WATCHMAN macro and reject in
> update-index, a better solution is sending "poke %d watchman" and
> returning "OK watchman" (vs "OK") when watchman is supported and
> active. If the user already requests watchman and index-helper
> returns
> just "OK" then we can warn the user the reason of possible
> performance
> degradation. It's related to the error reporting, but I don't think
> you can send straight errors over unix socket. It's possible but it's
> a bit more complicated.

Do you mean that we should do this here?  Or in update-index -
-watchman?  If the former, I agree.  If the latter, I'm not sure; maybe
you'll be setting up your index before you've started the index helper?

> > +static void refresh_by_watchman(struct index_state *istate)
> > +{
> > +       void *shm = NULL;
> > +       int length;
> > +       int i;
> > +       struct stat st;
> > +       int fd = -1;
> > +       const char *path = git_path("shm-watchman-%s-%"PRIuMAX,
> > +                                   sha1_to_hex(istate->sha1),
> > +                                   (uintmax_t)getpid());
> > +
> > +       fd = open(path, O_RDONLY);
> > +       if (fd < 0)
> > +               return;
> > +
> > +       /*
> > +        * This watchman data is just for us -- no need to keep it
> > +        * around once we've got it open.
> > +        */
> > +       unlink(path);
> 
> This will not play well when multiple processes read and refresh the
> index at the same time. 

Multiple processes will have different pids, right?  And the pid is
included in the filename.  Am I missing something?

> This is really extra. But if we know in advance that git does not 
> need refresh(), then we should be able to tell index-helper not to 
> waste cycles contacting watchman and preparing shm-watchman-%s-%d 
> (the poke line gets more parameters). Either that, or we decouple 
> watchman requests from read_cache() requests. Only when 
> refresh_index() is called that we send something to request shm-
> watchman-%s-%d. The same for read_directory() (i.e. untracked cache 
> stuff). Hmm?

It's true that we could decouple watchman requests.  I'll look and see
if that's reasonable.

> Now that I think of it, with watchman backing us, we probably should
> just do nothing in update_index_if_able() (or write_locked_index()
> when we know only stat info is changed) when watchman is active. The
> purpose of update_index_if_able() is to avoid costly refresh, but we
> can already avoid that with watchman. And updating big index files is
> always costly (even though it should cost less with split-index). 

That sounds like a change we could make in a separate series.  It's not
a bad idea, but if our goal is to get the basic version out, we should
start there.

> Of
> course this can only be done if watchman (inotify to be precise) can
> cover whole worktree. I'm not sure how watchman behaves when there's
> not enough inotify resource to cover full worktree.

It will detect this case and will either manually recrawl (in the event
of a max_queued_events overflow) or return an error (in the event of
too many watched directories).

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 06/15] index-helper: add --detach
  2016-04-19 23:50   ` Duy Nguyen
@ 2016-04-20  1:04     ` David Turner
  2016-04-20  9:33       ` Duy Nguyen
  2016-04-25 20:53     ` David Turner
  1 sibling, 1 reply; 35+ messages in thread
From: David Turner @ 2016-04-20  1:04 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

On Wed, 2016-04-20 at 06:50 +0700, Duy Nguyen wrote:
> On Wed, Apr 20, 2016 at 6:28 AM, David Turner <
> dturner@twopensource.com> wrote:
> > @@ -317,6 +320,8 @@ int main(int argc, char **argv)
> >         if (fd < 0)
> >                 die_errno(_("could not set up index-helper
> > socket"));
> > 
> > +       if (detach && daemonize(&daemonized))
> > +               die_errno(_("unable to detach"));
> 
> At the least, I think we need to redirect both stdout and stderr to a
> file, so we can catch errors. The watchman patch uses warning() to
> report errors, for example. And there is always a chance of die().
> 
> Then we need to report the errors back. I faced the same problem with
> daemonizing git-gc, but I'm not sure if we can do exactly the same
> here like in commit 329e6e8 (gc: save log from daemonized gc --auto
> and print it next time - 2015-09-19)

I'll add in code to log errors.  I'm not sure where it would make sense
to report the errors.  Generally, for errors during a client operation,
we would like to report them to the client, but the client might have
already disconnected.  I guess in that case it's OK if they just go to
the log?  The client could warn on a timeout while waiting for index
-helper and direct people to the log.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 02/15] read-cache: allow to keep mmap'd memory after reading
  2016-04-19 23:27 ` [PATCH v5 02/15] read-cache: allow to keep mmap'd memory after reading David Turner
@ 2016-04-20  9:01   ` Johannes Schindelin
  2016-04-20 19:41     ` David Turner
  2016-04-20  9:26   ` Duy Nguyen
  1 sibling, 1 reply; 35+ messages in thread
From: Johannes Schindelin @ 2016-04-20  9:01 UTC (permalink / raw)
  To: David Turner; +Cc: git, pclouds

Hi Dave,

On Tue, 19 Apr 2016, David Turner wrote:

>  unmap:
> +	istate->mmap = NULL;
>  	munmap(mmap, mmap_size);
>  	die("index file corrupt");
>  }
> [...]
> @@ -1698,6 +1705,10 @@ int discard_index(struct index_state *istate)
>  	free(istate->cache);
>  	istate->cache = NULL;
>  	istate->cache_alloc = 0;
> +	if (istate->keep_mmap && istate->mmap) {
> +		munmap(istate->mmap, istate->mmap_size);
> +		istate->mmap = NULL;
> +	}
>  	discard_split_index(istate);

Just curious: any reason why the first hunk munmap()s after resetting the
field to NULL and the second hunk does it in the opposite order?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 02/15] read-cache: allow to keep mmap'd memory after reading
  2016-04-19 23:27 ` [PATCH v5 02/15] read-cache: allow to keep mmap'd memory after reading David Turner
  2016-04-20  9:01   ` Johannes Schindelin
@ 2016-04-20  9:26   ` Duy Nguyen
  2016-04-20 19:43     ` David Turner
  1 sibling, 1 reply; 35+ messages in thread
From: Duy Nguyen @ 2016-04-20  9:26 UTC (permalink / raw)
  To: David Turner; +Cc: Git Mailing List

On Wed, Apr 20, 2016 at 6:27 AM, David Turner <dturner@twopensource.com> wrote:
> From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>
> Later, we will introduce git index-helper to share this memory with
> other git processes.
>
> Since the memory will be shared, it will never be unmapped (although
> the kernel may of course choose to page it out).

This is not entirely true. We do need to keep the mmap'd memory for
longer, even after read_index_from() finishes. But we do not share the
exact same address space to other processes (memcpy is used in
index-helper.c:share_index()). We could even discard_index() at the
end of share_index(), but I think we keep it anyway just in case
another process asks, or when index is not updated.
-- 
Duy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 06/15] index-helper: add --detach
  2016-04-20  1:04     ` David Turner
@ 2016-04-20  9:33       ` Duy Nguyen
  0 siblings, 0 replies; 35+ messages in thread
From: Duy Nguyen @ 2016-04-20  9:33 UTC (permalink / raw)
  To: David Turner; +Cc: Git Mailing List

On Wed, Apr 20, 2016 at 8:04 AM, David Turner <dturner@twopensource.com> wrote:
> On Wed, 2016-04-20 at 06:50 +0700, Duy Nguyen wrote:
>> On Wed, Apr 20, 2016 at 6:28 AM, David Turner <
>> dturner@twopensource.com> wrote:
>> > @@ -317,6 +320,8 @@ int main(int argc, char **argv)
>> >         if (fd < 0)
>> >                 die_errno(_("could not set up index-helper
>> > socket"));
>> >
>> > +       if (detach && daemonize(&daemonized))
>> > +               die_errno(_("unable to detach"));
>>
>> At the least, I think we need to redirect both stdout and stderr to a
>> file, so we can catch errors. The watchman patch uses warning() to
>> report errors, for example. And there is always a chance of die().
>>
>> Then we need to report the errors back. I faced the same problem with
>> daemonizing git-gc, but I'm not sure if we can do exactly the same
>> here like in commit 329e6e8 (gc: save log from daemonized gc --auto
>> and print it next time - 2015-09-19)
>
> I'll add in code to log errors.  I'm not sure where it would make sense
> to report the errors.  Generally, for errors during a client operation,
> we would like to report them to the client, but the client might have
> already disconnected.  I guess in that case it's OK if they just go to
> the log?  The client could warn on a timeout while waiting for index
> -helper and direct people to the log.

Yeah if the client already disconnects, we have no way but saving the
errors somewhere. index-helper can pick up from the log and report to
the next client, if you want to keep it simple. If we're building a
more complicated protocol on top of unix socket, I suggest you use
pkt-line to wrap/unwrap messages. Bonus point, we can trace what's
sending/receiving with GIT_TRACE_PACKET.
-- 
Duy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 09/15] index-helper: use watchman to avoid refreshing index with lstat()
  2016-04-20  1:01     ` David Turner
@ 2016-04-20  9:36       ` Duy Nguyen
  0 siblings, 0 replies; 35+ messages in thread
From: Duy Nguyen @ 2016-04-20  9:36 UTC (permalink / raw)
  To: David Turner; +Cc: Git Mailing List

On Wed, Apr 20, 2016 at 8:01 AM, David Turner <dturner@twopensource.com> wrote:
> On Wed, 2016-04-20 at 07:15 +0700, Duy Nguyen wrote:
>> Continuing my comment from the --use-watchman patch about watchman
>> not
>> being supported...
>>
>> On Wed, Apr 20, 2016 at 6:28 AM, David Turner <
>> dturner@twopensource.com> wrote:
>> > +static int poke_and_wait_for_reply(int fd)
>> > +{
>> > +       struct strbuf buf = STRBUF_INIT;
>> > +       struct strbuf reply = STRBUF_INIT;
>> > +       int ret = -1;
>> > +       fd_set fds;
>> > +       struct timeval timeout;
>> > +
>> > +       timeout.tv_usec = 0;
>> > +       timeout.tv_sec = 1;
>> > +
>> > +       if (fd < 0)
>> > +               return -1;
>> > +
>> > +       strbuf_addf(&buf, "poke %d", getpid());
>> > +       if (write_in_full(fd, buf.buf, buf.len + 1) != buf.len + 1)
>> > +               goto done_poke;
>> > +
>> > +       /* Now wait for a reply */
>> > +       FD_ZERO(&fds);
>> > +       FD_SET(fd, &fds);
>> > +       if (select(fd + 1, &fds, NULL, NULL, &timeout) == 0)
>> > +               /* No reply, giving up */
>> > +               goto done_poke;
>> > +
>> > +       if (strbuf_getwholeline_fd(&reply, fd, 0))
>> > +               goto done_poke;
>> > +
>> > +       if (!starts_with(reply.buf, "OK"))
>> > +               goto done_poke;
>>
>> ... while we could simply check USE_WATCHMAN macro and reject in
>> update-index, a better solution is sending "poke %d watchman" and
>> returning "OK watchman" (vs "OK") when watchman is supported and
>> active. If the user already requests watchman and index-helper
>> returns
>> just "OK" then we can warn the user the reason of possible
>> performance
>> degradation. It's related to the error reporting, but I don't think
>> you can send straight errors over unix socket. It's possible but it's
>> a bit more complicated.
>
> Do you mean that we should do this here?  Or in update-index -
> -watchman?  If the former, I agree.  If the latter, I'm not sure; maybe
> you'll be setting up your index before you've started the index helper?

Here is better than update-index because we can't know what
index-helper is capable of (the USE_WATCHMAN macro is more like a
suggestion)

>> > +static void refresh_by_watchman(struct index_state *istate)
>> > +{
>> > +       void *shm = NULL;
>> > +       int length;
>> > +       int i;
>> > +       struct stat st;
>> > +       int fd = -1;
>> > +       const char *path = git_path("shm-watchman-%s-%"PRIuMAX,
>> > +                                   sha1_to_hex(istate->sha1),
>> > +                                   (uintmax_t)getpid());
>> > +
>> > +       fd = open(path, O_RDONLY);
>> > +       if (fd < 0)
>> > +               return;
>> > +
>> > +       /*
>> > +        * This watchman data is just for us -- no need to keep it
>> > +        * around once we've got it open.
>> > +        */
>> > +       unlink(path);
>>
>> This will not play well when multiple processes read and refresh the
>> index at the same time.
>
> Multiple processes will have different pids, right?  And the pid is
> included in the filename.  Am I missing something?

Ahhh! I thought that pid was index-helper's. Silly me.

>> Now that I think of it, with watchman backing us, we probably should
>> just do nothing in update_index_if_able() (or write_locked_index()
>> when we know only stat info is changed) when watchman is active. The
>> purpose of update_index_if_able() is to avoid costly refresh, but we
>> can already avoid that with watchman. And updating big index files is
>> always costly (even though it should cost less with split-index).
>
> That sounds like a change we could make in a separate series.  It's not
> a bad idea, but if our goal is to get the basic version out, we should
> start there.

Agreed. More optimizations can always wait till later. We just need a
good foundation first.
-- 
Duy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 16/15] Add tracing to measure where most of the time is spent
  2016-04-19 23:28 ` [PATCH v5 15/15] index-helper: optionally automatically run David Turner
@ 2016-04-20  9:59   ` Nguyễn Thái Ngọc Duy
  2016-04-20 12:28     ` Johannes Schindelin
  0 siblings, 1 reply; 35+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2016-04-20  9:59 UTC (permalink / raw)
  To: git; +Cc: dturner, Nguyễn Thái Ngọc Duy

All the known heavy code blocks are measured (except object database
access). This should help identify if an optimization is effective or
not. An unoptimized git-status would give something like below (92% of
time is accounted). To sum up the effort of making git scale better:

 - read cache line is being addressed by index-helper
 - preload/refresh index by watchman
 - read packed refs by lmdb backend
 - diff-index by rebuilding cache-tree
 - read directory by untracked cache and watchman
 - write index by split index
 - name hash potientially by index-helper

read-cache.c:2075         performance: 0.004058570 s: read cache .../index
preload-index.c:104       performance: 0.004419864 s: preload index
read-cache.c:1265         performance: 0.000185224 s: refresh index
refs/files-backend.c:1100 performance: 0.001935788 s: read packed refs
diff-lib.c:240            performance: 0.000144132 s: diff-files
diff-lib.c:506            performance: 0.013592000 s: diff-index
name-hash.c:128           performance: 0.000614177 s: initialize name hash
dir.c:2030                performance: 0.015814103 s: read directory
read-cache.c:2565         performance: 0.004052343 s: write index, changed mask = 2
trace.c:420               performance: 0.048365509 s: git command: './git' 'status'

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 This is worth merging, I think.

 diff-lib.c           |  4 ++++
 dir.c                |  2 ++
 name-hash.c          |  2 ++
 preload-index.c      |  2 ++
 read-cache.c         | 11 +++++++++++
 refs/files-backend.c |  2 ++
 6 files changed, 23 insertions(+)

diff --git a/diff-lib.c b/diff-lib.c
index bc49c70..7af7f9a 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -90,6 +90,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 	int diff_unmerged_stage = revs->max_count;
 	unsigned ce_option = ((option & DIFF_RACY_IS_MODIFIED)
 			      ? CE_MATCH_RACY_IS_DIRTY : 0);
+	uint64_t start = getnanotime();
 
 	diff_set_mnemonic_prefix(&revs->diffopt, "i/", "w/");
 
@@ -236,6 +237,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 	}
 	diffcore_std(&revs->diffopt);
 	diff_flush(&revs->diffopt);
+	trace_performance_since(start, "diff-files");
 	return 0;
 }
 
@@ -491,6 +493,7 @@ static int diff_cache(struct rev_info *revs,
 int run_diff_index(struct rev_info *revs, int cached)
 {
 	struct object_array_entry *ent;
+	uint64_t start = getnanotime();
 
 	ent = revs->pending.objects;
 	if (diff_cache(revs, ent->item->oid.hash, ent->name, cached))
@@ -500,6 +503,7 @@ int run_diff_index(struct rev_info *revs, int cached)
 	diffcore_fix_diff_index(&revs->diffopt);
 	diffcore_std(&revs->diffopt);
 	diff_flush(&revs->diffopt);
+	trace_performance_since(start, "diff-index");
 	return 0;
 }
 
diff --git a/dir.c b/dir.c
index 91003e5..1ebf9fe 100644
--- a/dir.c
+++ b/dir.c
@@ -1991,6 +1991,7 @@ int read_directory(struct dir_struct *dir, const char *path, int len, const stru
 {
 	struct path_simplify *simplify;
 	struct untracked_cache_dir *untracked;
+	uint64_t start = getnanotime();
 
 	/*
 	 * Check out create_simplify()
@@ -2026,6 +2027,7 @@ int read_directory(struct dir_struct *dir, const char *path, int len, const stru
 	free_simplify(simplify);
 	qsort(dir->entries, dir->nr, sizeof(struct dir_entry *), cmp_name);
 	qsort(dir->ignored, dir->ignored_nr, sizeof(struct dir_entry *), cmp_name);
+	trace_performance_since(start, "read directory %.*s", len, path);
 	if (dir->untracked) {
 		static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS);
 		trace_printf_key(&trace_untracked_stats,
diff --git a/name-hash.c b/name-hash.c
index 6d9f23e..b3966d8 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -115,6 +115,7 @@ static int cache_entry_cmp(const struct cache_entry *ce1,
 static void lazy_init_name_hash(struct index_state *istate)
 {
 	int nr;
+	uint64_t start = getnanotime();
 
 	if (istate->name_hash_initialized)
 		return;
@@ -124,6 +125,7 @@ static void lazy_init_name_hash(struct index_state *istate)
 	for (nr = 0; nr < istate->cache_nr; nr++)
 		hash_index_entry(istate, istate->cache[nr]);
 	istate->name_hash_initialized = 1;
+	trace_performance_since(start, "initialize name hash");
 }
 
 void add_name_hash(struct index_state *istate, struct cache_entry *ce)
diff --git a/preload-index.c b/preload-index.c
index c1fe3a3..7bb4809 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -72,6 +72,7 @@ static void preload_index(struct index_state *index,
 {
 	int threads, i, work, offset;
 	struct thread_data data[MAX_PARALLEL];
+	uint64_t start = getnanotime();
 
 	if (!core_preload_index)
 		return;
@@ -100,6 +101,7 @@ static void preload_index(struct index_state *index,
 		if (pthread_join(p->pthread, NULL))
 			die("unable to join threaded lstat");
 	}
+	trace_performance_since(start, "preload index");
 }
 #endif
 
diff --git a/read-cache.c b/read-cache.c
index a73487e..3a79581 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1187,6 +1187,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	const char *typechange_fmt;
 	const char *added_fmt;
 	const char *unmerged_fmt;
+	uint64_t start = getnanotime();
 
 	modified_fmt = (in_porcelain ? "M\t%s\n" : "%s: needs update\n");
 	deleted_fmt = (in_porcelain ? "D\t%s\n" : "%s: needs update\n");
@@ -1261,6 +1262,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 
 		replace_index_entry(istate, i, new);
 	}
+	trace_performance_since(start, "refresh index");
 	return has_errors;
 }
 
@@ -2062,12 +2064,15 @@ int read_index_from(struct index_state *istate, const char *path)
 {
 	struct split_index *split_index;
 	int ret;
+	uint64_t start;
 
 	/* istate->initialized covers both .git/index and .git/sharedindex.xxx */
 	if (istate->initialized)
 		return istate->cache_nr;
 
+	start = getnanotime();
 	ret = do_read_index(istate, path, 0);
+	trace_performance_since(start, "read cache %s", path);
 
 	split_index = istate->split_index;
 	if (!split_index || is_null_sha1(split_index->base_sha1)) {
@@ -2082,6 +2087,7 @@ int read_index_from(struct index_state *istate, const char *path)
 	split_index->base->keep_mmap = istate->keep_mmap;
 	split_index->base->to_shm    = istate->to_shm;
 	split_index->base->from_shm  = istate->from_shm;
+	start = getnanotime();
 	ret = do_read_index(split_index->base,
 			    git_path("sharedindex.%s",
 				     sha1_to_hex(split_index->base_sha1)), 1);
@@ -2093,6 +2099,9 @@ int read_index_from(struct index_state *istate, const char *path)
 		    sha1_to_hex(split_index->base->sha1));
 	merge_base_index(istate);
 	post_read_index_from(istate);
+	trace_performance_since(start, "read cache %s",
+				git_path("sharedindex.%s",
+					 sha1_to_hex(split_index->base_sha1)));
 
 done:
 	if (ret > 0 && istate->from_shm && istate->last_update)
@@ -2437,6 +2446,7 @@ static int do_write_index(struct index_state *istate, int newfd,
 	int entries = istate->cache_nr;
 	struct stat st;
 	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
+	uint64_t start = getnanotime();
 
 	for (i = removed = extended = 0; i < entries; i++) {
 		if (cache[i]->ce_flags & CE_REMOVE)
@@ -2552,6 +2562,7 @@ static int do_write_index(struct index_state *istate, int newfd,
 		return -1;
 	istate->timestamp.sec = (unsigned int)st.st_mtime;
 	istate->timestamp.nsec = ST_MTIME_NSEC(st);
+	trace_performance_since(start, "write index, changed mask = %x", istate->cache_changed);
 	return 0;
 }
 
diff --git a/refs/files-backend.c b/refs/files-backend.c
index ea78ce9..b88913f 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1048,6 +1048,7 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir)
 	struct ref_entry *last = NULL;
 	struct strbuf line = STRBUF_INIT;
 	enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE;
+	uint64_t start = getnanotime();
 
 	while (strbuf_getwholeline(&line, f, '\n') != EOF) {
 		unsigned char sha1[20];
@@ -1096,6 +1097,7 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir)
 	}
 
 	strbuf_release(&line);
+	trace_performance_since(start, "read packed refs");
 }
 
 /*
-- 
2.8.0.rc0.210.gd302cd2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 03/15] index-helper: new daemon for caching index and related stuff
  2016-04-19 23:27 ` [PATCH v5 03/15] index-helper: new daemon for caching index and related stuff David Turner
@ 2016-04-20 12:17   ` Johannes Schindelin
  2016-04-20 12:31     ` Duy Nguyen
  2016-04-20 19:38     ` David Turner
  0 siblings, 2 replies; 35+ messages in thread
From: Johannes Schindelin @ 2016-04-20 12:17 UTC (permalink / raw)
  To: David Turner; +Cc: git, pclouds, Ramsay Jones

Hi Dave,

(apologies in advance if I may bring up anything that has been discussed
in earlier iterations; I simply was too busy with the rebase--helper
project to even look.)

On Tue, 19 Apr 2016, David Turner wrote:

> Shared memory is done by storing files in a per-repository temporary
> directory.  This is more portable than shm (which requires
> posix-realtime and has various quirks on OS X).  It might even work on
> Windows, although this has not been tested. The shared memory file's
> name folows the template "shm-<object>-<SHA1>" where <SHA1> is the

s/folows/follows/

And: now that it is no longer shared memory, should we not do away with
the "shm-" prefix?

> trailing SHA-1 of the index file. <object> is "index" for cached index
> files (and might later be "name-hash" for name-hash cache). If such
> shared memory exists, it contains the same index content as on
> disk. The content is already validated by the daemon and git won't
> validate it again (except comparing the trailing SHA-1s).

Excellent. I was worrying about the penalty of validating.

> We keep this daemon's logic as thin as possible. The "brain" stays in
> git. So the daemon can read and validate stuff, but that's all it's
> allowed to do. It does not add/create new information. It doesn't even
> accept direct updates from git.

I like this design. For now. Later, I really think we should add the
ability to update the index via the index-helper. I am thinking about a
method similar to watchman where a separate process (that may use the
inotify syscall on Linux, or tap into the NTFS journal on Windows) tells
the index-helper specifically which paths to look at, so that nobody ever
has to look at any files that were not modified at all.

> +	if (*new_mmap == MAP_FAILED) {
> +		*new_mmap = NULL;
> +		goto done;

Shouldn't we provide some sort of error message here?

> +	}
> +	madvise(new_mmap, size, MADV_WILLNEED);

I guess I'll need to add support for that to compat/win32mmap.c (most
likely using PrefetchVirtualMemory() when available, i.e. Windows >= 8, see
https://msdn.microsoft.com/en-us/library/windows/desktop/hh780543.aspx :-))

Likewise, I think it will be easy to use bidirectional named pipes on
Windows to accommodate for the lack of Unix domain sockets...

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 16/15] Add tracing to measure where most of the time is spent
  2016-04-20  9:59   ` [PATCH 16/15] Add tracing to measure where most of the time is spent Nguyễn Thái Ngọc Duy
@ 2016-04-20 12:28     ` Johannes Schindelin
  2016-04-20 12:36       ` Duy Nguyen
  0 siblings, 1 reply; 35+ messages in thread
From: Johannes Schindelin @ 2016-04-20 12:28 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy; +Cc: git, dturner

[-- Attachment #1: Type: text/plain, Size: 1565 bytes --]

Hi Duy,

On Wed, 20 Apr 2016, Nguyễn Thái Ngọc Duy wrote:

> All the known heavy code blocks are measured (except object database
> access). This should help identify if an optimization is effective or
> not. An unoptimized git-status would give something like below (92% of
> time is accounted). To sum up the effort of making git scale better:
> 
>  - read cache line is being addressed by index-helper
>  - preload/refresh index by watchman
>  - read packed refs by lmdb backend
>  - diff-index by rebuilding cache-tree
>  - read directory by untracked cache and watchman
>  - write index by split index
>  - name hash potientially by index-helper
> 
> read-cache.c:2075         performance: 0.004058570 s: read cache .../index
> preload-index.c:104       performance: 0.004419864 s: preload index
> read-cache.c:1265         performance: 0.000185224 s: refresh index
> refs/files-backend.c:1100 performance: 0.001935788 s: read packed refs
> diff-lib.c:240            performance: 0.000144132 s: diff-files
> diff-lib.c:506            performance: 0.013592000 s: diff-index
> name-hash.c:128           performance: 0.000614177 s: initialize name hash
> dir.c:2030                performance: 0.015814103 s: read directory
> read-cache.c:2565         performance: 0.004052343 s: write index, changed mask = 2
> trace.c:420               performance: 0.048365509 s: git command: './git' 'status'

Thank you for doing this! It will be *highly* valuable to get the
performance on Windows where I want it to be, too.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 03/15] index-helper: new daemon for caching index and related stuff
  2016-04-20 12:17   ` Johannes Schindelin
@ 2016-04-20 12:31     ` Duy Nguyen
  2016-04-20 19:38     ` David Turner
  1 sibling, 0 replies; 35+ messages in thread
From: Duy Nguyen @ 2016-04-20 12:31 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: David Turner, Git Mailing List, Ramsay Jones

On Wed, Apr 20, 2016 at 7:17 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>> We keep this daemon's logic as thin as possible. The "brain" stays in
>> git. So the daemon can read and validate stuff, but that's all it's
>> allowed to do. It does not add/create new information. It doesn't even
>> accept direct updates from git.
>
> I like this design. For now. Later, I really think we should add the
> ability to update the index via the index-helper.

Later we do. For watchman, index-helper creates and shares a piece of
information retrieved from watchman, which has to be combined back to
the index by git process. But it's still not a _direct_ index update.

> I am thinking about a
> method similar to watchman where a separate process (that may use the
> inotify syscall on Linux, or tap into the NTFS journal on Windows) tells
> the index-helper specifically which paths to look at, so that nobody ever
> has to look at any files that were not modified at all.

Am I missing something here? I thought watchman could already run on
Windows. We benefit from watchman, which was originally written for
Mercurial. If there's a better way to do inotify equivalent for
Windows, is it possible to do it in watchman so Mercurial can benefit
from it too?

On Wed, Apr 20, 2016 at 7:19 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>> On Wed, Apr 20, 2016 at 6:27 AM, David Turner <dturner@twopensource.com> wrote:
>> > Shared memory is done by storing files in a per-repository temporary
>> > directory.  This is more portable than shm (which requires
>> > posix-realtime and has various quirks on OS X).  It might even work on
>> > Windows, although this has not been tested.
>>
>> There's another option, but I'm not sure if it's too clever/tricky to
>> do. Anyway, on *nix we can send file descriptors over unix socket [2],
>> then mmap them back to access content.
>
> This sounds too clever to me ;-)

Well. At least we have considered everything (that I'm aware of).
-- 
Duy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 16/15] Add tracing to measure where most of the time is spent
  2016-04-20 12:28     ` Johannes Schindelin
@ 2016-04-20 12:36       ` Duy Nguyen
  0 siblings, 0 replies; 35+ messages in thread
From: Duy Nguyen @ 2016-04-20 12:36 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Git Mailing List, David Turner

On Wed, Apr 20, 2016 at 7:28 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi Duy,
>
> On Wed, 20 Apr 2016, Nguyễn Thái Ngọc Duy wrote:
>
>> All the known heavy code blocks are measured (except object database
>> access). This should help identify if an optimization is effective or
>> not. An unoptimized git-status would give something like below (92% of
>> time is accounted). To sum up the effort of making git scale better:
>>
>>  - read cache line is being addressed by index-helper
>>  - preload/refresh index by watchman
>>  - read packed refs by lmdb backend
>>  - diff-index by rebuilding cache-tree
>>  - read directory by untracked cache and watchman
>>  - write index by split index
>>  - name hash potientially by index-helper
>>
>> read-cache.c:2075         performance: 0.004058570 s: read cache .../index
>> preload-index.c:104       performance: 0.004419864 s: preload index
>> read-cache.c:1265         performance: 0.000185224 s: refresh index
>> refs/files-backend.c:1100 performance: 0.001935788 s: read packed refs
>> diff-lib.c:240            performance: 0.000144132 s: diff-files
>> diff-lib.c:506            performance: 0.013592000 s: diff-index
>> name-hash.c:128           performance: 0.000614177 s: initialize name hash
>> dir.c:2030                performance: 0.015814103 s: read directory
>> read-cache.c:2565         performance: 0.004052343 s: write index, changed mask = 2
>> trace.c:420               performance: 0.048365509 s: git command: './git' 'status'
>
> Thank you for doing this! It will be *highly* valuable to get the
> performance on Windows where I want it to be, too.

Just to be clear, these are unoptimized numbers, no untracked cache,
no split index, no index helper, and on a tiny repository that is
git.git. It's mostly to show the percentage of time spent in each
phase. We probably can do similar measurement and record it in the
watchman patch (can't wait for lmdb to be here :).
-- 
Duy

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 03/15] index-helper: new daemon for caching index and related stuff
  2016-04-20 12:17   ` Johannes Schindelin
  2016-04-20 12:31     ` Duy Nguyen
@ 2016-04-20 19:38     ` David Turner
  1 sibling, 0 replies; 35+ messages in thread
From: David Turner @ 2016-04-20 19:38 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, pclouds, Ramsay Jones

On Wed, 2016-04-20 at 14:17 +0200, Johannes Schindelin wrote:
> Hi Dave,
> 
> (apologies in advance if I may bring up anything that has been
> discussed
> in earlier iterations; I simply was too busy with the rebase--helper
> project to even look.)
> 
> On Tue, 19 Apr 2016, David Turner wrote:
> 
> > Shared memory is done by storing files in a per-repository
> > temporary
> > directory.  This is more portable than shm (which requires
> > posix-realtime and has various quirks on OS X).  It might even work
> > on
> > Windows, although this has not been tested. The shared memory
> > file's
> > name folows the template "shm-<object>-<SHA1>" where <SHA1> is the
> 
> s/folows/follows/

Will fix, thanks.

> And: now that it is no longer shared memory, should we not do away
> with
> the "shm-" prefix?

Hm.  It's intended to be shared via MAP_SHARED.  So it is just a "disk
-backed" shared memory object instead of a POSIX shm object.

> > +	if (*new_mmap == MAP_FAILED) {
> > +		*new_mmap = NULL;
> > +		goto done;
> 
> Shouldn't we provide some sort of error message here?

We generally try to fail silently on index-helper failures -- the user
isn't necessarily expecting output at that point, and we can always
fall back to directly reading the index.

On the other hand, this error should pretty much never happen, so maybe
it's worth calling out?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 02/15] read-cache: allow to keep mmap'd memory after reading
  2016-04-20  9:01   ` Johannes Schindelin
@ 2016-04-20 19:41     ` David Turner
  0 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-04-20 19:41 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, pclouds

On Wed, 2016-04-20 at 11:01 +0200, Johannes Schindelin wrote:
> Hi Dave,
> 
> On Tue, 19 Apr 2016, David Turner wrote:
> 
> >  unmap:
> > +	istate->mmap = NULL;
> >  	munmap(mmap, mmap_size);
> >  	die("index file corrupt");
> >  }
> > [...]
> > @@ -1698,6 +1705,10 @@ int discard_index(struct index_state
> > *istate)
> >  	free(istate->cache);
> >  	istate->cache = NULL;
> >  	istate->cache_alloc = 0;
> > +	if (istate->keep_mmap && istate->mmap) {
> > +		munmap(istate->mmap, istate->mmap_size);
> > +		istate->mmap = NULL;
> > +	}
> >  	discard_split_index(istate);
> 
> Just curious: any reason why the first hunk munmap()s after resetting
> the
> field to NULL and the second hunk does it in the opposite order?

No idea.  Will change the first hunk.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 02/15] read-cache: allow to keep mmap'd memory after reading
  2016-04-20  9:26   ` Duy Nguyen
@ 2016-04-20 19:43     ` David Turner
  0 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-04-20 19:43 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

On Wed, 2016-04-20 at 16:26 +0700, Duy Nguyen wrote:
> On Wed, Apr 20, 2016 at 6:27 AM, David Turner <
> dturner@twopensource.com> wrote:
> > From: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> > 
> > Later, we will introduce git index-helper to share this memory with
> > other git processes.
> > 
> > Since the memory will be shared, it will never be unmapped
> > (although
> > the kernel may of course choose to page it out).
> 
> This is not entirely true. We do need to keep the mmap'd memory for
> longer, even after read_index_from() finishes. But we do not share
> the
> exact same address space to other processes (memcpy is used in
> index-helper.c:share_index()). We could even discard_index() at the
> end of share_index(), but I think we keep it anyway just in case
> another process asks, or when index is not updated.

Will reword.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 10/15] update-index: enable/disable watchman support
  2016-04-19 23:45   ` Duy Nguyen
@ 2016-04-20 19:50     ` David Turner
  0 siblings, 0 replies; 35+ messages in thread
From: David Turner @ 2016-04-20 19:50 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

On Wed, 2016-04-20 at 06:45 +0700, Duy Nguyen wrote:
> On Wed, Apr 20, 2016 at 6:28 AM, David Turner <
> dturner@twopensource.com> wrote:
> > +       if (use_watchman > 0) {
> > +               the_index.last_update    = xstrdup("");
> > +               the_index.cache_changed |= WATCHMAN_CHANGED;
> > +       } else if (!use_watchman) {
> > +               the_index.last_update    = NULL;
> > +               the_index.cache_changed |= WATCHMAN_CHANGED;
> > +       }
> > +
> 
> We probably should warn people if index-helper is not built with
> watchman support, which makes this knob completely useless. If
> watchman fails to start, that's a separate problem..

Will warn.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 06/15] index-helper: add --detach
  2016-04-19 23:50   ` Duy Nguyen
  2016-04-20  1:04     ` David Turner
@ 2016-04-25 20:53     ` David Turner
  1 sibling, 0 replies; 35+ messages in thread
From: David Turner @ 2016-04-25 20:53 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

On Wed, 2016-04-20 at 06:50 +0700, Duy Nguyen wrote:
> On Wed, Apr 20, 2016 at 6:28 AM, David Turner <
> dturner@twopensource.com> wrote:
> > @@ -317,6 +320,8 @@ int main(int argc, char **argv)
> >         if (fd < 0)
> >                 die_errno(_("could not set up index-helper
> > socket"));
> > 
> > +       if (detach && daemonize(&daemonized))
> > +               die_errno(_("unable to detach"));
> 
> At the least, I think we need to redirect both stdout and stderr to a
> file, so we can catch errors. The watchman patch uses warning() to
> report errors, for example. And there is always a chance of die().
> 
> Then we need to report the errors back. I faced the same problem with
> daemonizing git-gc, but I'm not sure if we can do exactly the same
> here like in commit 329e6e8 (gc: save log from daemonized gc --auto
> and print it next time - 2015-09-19)

On reflection, I decided not to do a complicated system for replaying
warnings.  Here are my reasons why:

1. A user will not be expecting to see warnings from previous git
commands.  

2. It's not super-important that users see most of these warnings.  GC
is different because it's critical to the health of a repository so it
really matters that users learn about issues.  

3. It involves many complications to the (presently very simple)
protocol. We would need to distinguish between messages, warnings, and
previous-session warnings.

4. There are only a few cases where errors will happen, and none seem
to be exciting to clients.

Instead, I'll just log errors.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2016-04-25 20:53 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-19 23:27 [PATCH v5 00/15] index-helper/watchman David Turner
2016-04-19 23:27 ` [PATCH v5 01/15] read-cache.c: fix constness of verify_hdr() David Turner
2016-04-19 23:27 ` [PATCH v5 02/15] read-cache: allow to keep mmap'd memory after reading David Turner
2016-04-20  9:01   ` Johannes Schindelin
2016-04-20 19:41     ` David Turner
2016-04-20  9:26   ` Duy Nguyen
2016-04-20 19:43     ` David Turner
2016-04-19 23:27 ` [PATCH v5 03/15] index-helper: new daemon for caching index and related stuff David Turner
2016-04-20 12:17   ` Johannes Schindelin
2016-04-20 12:31     ` Duy Nguyen
2016-04-20 19:38     ` David Turner
2016-04-19 23:27 ` [PATCH v5 04/15] index-helper: add --strict David Turner
2016-04-19 23:27 ` [PATCH v5 05/15] daemonize(): set a flag before exiting the main process David Turner
2016-04-19 23:28 ` [PATCH v5 06/15] index-helper: add --detach David Turner
2016-04-19 23:50   ` Duy Nguyen
2016-04-20  1:04     ` David Turner
2016-04-20  9:33       ` Duy Nguyen
2016-04-25 20:53     ` David Turner
2016-04-19 23:28 ` [PATCH v5 07/15] read-cache: add watchman 'WAMA' extension David Turner
2016-04-19 23:28 ` [PATCH v5 08/15] Add watchman support to reduce index refresh cost David Turner
2016-04-19 23:28 ` [PATCH v5 09/15] index-helper: use watchman to avoid refreshing index with lstat() David Turner
2016-04-20  0:15   ` Duy Nguyen
2016-04-20  1:01     ` David Turner
2016-04-20  9:36       ` Duy Nguyen
2016-04-19 23:28 ` [PATCH v5 10/15] update-index: enable/disable watchman support David Turner
2016-04-19 23:45   ` Duy Nguyen
2016-04-20 19:50     ` David Turner
2016-04-19 23:28 ` [PATCH v5 11/15] unpack-trees: preserve index extensions David Turner
2016-04-19 23:28 ` [PATCH v5 12/15] index-helper: kill mode David Turner
2016-04-19 23:28 ` [PATCH v5 13/15] index-helper: don't run if already running David Turner
2016-04-19 23:28 ` [PATCH v5 14/15] index-helper: autorun mode David Turner
2016-04-19 23:28 ` [PATCH v5 15/15] index-helper: optionally automatically run David Turner
2016-04-20  9:59   ` [PATCH 16/15] Add tracing to measure where most of the time is spent Nguyễn Thái Ngọc Duy
2016-04-20 12:28     ` Johannes Schindelin
2016-04-20 12:36       ` Duy Nguyen

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).