git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v2 0/6] Fast git status via a file system watcher
@ 2017-05-18 20:13 Ben Peart
  2017-05-18 20:13 ` [PATCH v2 1/6] bswap: add 64 bit endianness helper get_be64 Ben Peart
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Ben Peart @ 2017-05-18 20:13 UTC (permalink / raw)
  To: git; +Cc: gitster, benpeart, pclouds, johannes.schindelin, David.Turner,
	peff

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 6184 bytes --]

Changes from V1 include:

 - add 64 bit endianness helper get_be64 in compat/bswap.h
 - switch to using get_be helpers when reading index extension
 - fix leak of strbuf in refresh_by_fsmonitor
 - rename update_istate and clean up parameter list
 - make t/t7519-status-fsmonitor.sh executable
 - update test cases to ensure fsmonitor extension is being used
 - update commit messages and titles


Goal
~~~~
 
Today, git must check existing files to see if there have been changes
and scan the working directory looking for new, untracked files.  As the
number of files and folders in the working directory increases, the time
to perform these checks can become very expensive O(# files in working
directory).

Given the number of new or modified files is typically a very small
percentage of the total number of files, it would be much more
performant if git only had to check files and folders that potentially
had changes. This reduces the cost to O(# modified files).

This patch series makes it possible to optionally add a hook process
that can return the set of files that may have been changed since the
requested time.  Git can then use this to limit its scan to only those
files and folders that potentially have changes.

Design
~~~~~~

A new git hook (query-fsmonitor) must exist and be enabled 
(core.fsmonitor=true) that takes a time_t formatted as a string and
outputs to stdout all files that have been modified since the requested
time.

A new 'fsmonitor' index extension has been added to store the time the
fsmonitor hook was last queried and a ewah bitmap of the current
'fsmonitor-dirty' files. Unmarked entries are 'fsmonitor-clean', marked
entries are 'fsmonitor-dirty.'

As needed, git will call the query-fsmonitor hook proc for the set of
changes since the index was last updated. Git then uses this set of
files along with the list saved in the fsmonitor index extension to flag
the potentially dirty index and untracked cache entries.  

refresh_index() and valid_cached_dir() are updated so that any entry not
flagged as potentially dirty is not checked as it cannot have any
changes. This saves all the work of checking files and folders for
changes that are already known to be clean.

If git finds out some entries are 'fsmonitor-dirty', but are really
unchanged (e.g. the file was changed, then reverted back), then Git will
clear the marking in the extension. If git adds or updates an index
entry, it is marked 'fsmonitor-dirty' to ensure it is checked for
changes.

The code is conservative so in case of any error (missing index
extension, error from hook, etc) it falls back to normal logic of
checking everything.

A sample hook is provided in query-fsmonitor.sample to integrate with
the cross platform Watchman file watching service
https://facebook.github.io/watchman/


Performance
~~~~~~~~~~~

The performance wins of this model are pretty dramatic. Each test was
run 3 times and averaged.  "Files" is the number of files in the working
directory.  Tests were done with a cold file system cache as well as
with a warm file system cache on a HDD.  SSD speeds were typically about
10x faster than the HDD.  Typical real world results would fall
somewhere between these extremes. 

*--------------------------------------------------------*
| Repo on HDD | Cache | fsmonitor=false | fsmonitor=true |
*--------------------------------------------------------*
| 3K Files    | Cold  |           0.77s |          0.55s |
+--------------------------------------------------------+
| 100K Files  | Cold  |          38.76s |          2.17s |
+--------------------------------------------------------+
| 3M Files    | Cold  |         421.55s |         18.57s |
+--------------------------------------------------------+
| 3K Files    | Warm  |           0.05s |          0.24s |
+--------------------------------------------------------+
| 100K Files  | Warm  |           1.13s |          0.40s |
+--------------------------------------------------------+
| 3M Files    | Warm  |          59.33s |          4.19s |
+--------------------------------------------------------+

Note that with the smallest repo, warm times actually increase slightly
as the overhead of calling the hook, watchman and perl outweighs the
savings of not scanning the working directory.


Credits
~~~~~~~

Idea taken and code refactored from 
http://public-inbox.org/git/1466914464-10358-1-git-send-email-novalis@novalis.org/

Current version as a fork of GFW on GitHub here: 
https://github.com/benpeart/git-for-windows/tree/fsmonitor

Ben Peart (6):
  bswap: add 64 bit endianness helper get_be64
  dir: make lookup_untracked() available outside of dir.c
  fsmonitor: teach git to optionally utilize a file system monitor to
    speed up detecting new or changed files.
  fsmonitor: add test cases for fsmonitor extension
  fsmonitor: add documentation for the fsmonitor extension.
  fsmonitor: add a sample query-fsmonitor hook script for Watchman

 Documentation/config.txt                 |   7 +
 Documentation/githooks.txt               |  23 +++
 Documentation/technical/index-format.txt |  18 +++
 Makefile                                 |   1 +
 builtin/update-index.c                   |   1 +
 cache.h                                  |   5 +
 compat/bswap.h                           |   4 +
 config.c                                 |   5 +
 dir.c                                    |  15 +-
 dir.h                                    |   5 +
 entry.c                                  |   1 +
 environment.c                            |   1 +
 fsmonitor.c                              | 231 +++++++++++++++++++++++++++++++
 fsmonitor.h                              |   9 ++
 read-cache.c                             |  28 +++-
 t/t7519-status-fsmonitor.sh              | 153 ++++++++++++++++++++
 templates/hooks--query-fsmonitor.sample  |  27 ++++
 unpack-trees.c                           |   1 +
 18 files changed, 532 insertions(+), 3 deletions(-)
 create mode 100644 fsmonitor.c
 create mode 100644 fsmonitor.h
 create mode 100755 t/t7519-status-fsmonitor.sh
 create mode 100644 templates/hooks--query-fsmonitor.sample

-- 
2.13.0.windows.1.6.g4597375fc3


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 1/6] bswap: add 64 bit endianness helper get_be64
  2017-05-18 20:13 [PATCH v2 0/6] Fast git status via a file system watcher Ben Peart
@ 2017-05-18 20:13 ` Ben Peart
  2017-05-18 20:13 ` [PATCH v2 2/6] dir: make lookup_untracked() available outside of dir.c Ben Peart
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: Ben Peart @ 2017-05-18 20:13 UTC (permalink / raw)
  To: git; +Cc: gitster, benpeart, pclouds, johannes.schindelin, David.Turner,
	peff

Add a new get_be64 macro to enable 64 bit endian conversions on memory
that may or may not be aligned.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 compat/bswap.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/compat/bswap.h b/compat/bswap.h
index d47c003544..f89fe7f4b5 100644
--- a/compat/bswap.h
+++ b/compat/bswap.h
@@ -158,6 +158,7 @@ static inline uint64_t git_bswap64(uint64_t x)
 
 #define get_be16(p)	ntohs(*(unsigned short *)(p))
 #define get_be32(p)	ntohl(*(unsigned int *)(p))
+#define get_be64(p)	ntohll(*(uint64_t *)(p))
 #define put_be32(p, v)	do { *(unsigned int *)(p) = htonl(v); } while (0)
 
 #else
@@ -170,6 +171,9 @@ static inline uint64_t git_bswap64(uint64_t x)
 	(*((unsigned char *)(p) + 1) << 16) | \
 	(*((unsigned char *)(p) + 2) <<  8) | \
 	(*((unsigned char *)(p) + 3) <<  0) )
+#define get_be64(p)	( \
+	((uint64_t)get_be32((unsigned char *)(p) + 0) << 32) | \
+	((uint64_t)get_be32((unsigned char *)(p) + 4) <<  0)
 #define put_be32(p, v)	do { \
 	unsigned int __v = (v); \
 	*((unsigned char *)(p) + 0) = __v >> 24; \
-- 
2.13.0.windows.1.6.g4597375fc3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 2/6] dir: make lookup_untracked() available outside of dir.c
  2017-05-18 20:13 [PATCH v2 0/6] Fast git status via a file system watcher Ben Peart
  2017-05-18 20:13 ` [PATCH v2 1/6] bswap: add 64 bit endianness helper get_be64 Ben Peart
@ 2017-05-18 20:13 ` Ben Peart
  2017-05-18 20:13 ` [PATCH v2 3/6] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files Ben Peart
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: Ben Peart @ 2017-05-18 20:13 UTC (permalink / raw)
  To: git; +Cc: gitster, benpeart, pclouds, johannes.schindelin, David.Turner,
	peff

Remove the static qualifier from lookup_untracked() and make it
available to other modules by exporting it from dir.h.  This will be
used later when we need to find entries to mark 'fsmonitor dirty.'

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 dir.c | 2 +-
 dir.h | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index f451bfa48c..1b5558fdf9 100644
--- a/dir.c
+++ b/dir.c
@@ -660,7 +660,7 @@ static void trim_trailing_spaces(char *buf)
  *
  * If "name" has the trailing slash, it'll be excluded in the search.
  */
-static struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
+struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
 						    struct untracked_cache_dir *dir,
 						    const char *name, int len)
 {
diff --git a/dir.h b/dir.h
index bf23a470af..9e387551bd 100644
--- a/dir.h
+++ b/dir.h
@@ -339,4 +339,7 @@ extern void connect_work_tree_and_git_dir(const char *work_tree, const char *git
 extern void relocate_gitdir(const char *path,
 			    const char *old_git_dir,
 			    const char *new_git_dir);
+struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
+					     struct untracked_cache_dir *dir,
+					     const char *name, int len);
 #endif
-- 
2.13.0.windows.1.6.g4597375fc3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 3/6] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
  2017-05-18 20:13 [PATCH v2 0/6] Fast git status via a file system watcher Ben Peart
  2017-05-18 20:13 ` [PATCH v2 1/6] bswap: add 64 bit endianness helper get_be64 Ben Peart
  2017-05-18 20:13 ` [PATCH v2 2/6] dir: make lookup_untracked() available outside of dir.c Ben Peart
@ 2017-05-18 20:13 ` Ben Peart
  2017-05-19 15:33   ` Ben Peart
  2017-05-24 12:30   ` Christian Couder
  2017-05-18 20:13 ` [PATCH v2 4/6] fsmonitor: add test cases for fsmonitor extension Ben Peart
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 30+ messages in thread
From: Ben Peart @ 2017-05-18 20:13 UTC (permalink / raw)
  To: git; +Cc: gitster, benpeart, pclouds, johannes.schindelin, David.Turner,
	peff

When the index is read from disk, the query-fsmonitor index extension is
used to flag the last known potentially dirty index and untracked cach
entries.

If git finds out some entries are 'fsmonitor-dirty', but are really
unchanged (e.g. the file was changed, then reverted back), then Git will
clear the marking in the extension. If git adds or updates an index
entry, it is marked 'fsmonitor-dirty' to ensure it is checked for
changes in the working directory.

Before the 'fsmonitor-dirty' flags are used to limit the scope of the
files to be checked, the query-fsmonitor hook proc is called with the
time the index was last updated.  The hook proc returns the list of
files changed since that last updated time and the list of
potentially dirty entries is updated to reflect the current state.

refresh_index() and valid_cached_dir() are updated so that any entry not
flagged as potentially dirty is not checked as it cannot have any
changes.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 Makefile               |   1 +
 builtin/update-index.c |   1 +
 cache.h                |   5 ++
 config.c               |   5 ++
 dir.c                  |  13 +++
 dir.h                  |   2 +
 entry.c                |   1 +
 environment.c          |   1 +
 fsmonitor.c            | 231 +++++++++++++++++++++++++++++++++++++++++++++++++
 fsmonitor.h            |   9 ++
 read-cache.c           |  28 +++++-
 unpack-trees.c         |   1 +
 12 files changed, 296 insertions(+), 2 deletions(-)
 create mode 100644 fsmonitor.c
 create mode 100644 fsmonitor.h

diff --git a/Makefile b/Makefile
index e35542e631..488a4466cc 100644
--- a/Makefile
+++ b/Makefile
@@ -760,6 +760,7 @@ LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec_cmd.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
+LIB_OBJS += fsmonitor.o
 LIB_OBJS += gettext.o
 LIB_OBJS += gpg-interface.o
 LIB_OBJS += graph.o
diff --git a/builtin/update-index.c b/builtin/update-index.c
index ebfc09faa0..32fd977b43 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -232,6 +232,7 @@ static int mark_ce_flags(const char *path, int flag, int mark)
 		else
 			active_cache[pos]->ce_flags &= ~flag;
 		active_cache[pos]->ce_flags |= CE_UPDATE_IN_BASE;
+		active_cache[pos]->ce_flags |= CE_FSMONITOR_DIRTY;
 		cache_tree_invalidate_path(&the_index, path);
 		active_cache_changed |= CE_ENTRY_CHANGED;
 		return 0;
diff --git a/cache.h b/cache.h
index 188811920c..36423c77cc 100644
--- a/cache.h
+++ b/cache.h
@@ -201,6 +201,7 @@ struct cache_entry {
 #define CE_ADDED             (1 << 19)
 
 #define CE_HASHED            (1 << 20)
+#define CE_FSMONITOR_DIRTY   (1 << 21)
 #define CE_WT_REMOVE         (1 << 22) /* remove in work directory */
 #define CE_CONFLICTED        (1 << 23)
 
@@ -324,6 +325,7 @@ static inline unsigned int canon_mode(unsigned int mode)
 #define CACHE_TREE_CHANGED	(1 << 5)
 #define SPLIT_INDEX_ORDERED	(1 << 6)
 #define UNTRACKED_CHANGED	(1 << 7)
+#define FSMONITOR_CHANGED	(1 << 8)
 
 struct split_index;
 struct untracked_cache;
@@ -342,6 +344,8 @@ struct index_state {
 	struct hashmap dir_hash;
 	unsigned char sha1[20];
 	struct untracked_cache *untracked;
+	time_t fsmonitor_last_update;
+	struct ewah_bitmap *fsmonitor_dirty_bitmap;
 };
 
 extern struct index_state the_index;
@@ -766,6 +770,7 @@ extern int precomposed_unicode;
 extern int protect_hfs;
 extern int protect_ntfs;
 extern int git_db_env, git_index_env, git_graft_env, git_common_dir_env;
+extern int core_fsmonitor;
 
 /*
  * Include broken refs in all ref iterations, which will
diff --git a/config.c b/config.c
index bb4d735701..1a8108636d 100644
--- a/config.c
+++ b/config.c
@@ -1232,6 +1232,11 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.fsmonitor")) {
+		core_fsmonitor = git_config_bool(var, value);
+		return 0;
+	}
+
 	/* Add other config variables here and to Documentation/config.txt. */
 	return 0;
 }
diff --git a/dir.c b/dir.c
index 1b5558fdf9..da428489e2 100644
--- a/dir.c
+++ b/dir.c
@@ -1652,6 +1652,18 @@ static int valid_cached_dir(struct dir_struct *dir,
 	if (!untracked)
 		return 0;
 
+	refresh_by_fsmonitor(&the_index);
+	if (dir->untracked->use_fsmonitor) {
+		/*
+		 * With fsmonitor, we can trust the untracked cache's
+		 * valid field.
+		 */
+		if (untracked->valid)
+			goto skip_stat;
+		else
+			invalidate_directory(dir->untracked, untracked);
+	}
+
 	if (stat(path->len ? path->buf : ".", &st)) {
 		invalidate_directory(dir->untracked, untracked);
 		memset(&untracked->stat_data, 0, sizeof(untracked->stat_data));
@@ -1665,6 +1677,7 @@ static int valid_cached_dir(struct dir_struct *dir,
 		return 0;
 	}
 
+skip_stat:
 	if (untracked->check_only != !!check_only) {
 		invalidate_directory(dir->untracked, untracked);
 		return 0;
diff --git a/dir.h b/dir.h
index 9e387551bd..ff6a00abcc 100644
--- a/dir.h
+++ b/dir.h
@@ -139,6 +139,8 @@ struct untracked_cache {
 	int gitignore_invalidated;
 	int dir_invalidated;
 	int dir_opened;
+	/* fsmonitor invalidation data */
+	unsigned int use_fsmonitor : 1;
 };
 
 struct dir_struct {
diff --git a/entry.c b/entry.c
index d2b512da90..c2d3c1079c 100644
--- a/entry.c
+++ b/entry.c
@@ -221,6 +221,7 @@ static int write_entry(struct cache_entry *ce,
 			lstat(ce->name, &st);
 		fill_stat_cache_info(ce, &st);
 		ce->ce_flags |= CE_UPDATE_IN_BASE;
+		ce->ce_flags |= CE_FSMONITOR_DIRTY;
 		state->istate->cache_changed |= CE_ENTRY_CHANGED;
 	}
 	return 0;
diff --git a/environment.c b/environment.c
index 560408953c..1afabbae8c 100644
--- a/environment.c
+++ b/environment.c
@@ -64,6 +64,7 @@ int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
 unsigned long pack_size_limit_cfg;
 enum hide_dotfiles_type hide_dotfiles = HIDE_DOTFILES_DOTGITONLY;
 enum log_refs_config log_all_ref_updates = LOG_REFS_UNSET;
+int core_fsmonitor;
 
 #ifndef PROTECT_HFS_DEFAULT
 #define PROTECT_HFS_DEFAULT 0
diff --git a/fsmonitor.c b/fsmonitor.c
new file mode 100644
index 0000000000..6356dc795e
--- /dev/null
+++ b/fsmonitor.c
@@ -0,0 +1,231 @@
+#include "cache.h"
+#include "dir.h"
+#include "ewah/ewok.h"
+#include "run-command.h"
+#include "strbuf.h"
+#include "fsmonitor.h"
+
+static struct untracked_cache_dir *find_untracked_cache_dir(
+	struct untracked_cache *uc, struct untracked_cache_dir *ucd,
+	const char *name)
+{
+	const char *end;
+	struct untracked_cache_dir *dir = ucd;
+
+	if (!*name)
+		return dir;
+
+	end = strchr(name, '/');
+	if (end) {
+		dir = lookup_untracked(uc, ucd, name, end - name);
+		if (dir)
+			return find_untracked_cache_dir(uc, dir, end + 1);
+	}
+
+	return dir;
+}
+
+/* This function will be passed to ewah_each_bit() */
+static void mark_no_fsmonitor(size_t pos, void *is)
+{
+	struct index_state *istate = is;
+	struct untracked_cache_dir *dir;
+	struct cache_entry *ce = istate->cache[pos];
+
+	assert(pos < istate->cache_nr);
+	ce->ce_flags |= CE_FSMONITOR_DIRTY;
+
+	if (!istate->untracked || !istate->untracked->root)
+		return;
+
+	dir = find_untracked_cache_dir(istate->untracked, istate->untracked->root, ce->name);
+	if (dir)
+		dir->valid = 0;
+}
+
+int read_fsmonitor_extension(struct index_state *istate, const void *data,
+	unsigned long sz)
+{
+	const char *index = data;
+	uint32_t hdr_version;
+	uint32_t ewah_size;
+	int ret;
+
+	if (sz < sizeof(uint32_t) + sizeof(uint64_t) + sizeof(uint32_t))
+		return error("corrupt fsmonitor extension (too short)");
+
+	hdr_version = get_be32(index);
+	index += sizeof(uint32_t);
+	if (hdr_version != 1)
+		return error("bad fsmonitor version %d", hdr_version);
+
+	istate->fsmonitor_last_update = (time_t)get_be64(index);
+	index += sizeof(uint64_t);
+
+	ewah_size = get_be32(index);
+	index += sizeof(uint32_t);
+
+	istate->fsmonitor_dirty_bitmap = ewah_new();
+	ret = ewah_read_mmap(istate->fsmonitor_dirty_bitmap, index, ewah_size);
+	if (ret != ewah_size) {
+		ewah_free(istate->fsmonitor_dirty_bitmap);
+		istate->fsmonitor_dirty_bitmap = NULL;
+		return error("failed to parse ewah bitmap reading fsmonitor index extension");
+	}
+
+	return 0;
+}
+
+void write_fsmonitor_extension(struct strbuf *sb, struct index_state *istate)
+{
+	uint32_t hdr_version;
+	uint64_t tm;
+	struct ewah_bitmap *bitmap;
+	int i;
+	uint32_t ewah_start;
+	uint32_t ewah_size = 0;
+	int fixup = 0;
+
+	hdr_version = htonl(1);
+	strbuf_add(sb, &hdr_version, sizeof(uint32_t));
+
+	tm = htonll((uint64_t)istate->fsmonitor_last_update);
+	strbuf_add(sb, &tm, sizeof(uint64_t));
+	fixup = sb->len;
+	strbuf_add(sb, &ewah_size, sizeof(uint32_t)); /* we'll fix this up later */
+
+	ewah_start = sb->len;
+	bitmap = ewah_new();
+	for (i = 0; i < istate->cache_nr; i++)
+		if (istate->cache[i]->ce_flags & CE_FSMONITOR_DIRTY)
+			ewah_set(bitmap, i);
+	ewah_serialize_strbuf(bitmap, sb);
+	ewah_free(bitmap);
+
+	/* fix up size field */
+	ewah_size = htonl(sb->len - ewah_start);
+	memcpy(sb->buf + fixup, &ewah_size, sizeof(uint32_t));
+}
+
+static void mark_file_dirty(struct index_state *istate, const char *name)
+{
+	struct untracked_cache_dir *dir;
+	int pos;
+
+	/* find it in the index and mark that entry as dirty */
+	pos = index_name_pos(istate, name, strlen(name));
+	if (pos >= 0)
+		istate->cache[pos]->ce_flags |= CE_FSMONITOR_DIRTY;
+
+	/*
+	 * Find the corresponding directory in the untracked cache
+	 * and mark it as invalid
+	 */
+	if (!istate->untracked || !istate->untracked->root)
+		return;
+
+	dir = find_untracked_cache_dir(istate->untracked, istate->untracked->root, name);
+	if (dir)
+		dir->valid = 0;
+}
+
+/*
+ * Call the query-fsmonitor hook passing the time of the last saved results.
+ */
+static int query_fsmonitor(time_t last_update, struct strbuf *query_result)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+	char date[64];
+	const char *argv[3];
+
+	if (!(argv[0] = find_hook("query-fsmonitor")))
+		return -1;
+
+	snprintf(date, sizeof(date), "%" PRIuMAX, (uintmax_t)last_update);
+	argv[1] = date;
+	argv[2] = NULL;
+	cp.argv = argv;
+	cp.out = -1;
+
+	return capture_command(&cp, query_result, 1024);
+}
+
+void process_fsmonitor_extension(struct index_state *istate)
+{
+	if (!istate->fsmonitor_dirty_bitmap)
+		return;
+
+	ewah_each_bit(istate->fsmonitor_dirty_bitmap, mark_no_fsmonitor, istate);
+	ewah_free(istate->fsmonitor_dirty_bitmap);
+	istate->fsmonitor_dirty_bitmap = NULL;
+}
+
+void refresh_by_fsmonitor(struct index_state *istate)
+{
+	static has_run_once = FALSE;
+	struct strbuf query_result = STRBUF_INIT;
+	int query_success = 0;
+	size_t bol = 0; /* beginning of line */
+	time_t last_update;
+	char *buf, *entry;
+	int i;
+
+	if (!core_fsmonitor || has_run_once)
+		return;
+	has_run_once = TRUE;
+
+	/*
+	 * This could be racy so save the date/time now and the hook
+	 * should be inclusive to ensure we don't miss potential changes.
+	 */
+	last_update = time(NULL);
+
+	/* If we have a last update time, call query-monitor for the set of changes since that time */
+	if (istate->fsmonitor_last_update) {
+		query_success = !query_fsmonitor(istate->fsmonitor_last_update, &query_result);
+	}
+
+	if (query_success) {
+		/* Mark all entries returned by the monitor as dirty */
+		buf = entry = query_result.buf;
+		for (i = 0; i < query_result.len; i++) {
+			if (buf[i] != '\0')
+				continue;
+			mark_file_dirty(istate, buf + bol);
+			bol = i + 1;
+		}
+		if (bol < query_result.len)
+			mark_file_dirty(istate, buf + bol);
+
+		/* Mark all clean entries up-to-date */
+		for (i = 0; i < istate->cache_nr; i++) {
+			struct cache_entry *ce = istate->cache[i];
+			if (ce_stage(ce) || (ce->ce_flags & CE_FSMONITOR_DIRTY))
+				continue;
+			ce_mark_uptodate(ce);
+		}
+
+		/*
+		 * Now that we've marked the invalid entries in the
+		 * untracked-cache itself, we can mark the untracked cache for
+		 * fsmonitor usage.
+		 */
+		if (istate->untracked) {
+			istate->untracked->use_fsmonitor = 1;
+		}
+	}
+	else {
+		/* if we can't update the cache, fall back to checking them all */
+		for (i = 0; i < istate->cache_nr; i++)
+			istate->cache[i]->ce_flags |= CE_FSMONITOR_DIRTY;
+
+		/* mark the untracked cache as unusable for fsmonitor */
+		if (istate->untracked)
+			istate->untracked->use_fsmonitor = 0;
+	}
+	strbuf_release(&query_result);
+
+	/* Now that we've updated istate, save the last_update time */
+	istate->fsmonitor_last_update = last_update;
+	istate->cache_changed |= FSMONITOR_CHANGED;
+}
diff --git a/fsmonitor.h b/fsmonitor.h
new file mode 100644
index 0000000000..57b061688f
--- /dev/null
+++ b/fsmonitor.h
@@ -0,0 +1,9 @@
+#ifndef FSMONITOR_H
+#define FSMONITOR_H
+
+int read_fsmonitor_extension(struct index_state *istate, const void *data, unsigned long sz);
+void write_fsmonitor_extension(struct strbuf *sb, struct index_state *istate);
+void process_fsmonitor_extension(struct index_state *istate);
+void refresh_by_fsmonitor(struct index_state *istate);
+
+#endif
diff --git a/read-cache.c b/read-cache.c
index 3339de8124..5b52f08db2 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -18,6 +18,7 @@
 #include "varint.h"
 #include "split-index.h"
 #include "utf8.h"
+#include "fsmonitor.h"
 
 /* Mask for the name length in ce_flags in the on-disk index */
 
@@ -37,11 +38,12 @@
 #define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */
 #define CACHE_EXT_LINK 0x6c696e6b	  /* "link" */
 #define CACHE_EXT_UNTRACKED 0x554E5452	  /* "UNTR" */
+#define CACHE_EXT_FSMONITOR 0x46534D4E	  /* "FSMN" */
 
 /* changes that can be kept in $GIT_DIR/index (basically all extensions) */
 #define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \
 		 CE_ENTRY_ADDED | CE_ENTRY_REMOVED | CE_ENTRY_CHANGED | \
-		 SPLIT_INDEX_ORDERED | UNTRACKED_CHANGED)
+		 SPLIT_INDEX_ORDERED | UNTRACKED_CHANGED | FSMONITOR_CHANGED)
 
 struct index_state the_index;
 static const char *alternate_index_output;
@@ -61,6 +63,7 @@ static void replace_index_entry(struct index_state *istate, int nr, struct cache
 	free(old);
 	set_index_entry(istate, nr, ce);
 	ce->ce_flags |= CE_UPDATE_IN_BASE;
+	ce->ce_flags |= CE_FSMONITOR_DIRTY;
 	istate->cache_changed |= CE_ENTRY_CHANGED;
 }
 
@@ -777,6 +780,7 @@ int chmod_index_entry(struct index_state *istate, struct cache_entry *ce,
 	}
 	cache_tree_invalidate_path(istate, ce->name);
 	ce->ce_flags |= CE_UPDATE_IN_BASE;
+	ce->ce_flags |= CE_FSMONITOR_DIRTY;
 	istate->cache_changed |= CE_ENTRY_CHANGED;
 
 	return 0;
@@ -1344,6 +1348,8 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	const char *added_fmt;
 	const char *unmerged_fmt;
 
+	refresh_by_fsmonitor(istate);
+
 	modified_fmt = (in_porcelain ? "M\t%s\n" : "%s: needs update\n");
 	deleted_fmt = (in_porcelain ? "D\t%s\n" : "%s: needs update\n");
 	typechange_fmt = (in_porcelain ? "T\t%s\n" : "%s needs update\n");
@@ -1380,8 +1386,11 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 			continue;
 
 		new = refresh_cache_ent(istate, ce, options, &cache_errno, &changed);
-		if (new == ce)
+		if (new == ce) {
+			ce->ce_flags &= ~CE_FSMONITOR_DIRTY;
 			continue;
+		}
+
 		if (!new) {
 			const char *fmt;
 
@@ -1391,6 +1400,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 				 */
 				ce->ce_flags &= ~CE_VALID;
 				ce->ce_flags |= CE_UPDATE_IN_BASE;
+				ce->ce_flags |= CE_FSMONITOR_DIRTY;
 				istate->cache_changed |= CE_ENTRY_CHANGED;
 			}
 			if (quiet)
@@ -1549,6 +1559,9 @@ static int read_index_extension(struct index_state *istate,
 	case CACHE_EXT_UNTRACKED:
 		istate->untracked = read_untracked_extension(data, sz);
 		break;
+	case CACHE_EXT_FSMONITOR:
+		read_fsmonitor_extension(istate, data, sz);
+		break;
 	default:
 		if (*ext < 'A' || 'Z' < *ext)
 			return error("index uses %.4s extension, which we do not understand",
@@ -1721,6 +1734,7 @@ static void post_read_index_from(struct index_state *istate)
 	check_ce_order(istate);
 	tweak_untracked_cache(istate);
 	tweak_split_index(istate);
+	process_fsmonitor_extension(istate);
 }
 
 /* remember to discard_cache() before reading a different cache! */
@@ -2300,6 +2314,16 @@ static int do_write_index(struct index_state *istate, int newfd,
 		if (err)
 			return -1;
 	}
+	if (!strip_extensions && istate->fsmonitor_last_update) {
+		struct strbuf sb = STRBUF_INIT;
+
+		write_fsmonitor_extension(&sb, istate);
+		err = write_index_ext_header(&c, newfd, CACHE_EXT_FSMONITOR, sb.len) < 0
+			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
+		strbuf_release(&sb);
+		if (err)
+			return -1;
+	}
 
 	if (ce_flush(&c, newfd, istate->sha1) || fstat(newfd, &st))
 		return -1;
diff --git a/unpack-trees.c b/unpack-trees.c
index aa15111fef..259e6960b9 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -412,6 +412,7 @@ static int apply_sparse_checkout(struct index_state *istate,
 		ce->ce_flags &= ~CE_SKIP_WORKTREE;
 	if (was_skip_worktree != ce_skip_worktree(ce)) {
 		ce->ce_flags |= CE_UPDATE_IN_BASE;
+		ce->ce_flags |= CE_FSMONITOR_DIRTY;
 		istate->cache_changed |= CE_ENTRY_CHANGED;
 	}
 
-- 
2.13.0.windows.1.6.g4597375fc3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 4/6] fsmonitor: add test cases for fsmonitor extension
  2017-05-18 20:13 [PATCH v2 0/6] Fast git status via a file system watcher Ben Peart
                   ` (2 preceding siblings ...)
  2017-05-18 20:13 ` [PATCH v2 3/6] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files Ben Peart
@ 2017-05-18 20:13 ` Ben Peart
  2017-05-20 16:55   ` Torsten Bögershausen
  2017-05-18 20:13 ` [PATCH v2 5/6] fsmonitor: add documentation for the " Ben Peart
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Ben Peart @ 2017-05-18 20:13 UTC (permalink / raw)
  To: git; +Cc: gitster, benpeart, pclouds, johannes.schindelin, David.Turner,
	peff

Add test cases that ensure status results are correct when using the new
fsmonitor extension.  Test untracked, modified, and new files by
ensuring the results are identical to when not using the extension.

Add a test to ensure updates to the index properly mark corresponding
entries in the index extension as dirty so that the status is correct
after commands that modify the index but don't trigger changes in the
working directory.

Add a test that verifies that if the fsmonitor extension doesn't tell
git about a change, it doesn't discover it on its own.  This ensures
git is honoring the extension and that we get the performance benefits
desired.

All test hooks output a marker file that is used to ensure the hook
was actually used to generate the test results.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 t/t7519-status-fsmonitor.sh | 153 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 153 insertions(+)
 create mode 100755 t/t7519-status-fsmonitor.sh

diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
new file mode 100755
index 0000000000..d3cffc758f
--- /dev/null
+++ b/t/t7519-status-fsmonitor.sh
@@ -0,0 +1,153 @@
+#!/bin/sh
+
+test_description='git status with file system watcher'
+
+. ./test-lib.sh
+
+clean_repo () {
+	git reset --hard HEAD
+	git clean -fd
+	rm marker -f
+}
+
+dirty_repo () {
+	: >untracked
+	: >dir1/untracked
+	: >dir2/untracked
+	echo 1 >modified
+	echo 2 >dir1/modified
+	echo 3 >dir2/modified
+	echo 4 >new
+	echo 5 >dir1/new
+	echo 6 >dir2/new
+	git add new
+	git add dir1/new
+	git add dir2/new
+}
+
+# The test query-fsmonitor hook proc will output a marker file we can use to
+# ensure the hook was actually used to generate the correct results.
+
+test_expect_success 'setup' '
+	mkdir -p .git/hooks &&
+	write_script .git/hooks/query-fsmonitor<<-\EOF &&
+	: >marker
+	printf "untracked\0"
+	printf "dir1/untracked\0"
+	printf "dir2/untracked\0"
+	printf "modified\0"
+	printf "dir1/modified\0"
+	printf "dir2/modified\0"
+	printf "new\0""
+	printf "dir1/new\0"
+	printf "dir2/new\0"
+	EOF
+	: >tracked &&
+	: >modified &&
+	mkdir dir1 &&
+	: >dir1/tracked &&
+	: >dir1/modified &&
+	mkdir dir2 &&
+	: >dir2/tracked &&
+	: >dir2/modified &&
+	git add . &&
+	test_tick &&
+	git commit -m initial &&
+	dirty_repo
+'
+
+cat >.gitignore <<\EOF
+.gitignore
+expect*
+output*
+marker*
+EOF
+
+# Status is well tested elsewhere so we'll just ensure that the results are
+# the same when using core.fsmonitor. First call after turning on the option
+# does a complete scan so need to do two calls to ensure we test the new
+# codepath.
+
+test_expect_success 'status with core.untrackedcache true' '
+	git config core.fsmonitor true  &&
+	git config core.untrackedcache true &&
+	git -c core.fsmonitor=false -c core.untrackedcache=true status >expect &&
+	clean_repo &&
+	git status &&
+	test_path_is_missing marker &&
+	dirty_repo &&
+	git status >output &&
+	test_path_is_file marker &&
+	test_i18ncmp expect output
+'
+
+
+test_expect_success 'status with core.untrackedcache false' '
+	git config core.fsmonitor true &&
+	git config core.untrackedcache false &&
+	git -c core.fsmonitor=false -c core.untrackedcache=false status >expect &&
+	clean_repo &&
+	git status &&
+	test_path_is_missing marker &&
+	dirty_repo &&
+	git status >output &&
+	test_path_is_file marker &&
+	test_i18ncmp expect output
+'
+
+# Ensure commands that call refresh_index() to move the index back in time
+# properly invalidate the fsmonitor cache
+
+test_expect_success 'refresh_index() invalidates fsmonitor cache' '
+	git config core.fsmonitor true &&
+	git config core.untrackedcache true &&
+	clean_repo &&
+	git status &&
+	test_path_is_missing marker &&
+	dirty_repo &&
+	write_script .git/hooks/query-fsmonitor<<-\EOF &&
+	:>marker
+	EOF
+	git add . &&
+	git commit -m "to reset" &&
+	git status &&
+	test_path_is_file marker &&
+	git reset HEAD~1 &&
+	git status >output &&
+	test_path_is_file marker &&
+	git -c core.fsmonitor=false status >expect &&
+	test_i18ncmp expect output
+'
+
+# Now make sure it's actually skipping the check for modified and untracked
+# files unless it is told about them.  Note, after "git reset --hard HEAD" no
+# extensions exist other than 'TREE' so do a "git status" to get the extension
+# written before testing the results.
+
+test_expect_success 'status doesnt detect unreported modifications' '
+	git config core.fsmonitor true &&
+	git config core.untrackedcache true &&
+	write_script .git/hooks/query-fsmonitor<<-\EOF &&
+	:>marker
+	EOF
+	clean_repo &&
+	git status &&
+	test_path_is_missing marker &&
+	: >untracked &&
+	echo 2 >dir1/modified &&
+	git status >output &&
+	test_path_is_file marker &&
+	test_i18ngrep ! "Changes not staged for commit:" output &&
+	test_i18ngrep ! "Untracked files:" output &&
+	write_script .git/hooks/query-fsmonitor<<-\EOF &&
+	:>marker
+	printf "untracked%s\0"
+	printf "dir1/modified\0"
+	EOF
+	git status >output &&
+	test_path_is_file marker &&
+	test_i18ngrep "Changes not staged for commit:" output &&
+	test_i18ngrep "Untracked files:" output
+'
+
+test_done
-- 
2.13.0.windows.1.6.g4597375fc3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 5/6] fsmonitor: add documentation for the fsmonitor extension.
  2017-05-18 20:13 [PATCH v2 0/6] Fast git status via a file system watcher Ben Peart
                   ` (3 preceding siblings ...)
  2017-05-18 20:13 ` [PATCH v2 4/6] fsmonitor: add test cases for fsmonitor extension Ben Peart
@ 2017-05-18 20:13 ` Ben Peart
  2017-05-20 11:28   ` Junio C Hamano
  2017-05-20 12:10   ` Ævar Arnfjörð Bjarmason
  2017-05-18 20:13 ` [PATCH v2 6/6] fsmonitor: add a sample query-fsmonitor hook script for Watchman Ben Peart
  2017-05-24 10:54 ` [PATCH v2 0/6] Fast git status via a file system watcher Christian Couder
  6 siblings, 2 replies; 30+ messages in thread
From: Ben Peart @ 2017-05-18 20:13 UTC (permalink / raw)
  To: git; +Cc: gitster, benpeart, pclouds, johannes.schindelin, David.Turner,
	peff

This includes the core.fsmonitor setting, the query-fsmonitor hook,
and the fsmonitor index extension.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 Documentation/config.txt                 |  7 +++++++
 Documentation/githooks.txt               | 23 +++++++++++++++++++++++
 Documentation/technical/index-format.txt | 18 ++++++++++++++++++
 3 files changed, 48 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 96e9cf8b73..4ffbf0d4c2 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -389,6 +389,13 @@ core.protectNTFS::
 	8.3 "short" names.
 	Defaults to `true` on Windows, and `false` elsewhere.
 
+core.fsmonitor::
+	If set to true, call the query-fsmonitor hook proc which will
+	identify all files that may have had changes since the last
+	request. This information is used to speed up operations like
+	'git commit' and 'git status' by limiting what git must scan to
+	detect changes.
+
 core.trustctime::
 	If false, the ctime differences between the index and the
 	working tree are ignored; useful when the inode change time
diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index 706091a569..f7b4b4a844 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -448,6 +448,29 @@ The commits are guaranteed to be listed in the order that they were
 processed by rebase.
 
 
+[[query-fsmonitor]]
+query-fsmonitor
+~~~~~~~~~~~~
+
+This hook is invoked when the configuration option core.fsmonitor is
+set and git needs to identify changed or untracked files.  It takes
+a single argument which is the time in elapsed seconds since midnight,
+January 1, 1970.
+
+The hook should output to stdout the list of all files in the working
+directory that may have changed since the requested time.  The logic
+should be inclusive so that it does not miss any potential changes.
+The paths should be relative to the root of the working directory
+and be separated by a single NUL.
+
+Git will limit what files it checks for changes as well as which
+directories are checked for untracked files based on the path names
+given.
+
+The exit status determines whether git will use the data from the
+hook to limit its search.  On error, it will fall back to verifying
+all files and folders.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
index ade0b0c445..b002d23c05 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -295,3 +295,21 @@ The remaining data of each directory block is grouped by type:
     in the previous ewah bitmap.
 
   - One NUL.
+
+== File System Monitor cache
+
+  The file system monitor cache tracks files for which the query-fsmonitor
+  hook has told us about changes.  The signature for this extension is
+  { 'F', 'S', 'M', 'N' }.
+
+  The extension starts with
+
+  - 32-bit version number: the current supported version is 1.
+
+  - 64-bit time: the extension data reflects all changes through the given
+	time which is stored as the seconds elapsed since midnight, January 1, 1970.
+
+  - 32-bit bitmap size: the size of the CE_FSMONITOR_DIRTY bitmap.
+
+  - An ewah bitmap, the n-th bit indicates whether the n-th index entry
+    is CE_FSMONITOR_DIRTY.
-- 
2.13.0.windows.1.6.g4597375fc3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 6/6] fsmonitor: add a sample query-fsmonitor hook script for Watchman
  2017-05-18 20:13 [PATCH v2 0/6] Fast git status via a file system watcher Ben Peart
                   ` (4 preceding siblings ...)
  2017-05-18 20:13 ` [PATCH v2 5/6] fsmonitor: add documentation for the " Ben Peart
@ 2017-05-18 20:13 ` Ben Peart
  2017-05-24 13:12   ` Christian Couder
  2017-05-25 21:05   ` Ævar Arnfjörð Bjarmason
  2017-05-24 10:54 ` [PATCH v2 0/6] Fast git status via a file system watcher Christian Couder
  6 siblings, 2 replies; 30+ messages in thread
From: Ben Peart @ 2017-05-18 20:13 UTC (permalink / raw)
  To: git; +Cc: gitster, benpeart, pclouds, johannes.schindelin, David.Turner,
	peff

This hook script integrates the new fsmonitor capabilities of git with
the cross platform Watchman file watching service. To use the script:

Download and install Watchman from https://facebook.github.io/watchman/
and instruct Watchman to watch your working directory for changes
('watchman watch-project /usr/src/git').

Rename the sample integration hook from query-fsmonitor.sample to
query-fsmonitor.

Configure git to use the extension ('git config core.fsmonitor true')
and optionally turn on the untracked cache for optimal performance
('git config core.untrackedcache true').

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 templates/hooks--query-fsmonitor.sample | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)
 create mode 100644 templates/hooks--query-fsmonitor.sample

diff --git a/templates/hooks--query-fsmonitor.sample b/templates/hooks--query-fsmonitor.sample
new file mode 100644
index 0000000000..4bd22f21d8
--- /dev/null
+++ b/templates/hooks--query-fsmonitor.sample
@@ -0,0 +1,27 @@
+#!/bin/sh
+#
+# An example hook script to integrate Watchman
+# (https://facebook.github.io/watchman/) with git to provide fast
+# git status.
+#
+# The hook is passed a time_t formatted as a string and outputs to
+# stdout all files that have been modified since the given time.
+# Paths must be relative to the root of the working tree and
+# separated by a single NUL.
+#
+# To enable this hook, rename this file to "query-fsmonitor"
+
+# Convert unix style paths to escaped Windows style paths
+case "$(uname -s)" in
+MINGW*|MSYS_NT*)
+  GIT_WORK_TREE="$(cygpath -aw "$PWD" | sed 's,\\,\\\\,g')"
+  ;;
+*)
+  GIT_WORK_TREE="$PWD"
+  ;;
+esac
+
+# Query Watchman for all the changes since the requested time
+echo "[\"query\", \"$GIT_WORK_TREE\", {\"since\": $1, \"fields\":[\"name\"]}]" | \
+watchman -j | \
+perl -e 'use JSON::PP; my $o = JSON::PP->new->utf8->decode(join("", <>)); die "Watchman: $o->{'error'}.\nFalling back to scanning...\n" if defined($o->{"error"}); print(join("\0", @{$o->{"files"}}));'
-- 
2.13.0.windows.1.6.g4597375fc3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 3/6] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
  2017-05-18 20:13 ` [PATCH v2 3/6] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files Ben Peart
@ 2017-05-19 15:33   ` Ben Peart
  2017-05-20 10:41     ` Junio C Hamano
  2017-05-24 12:30   ` Christian Couder
  1 sibling, 1 reply; 30+ messages in thread
From: Ben Peart @ 2017-05-19 15:33 UTC (permalink / raw)
  To: git; +Cc: gitster, benpeart, pclouds, johannes.schindelin, David.Turner,
	peff

After sending this, I noticed that using a different compiler generated 
a couple of warnings I should fix.  I'm assuming I'll need another 
re-roll but if not, here are the changes that will be squashed in.

Ben



diff --git a/dir.c b/dir.c
index da428489e2..37f1c580a5 100644
--- a/dir.c
+++ b/dir.c
@@ -16,6 +16,7 @@
  #include "utf8.h"
  #include "varint.h"
  #include "ewah/ewok.h"
+#include "fsmonitor.h"


diff --git a/fsmonitor.c b/fsmonitor.c
index 6356dc795e..9f08e66db9 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -162,7 +162,7 @@ void process_fsmonitor_extension(struct index_state 
*istate)

  void refresh_by_fsmonitor(struct index_state *istate)
  {
-       static has_run_once = FALSE;
+       static int has_run_once = FALSE;
         struct strbuf query_result = STRBUF_INIT;
         int query_success = 0;
         size_t bol = 0; /* beginning of line */


On 5/18/2017 4:13 PM, Ben Peart wrote:
> When the index is read from disk, the query-fsmonitor index extension is
> used to flag the last known potentially dirty index and untracked cach
> entries.
>
> If git finds out some entries are 'fsmonitor-dirty', but are really
> unchanged (e.g. the file was changed, then reverted back), then Git will
> clear the marking in the extension. If git adds or updates an index
> entry, it is marked 'fsmonitor-dirty' to ensure it is checked for
> changes in the working directory.
>
> Before the 'fsmonitor-dirty' flags are used to limit the scope of the
> files to be checked, the query-fsmonitor hook proc is called with the
> time the index was last updated.  The hook proc returns the list of
> files changed since that last updated time and the list of
> potentially dirty entries is updated to reflect the current state.
>
> refresh_index() and valid_cached_dir() are updated so that any entry not
> flagged as potentially dirty is not checked as it cannot have any
> changes.
>
> Signed-off-by: Ben Peart <benpeart@microsoft.com>
> ---
>  Makefile               |   1 +
>  builtin/update-index.c |   1 +
>  cache.h                |   5 ++
>  config.c               |   5 ++
>  dir.c                  |  13 +++
>  dir.h                  |   2 +
>  entry.c                |   1 +
>  environment.c          |   1 +
>  fsmonitor.c            | 231 +++++++++++++++++++++++++++++++++++++++++++++++++
>  fsmonitor.h            |   9 ++
>  read-cache.c           |  28 +++++-
>  unpack-trees.c         |   1 +
>  12 files changed, 296 insertions(+), 2 deletions(-)
>  create mode 100644 fsmonitor.c
>  create mode 100644 fsmonitor.h
>
> diff --git a/Makefile b/Makefile
> index e35542e631..488a4466cc 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -760,6 +760,7 @@ LIB_OBJS += ewah/ewah_rlw.o
>  LIB_OBJS += exec_cmd.o
>  LIB_OBJS += fetch-pack.o
>  LIB_OBJS += fsck.o
> +LIB_OBJS += fsmonitor.o
>  LIB_OBJS += gettext.o
>  LIB_OBJS += gpg-interface.o
>  LIB_OBJS += graph.o
> diff --git a/builtin/update-index.c b/builtin/update-index.c
> index ebfc09faa0..32fd977b43 100644
> --- a/builtin/update-index.c
> +++ b/builtin/update-index.c
> @@ -232,6 +232,7 @@ static int mark_ce_flags(const char *path, int flag, int mark)
>  		else
>  			active_cache[pos]->ce_flags &= ~flag;
>  		active_cache[pos]->ce_flags |= CE_UPDATE_IN_BASE;
> +		active_cache[pos]->ce_flags |= CE_FSMONITOR_DIRTY;
>  		cache_tree_invalidate_path(&the_index, path);
>  		active_cache_changed |= CE_ENTRY_CHANGED;
>  		return 0;
> diff --git a/cache.h b/cache.h
> index 188811920c..36423c77cc 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -201,6 +201,7 @@ struct cache_entry {
>  #define CE_ADDED             (1 << 19)
>
>  #define CE_HASHED            (1 << 20)
> +#define CE_FSMONITOR_DIRTY   (1 << 21)
>  #define CE_WT_REMOVE         (1 << 22) /* remove in work directory */
>  #define CE_CONFLICTED        (1 << 23)
>
> @@ -324,6 +325,7 @@ static inline unsigned int canon_mode(unsigned int mode)
>  #define CACHE_TREE_CHANGED	(1 << 5)
>  #define SPLIT_INDEX_ORDERED	(1 << 6)
>  #define UNTRACKED_CHANGED	(1 << 7)
> +#define FSMONITOR_CHANGED	(1 << 8)
>
>  struct split_index;
>  struct untracked_cache;
> @@ -342,6 +344,8 @@ struct index_state {
>  	struct hashmap dir_hash;
>  	unsigned char sha1[20];
>  	struct untracked_cache *untracked;
> +	time_t fsmonitor_last_update;
> +	struct ewah_bitmap *fsmonitor_dirty_bitmap;
>  };
>
>  extern struct index_state the_index;
> @@ -766,6 +770,7 @@ extern int precomposed_unicode;
>  extern int protect_hfs;
>  extern int protect_ntfs;
>  extern int git_db_env, git_index_env, git_graft_env, git_common_dir_env;
> +extern int core_fsmonitor;
>
>  /*
>   * Include broken refs in all ref iterations, which will
> diff --git a/config.c b/config.c
> index bb4d735701..1a8108636d 100644
> --- a/config.c
> +++ b/config.c
> @@ -1232,6 +1232,11 @@ static int git_default_core_config(const char *var, const char *value)
>  		return 0;
>  	}
>
> +	if (!strcmp(var, "core.fsmonitor")) {
> +		core_fsmonitor = git_config_bool(var, value);
> +		return 0;
> +	}
> +
>  	/* Add other config variables here and to Documentation/config.txt. */
>  	return 0;
>  }
> diff --git a/dir.c b/dir.c
> index 1b5558fdf9..da428489e2 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1652,6 +1652,18 @@ static int valid_cached_dir(struct dir_struct *dir,
>  	if (!untracked)
>  		return 0;
>
> +	refresh_by_fsmonitor(&the_index);
> +	if (dir->untracked->use_fsmonitor) {
> +		/*
> +		 * With fsmonitor, we can trust the untracked cache's
> +		 * valid field.
> +		 */
> +		if (untracked->valid)
> +			goto skip_stat;
> +		else
> +			invalidate_directory(dir->untracked, untracked);
> +	}
> +
>  	if (stat(path->len ? path->buf : ".", &st)) {
>  		invalidate_directory(dir->untracked, untracked);
>  		memset(&untracked->stat_data, 0, sizeof(untracked->stat_data));
> @@ -1665,6 +1677,7 @@ static int valid_cached_dir(struct dir_struct *dir,
>  		return 0;
>  	}
>
> +skip_stat:
>  	if (untracked->check_only != !!check_only) {
>  		invalidate_directory(dir->untracked, untracked);
>  		return 0;
> diff --git a/dir.h b/dir.h
> index 9e387551bd..ff6a00abcc 100644
> --- a/dir.h
> +++ b/dir.h
> @@ -139,6 +139,8 @@ struct untracked_cache {
>  	int gitignore_invalidated;
>  	int dir_invalidated;
>  	int dir_opened;
> +	/* fsmonitor invalidation data */
> +	unsigned int use_fsmonitor : 1;
>  };
>
>  struct dir_struct {
> diff --git a/entry.c b/entry.c
> index d2b512da90..c2d3c1079c 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -221,6 +221,7 @@ static int write_entry(struct cache_entry *ce,
>  			lstat(ce->name, &st);
>  		fill_stat_cache_info(ce, &st);
>  		ce->ce_flags |= CE_UPDATE_IN_BASE;
> +		ce->ce_flags |= CE_FSMONITOR_DIRTY;
>  		state->istate->cache_changed |= CE_ENTRY_CHANGED;
>  	}
>  	return 0;
> diff --git a/environment.c b/environment.c
> index 560408953c..1afabbae8c 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -64,6 +64,7 @@ int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
>  unsigned long pack_size_limit_cfg;
>  enum hide_dotfiles_type hide_dotfiles = HIDE_DOTFILES_DOTGITONLY;
>  enum log_refs_config log_all_ref_updates = LOG_REFS_UNSET;
> +int core_fsmonitor;
>
>  #ifndef PROTECT_HFS_DEFAULT
>  #define PROTECT_HFS_DEFAULT 0
> diff --git a/fsmonitor.c b/fsmonitor.c
> new file mode 100644
> index 0000000000..6356dc795e
> --- /dev/null
> +++ b/fsmonitor.c
> @@ -0,0 +1,231 @@
> +#include "cache.h"
> +#include "dir.h"
> +#include "ewah/ewok.h"
> +#include "run-command.h"
> +#include "strbuf.h"
> +#include "fsmonitor.h"
> +
> +static struct untracked_cache_dir *find_untracked_cache_dir(
> +	struct untracked_cache *uc, struct untracked_cache_dir *ucd,
> +	const char *name)
> +{
> +	const char *end;
> +	struct untracked_cache_dir *dir = ucd;
> +
> +	if (!*name)
> +		return dir;
> +
> +	end = strchr(name, '/');
> +	if (end) {
> +		dir = lookup_untracked(uc, ucd, name, end - name);
> +		if (dir)
> +			return find_untracked_cache_dir(uc, dir, end + 1);
> +	}
> +
> +	return dir;
> +}
> +
> +/* This function will be passed to ewah_each_bit() */
> +static void mark_no_fsmonitor(size_t pos, void *is)
> +{
> +	struct index_state *istate = is;
> +	struct untracked_cache_dir *dir;
> +	struct cache_entry *ce = istate->cache[pos];
> +
> +	assert(pos < istate->cache_nr);
> +	ce->ce_flags |= CE_FSMONITOR_DIRTY;
> +
> +	if (!istate->untracked || !istate->untracked->root)
> +		return;
> +
> +	dir = find_untracked_cache_dir(istate->untracked, istate->untracked->root, ce->name);
> +	if (dir)
> +		dir->valid = 0;
> +}
> +
> +int read_fsmonitor_extension(struct index_state *istate, const void *data,
> +	unsigned long sz)
> +{
> +	const char *index = data;
> +	uint32_t hdr_version;
> +	uint32_t ewah_size;
> +	int ret;
> +
> +	if (sz < sizeof(uint32_t) + sizeof(uint64_t) + sizeof(uint32_t))
> +		return error("corrupt fsmonitor extension (too short)");
> +
> +	hdr_version = get_be32(index);
> +	index += sizeof(uint32_t);
> +	if (hdr_version != 1)
> +		return error("bad fsmonitor version %d", hdr_version);
> +
> +	istate->fsmonitor_last_update = (time_t)get_be64(index);
> +	index += sizeof(uint64_t);
> +
> +	ewah_size = get_be32(index);
> +	index += sizeof(uint32_t);
> +
> +	istate->fsmonitor_dirty_bitmap = ewah_new();
> +	ret = ewah_read_mmap(istate->fsmonitor_dirty_bitmap, index, ewah_size);
> +	if (ret != ewah_size) {
> +		ewah_free(istate->fsmonitor_dirty_bitmap);
> +		istate->fsmonitor_dirty_bitmap = NULL;
> +		return error("failed to parse ewah bitmap reading fsmonitor index extension");
> +	}
> +
> +	return 0;
> +}
> +
> +void write_fsmonitor_extension(struct strbuf *sb, struct index_state *istate)
> +{
> +	uint32_t hdr_version;
> +	uint64_t tm;
> +	struct ewah_bitmap *bitmap;
> +	int i;
> +	uint32_t ewah_start;
> +	uint32_t ewah_size = 0;
> +	int fixup = 0;
> +
> +	hdr_version = htonl(1);
> +	strbuf_add(sb, &hdr_version, sizeof(uint32_t));
> +
> +	tm = htonll((uint64_t)istate->fsmonitor_last_update);
> +	strbuf_add(sb, &tm, sizeof(uint64_t));
> +	fixup = sb->len;
> +	strbuf_add(sb, &ewah_size, sizeof(uint32_t)); /* we'll fix this up later */
> +
> +	ewah_start = sb->len;
> +	bitmap = ewah_new();
> +	for (i = 0; i < istate->cache_nr; i++)
> +		if (istate->cache[i]->ce_flags & CE_FSMONITOR_DIRTY)
> +			ewah_set(bitmap, i);
> +	ewah_serialize_strbuf(bitmap, sb);
> +	ewah_free(bitmap);
> +
> +	/* fix up size field */
> +	ewah_size = htonl(sb->len - ewah_start);
> +	memcpy(sb->buf + fixup, &ewah_size, sizeof(uint32_t));
> +}
> +
> +static void mark_file_dirty(struct index_state *istate, const char *name)
> +{
> +	struct untracked_cache_dir *dir;
> +	int pos;
> +
> +	/* find it in the index and mark that entry as dirty */
> +	pos = index_name_pos(istate, name, strlen(name));
> +	if (pos >= 0)
> +		istate->cache[pos]->ce_flags |= CE_FSMONITOR_DIRTY;
> +
> +	/*
> +	 * Find the corresponding directory in the untracked cache
> +	 * and mark it as invalid
> +	 */
> +	if (!istate->untracked || !istate->untracked->root)
> +		return;
> +
> +	dir = find_untracked_cache_dir(istate->untracked, istate->untracked->root, name);
> +	if (dir)
> +		dir->valid = 0;
> +}
> +
> +/*
> + * Call the query-fsmonitor hook passing the time of the last saved results.
> + */
> +static int query_fsmonitor(time_t last_update, struct strbuf *query_result)
> +{
> +	struct child_process cp = CHILD_PROCESS_INIT;
> +	char date[64];
> +	const char *argv[3];
> +
> +	if (!(argv[0] = find_hook("query-fsmonitor")))
> +		return -1;
> +
> +	snprintf(date, sizeof(date), "%" PRIuMAX, (uintmax_t)last_update);
> +	argv[1] = date;
> +	argv[2] = NULL;
> +	cp.argv = argv;
> +	cp.out = -1;
> +
> +	return capture_command(&cp, query_result, 1024);
> +}
> +
> +void process_fsmonitor_extension(struct index_state *istate)
> +{
> +	if (!istate->fsmonitor_dirty_bitmap)
> +		return;
> +
> +	ewah_each_bit(istate->fsmonitor_dirty_bitmap, mark_no_fsmonitor, istate);
> +	ewah_free(istate->fsmonitor_dirty_bitmap);
> +	istate->fsmonitor_dirty_bitmap = NULL;
> +}
> +
> +void refresh_by_fsmonitor(struct index_state *istate)
> +{
> +	static has_run_once = FALSE;
> +	struct strbuf query_result = STRBUF_INIT;
> +	int query_success = 0;
> +	size_t bol = 0; /* beginning of line */
> +	time_t last_update;
> +	char *buf, *entry;
> +	int i;
> +
> +	if (!core_fsmonitor || has_run_once)
> +		return;
> +	has_run_once = TRUE;
> +
> +	/*
> +	 * This could be racy so save the date/time now and the hook
> +	 * should be inclusive to ensure we don't miss potential changes.
> +	 */
> +	last_update = time(NULL);
> +
> +	/* If we have a last update time, call query-monitor for the set of changes since that time */
> +	if (istate->fsmonitor_last_update) {
> +		query_success = !query_fsmonitor(istate->fsmonitor_last_update, &query_result);
> +	}
> +
> +	if (query_success) {
> +		/* Mark all entries returned by the monitor as dirty */
> +		buf = entry = query_result.buf;
> +		for (i = 0; i < query_result.len; i++) {
> +			if (buf[i] != '\0')
> +				continue;
> +			mark_file_dirty(istate, buf + bol);
> +			bol = i + 1;
> +		}
> +		if (bol < query_result.len)
> +			mark_file_dirty(istate, buf + bol);
> +
> +		/* Mark all clean entries up-to-date */
> +		for (i = 0; i < istate->cache_nr; i++) {
> +			struct cache_entry *ce = istate->cache[i];
> +			if (ce_stage(ce) || (ce->ce_flags & CE_FSMONITOR_DIRTY))
> +				continue;
> +			ce_mark_uptodate(ce);
> +		}
> +
> +		/*
> +		 * Now that we've marked the invalid entries in the
> +		 * untracked-cache itself, we can mark the untracked cache for
> +		 * fsmonitor usage.
> +		 */
> +		if (istate->untracked) {
> +			istate->untracked->use_fsmonitor = 1;
> +		}
> +	}
> +	else {
> +		/* if we can't update the cache, fall back to checking them all */
> +		for (i = 0; i < istate->cache_nr; i++)
> +			istate->cache[i]->ce_flags |= CE_FSMONITOR_DIRTY;
> +
> +		/* mark the untracked cache as unusable for fsmonitor */
> +		if (istate->untracked)
> +			istate->untracked->use_fsmonitor = 0;
> +	}
> +	strbuf_release(&query_result);
> +
> +	/* Now that we've updated istate, save the last_update time */
> +	istate->fsmonitor_last_update = last_update;
> +	istate->cache_changed |= FSMONITOR_CHANGED;
> +}
> diff --git a/fsmonitor.h b/fsmonitor.h
> new file mode 100644
> index 0000000000..57b061688f
> --- /dev/null
> +++ b/fsmonitor.h
> @@ -0,0 +1,9 @@
> +#ifndef FSMONITOR_H
> +#define FSMONITOR_H
> +
> +int read_fsmonitor_extension(struct index_state *istate, const void *data, unsigned long sz);
> +void write_fsmonitor_extension(struct strbuf *sb, struct index_state *istate);
> +void process_fsmonitor_extension(struct index_state *istate);
> +void refresh_by_fsmonitor(struct index_state *istate);
> +
> +#endif
> diff --git a/read-cache.c b/read-cache.c
> index 3339de8124..5b52f08db2 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -18,6 +18,7 @@
>  #include "varint.h"
>  #include "split-index.h"
>  #include "utf8.h"
> +#include "fsmonitor.h"
>
>  /* Mask for the name length in ce_flags in the on-disk index */
>
> @@ -37,11 +38,12 @@
>  #define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */
>  #define CACHE_EXT_LINK 0x6c696e6b	  /* "link" */
>  #define CACHE_EXT_UNTRACKED 0x554E5452	  /* "UNTR" */
> +#define CACHE_EXT_FSMONITOR 0x46534D4E	  /* "FSMN" */
>
>  /* changes that can be kept in $GIT_DIR/index (basically all extensions) */
>  #define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \
>  		 CE_ENTRY_ADDED | CE_ENTRY_REMOVED | CE_ENTRY_CHANGED | \
> -		 SPLIT_INDEX_ORDERED | UNTRACKED_CHANGED)
> +		 SPLIT_INDEX_ORDERED | UNTRACKED_CHANGED | FSMONITOR_CHANGED)
>
>  struct index_state the_index;
>  static const char *alternate_index_output;
> @@ -61,6 +63,7 @@ static void replace_index_entry(struct index_state *istate, int nr, struct cache
>  	free(old);
>  	set_index_entry(istate, nr, ce);
>  	ce->ce_flags |= CE_UPDATE_IN_BASE;
> +	ce->ce_flags |= CE_FSMONITOR_DIRTY;
>  	istate->cache_changed |= CE_ENTRY_CHANGED;
>  }
>
> @@ -777,6 +780,7 @@ int chmod_index_entry(struct index_state *istate, struct cache_entry *ce,
>  	}
>  	cache_tree_invalidate_path(istate, ce->name);
>  	ce->ce_flags |= CE_UPDATE_IN_BASE;
> +	ce->ce_flags |= CE_FSMONITOR_DIRTY;
>  	istate->cache_changed |= CE_ENTRY_CHANGED;
>
>  	return 0;
> @@ -1344,6 +1348,8 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>  	const char *added_fmt;
>  	const char *unmerged_fmt;
>
> +	refresh_by_fsmonitor(istate);
> +
>  	modified_fmt = (in_porcelain ? "M\t%s\n" : "%s: needs update\n");
>  	deleted_fmt = (in_porcelain ? "D\t%s\n" : "%s: needs update\n");
>  	typechange_fmt = (in_porcelain ? "T\t%s\n" : "%s needs update\n");
> @@ -1380,8 +1386,11 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>  			continue;
>
>  		new = refresh_cache_ent(istate, ce, options, &cache_errno, &changed);
> -		if (new == ce)
> +		if (new == ce) {
> +			ce->ce_flags &= ~CE_FSMONITOR_DIRTY;
>  			continue;
> +		}
> +
>  		if (!new) {
>  			const char *fmt;
>
> @@ -1391,6 +1400,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>  				 */
>  				ce->ce_flags &= ~CE_VALID;
>  				ce->ce_flags |= CE_UPDATE_IN_BASE;
> +				ce->ce_flags |= CE_FSMONITOR_DIRTY;
>  				istate->cache_changed |= CE_ENTRY_CHANGED;
>  			}
>  			if (quiet)
> @@ -1549,6 +1559,9 @@ static int read_index_extension(struct index_state *istate,
>  	case CACHE_EXT_UNTRACKED:
>  		istate->untracked = read_untracked_extension(data, sz);
>  		break;
> +	case CACHE_EXT_FSMONITOR:
> +		read_fsmonitor_extension(istate, data, sz);
> +		break;
>  	default:
>  		if (*ext < 'A' || 'Z' < *ext)
>  			return error("index uses %.4s extension, which we do not understand",
> @@ -1721,6 +1734,7 @@ static void post_read_index_from(struct index_state *istate)
>  	check_ce_order(istate);
>  	tweak_untracked_cache(istate);
>  	tweak_split_index(istate);
> +	process_fsmonitor_extension(istate);
>  }
>
>  /* remember to discard_cache() before reading a different cache! */
> @@ -2300,6 +2314,16 @@ static int do_write_index(struct index_state *istate, int newfd,
>  		if (err)
>  			return -1;
>  	}
> +	if (!strip_extensions && istate->fsmonitor_last_update) {
> +		struct strbuf sb = STRBUF_INIT;
> +
> +		write_fsmonitor_extension(&sb, istate);
> +		err = write_index_ext_header(&c, newfd, CACHE_EXT_FSMONITOR, sb.len) < 0
> +			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
> +		strbuf_release(&sb);
> +		if (err)
> +			return -1;
> +	}
>
>  	if (ce_flush(&c, newfd, istate->sha1) || fstat(newfd, &st))
>  		return -1;
> diff --git a/unpack-trees.c b/unpack-trees.c
> index aa15111fef..259e6960b9 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -412,6 +412,7 @@ static int apply_sparse_checkout(struct index_state *istate,
>  		ce->ce_flags &= ~CE_SKIP_WORKTREE;
>  	if (was_skip_worktree != ce_skip_worktree(ce)) {
>  		ce->ce_flags |= CE_UPDATE_IN_BASE;
> +		ce->ce_flags |= CE_FSMONITOR_DIRTY;
>  		istate->cache_changed |= CE_ENTRY_CHANGED;
>  	}
>
>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 3/6] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
  2017-05-19 15:33   ` Ben Peart
@ 2017-05-20 10:41     ` Junio C Hamano
  0 siblings, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2017-05-20 10:41 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, benpeart, pclouds, johannes.schindelin, David.Turner, peff

Ben Peart <peartben@gmail.com> writes:

> After sending this, I noticed that using a different compiler
> generated a couple of warnings I should fix.  I'm assuming I'll need
> another re-roll but if not, here are the changes that will be squashed
> in.
>
> Ben

Thanks, in addition, I am missing the definition of TRUE and FALSE.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 5/6] fsmonitor: add documentation for the fsmonitor extension.
  2017-05-18 20:13 ` [PATCH v2 5/6] fsmonitor: add documentation for the " Ben Peart
@ 2017-05-20 11:28   ` Junio C Hamano
  2017-05-20 12:10   ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 30+ messages in thread
From: Junio C Hamano @ 2017-05-20 11:28 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, benpeart, pclouds, johannes.schindelin, David.Turner, peff

Ben Peart <peartben@gmail.com> writes:

> diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
> index 706091a569..f7b4b4a844 100644
> --- a/Documentation/githooks.txt
> +++ b/Documentation/githooks.txt
> @@ -448,6 +448,29 @@ The commits are guaranteed to be listed in the order that they were
>  processed by rebase.
>  
>  
> +[[query-fsmonitor]]
> +query-fsmonitor
> +~~~~~~~~~~~~

This underline is short by 3 '~' for the string it underlines.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 5/6] fsmonitor: add documentation for the fsmonitor extension.
  2017-05-18 20:13 ` [PATCH v2 5/6] fsmonitor: add documentation for the " Ben Peart
  2017-05-20 11:28   ` Junio C Hamano
@ 2017-05-20 12:10   ` Ævar Arnfjörð Bjarmason
  2017-05-22 16:18     ` Ben Peart
  1 sibling, 1 reply; 30+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-20 12:10 UTC (permalink / raw)
  To: Ben Peart
  Cc: Git Mailing List, Junio C Hamano, benpeart,
	Nguyễn Thái Ngọc Duy, Johannes Schindelin,
	David.Turner, Jeff King

On Thu, May 18, 2017 at 10:13 PM, Ben Peart <peartben@gmail.com> wrote:
> This includes the core.fsmonitor setting, the query-fsmonitor hook,
> and the fsmonitor index extension.
>
> Signed-off-by: Ben Peart <benpeart@microsoft.com>
> ---
>  Documentation/config.txt                 |  7 +++++++
>  Documentation/githooks.txt               | 23 +++++++++++++++++++++++
>  Documentation/technical/index-format.txt | 18 ++++++++++++++++++
>  3 files changed, 48 insertions(+)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 96e9cf8b73..4ffbf0d4c2 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -389,6 +389,13 @@ core.protectNTFS::
>         8.3 "short" names.
>         Defaults to `true` on Windows, and `false` elsewhere.
>
> +core.fsmonitor::
> +       If set to true, call the query-fsmonitor hook proc which will
> +       identify all files that may have had changes since the last
> +       request. This information is used to speed up operations like
> +       'git commit' and 'git status' by limiting what git must scan to
> +       detect changes.
> +
>  core.trustctime::
>         If false, the ctime differences between the index and the
>         working tree are ignored; useful when the inode change time
> diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
> index 706091a569..f7b4b4a844 100644
> --- a/Documentation/githooks.txt
> +++ b/Documentation/githooks.txt
> @@ -448,6 +448,29 @@ The commits are guaranteed to be listed in the order that they were
>  processed by rebase.
>
>
> +[[query-fsmonitor]]
> +query-fsmonitor
> +~~~~~~~~~~~~
> +
> +This hook is invoked when the configuration option core.fsmonitor is
> +set and git needs to identify changed or untracked files.  It takes
> +a single argument which is the time in elapsed seconds since midnight,
> +January 1, 1970.
> +
> +The hook should output to stdout the list of all files in the working
> +directory that may have changed since the requested time.  The logic
> +should be inclusive so that it does not miss any potential changes.
> +The paths should be relative to the root of the working directory
> +and be separated by a single NUL.
> +
> +Git will limit what files it checks for changes as well as which
> +directories are checked for untracked files based on the path names
> +given.
> +
> +The exit status determines whether git will use the data from the
> +hook to limit its search.  On error, it will fall back to verifying
> +all files and folders.
> +
>  GIT
>  ---
>  Part of the linkgit:git[1] suite
> diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
> index ade0b0c445..b002d23c05 100644
> --- a/Documentation/technical/index-format.txt
> +++ b/Documentation/technical/index-format.txt
> @@ -295,3 +295,21 @@ The remaining data of each directory block is grouped by type:
>      in the previous ewah bitmap.
>
>    - One NUL.
> +
> +== File System Monitor cache
> +
> +  The file system monitor cache tracks files for which the query-fsmonitor
> +  hook has told us about changes.  The signature for this extension is
> +  { 'F', 'S', 'M', 'N' }.
> +
> +  The extension starts with
> +
> +  - 32-bit version number: the current supported version is 1.
> +
> +  - 64-bit time: the extension data reflects all changes through the given
> +       time which is stored as the seconds elapsed since midnight, January 1, 1970.
> +
> +  - 32-bit bitmap size: the size of the CE_FSMONITOR_DIRTY bitmap.
> +
> +  - An ewah bitmap, the n-th bit indicates whether the n-th index entry
> +    is CE_FSMONITOR_DIRTY.

We already have a uint64_t in one place in the codebase (getnanotime)
which uses a 64 bit time for nanosecond accuracy, and numerous
filesystems already support nanosecond timestamps (ext4, that new
Apple thingy...).

I don't know if any of the inotify/fsmonitor APIs support that yet,
but it seems inevitable that that'll be added if not, in some
pathological cases we can have a lot of files modified in 1 second, so
using nanosecond accuracy means there'll be a lot less data to
consider in some cases.

It does mean this'll only work until the year ~2500, but that seems
like an acceptable trade-off.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 4/6] fsmonitor: add test cases for fsmonitor extension
  2017-05-18 20:13 ` [PATCH v2 4/6] fsmonitor: add test cases for fsmonitor extension Ben Peart
@ 2017-05-20 16:55   ` Torsten Bögershausen
  0 siblings, 0 replies; 30+ messages in thread
From: Torsten Bögershausen @ 2017-05-20 16:55 UTC (permalink / raw)
  To: Ben Peart, git
  Cc: gitster, benpeart, pclouds, johannes.schindelin, David.Turner,
	peff

> diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
> new file mode 100755
> index 0000000000..d3cffc758f
> --- /dev/null
> +++ b/t/t7519-status-fsmonitor.sh
> @@ -0,0 +1,153 @@
> +#!/bin/sh
> +
> +test_description='git status with file system watcher'
> +
> +. ./test-lib.sh
> +
> +clean_repo () {
> +	git reset --hard HEAD
> +	git clean -fd
> +	rm marker -f

This Works under Linux, but not e.g. Mac OS X. Should be

rm -f marker

> +}


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 5/6] fsmonitor: add documentation for the fsmonitor extension.
  2017-05-20 12:10   ` Ævar Arnfjörð Bjarmason
@ 2017-05-22 16:18     ` Ben Peart
  2017-05-22 17:28       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 30+ messages in thread
From: Ben Peart @ 2017-05-22 16:18 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano, benpeart,
	Nguyễn Thái Ngọc Duy, Johannes Schindelin,
	David.Turner, Jeff King



On 5/20/2017 8:10 AM, Ævar Arnfjörð Bjarmason wrote:
>> +== File System Monitor cache
>> +
>> +  The file system monitor cache tracks files for which the query-fsmonitor
>> +  hook has told us about changes.  The signature for this extension is
>> +  { 'F', 'S', 'M', 'N' }.
>> +
>> +  The extension starts with
>> +
>> +  - 32-bit version number: the current supported version is 1.
>> +
>> +  - 64-bit time: the extension data reflects all changes through the given
>> +       time which is stored as the seconds elapsed since midnight, January 1, 1970.
>> +
>> +  - 32-bit bitmap size: the size of the CE_FSMONITOR_DIRTY bitmap.
>> +
>> +  - An ewah bitmap, the n-th bit indicates whether the n-th index entry
>> +    is CE_FSMONITOR_DIRTY.
>
> We already have a uint64_t in one place in the codebase (getnanotime)
> which uses a 64 bit time for nanosecond accuracy, and numerous
> filesystems already support nanosecond timestamps (ext4, that new
> Apple thingy...).
>
> I don't know if any of the inotify/fsmonitor APIs support that yet,
> but it seems inevitable that that'll be added if not, in some
> pathological cases we can have a lot of files modified in 1 second, so
> using nanosecond accuracy means there'll be a lot less data to
> consider in some cases.
>
> It does mean this'll only work until the year ~2500, but that seems
> like an acceptable trade-off.
>

I really don't think nano-second resolution is needed in this case for a 
few reasons.

The number of files that can change within a given second is limited by 
the IO throughput of the underlying device. Even assuming a very fast 
device and very small files and changes, this won't be that many files.

Without this patch, git would have scanned all those files every time. 
With this patch, git will only scan those files a 2nd time that are 
modified in the same second that it did the first scan *that came before 
the first scan started* (the "lots of files modified" section in the 1 
second timeline below).

|------------------------- one second ---------------------|
|-lots of files modified - git status - more file modified-|

Yes, some duplicate status checks can be made but its still a 
significant win in any reasonable scenario. Especially when you consider 
that it is pretty unusual to do git status/add/commit calls in the 
middle of making lots of changes to files.

In addition, the backing file system monitor (Watchman) supports number 
of seconds since the unix epoch (unix time_t style).  This means any 
support of nano seconds by git is academic until someone provides a file 
system watcher that does support nano second granularity.

Finally, the fsmonitor index extension is versioned so that we can 
seamlessly upgrade to nano second resolution later if we desire.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 5/6] fsmonitor: add documentation for the fsmonitor extension.
  2017-05-22 16:18     ` Ben Peart
@ 2017-05-22 17:28       ` Ævar Arnfjörð Bjarmason
  2017-05-25 13:49         ` Ben Peart
  0 siblings, 1 reply; 30+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-22 17:28 UTC (permalink / raw)
  To: Ben Peart
  Cc: Git Mailing List, Junio C Hamano, benpeart,
	Nguyễn Thái Ngọc Duy, Johannes Schindelin,
	David.Turner, Jeff King

On Mon, May 22, 2017 at 6:18 PM, Ben Peart <peartben@gmail.com> wrote:
> On 5/20/2017 8:10 AM, Ævar Arnfjörð Bjarmason wrote:
>>>
>>> +== File System Monitor cache
>>> +
>>> +  The file system monitor cache tracks files for which the
>>> query-fsmonitor
>>> +  hook has told us about changes.  The signature for this extension is
>>> +  { 'F', 'S', 'M', 'N' }.
>>> +
>>> +  The extension starts with
>>> +
>>> +  - 32-bit version number: the current supported version is 1.
>>> +
>>> +  - 64-bit time: the extension data reflects all changes through the
>>> given
>>> +       time which is stored as the seconds elapsed since midnight,
>>> January 1, 1970.
>>> +
>>> +  - 32-bit bitmap size: the size of the CE_FSMONITOR_DIRTY bitmap.
>>> +
>>> +  - An ewah bitmap, the n-th bit indicates whether the n-th index entry
>>> +    is CE_FSMONITOR_DIRTY.
>>
>>
>> We already have a uint64_t in one place in the codebase (getnanotime)
>> which uses a 64 bit time for nanosecond accuracy, and numerous
>> filesystems already support nanosecond timestamps (ext4, that new
>> Apple thingy...).
>>
>> I don't know if any of the inotify/fsmonitor APIs support that yet,
>> but it seems inevitable that that'll be added if not, in some
>> pathological cases we can have a lot of files modified in 1 second, so
>> using nanosecond accuracy means there'll be a lot less data to
>> consider in some cases.
>>
>> It does mean this'll only work until the year ~2500, but that seems
>> like an acceptable trade-off.
>>
>
> I really don't think nano-second resolution is needed in this case for a few
> reasons.
>
> The number of files that can change within a given second is limited by the
> IO throughput of the underlying device. Even assuming a very fast device and
> very small files and changes, this won't be that many files.
>
> Without this patch, git would have scanned all those files every time. With
> this patch, git will only scan those files a 2nd time that are modified in
> the same second that it did the first scan *that came before the first scan
> started* (the "lots of files modified" section in the 1 second timeline
> below).
>
> |------------------------- one second ---------------------|
> |-lots of files modified - git status - more file modified-|
>
> Yes, some duplicate status checks can be made but its still a significant
> win in any reasonable scenario. Especially when you consider that it is
> pretty unusual to do git status/add/commit calls in the middle of making
> lots of changes to files.

I understand that we get most of the wins either way.

I'm just wondering why not make it nanosecond-resolution now from the
beginning since that's where major filesystems are heading already,
and changing stuff like this can be a hassle once it's initially out
there, whereas just dividing by 10^9 for APIs that need seconds from
the beginning is easy & future-proof.

There are some uses of git where this would probably matter in practice.

E.g. consider a git-annex backed fileserver using ext4, currently
git-annex comes with its own FS watcher as a daemon invoked via `git
annex watch` to constantly add new files without killing performance
via a status/add in a loop, with this a daemon like that could just
run status/add in a loop, but on a busy repo the 1s interval size
might start to matter as you're constantly inspecting larger
intervals.

More importantly though, I can't think of any case where having it in
nanoseconds to begin with would do any harm.

> In addition, the backing file system monitor (Watchman) supports number of
> seconds since the unix epoch (unix time_t style).  This means any support of
> nano seconds by git is academic until someone provides a file system watcher
> that does support nano second granularity.

I haven't used watchman for anything non-trivial, but the
documentation for the query API you seem to be using says you can
request a {ctime,mtime}_ns field:

https://github.com/facebook/watchman/blob/master/website/_docs/cmd.query.markdown#user-content-available-fields

And isn't this the documentation for the "since" query you're using,
saying you can specify any arbitrary fs timing field, such as a _ns
time field:

https://github.com/facebook/watchman/blob/master/website/_docs/expr.since.md

?

> Finally, the fsmonitor index extension is versioned so that we can
> seamlessly upgrade to nano second resolution later if we desire.

Aside from my nanosecond bikeshedding, let's say we change the
semantics of any of this in the future: The index has the version, but
there's one argument passed to the hook without a version. Is the hook
expected to potentially be reading the version from the index to make
sense of whether (in this case) the argument is a mtime or mtime_ns?

Wouldn't this make more sense than that on top, i.e. pass the version
to the hook, untested (and probably whines about the sprintf
format...):

$ git diff -U1
diff --git a/cache.h b/cache.h
index 6eafd34fff..3c63f179f8 100644
--- a/cache.h
+++ b/cache.h
@@ -346,2 +346,3 @@ struct index_state {
        struct untracked_cache *untracked;
+       uint32_t fsmonitor_version;
        time_t fsmonitor_last_update;
diff --git a/fsmonitor.c b/fsmonitor.c
index f5a9f818e8..7236b538ac 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -60,2 +60,4 @@ int read_fsmonitor_extension(struct index_state
*istate, const void *data,
                return error("bad fsmonitor version %d", hdr_version);
+       else
+               istate->fsmonitor_version = hdr_version;

@@ -137,2 +139,3 @@ static int query_fsmonitor(time_t last_update,
struct strbuf *query_result)
        struct child_process cp = CHILD_PROCESS_INIT;
+       char version[1];
        char date[64];
@@ -143,2 +146,3 @@ static int query_fsmonitor(time_t last_update,
struct strbuf *query_result)

+       snprintf(version, sizeof(version), "%d", istate->fsmonitor_version);
        snprintf(date, sizeof(date), "%" PRIuMAX, (uintmax_t)last_update)

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 0/6] Fast git status via a file system watcher
  2017-05-18 20:13 [PATCH v2 0/6] Fast git status via a file system watcher Ben Peart
                   ` (5 preceding siblings ...)
  2017-05-18 20:13 ` [PATCH v2 6/6] fsmonitor: add a sample query-fsmonitor hook script for Watchman Ben Peart
@ 2017-05-24 10:54 ` Christian Couder
  2017-05-25 13:55   ` Ben Peart
  6 siblings, 1 reply; 30+ messages in thread
From: Christian Couder @ 2017-05-24 10:54 UTC (permalink / raw)
  To: Ben Peart
  Cc: git, Junio C Hamano, Ben Peart, Nguyen Thai Ngoc Duy,
	Johannes Schindelin, David Turner, Jeff King

> Design
> ~~~~~~
>
> A new git hook (query-fsmonitor) must exist and be enabled
> (core.fsmonitor=true) that takes a time_t formatted as a string and
> outputs to stdout all files that have been modified since the requested
> time.

Is there a reason why there is a new hook, instead of a
"core.fsmonitorquery" config option to which you could pass whatever
command line with options?

> A new 'fsmonitor' index extension has been added to store the time the
> fsmonitor hook was last queried and a ewah bitmap of the current
> 'fsmonitor-dirty' files. Unmarked entries are 'fsmonitor-clean', marked
> entries are 'fsmonitor-dirty.'
>
> As needed, git will call the query-fsmonitor hook proc for the set of
> changes since the index was last updated. Git then uses this set of
> files along with the list saved in the fsmonitor index extension to flag
> the potentially dirty index and untracked cache entries.

So this can work only if "core.untrackedCache" is set to true?

Thanks for working on this,
Christian.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 3/6] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
  2017-05-18 20:13 ` [PATCH v2 3/6] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files Ben Peart
  2017-05-19 15:33   ` Ben Peart
@ 2017-05-24 12:30   ` Christian Couder
  1 sibling, 0 replies; 30+ messages in thread
From: Christian Couder @ 2017-05-24 12:30 UTC (permalink / raw)
  To: Ben Peart
  Cc: git, Junio C Hamano, Ben Peart, Nguyen Thai Ngoc Duy,
	Johannes Schindelin, David Turner, Jeff King

On Thu, May 18, 2017 at 10:13 PM, Ben Peart <peartben@gmail.com> wrote:
> When the index is read from disk, the query-fsmonitor index extension is
> used to flag the last known potentially dirty index and untracked cach

s/cach/cache/

> entries.

[...]

> diff --git a/cache.h b/cache.h
> index 188811920c..36423c77cc 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -201,6 +201,7 @@ struct cache_entry {
>  #define CE_ADDED             (1 << 19)
>
>  #define CE_HASHED            (1 << 20)
> +#define CE_FSMONITOR_DIRTY   (1 << 21)

It looks like the (1 << 21) value was used before (as CE_UNHASHED) and
was removed in:

419a597f64 (name-hash.c: remove cache entries instead of marking them
CE_UNHASHED, 2013-11-14)

I wondered if using this value again could make old and new versions
of git interact badly, but it looks like these are in memory only
flags, so it should be ok.

>  #define CE_WT_REMOVE         (1 << 22) /* remove in work directory */
>  #define CE_CONFLICTED        (1 << 23)
>
>  struct split_index;
>  struct untracked_cache;
> @@ -342,6 +344,8 @@ struct index_state {
>         struct hashmap dir_hash;
>         unsigned char sha1[20];
>         struct untracked_cache *untracked;
> +       time_t fsmonitor_last_update;
> +       struct ewah_bitmap *fsmonitor_dirty_bitmap;

Maybe you could remove "_bitmap" at the end of the name.

>  };


> diff --git a/dir.c b/dir.c
> index 1b5558fdf9..da428489e2 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1652,6 +1652,18 @@ static int valid_cached_dir(struct dir_struct *dir,
>         if (!untracked)
>                 return 0;
>
> +       refresh_by_fsmonitor(&the_index);
> +       if (dir->untracked->use_fsmonitor) {
> +               /*
> +                * With fsmonitor, we can trust the untracked cache's
> +                * valid field.
> +                */
> +               if (untracked->valid)
> +                       goto skip_stat;

Maybe you could avoid this goto using a valid_cached_dir_after_stat()
function that would do what is after the "skip_stat:" label below?

> +               else
> +                       invalidate_directory(dir->untracked, untracked);
> +       }
> +
>         if (stat(path->len ? path->buf : ".", &st)) {
>                 invalidate_directory(dir->untracked, untracked);
>                 memset(&untracked->stat_data, 0, sizeof(untracked->stat_data));
> @@ -1665,6 +1677,7 @@ static int valid_cached_dir(struct dir_struct *dir,
>                 return 0;
>         }
>
> +skip_stat:
>         if (untracked->check_only != !!check_only) {
>                 invalidate_directory(dir->untracked, untracked);
>                 return 0;

[...]

> +void refresh_by_fsmonitor(struct index_state *istate)
> +{
> +       static has_run_once = FALSE;
> +       struct strbuf query_result = STRBUF_INIT;
> +       int query_success = 0;
> +       size_t bol = 0; /* beginning of line */
> +       time_t last_update;
> +       char *buf, *entry;
> +       int i;
> +
> +       if (!core_fsmonitor || has_run_once)
> +               return;
> +       has_run_once = TRUE;
> +
> +       /*
> +        * This could be racy so save the date/time now and the hook
> +        * should be inclusive to ensure we don't miss potential changes.
> +        */
> +       last_update = time(NULL);
> +
> +       /* If we have a last update time, call query-monitor for the set of changes since that time */
> +       if (istate->fsmonitor_last_update) {
> +               query_success = !query_fsmonitor(istate->fsmonitor_last_update, &query_result);
> +       }

Braces can be removed.

> +       if (query_success) {
> +               /* Mark all entries returned by the monitor as dirty */
> +               buf = entry = query_result.buf;
> +               for (i = 0; i < query_result.len; i++) {
> +                       if (buf[i] != '\0')
> +                               continue;
> +                       mark_file_dirty(istate, buf + bol);
> +                       bol = i + 1;
> +               }
> +               if (bol < query_result.len)
> +                       mark_file_dirty(istate, buf + bol);
> +
> +               /* Mark all clean entries up-to-date */
> +               for (i = 0; i < istate->cache_nr; i++) {
> +                       struct cache_entry *ce = istate->cache[i];
> +                       if (ce_stage(ce) || (ce->ce_flags & CE_FSMONITOR_DIRTY))
> +                               continue;
> +                       ce_mark_uptodate(ce);
> +               }
> +
> +               /*
> +                * Now that we've marked the invalid entries in the
> +                * untracked-cache itself, we can mark the untracked cache for
> +                * fsmonitor usage.
> +                */
> +               if (istate->untracked) {
> +                       istate->untracked->use_fsmonitor = 1;
> +               }

Braces can be removed.

> +       }
> +       else {
> +               /* if we can't update the cache, fall back to checking them all */
> +               for (i = 0; i < istate->cache_nr; i++)
> +                       istate->cache[i]->ce_flags |= CE_FSMONITOR_DIRTY;
> +
> +               /* mark the untracked cache as unusable for fsmonitor */
> +               if (istate->untracked)
> +                       istate->untracked->use_fsmonitor = 0;
> +       }
> +       strbuf_release(&query_result);
> +
> +       /* Now that we've updated istate, save the last_update time */
> +       istate->fsmonitor_last_update = last_update;
> +       istate->cache_changed |= FSMONITOR_CHANGED;
> +}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 6/6] fsmonitor: add a sample query-fsmonitor hook script for Watchman
  2017-05-18 20:13 ` [PATCH v2 6/6] fsmonitor: add a sample query-fsmonitor hook script for Watchman Ben Peart
@ 2017-05-24 13:12   ` Christian Couder
  2017-05-26  9:47     ` Ævar Arnfjörð Bjarmason
  2017-05-25 21:05   ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 30+ messages in thread
From: Christian Couder @ 2017-05-24 13:12 UTC (permalink / raw)
  To: Ben Peart
  Cc: git, Junio C Hamano, Ben Peart, Nguyen Thai Ngoc Duy,
	Johannes Schindelin, David Turner, Jeff King

On Thu, May 18, 2017 at 10:13 PM, Ben Peart <peartben@gmail.com> wrote:
> This hook script integrates the new fsmonitor capabilities of git with
> the cross platform Watchman file watching service. To use the script:
>
> Download and install Watchman from https://facebook.github.io/watchman/
> and instruct Watchman to watch your working directory for changes
> ('watchman watch-project /usr/src/git').
>
> Rename the sample integration hook from query-fsmonitor.sample to
> query-fsmonitor.
>
> Configure git to use the extension ('git config core.fsmonitor true')
> and optionally turn on the untracked cache for optimal performance
> ('git config core.untrackedcache true').

Ok, it looks like the untracked cache can be used, but it could work without it.

> Signed-off-by: Ben Peart <benpeart@microsoft.com>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  templates/hooks--query-fsmonitor.sample | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
>  create mode 100644 templates/hooks--query-fsmonitor.sample
>
> diff --git a/templates/hooks--query-fsmonitor.sample b/templates/hooks--query-fsmonitor.sample
> new file mode 100644
> index 0000000000..4bd22f21d8
> --- /dev/null
> +++ b/templates/hooks--query-fsmonitor.sample
> @@ -0,0 +1,27 @@
> +#!/bin/sh
> +#
> +# An example hook script to integrate Watchman
> +# (https://facebook.github.io/watchman/) with git to provide fast
> +# git status.
> +#
> +# The hook is passed a time_t formatted as a string and outputs to
> +# stdout all files that have been modified since the given time.
> +# Paths must be relative to the root of the working tree and
> +# separated by a single NUL.
> +#
> +# To enable this hook, rename this file to "query-fsmonitor"
> +
> +# Convert unix style paths to escaped Windows style paths
> +case "$(uname -s)" in
> +MINGW*|MSYS_NT*)
> +  GIT_WORK_TREE="$(cygpath -aw "$PWD" | sed 's,\\,\\\\,g')"
> +  ;;
> +*)
> +  GIT_WORK_TREE="$PWD"
> +  ;;
> +esac
> +
> +# Query Watchman for all the changes since the requested time
> +echo "[\"query\", \"$GIT_WORK_TREE\", {\"since\": $1, \"fields\":[\"name\"]}]" | \
> +watchman -j | \
> +perl -e 'use JSON::PP; my $o = JSON::PP->new->utf8->decode(join("", <>)); die "Watchman: $o->{'error'}.\nFalling back to scanning...\n" if defined($o->{"error"}); print(join("\0", @{$o->{"files"}}));'

Maybe put the above small perl script on multiple lines for improved
readability.

Is JSON::PP always available in Perl >= 5.8?
What happens if watchman is not installed or not in the PATH?
It seems to me that the git process will not die, and will just work
as if the hook does not exist, except that it will call the hook which
will probably output error messages.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 5/6] fsmonitor: add documentation for the fsmonitor extension.
  2017-05-22 17:28       ` Ævar Arnfjörð Bjarmason
@ 2017-05-25 13:49         ` Ben Peart
  0 siblings, 0 replies; 30+ messages in thread
From: Ben Peart @ 2017-05-25 13:49 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano, benpeart,
	Nguyễn Thái Ngọc Duy, Johannes Schindelin,
	David.Turner, Jeff King



On 5/22/2017 1:28 PM, Ævar Arnfjörð Bjarmason wrote:
> On Mon, May 22, 2017 at 6:18 PM, Ben Peart <peartben@gmail.com> wrote:
>> On 5/20/2017 8:10 AM, Ævar Arnfjörð Bjarmason wrote:
>>>>
>>>> +== File System Monitor cache
>>>> +
>>>> +  The file system monitor cache tracks files for which the
>>>> query-fsmonitor
>>>> +  hook has told us about changes.  The signature for this extension is
>>>> +  { 'F', 'S', 'M', 'N' }.
>>>> +
>>>> +  The extension starts with
>>>> +
>>>> +  - 32-bit version number: the current supported version is 1.
>>>> +
>>>> +  - 64-bit time: the extension data reflects all changes through the
>>>> given
>>>> +       time which is stored as the seconds elapsed since midnight,
>>>> January 1, 1970.
>>>> +
>>>> +  - 32-bit bitmap size: the size of the CE_FSMONITOR_DIRTY bitmap.
>>>> +
>>>> +  - An ewah bitmap, the n-th bit indicates whether the n-th index entry
>>>> +    is CE_FSMONITOR_DIRTY.
>>>
>>>
>>> We already have a uint64_t in one place in the codebase (getnanotime)
>>> which uses a 64 bit time for nanosecond accuracy, and numerous
>>> filesystems already support nanosecond timestamps (ext4, that new
>>> Apple thingy...).
>>>
>>> I don't know if any of the inotify/fsmonitor APIs support that yet,
>>> but it seems inevitable that that'll be added if not, in some
>>> pathological cases we can have a lot of files modified in 1 second, so
>>> using nanosecond accuracy means there'll be a lot less data to
>>> consider in some cases.
>>>
>>> It does mean this'll only work until the year ~2500, but that seems
>>> like an acceptable trade-off.
>>>
>>
>> I really don't think nano-second resolution is needed in this case for a few
>> reasons.
>>
>> The number of files that can change within a given second is limited by the
>> IO throughput of the underlying device. Even assuming a very fast device and
>> very small files and changes, this won't be that many files.
>>
>> Without this patch, git would have scanned all those files every time. With
>> this patch, git will only scan those files a 2nd time that are modified in
>> the same second that it did the first scan *that came before the first scan
>> started* (the "lots of files modified" section in the 1 second timeline
>> below).
>>
>> |------------------------- one second ---------------------|
>> |-lots of files modified - git status - more file modified-|
>>
>> Yes, some duplicate status checks can be made but its still a significant
>> win in any reasonable scenario. Especially when you consider that it is
>> pretty unusual to do git status/add/commit calls in the middle of making
>> lots of changes to files.
> 
> I understand that we get most of the wins either way.
> 
> I'm just wondering why not make it nanosecond-resolution now from the
> beginning since that's where major filesystems are heading already,
> and changing stuff like this can be a hassle once it's initially out
> there, whereas just dividing by 10^9 for APIs that need seconds from
> the beginning is easy & future-proof.
> 
> There are some uses of git where this would probably matter in practice.
> 
> E.g. consider a git-annex backed fileserver using ext4, currently
> git-annex comes with its own FS watcher as a daemon invoked via `git
> annex watch` to constantly add new files without killing performance
> via a status/add in a loop, with this a daemon like that could just
> run status/add in a loop, but on a busy repo the 1s interval size
> might start to matter as you're constantly inspecting larger
> intervals.
> 
> More importantly though, I can't think of any case where having it in
> nanoseconds to begin with would do any harm.

You're right, it's not hard to support nano second resolution and it 
doesn't do any harm.  I switch the index format and interface as I don't 
expect this code will still be running when the timer rolls over. 
Someone long after me will have to fix it if it is. :)

> 
>> In addition, the backing file system monitor (Watchman) supports number of
>> seconds since the unix epoch (unix time_t style).  This means any support of
>> nano seconds by git is academic until someone provides a file system watcher
>> that does support nano second granularity.
> 
> I haven't used watchman for anything non-trivial, but the
> documentation for the query API you seem to be using says you can
> request a {ctime,mtime}_ns field:
> 
> https://github.com/facebook/watchman/blob/master/website/_docs/cmd.query.markdown#user-content-available-fields
> 
> And isn't this the documentation for the "since" query you're using,
> saying you can specify any arbitrary fs timing field, such as a _ns
> time field:
> 
> https://github.com/facebook/watchman/blob/master/website/_docs/expr.since.md
> 
> ?

To keep the index extension and hook interface generic, I have not 
adopted the Watchman specific clock format.  This enables anyone to 
provide a different file system watcher that can inter-operate as nano 
seconds since epoc is easy for anyone to support.

> 
>> Finally, the fsmonitor index extension is versioned so that we can
>> seamlessly upgrade to nano second resolution later if we desire.
> 
> Aside from my nanosecond bikeshedding, let's say we change the
> semantics of any of this in the future: The index has the version, but
> there's one argument passed to the hook without a version. Is the hook
> expected to potentially be reading the version from the index to make
> sense of whether (in this case) the argument is a mtime or mtime_ns?
> 
> Wouldn't this make more sense than that on top, i.e. pass the version
> to the hook, untested (and probably whines about the sprintf
> format...):

It's a good point that the index extension is versioned and the hook 
interface is not.  I've not seen versioning in any hook interface to 
date but its never to late to start.  I'll add a separate version to the 
hook interface as well so they can be updated independently if needed.

> 
> $ git diff -U1
> diff --git a/cache.h b/cache.h
> index 6eafd34fff..3c63f179f8 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -346,2 +346,3 @@ struct index_state {
>          struct untracked_cache *untracked;
> +       uint32_t fsmonitor_version;
>          time_t fsmonitor_last_update;
> diff --git a/fsmonitor.c b/fsmonitor.c
> index f5a9f818e8..7236b538ac 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -60,2 +60,4 @@ int read_fsmonitor_extension(struct index_state
> *istate, const void *data,
>                  return error("bad fsmonitor version %d", hdr_version);
> +       else
> +               istate->fsmonitor_version = hdr_version;
> 
> @@ -137,2 +139,3 @@ static int query_fsmonitor(time_t last_update,
> struct strbuf *query_result)
>          struct child_process cp = CHILD_PROCESS_INIT;
> +       char version[1];
>          char date[64];
> @@ -143,2 +146,3 @@ static int query_fsmonitor(time_t last_update,
> struct strbuf *query_result)
> 
> +       snprintf(version, sizeof(version), "%d", istate->fsmonitor_version);
>          snprintf(date, sizeof(date), "%" PRIuMAX, (uintmax_t)last_update)
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 0/6] Fast git status via a file system watcher
  2017-05-24 10:54 ` [PATCH v2 0/6] Fast git status via a file system watcher Christian Couder
@ 2017-05-25 13:55   ` Ben Peart
  2017-05-27  6:57     ` Christian Couder
  2017-05-31  7:59     ` Christian Couder
  0 siblings, 2 replies; 30+ messages in thread
From: Ben Peart @ 2017-05-25 13:55 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Junio C Hamano, Ben Peart, Nguyen Thai Ngoc Duy,
	Johannes Schindelin, David Turner, Jeff King



On 5/24/2017 6:54 AM, Christian Couder wrote:
>> Design
>> ~~~~~~
>>
>> A new git hook (query-fsmonitor) must exist and be enabled
>> (core.fsmonitor=true) that takes a time_t formatted as a string and
>> outputs to stdout all files that have been modified since the requested
>> time.
> 
> Is there a reason why there is a new hook, instead of a
> "core.fsmonitorquery" config option to which you could pass whatever
> command line with options?

A hook is a simple and well defined way to integrate git with another 
process.  If there is some fixed set of arguments that need to be passed 
to a file system monitor (beyond the timestamp stored in the index 
extension), they can be encoded in the integration script like I've done 
in the Watchman integration sample hook.

> 
>> A new 'fsmonitor' index extension has been added to store the time the
>> fsmonitor hook was last queried and a ewah bitmap of the current
>> 'fsmonitor-dirty' files. Unmarked entries are 'fsmonitor-clean', marked
>> entries are 'fsmonitor-dirty.'
>>
>> As needed, git will call the query-fsmonitor hook proc for the set of
>> changes since the index was last updated. Git then uses this set of
>> files along with the list saved in the fsmonitor index extension to flag
>> the potentially dirty index and untracked cache entries.
> 
> So this can work only if "core.untrackedCache" is set to true?
> 

This works with core.untrackedCache set to true or false.  If it is set 
to false, you get valid results, you just don't get the speed up when 
checking for untracked files.

> Thanks for working on this,
> Christian.
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 6/6] fsmonitor: add a sample query-fsmonitor hook script for Watchman
  2017-05-18 20:13 ` [PATCH v2 6/6] fsmonitor: add a sample query-fsmonitor hook script for Watchman Ben Peart
  2017-05-24 13:12   ` Christian Couder
@ 2017-05-25 21:05   ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 30+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-25 21:05 UTC (permalink / raw)
  To: Ben Peart
  Cc: Git Mailing List, Junio C Hamano, benpeart,
	Nguyễn Thái Ngọc Duy, Johannes Schindelin,
	David.Turner, Jeff King

On Thu, May 18, 2017 at 10:13 PM, Ben Peart <peartben@gmail.com> wrote:
> This hook script integrates the new fsmonitor capabilities of git with
> the cross platform Watchman file watching service. To use the script:
>
> Download and install Watchman from https://facebook.github.io/watchman/
> and instruct Watchman to watch your working directory for changes
> ('watchman watch-project /usr/src/git').
>
> Rename the sample integration hook from query-fsmonitor.sample to
> query-fsmonitor.
>
> Configure git to use the extension ('git config core.fsmonitor true')
> and optionally turn on the untracked cache for optimal performance
> ('git config core.untrackedcache true').
>
> Signed-off-by: Ben Peart <benpeart@microsoft.com>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  templates/hooks--query-fsmonitor.sample | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
>  create mode 100644 templates/hooks--query-fsmonitor.sample
>
> diff --git a/templates/hooks--query-fsmonitor.sample b/templates/hooks--query-fsmonitor.sample
> new file mode 100644
> index 0000000000..4bd22f21d8
> --- /dev/null
> +++ b/templates/hooks--query-fsmonitor.sample
> @@ -0,0 +1,27 @@
> +#!/bin/sh
> +#
> +# An example hook script to integrate Watchman
> +# (https://facebook.github.io/watchman/) with git to provide fast
> +# git status.
> +#
> +# The hook is passed a time_t formatted as a string and outputs to
> +# stdout all files that have been modified since the given time.
> +# Paths must be relative to the root of the working tree and
> +# separated by a single NUL.
> +#
> +# To enable this hook, rename this file to "query-fsmonitor"
> +
> +# Convert unix style paths to escaped Windows style paths
> +case "$(uname -s)" in
> +MINGW*|MSYS_NT*)
> +  GIT_WORK_TREE="$(cygpath -aw "$PWD" | sed 's,\\,\\\\,g')"
> +  ;;
> +*)
> +  GIT_WORK_TREE="$PWD"
> +  ;;
> +esac
> +
> +# Query Watchman for all the changes since the requested time
> +echo "[\"query\", \"$GIT_WORK_TREE\", {\"since\": $1, \"fields\":[\"name\"]}]" | \
> +watchman -j | \
> +perl -e 'use JSON::PP; my $o = JSON::PP->new->utf8->decode(join("", <>)); die "Watchman: $o->{'error'}.\nFalling back to scanning...\n" if defined($o->{"error"}); print(join("\0", @{$o->{"files"}}));'

I couldn't get watchman to work for me (built from source, keep
getting [1]), but I hacked up something you can hopefully test &
squash on top of this:

 # Query Watchman for all the changes since the requested time
-echo "[\"query\", \"$GIT_WORK_TREE\", {\"since\": $time_t,
\"fields\":[\"name\"]}]" | \
-watchman -j | \
-perl -e 'use JSON::PP; my $o = JSON::PP->new->utf8->decode(join("",
<>)); die "Watchman: $o->{'error'}.\nFalling back to scanning...\n" if
defined($o-
+echo "[\"query\", \"$GIT_WORK_TREE\", {\"since\": $time_t,
\"fields\":[\"name\"]}]" |
+       watchman -j |
+       perl -0666 -e '
+               use strict;
+               use warnings;
+
+               my $stdin = <>;
+               die "Watchman: The watchman command returned no
output, error above?\n" if $stdin eq "";
+               die "Watchman: Invalid input: $stdin\n" unless $stdin =~ /^\{/;
+
+               my $json_pkg;
+               eval {
+                       require JSON::XS;
+                       $json_pkg = "JSON::XS";
+                       1;
+               } or do {
+                       require JSON::PP;
+                       $json_pkg = "JSON::PP";
+               };
+
+               my $o = $json_pkg->new->utf8->decode($stdin);
+               die "Watchman: $o->{error}.\nFalling back to scanning...\n"
+                       if $o->{error};
+
+               local $, = "\0";
+               print @{$o->{files}};
+       '

Rationale:

 * We use the much faster JSON::XS if it's installed.
 * We use strict/warnings
 * Micro optimization: Replace joining stdin with an equivalent -0666
   invocation. See "perldoc perlrun".
 * Micro optimization: No need to join up the possibly large list of
   files into one big string, just set $, to \0 and stream out the
   array.
 * Error handling: My watchman is broken (so actually this isn't
   tested), it only spews to stderr and exits. Handle that by checking
   whether stdin is "".

Those changes are available at
https://github.com/avar/git/commits/avar/fsmonitor



1. watchman: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version
`CXXABI_1.3.11' not found (required by watchman)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 6/6] fsmonitor: add a sample query-fsmonitor hook script for Watchman
  2017-05-24 13:12   ` Christian Couder
@ 2017-05-26  9:47     ` Ævar Arnfjörð Bjarmason
  2017-05-26 16:02       ` Ben Peart
  0 siblings, 1 reply; 30+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-26  9:47 UTC (permalink / raw)
  To: Christian Couder
  Cc: Ben Peart, git, Junio C Hamano, Ben Peart, Nguyen Thai Ngoc Duy,
	Johannes Schindelin, David Turner, Jeff King

On Wed, May 24, 2017 at 3:12 PM, Christian Couder
<christian.couder@gmail.com> wrote:
> On Thu, May 18, 2017 at 10:13 PM, Ben Peart <peartben@gmail.com> wrote:
>> This hook script integrates the new fsmonitor capabilities of git with
>> the cross platform Watchman file watching service. To use the script:
>>
>> Download and install Watchman from https://facebook.github.io/watchman/
>> and instruct Watchman to watch your working directory for changes
>> ('watchman watch-project /usr/src/git').
>>
>> Rename the sample integration hook from query-fsmonitor.sample to
>> query-fsmonitor.
>>
>> Configure git to use the extension ('git config core.fsmonitor true')
>> and optionally turn on the untracked cache for optimal performance
>> ('git config core.untrackedcache true').
>
> Ok, it looks like the untracked cache can be used, but it could work without it.
>
>> Signed-off-by: Ben Peart <benpeart@microsoft.com>
>> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>> ---
>>  templates/hooks--query-fsmonitor.sample | 27 +++++++++++++++++++++++++++
>>  1 file changed, 27 insertions(+)
>>  create mode 100644 templates/hooks--query-fsmonitor.sample
>>
>> diff --git a/templates/hooks--query-fsmonitor.sample b/templates/hooks--query-fsmonitor.sample
>> new file mode 100644
>> index 0000000000..4bd22f21d8
>> --- /dev/null
>> +++ b/templates/hooks--query-fsmonitor.sample
>> @@ -0,0 +1,27 @@
>> +#!/bin/sh
>> +#
>> +# An example hook script to integrate Watchman
>> +# (https://facebook.github.io/watchman/) with git to provide fast
>> +# git status.
>> +#
>> +# The hook is passed a time_t formatted as a string and outputs to
>> +# stdout all files that have been modified since the given time.
>> +# Paths must be relative to the root of the working tree and
>> +# separated by a single NUL.
>> +#
>> +# To enable this hook, rename this file to "query-fsmonitor"
>> +
>> +# Convert unix style paths to escaped Windows style paths
>> +case "$(uname -s)" in
>> +MINGW*|MSYS_NT*)
>> +  GIT_WORK_TREE="$(cygpath -aw "$PWD" | sed 's,\\,\\\\,g')"
>> +  ;;
>> +*)
>> +  GIT_WORK_TREE="$PWD"
>> +  ;;
>> +esac
>> +
>> +# Query Watchman for all the changes since the requested time
>> +echo "[\"query\", \"$GIT_WORK_TREE\", {\"since\": $1, \"fields\":[\"name\"]}]" | \
>> +watchman -j | \
>> +perl -e 'use JSON::PP; my $o = JSON::PP->new->utf8->decode(join("", <>)); die "Watchman: $o->{'error'}.\nFalling back to scanning...\n" if defined($o->{"error"}); print(join("\0", @{$o->{"files"}}));'
>
> Maybe put the above small perl script on multiple lines for improved
> readability.
>
> Is JSON::PP always available in Perl >= 5.8?

No, it's shipped with perl as of 5.14.0, which came out in May 2011.

I think depending on that is fine. FWIW we bumped the required core
dependency (but you might still need to install modules) in 2010 in my
d48b284183 ("perl: bump the required Perl version to 5.8 from
5.6.[21]", 2010-09-24). Maybe we should be bumping that again...

> What happens if watchman is not installed or not in the PATH?
> It seems to me that the git process will not die, and will just work
> as if the hook does not exist, except that it will call the hook which
> will probably output error messages.

I think a good solution to this is to just add "set -euo pipefail" to
that script.

Then we'll print out on stderr that we couldn't find the watchman
command. Right now (with my patch) it'll make its way to the perl
program and get empty input.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 6/6] fsmonitor: add a sample query-fsmonitor hook script for Watchman
  2017-05-26  9:47     ` Ævar Arnfjörð Bjarmason
@ 2017-05-26 16:02       ` Ben Peart
  0 siblings, 0 replies; 30+ messages in thread
From: Ben Peart @ 2017-05-26 16:02 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Christian Couder
  Cc: git, Junio C Hamano, Ben Peart, Nguyen Thai Ngoc Duy,
	Johannes Schindelin, David Turner, Jeff King



On 5/26/2017 5:47 AM, Ævar Arnfjörð Bjarmason wrote:
> On Wed, May 24, 2017 at 3:12 PM, Christian Couder
> <christian.couder@gmail.com> wrote:
>> On Thu, May 18, 2017 at 10:13 PM, Ben Peart <peartben@gmail.com> wrote:
>>> This hook script integrates the new fsmonitor capabilities of git with
>>> the cross platform Watchman file watching service. To use the script:
>>>
>>> Download and install Watchman from https://facebook.github.io/watchman/
>>> and instruct Watchman to watch your working directory for changes
>>> ('watchman watch-project /usr/src/git').
>>>
>>> Rename the sample integration hook from query-fsmonitor.sample to
>>> query-fsmonitor.
>>>
>>> Configure git to use the extension ('git config core.fsmonitor true')
>>> and optionally turn on the untracked cache for optimal performance
>>> ('git config core.untrackedcache true').
>>
>> Ok, it looks like the untracked cache can be used, but it could work without it.
>>
>>> Signed-off-by: Ben Peart <benpeart@microsoft.com>
>>> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>>> ---
>>>   templates/hooks--query-fsmonitor.sample | 27 +++++++++++++++++++++++++++
>>>   1 file changed, 27 insertions(+)
>>>   create mode 100644 templates/hooks--query-fsmonitor.sample
>>>
>>> diff --git a/templates/hooks--query-fsmonitor.sample b/templates/hooks--query-fsmonitor.sample
>>> new file mode 100644
>>> index 0000000000..4bd22f21d8
>>> --- /dev/null
>>> +++ b/templates/hooks--query-fsmonitor.sample
>>> @@ -0,0 +1,27 @@
>>> +#!/bin/sh
>>> +#
>>> +# An example hook script to integrate Watchman
>>> +# (https://facebook.github.io/watchman/) with git to provide fast
>>> +# git status.
>>> +#
>>> +# The hook is passed a time_t formatted as a string and outputs to
>>> +# stdout all files that have been modified since the given time.
>>> +# Paths must be relative to the root of the working tree and
>>> +# separated by a single NUL.
>>> +#
>>> +# To enable this hook, rename this file to "query-fsmonitor"
>>> +
>>> +# Convert unix style paths to escaped Windows style paths
>>> +case "$(uname -s)" in
>>> +MINGW*|MSYS_NT*)
>>> +  GIT_WORK_TREE="$(cygpath -aw "$PWD" | sed 's,\\,\\\\,g')"
>>> +  ;;
>>> +*)
>>> +  GIT_WORK_TREE="$PWD"
>>> +  ;;
>>> +esac
>>> +
>>> +# Query Watchman for all the changes since the requested time
>>> +echo "[\"query\", \"$GIT_WORK_TREE\", {\"since\": $1, \"fields\":[\"name\"]}]" | \
>>> +watchman -j | \
>>> +perl -e 'use JSON::PP; my $o = JSON::PP->new->utf8->decode(join("", <>)); die "Watchman: $o->{'error'}.\nFalling back to scanning...\n" if defined($o->{"error"}); print(join("\0", @{$o->{"files"}}));'
>>
>> Maybe put the above small perl script on multiple lines for improved
>> readability.
>>

I'll pick up the suggestions from Ævar on the perl script.  I believe 
that fixes all the issues you have raised.

I've also tested the various error cases of watchman not installed and 
when watchman returns an error and they are all properly handled by 1) 
giving an error message and 2) falling back to the git codepath without 
fsmonitor so that the git command succeeds.

>> Is JSON::PP always available in Perl >= 5.8?
> 
> No, it's shipped with perl as of 5.14.0, which came out in May 2011.
> 
> I think depending on that is fine. FWIW we bumped the required core
> dependency (but you might still need to install modules) in 2010 in my
> d48b284183 ("perl: bump the required Perl version to 5.8 from
> 5.6.[21]", 2010-09-24). Maybe we should be bumping that again...
> 
>> What happens if watchman is not installed or not in the PATH?
>> It seems to me that the git process will not die, and will just work
>> as if the hook does not exist, except that it will call the hook which
>> will probably output error messages.
> 
> I think a good solution to this is to just add "set -euo pipefail" to
> that script.
> 
> Then we'll print out on stderr that we couldn't find the watchman
> command. Right now (with my patch) it'll make its way to the perl
> program and get empty input.
> 

With or without "set -euo pipefail" the output if watchman is not 
installed is:

.git/hooks/query-fsmonitor: line 37: watchman: command not found
Watchman: command returned no output.
Falling back to scanning...
On branch fsmonitor
Your branch is up-to-date with 'benpeart/fsmonitor'.


To try this out on Mac, you can install the package maintained by 
MacPorts by running "sudo port install watchman"

https://facebook.github.io/watchman/docs/install.html


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 0/6] Fast git status via a file system watcher
  2017-05-25 13:55   ` Ben Peart
@ 2017-05-27  6:57     ` Christian Couder
  2017-05-30 18:05       ` Ben Peart
  2017-05-31  7:59     ` Christian Couder
  1 sibling, 1 reply; 30+ messages in thread
From: Christian Couder @ 2017-05-27  6:57 UTC (permalink / raw)
  To: Ben Peart
  Cc: git, Junio C Hamano, Ben Peart, Nguyen Thai Ngoc Duy,
	Johannes Schindelin, David Turner, Jeff King

On Thu, May 25, 2017 at 3:55 PM, Ben Peart <peartben@gmail.com> wrote:
>
>
> On 5/24/2017 6:54 AM, Christian Couder wrote:
>>>
>>> Design
>>> ~~~~~~
>>>
>>> A new git hook (query-fsmonitor) must exist and be enabled
>>> (core.fsmonitor=true) that takes a time_t formatted as a string and
>>> outputs to stdout all files that have been modified since the requested
>>> time.
>>
>>
>> Is there a reason why there is a new hook, instead of a
>> "core.fsmonitorquery" config option to which you could pass whatever
>> command line with options?
>
>
> A hook is a simple and well defined way to integrate git with another
> process.  If there is some fixed set of arguments that need to be passed to
> a file system monitor (beyond the timestamp stored in the index extension),
> they can be encoded in the integration script like I've done in the Watchman
> integration sample hook.

Yeah, they could be encoded in the integration script, but it could be
better if it was possible to just configure a generic command line.

For example if the directory that should be watched for filesystem
changes could be passed as well as the time since the last changes,
perhaps only a generic command line would be need.

I am also wondering about sparse checkout, as we might want to pass
all the directories we are interested in.
How is it supposed to work with sparse checkout?

>>> A new 'fsmonitor' index extension has been added to store the time the
>>> fsmonitor hook was last queried and a ewah bitmap of the current
>>> 'fsmonitor-dirty' files. Unmarked entries are 'fsmonitor-clean', marked
>>> entries are 'fsmonitor-dirty.'
>>>
>>> As needed, git will call the query-fsmonitor hook proc for the set of
>>> changes since the index was last updated. Git then uses this set of
>>> files along with the list saved in the fsmonitor index extension to flag
>>> the potentially dirty index and untracked cache entries.
>>
>>
>> So this can work only if "core.untrackedCache" is set to true?
>>
>
> This works with core.untrackedCache set to true or false.  If it is set to
> false, you get valid results, you just don't get the speed up when checking
> for untracked files.

Great!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 0/6] Fast git status via a file system watcher
  2017-05-27  6:57     ` Christian Couder
@ 2017-05-30 18:05       ` Ben Peart
  2017-05-30 20:33         ` Christian Couder
  0 siblings, 1 reply; 30+ messages in thread
From: Ben Peart @ 2017-05-30 18:05 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Junio C Hamano, Ben Peart, Nguyen Thai Ngoc Duy,
	Johannes Schindelin, David Turner, Jeff King



On 5/27/2017 2:57 AM, Christian Couder wrote:
> On Thu, May 25, 2017 at 3:55 PM, Ben Peart <peartben@gmail.com> wrote:
>>
>>
>> On 5/24/2017 6:54 AM, Christian Couder wrote:
>>>>
>>>> Design
>>>> ~~~~~~
>>>>
>>>> A new git hook (query-fsmonitor) must exist and be enabled
>>>> (core.fsmonitor=true) that takes a time_t formatted as a string and
>>>> outputs to stdout all files that have been modified since the requested
>>>> time.
>>>
>>>
>>> Is there a reason why there is a new hook, instead of a
>>> "core.fsmonitorquery" config option to which you could pass whatever
>>> command line with options?
>>
>>
>> A hook is a simple and well defined way to integrate git with another
>> process.  If there is some fixed set of arguments that need to be passed to
>> a file system monitor (beyond the timestamp stored in the index extension),
>> they can be encoded in the integration script like I've done in the Watchman
>> integration sample hook.
> 
> Yeah, they could be encoded in the integration script, but it could be
> better if it was possible to just configure a generic command line.
> 
> For example if the directory that should be watched for filesystem
> changes could be passed as well as the time since the last changes,
> perhaps only a generic command line would be need.

Maybe I'm not understanding what you have in mind but I haven't found 
this to be the case in the two integrations I've done with file system 
watchers (one internal and Watchman).  They require you download, 
install, and configure them by telling them about the folders you want 
monitored.  Then you can start querying them for changes and processing 
the output to match what git expects.  While the download and install 
steps vary, having that query + process and return results wrapped up in 
an integration hook has worked well.

> 
> I am also wondering about sparse checkout, as we might want to pass
> all the directories we are interested in.
> How is it supposed to work with sparse checkout?
> 

The fsmonitor code works well with or without a sparse-checkout.  The 
file system monitor is unaware of the sparse checkout so will notify git 
about any change irrespective of whether git will eventually ignore it 
because the skip worktree bit is set.

>>>> A new 'fsmonitor' index extension has been added to store the time the
>>>> fsmonitor hook was last queried and a ewah bitmap of the current
>>>> 'fsmonitor-dirty' files. Unmarked entries are 'fsmonitor-clean', marked
>>>> entries are 'fsmonitor-dirty.'
>>>>
>>>> As needed, git will call the query-fsmonitor hook proc for the set of
>>>> changes since the index was last updated. Git then uses this set of
>>>> files along with the list saved in the fsmonitor index extension to flag
>>>> the potentially dirty index and untracked cache entries.
>>>
>>>
>>> So this can work only if "core.untrackedCache" is set to true?
>>>
>>
>> This works with core.untrackedCache set to true or false.  If it is set to
>> false, you get valid results, you just don't get the speed up when checking
>> for untracked files.
> 
> Great!
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 0/6] Fast git status via a file system watcher
  2017-05-30 18:05       ` Ben Peart
@ 2017-05-30 20:33         ` Christian Couder
  2017-05-30 23:11           ` Ben Peart
  0 siblings, 1 reply; 30+ messages in thread
From: Christian Couder @ 2017-05-30 20:33 UTC (permalink / raw)
  To: Ben Peart
  Cc: git, Junio C Hamano, Ben Peart, Nguyen Thai Ngoc Duy,
	Johannes Schindelin, David Turner, Jeff King

On Tue, May 30, 2017 at 8:05 PM, Ben Peart <peartben@gmail.com> wrote:
>
>
> On 5/27/2017 2:57 AM, Christian Couder wrote:
>>
>> On Thu, May 25, 2017 at 3:55 PM, Ben Peart <peartben@gmail.com> wrote:
>>>
>>>
>>>
>>> On 5/24/2017 6:54 AM, Christian Couder wrote:
>>>>>
>>>>>
>>>>> Design
>>>>> ~~~~~~
>>>>>
>>>>> A new git hook (query-fsmonitor) must exist and be enabled
>>>>> (core.fsmonitor=true) that takes a time_t formatted as a string and
>>>>> outputs to stdout all files that have been modified since the requested
>>>>> time.
>>>>
>>>>
>>>>
>>>> Is there a reason why there is a new hook, instead of a
>>>> "core.fsmonitorquery" config option to which you could pass whatever
>>>> command line with options?
>>>
>>>
>>>
>>> A hook is a simple and well defined way to integrate git with another
>>> process.  If there is some fixed set of arguments that need to be passed
>>> to
>>> a file system monitor (beyond the timestamp stored in the index
>>> extension),
>>> they can be encoded in the integration script like I've done in the
>>> Watchman
>>> integration sample hook.
>>
>>
>> Yeah, they could be encoded in the integration script, but it could be
>> better if it was possible to just configure a generic command line.
>>
>> For example if the directory that should be watched for filesystem
>> changes could be passed as well as the time since the last changes,
>> perhaps only a generic command line would be need.
>
>
> Maybe I'm not understanding what you have in mind but I haven't found this
> to be the case in the two integrations I've done with file system watchers
> (one internal and Watchman).  They require you download, install, and
> configure them by telling them about the folders you want monitored.  Then
> you can start querying them for changes and processing the output to match
> what git expects.  While the download and install steps vary, having that
> query + process and return results wrapped up in an integration hook has
> worked well.

It looks like one can also just ask watchman to monitor a directory with:

watchman watch /path/to/dir

or:

echo '["watch", "/path/to/dir"]' | watchman -j

Also for example on Linux people might want to use command line tools like:

https://linux.die.net/man/1/inotifywait

and you can pass the directories you want to be watched as arguments
to this kind of tools.

So it would be nice, if we didn't require the user to configure
anything and we could just configure the watching of what we need in
the hook (or a command configured using a config option). If the hook
(or configured command) could be passed the directory by git, it could
also be generic.

>> I am also wondering about sparse checkout, as we might want to pass
>> all the directories we are interested in.
>> How is it supposed to work with sparse checkout?
>
> The fsmonitor code works well with or without a sparse-checkout.  The file
> system monitor is unaware of the sparse checkout so will notify git about
> any change irrespective of whether git will eventually ignore it because the
> skip worktree bit is set.

I was wondering if it could ease the job for the monitoring service
and perhaps improve performance to just ask to watch the directories
we are interested in when using sparse checkout.
On Linux it looks like a separate inotify watch is created for every
subdirectory and there is maximum amount of inotify watches per user.
This can be increased by writing in
/proc/sys/fs/inotify/max_user_watches, but it is not nice to have to
ask admins to increase this.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 0/6] Fast git status via a file system watcher
  2017-05-30 20:33         ` Christian Couder
@ 2017-05-30 23:11           ` Ben Peart
  2017-05-31  7:37             ` Christian Couder
  0 siblings, 1 reply; 30+ messages in thread
From: Ben Peart @ 2017-05-30 23:11 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Junio C Hamano, Ben Peart, Nguyen Thai Ngoc Duy,
	Johannes Schindelin, David Turner, Jeff King



On 5/30/2017 4:33 PM, Christian Couder wrote:
> On Tue, May 30, 2017 at 8:05 PM, Ben Peart <peartben@gmail.com> wrote:
>>
>>
>> On 5/27/2017 2:57 AM, Christian Couder wrote:
>>>
>>> On Thu, May 25, 2017 at 3:55 PM, Ben Peart <peartben@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> On 5/24/2017 6:54 AM, Christian Couder wrote:
>>>>>>
>>>>>>
>>>>>> Design
>>>>>> ~~~~~~
>>>>>>
>>>>>> A new git hook (query-fsmonitor) must exist and be enabled
>>>>>> (core.fsmonitor=true) that takes a time_t formatted as a string and
>>>>>> outputs to stdout all files that have been modified since the requested
>>>>>> time.
>>>>>
>>>>>
>>>>>
>>>>> Is there a reason why there is a new hook, instead of a
>>>>> "core.fsmonitorquery" config option to which you could pass whatever
>>>>> command line with options?
>>>>
>>>>
>>>>
>>>> A hook is a simple and well defined way to integrate git with another
>>>> process.  If there is some fixed set of arguments that need to be passed
>>>> to
>>>> a file system monitor (beyond the timestamp stored in the index
>>>> extension),
>>>> they can be encoded in the integration script like I've done in the
>>>> Watchman
>>>> integration sample hook.
>>>
>>>
>>> Yeah, they could be encoded in the integration script, but it could be
>>> better if it was possible to just configure a generic command line.
>>>
>>> For example if the directory that should be watched for filesystem
>>> changes could be passed as well as the time since the last changes,
>>> perhaps only a generic command line would be need.
>>
>>
>> Maybe I'm not understanding what you have in mind but I haven't found this
>> to be the case in the two integrations I've done with file system watchers
>> (one internal and Watchman).  They require you download, install, and
>> configure them by telling them about the folders you want monitored.  Then
>> you can start querying them for changes and processing the output to match
>> what git expects.  While the download and install steps vary, having that
>> query + process and return results wrapped up in an integration hook has
>> worked well.
> 
> It looks like one can also just ask watchman to monitor a directory with:
> 
> watchman watch /path/to/dir
> 
> or:
> 
> echo '["watch", "/path/to/dir"]' | watchman -j
> 
> Also for example on Linux people might want to use command line tools like:
> 
> https://linux.die.net/man/1/inotifywait
> 
> and you can pass the directories you want to be watched as arguments
> to this kind of tools.
> 
> So it would be nice, if we didn't require the user to configure
> anything and we could just configure the watching of what we need in
> the hook (or a command configured using a config option). If the hook
> (or configured command) could be passed the directory by git, it could
> also be generic.
> 

OK, I think I understand what you're attempting to accomplish now. 
Often, Watchman (and other similar tools) are used to do much more than 
speed up git (in fact, _all_ use cases today are not used for that since 
this patch series hasn't been accepted yet :)).  They trigger builds, 
run verification tools, test passes, or other tasks.

I'm afraid that attempting to have the user configure git to configure 
the tool "automatically" is just adding an extra layer of complexity 
rather than making it simpler.  I'll leave that to a future patch series 
to work out.

>>> I am also wondering about sparse checkout, as we might want to pass
>>> all the directories we are interested in.
>>> How is it supposed to work with sparse checkout?
>>
>> The fsmonitor code works well with or without a sparse-checkout.  The file
>> system monitor is unaware of the sparse checkout so will notify git about
>> any change irrespective of whether git will eventually ignore it because the
>> skip worktree bit is set.
> 
> I was wondering if it could ease the job for the monitoring service
> and perhaps improve performance to just ask to watch the directories
> we are interested in when using sparse checkout.
> On Linux it looks like a separate inotify watch is created for every
> subdirectory and there is maximum amount of inotify watches per user.
> This can be increased by writing in
> /proc/sys/fs/inotify/max_user_watches, but it is not nice to have to
> ask admins to increase this.
> 

Having a single instance that watches the root of the working directory 
is the simplest model and minimizes use of system resources like inotify 
as there is only one needed per clone.

In addition, when the sparse-checkout file is modified, there is no need 
to try and automatically update the monitor by adding and removing 
folders as necessary.

Finally, if files or directories are excluded via sparse-checkout, they 
are removed from the working directory at checkout time so don't add any 
additional overhead to the file system watcher anyway as they clearly 
can't generate write events if they don't exist.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 0/6] Fast git status via a file system watcher
  2017-05-30 23:11           ` Ben Peart
@ 2017-05-31  7:37             ` Christian Couder
  0 siblings, 0 replies; 30+ messages in thread
From: Christian Couder @ 2017-05-31  7:37 UTC (permalink / raw)
  To: Ben Peart
  Cc: git, Junio C Hamano, Ben Peart, Nguyen Thai Ngoc Duy,
	Johannes Schindelin, David Turner, Jeff King

>>>> Yeah, they could be encoded in the integration script, but it could be
>>>> better if it was possible to just configure a generic command line.
>>>>
>>>> For example if the directory that should be watched for filesystem
>>>> changes could be passed as well as the time since the last changes,
>>>> perhaps only a generic command line would be need.
>>>
>>> Maybe I'm not understanding what you have in mind but I haven't found
>>> this
>>> to be the case in the two integrations I've done with file system
>>> watchers
>>> (one internal and Watchman).  They require you download, install, and
>>> configure them by telling them about the folders you want monitored.
>>> Then
>>> you can start querying them for changes and processing the output to
>>> match
>>> what git expects.  While the download and install steps vary, having that
>>> query + process and return results wrapped up in an integration hook has
>>> worked well.
>>
>> It looks like one can also just ask watchman to monitor a directory with:
>>
>> watchman watch /path/to/dir
>>
>> or:
>>
>> echo '["watch", "/path/to/dir"]' | watchman -j
>>
>> Also for example on Linux people might want to use command line tools
>> like:
>>
>> https://linux.die.net/man/1/inotifywait
>>
>> and you can pass the directories you want to be watched as arguments
>> to this kind of tools.
>>
>> So it would be nice, if we didn't require the user to configure
>> anything and we could just configure the watching of what we need in
>> the hook (or a command configured using a config option). If the hook
>> (or configured command) could be passed the directory by git, it could
>> also be generic.
>
> OK, I think I understand what you're attempting to accomplish now. Often,
> Watchman (and other similar tools) are used to do much more than speed up
> git (in fact, _all_ use cases today are not used for that since this patch
> series hasn't been accepted yet :)).  They trigger builds, run verification
> tools, test passes, or other tasks.

Yeah, but some people might only be interested in installing Watchman
or similar tools to speed up "git status".

> I'm afraid that attempting to have the user configure git to configure the
> tool "automatically" is just adding an extra layer of complexity rather than
> making it simpler.

I think that for the user it makes things simpler, as the user
wouldn't have to configure anything.

For example if the hook does something like the following :

# Check that watchman is already watching the worktree
if ! watchman watch-list | grep "\"$GIT_WORK_TREE\""
then
       # Ask watchman to watch the worktree
       watchman watch "$GIT_WORK_TREE"
       exit 1
fi

# Query Watchman for all the changes since the requested time
echo "[\"query\", \"$GIT_WORK_TREE\", {\"since\": $1,
\"fields\":[\"name\"]}]" | \
watchman -j | \
...

Then users might not need to configure Watchman in the first place,
and if they move their repo they might not need to reconfigure
Watchman.

> I'll leave that to a future patch series to work out.

Yeah, the above improvement can be done later.

>>>> I am also wondering about sparse checkout, as we might want to pass
>>>> all the directories we are interested in.
>>>> How is it supposed to work with sparse checkout?
>>>
>>>
>>> The fsmonitor code works well with or without a sparse-checkout.  The
>>> file
>>> system monitor is unaware of the sparse checkout so will notify git about
>>> any change irrespective of whether git will eventually ignore it because
>>> the
>>> skip worktree bit is set.
>>
>> I was wondering if it could ease the job for the monitoring service
>> and perhaps improve performance to just ask to watch the directories
>> we are interested in when using sparse checkout.
>> On Linux it looks like a separate inotify watch is created for every
>> subdirectory and there is maximum amount of inotify watches per user.
>> This can be increased by writing in
>> /proc/sys/fs/inotify/max_user_watches, but it is not nice to have to
>> ask admins to increase this.
>
> Having a single instance that watches the root of the working directory is
> the simplest model and minimizes use of system resources like inotify as
> there is only one needed per clone.

From https://linux.die.net/man/1/inotifywait:

-r, --recursive

Watch all subdirectories of any directories passed as arguments.
Watches will be set up recursively to an unlimited depth. Symbolic
links are not traversed. Newly created subdirectories will also be
watched.

Warning: If you use this option while watching the root directory of a
large tree, it may take quite a while until all inotify watches are
established, and events will not be received in this time. Also, since
one inotify watch will be established per subdirectory, it is possible
that the maximum amount of inotify watches per user will be reached.
The default maximum is 8192; it can be increased by writing to
/proc/sys/fs/inotify/max_user_watches.

> In addition, when the sparse-checkout file is modified, there is no need to
> try and automatically update the monitor by adding and removing folders as
> necessary.

Yeah, I agree it is simpler to not have to worry about the
sparse-checkout file being modified.

> Finally, if files or directories are excluded via sparse-checkout, they are
> removed from the working directory at checkout time so don't add any
> additional overhead to the file system watcher anyway as they clearly can't
> generate write events if they don't exist.

Yeah, but if you configure sparse-checkout when you already have
untracked files in many directories, like .o files, these files and
the directories they are in are not removed when you perform a
checkout, so the directories are still watched by the file system
watcher.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 0/6] Fast git status via a file system watcher
  2017-05-25 13:55   ` Ben Peart
  2017-05-27  6:57     ` Christian Couder
@ 2017-05-31  7:59     ` Christian Couder
  2017-05-31 13:37       ` Ben Peart
  1 sibling, 1 reply; 30+ messages in thread
From: Christian Couder @ 2017-05-31  7:59 UTC (permalink / raw)
  To: Ben Peart
  Cc: git, Junio C Hamano, Ben Peart, Nguyen Thai Ngoc Duy,
	Johannes Schindelin, David Turner, Jeff King

On Thu, May 25, 2017 at 3:55 PM, Ben Peart <peartben@gmail.com> wrote:
>
> On 5/24/2017 6:54 AM, Christian Couder wrote:
>>>
>>> A new git hook (query-fsmonitor) must exist and be enabled
>>> (core.fsmonitor=true) that takes a time_t formatted as a string and
>>> outputs to stdout all files that have been modified since the requested
>>> time.
>>
>> Is there a reason why there is a new hook, instead of a
>> "core.fsmonitorquery" config option to which you could pass whatever
>> command line with options?
>
> A hook is a simple and well defined way to integrate git with another
> process.  If there is some fixed set of arguments that need to be passed to
> a file system monitor (beyond the timestamp stored in the index extension),
> they can be encoded in the integration script like I've done in the Watchman
> integration sample hook.

Yeah, but a hook must also be called everytime git wants to
communicate with the file system monitor. And we could perhaps get a
speed up if we could have only one long running process to communicate
with the file system monitor.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 0/6] Fast git status via a file system watcher
  2017-05-31  7:59     ` Christian Couder
@ 2017-05-31 13:37       ` Ben Peart
  2017-05-31 14:10         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 30+ messages in thread
From: Ben Peart @ 2017-05-31 13:37 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Junio C Hamano, Ben Peart, Nguyen Thai Ngoc Duy,
	Johannes Schindelin, David Turner, Jeff King



On 5/31/2017 3:59 AM, Christian Couder wrote:
> On Thu, May 25, 2017 at 3:55 PM, Ben Peart <peartben@gmail.com> wrote:
>>
>> On 5/24/2017 6:54 AM, Christian Couder wrote:
>>>>
>>>> A new git hook (query-fsmonitor) must exist and be enabled
>>>> (core.fsmonitor=true) that takes a time_t formatted as a string and
>>>> outputs to stdout all files that have been modified since the requested
>>>> time.
>>>
>>> Is there a reason why there is a new hook, instead of a
>>> "core.fsmonitorquery" config option to which you could pass whatever
>>> command line with options?
>>
>> A hook is a simple and well defined way to integrate git with another
>> process.  If there is some fixed set of arguments that need to be passed to
>> a file system monitor (beyond the timestamp stored in the index extension),
>> they can be encoded in the integration script like I've done in the Watchman
>> integration sample hook.
> 
> Yeah, but a hook must also be called everytime git wants to
> communicate with the file system monitor. And we could perhaps get a
> speed up if we could have only one long running process to communicate
> with the file system monitor.
> 

In this particular case a long running background processes isn't 
helpful because refresh_by_fsmonitor() already has logic to ensure the 
file system monitor is only called once per git process.

The overhead of that one call isn't significant as demonstrated by the 
performance numbers I sent out in the cover letter.  Even with the cost 
of the hook and all the associated post-processing, this patch series 
still results in a nice performance win (even on Windows where spawning 
processes is typically more expensive than on other platforms).

I appreciate the close look at this patch series!  Even when I'm pushing 
back, I'm glad someone is asking questions and making sure I've thought 
through things.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 0/6] Fast git status via a file system watcher
  2017-05-31 13:37       ` Ben Peart
@ 2017-05-31 14:10         ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 30+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-05-31 14:10 UTC (permalink / raw)
  To: Ben Peart
  Cc: Christian Couder, git, Junio C Hamano, Ben Peart,
	Nguyen Thai Ngoc Duy, Johannes Schindelin, David Turner,
	Jeff King

On Wed, May 31, 2017 at 3:37 PM, Ben Peart <peartben@gmail.com> wrote:
> On 5/31/2017 3:59 AM, Christian Couder wrote:
>>
>> On Thu, May 25, 2017 at 3:55 PM, Ben Peart <peartben@gmail.com> wrote:
>>>
>>>
>>> On 5/24/2017 6:54 AM, Christian Couder wrote:
>>>>>
>>>>>
>>>>> A new git hook (query-fsmonitor) must exist and be enabled
>>>>> (core.fsmonitor=true) that takes a time_t formatted as a string and
>>>>> outputs to stdout all files that have been modified since the requested
>>>>> time.
>>>>
>>>>
>>>> Is there a reason why there is a new hook, instead of a
>>>> "core.fsmonitorquery" config option to which you could pass whatever
>>>> command line with options?
>>>
>>>
>>> A hook is a simple and well defined way to integrate git with another
>>> process.  If there is some fixed set of arguments that need to be passed
>>> to
>>> a file system monitor (beyond the timestamp stored in the index
>>> extension),
>>> they can be encoded in the integration script like I've done in the
>>> Watchman
>>> integration sample hook.
>>
>>
>> Yeah, but a hook must also be called everytime git wants to
>> communicate with the file system monitor. And we could perhaps get a
>> speed up if we could have only one long running process to communicate
>> with the file system monitor.
>>
>
> In this particular case a long running background processes isn't helpful
> because refresh_by_fsmonitor() already has logic to ensure the file system
> monitor is only called once per git process.
>
> The overhead of that one call isn't significant as demonstrated by the
> performance numbers I sent out in the cover letter.  Even with the cost of
> the hook and all the associated post-processing, this patch series still
> results in a nice performance win (even on Windows where spawning processes
> is typically more expensive than on other platforms).
>
> I appreciate the close look at this patch series!  Even when I'm pushing
> back, I'm glad someone is asking questions and making sure I've thought
> through things.

I think where this series is going for now makes perfect sense.
Looking forward to the next version, need to get watchman compiling
though.

I think what Christian may be getting at here (correct me if I'm
wrong) is that David had a series 1-2 years ago that went a bit beyond
what this is doing, i.e. it had a persistent index-helper daemon
talking to watchman, so "git status" would stream the index state over
from this daemon, and it would consult watchman.

That's an interesting *additional* optimization, but a huge change on
top of this, so it makes perfect sense to start with something simpler
as this series does, and my hunch from having tested/looked at both is
that just talking to watchman is going to be by far the biggest win.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2017-05-31 14:10 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-18 20:13 [PATCH v2 0/6] Fast git status via a file system watcher Ben Peart
2017-05-18 20:13 ` [PATCH v2 1/6] bswap: add 64 bit endianness helper get_be64 Ben Peart
2017-05-18 20:13 ` [PATCH v2 2/6] dir: make lookup_untracked() available outside of dir.c Ben Peart
2017-05-18 20:13 ` [PATCH v2 3/6] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files Ben Peart
2017-05-19 15:33   ` Ben Peart
2017-05-20 10:41     ` Junio C Hamano
2017-05-24 12:30   ` Christian Couder
2017-05-18 20:13 ` [PATCH v2 4/6] fsmonitor: add test cases for fsmonitor extension Ben Peart
2017-05-20 16:55   ` Torsten Bögershausen
2017-05-18 20:13 ` [PATCH v2 5/6] fsmonitor: add documentation for the " Ben Peart
2017-05-20 11:28   ` Junio C Hamano
2017-05-20 12:10   ` Ævar Arnfjörð Bjarmason
2017-05-22 16:18     ` Ben Peart
2017-05-22 17:28       ` Ævar Arnfjörð Bjarmason
2017-05-25 13:49         ` Ben Peart
2017-05-18 20:13 ` [PATCH v2 6/6] fsmonitor: add a sample query-fsmonitor hook script for Watchman Ben Peart
2017-05-24 13:12   ` Christian Couder
2017-05-26  9:47     ` Ævar Arnfjörð Bjarmason
2017-05-26 16:02       ` Ben Peart
2017-05-25 21:05   ` Ævar Arnfjörð Bjarmason
2017-05-24 10:54 ` [PATCH v2 0/6] Fast git status via a file system watcher Christian Couder
2017-05-25 13:55   ` Ben Peart
2017-05-27  6:57     ` Christian Couder
2017-05-30 18:05       ` Ben Peart
2017-05-30 20:33         ` Christian Couder
2017-05-30 23:11           ` Ben Peart
2017-05-31  7:37             ` Christian Couder
2017-05-31  7:59     ` Christian Couder
2017-05-31 13:37       ` Ben Peart
2017-05-31 14:10         ` Ævar Arnfjörð Bjarmason

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).