git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH v1 0/5] Fast git status via a file system watcher
@ 2017-05-15 19:13 Ben Peart
  2017-05-15 19:13 ` [PATCH v1 1/5] dir: make lookup_untracked() available outside of dir.c Ben Peart
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Ben Peart @ 2017-05-15 19:13 UTC (permalink / raw)
  To: git; +Cc: gitster, benpeart, pclouds, johannes.schindelin, David.Turner,
	peff

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 6117 bytes --]

Goal
~~~~
 
Today, git must check existing files to see if there have been changes
and scan the working directory looking for new, untracked files.  As the
number of files and folders in the working directory increases, the time
to perform these checks can become very expensive O(# files in working
directory).

Given the number of new or modified files is typically a very small
percentage of the total number of files, it would be much more
performant if git only had to check files and folders that potentially
had changes. This reduces the cost to O(# modified files).

This patch series makes it possible to optionally add a hook process
that can return the set of files that may have been changed since the
requested time.  Git can then use this to limit its scan to only those
files and folders that potentially have changes.

Design
~~~~~~

A new git hook (query-fsmonitor) must exist and be enabled 
(core.fsmonitor=true) that takes a time_t formatted as a string and
outputs to stdout all files that have been modified since the requested
time.

A new 'fsmonitor' index extension has been added to store the time the
fsmonitor hook was last queried and a ewah bitmap of the current
'fsmonitor-dirty' files. Unmarked entries are 'fsmonitor-clean', marked
entries are 'fsmonitor-dirty.'

As needed, git will call the query-fsmonitor hook proc for the set of
changes since the index was last updated. Git then uses this set of
files along with the list saved in the fsmonitor index extension to flag
the potentially dirty index and untracked cache entries.  

refresh_index() and valid_cached_dir() are updated so that any entry not
flagged as potentially dirty is not checked as it cannot have any
changes. This saves all the work of checking files and folders for
changes that are already known to be clean.

If git finds out some entries are 'fsmonitor-dirty', but are really
unchanged (e.g. the file was changed, then reverted back), then Git will
clear the marking in the extension. If git adds or updates an index
entry, it is marked 'fsmonitor-dirty' to ensure it is checked for
changes.

The code is conservative so in case of any error (missing index
extension, error from hook, etc) it falls back to normal logic of
checking everything.

A sample hook is provided in query-fsmonitor.sample to integrate with
the cross platform Watchman file watching service
https://facebook.github.io/watchman/


Performance
~~~~~~~~~~~

The performance wins of this model are pretty dramatic. Each test was
run 3 times and averaged.  "Files" is the number of files in the working
directory.  Tests were done with a cold file system cache as well as
with a warm file system cache on a HDD.  SSD speeds were typically about
10x faster than the HDD.  Typical real world results would fall
somewhere between these extremes. 

*--------------------------------------------------------*
| Repo on HDD | Cache | fsmonitor=false | fsmonitor=true |
*--------------------------------------------------------*
| 3K Files    | Cold  |           0.77s |          0.55s |
+--------------------------------------------------------+
| 100K Files  | Cold  |          38.76s |          2.17s |
+--------------------------------------------------------+
| 3M Files    | Cold  |         421.55s |         18.57s |
+--------------------------------------------------------+
| 3K Files    | Warm  |           0.05s |          0.24s |
+--------------------------------------------------------+
| 100K Files  | Warm  |           1.13s |          0.40s |
+--------------------------------------------------------+
| 3M Files    | Warm  |          59.33s |          4.19s |
+--------------------------------------------------------+

Note that with the smallest repo, warm times actually increase slightly
as the overhead of calling the hook, watchman and perl outweighs the
savings of not scanning the working directory.


Open Issues
~~~~~~~~~~~

The index extension currently has a 32 bit version number, a 64 bit time
and a 32 bit bitmap size.  Do I need to quad-align the version and
bitmap size in the index extension or can all supported platforms handle
dereferencing memory that isn't quad aligned?


Credits
~~~~~~~

Idea taken and code refactored from 
http://public-inbox.org/git/1466914464-10358-1-git-send-email-novalis@novalis.org/

Current version as a fork of GFW on GitHub here: 
https://github.com/benpeart/git-for-windows/tree/fsmonitor


Ben Peart (5):
  dir: make lookup_untracked() available outside of dir.c
  Teach git to optionally utilize a file system monitor to speed up
    detecting new or changed files.
  fsmonitor: add test cases for fsmonitor extension
  Add documentation for the fsmonitor extension.  This includes the
    core.fsmonitor setting, the query-fsmonitor hook, and the fsmonitor
    index extension.
  Add a sample query-fsmonitor hook script that integrates with the
    cross platform Watchman file watching service.

 Documentation/config.txt                 |   7 +
 Documentation/githooks.txt               |  23 +++
 Documentation/technical/index-format.txt |  18 +++
 Makefile                                 |   1 +
 builtin/update-index.c                   |   1 +
 cache.h                                  |   5 +
 config.c                                 |   5 +
 dir.c                                    |  15 +-
 dir.h                                    |   5 +
 entry.c                                  |   1 +
 environment.c                            |   1 +
 fsmonitor.c                              | 233 +++++++++++++++++++++++++++++++
 fsmonitor.h                              |   9 ++
 read-cache.c                             |  28 +++-
 t/t7519-status-fsmonitor.sh              | 134 ++++++++++++++++++
 templates/hooks--query-fsmonitor.sample  |  27 ++++
 unpack-trees.c                           |   1 +
 17 files changed, 511 insertions(+), 3 deletions(-)
 create mode 100644 fsmonitor.c
 create mode 100644 fsmonitor.h
 create mode 100644 t/t7519-status-fsmonitor.sh
 create mode 100644 templates/hooks--query-fsmonitor.sample

-- 
2.13.0.windows.1.6.g4597375fc3


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v1 1/5] dir: make lookup_untracked() available outside of dir.c
  2017-05-15 19:13 [PATCH v1 0/5] Fast git status via a file system watcher Ben Peart
@ 2017-05-15 19:13 ` Ben Peart
  2017-05-16  5:01   ` Junio C Hamano
  2017-05-15 19:13 ` [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files Ben Peart
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Ben Peart @ 2017-05-15 19:13 UTC (permalink / raw)
  To: git; +Cc: gitster, benpeart, pclouds, johannes.schindelin, David.Turner,
	peff

Remove the static qualifier from lookup_untracked() and make it
available to other modules by exporting it from dir.h.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 dir.c | 2 +-
 dir.h | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index f451bfa48c..1b5558fdf9 100644
--- a/dir.c
+++ b/dir.c
@@ -660,7 +660,7 @@ static void trim_trailing_spaces(char *buf)
  *
  * If "name" has the trailing slash, it'll be excluded in the search.
  */
-static struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
+struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
 						    struct untracked_cache_dir *dir,
 						    const char *name, int len)
 {
diff --git a/dir.h b/dir.h
index bf23a470af..9e387551bd 100644
--- a/dir.h
+++ b/dir.h
@@ -339,4 +339,7 @@ extern void connect_work_tree_and_git_dir(const char *work_tree, const char *git
 extern void relocate_gitdir(const char *path,
 			    const char *old_git_dir,
 			    const char *new_git_dir);
+struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
+					     struct untracked_cache_dir *dir,
+					     const char *name, int len);
 #endif
-- 
2.13.0.windows.1.6.g4597375fc3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
  2017-05-15 19:13 [PATCH v1 0/5] Fast git status via a file system watcher Ben Peart
  2017-05-15 19:13 ` [PATCH v1 1/5] dir: make lookup_untracked() available outside of dir.c Ben Peart
@ 2017-05-15 19:13 ` Ben Peart
  2017-05-15 21:21   ` David Turner
                     ` (2 more replies)
  2017-05-15 19:13 ` [PATCH v1 3/5] fsmonitor: add test cases for fsmonitor extension Ben Peart
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 26+ messages in thread
From: Ben Peart @ 2017-05-15 19:13 UTC (permalink / raw)
  To: git; +Cc: gitster, benpeart, pclouds, johannes.schindelin, David.Turner,
	peff

When the index is read from disk, the query-fsmonitor index extension is
used to flag the last known potentially dirty index and untracked cach
entries.

If git finds out some entries are 'fsmonitor-dirty', but are really
unchanged (e.g. the file was changed, then reverted back), then Git will
clear the marking in the extension. If git adds or updates an index
entry, it is marked 'fsmonitor-dirty' to ensure it is checked for
changes in the working directory.

Before the 'fsmonitor-dirty' flags are used to limit the scope of the
files to be checked, the query-fsmonitor hook proc is called with the
time the index was last updated.  The hook proc returns the list of
files changed since that last updated time and the list of
potentially dirty entries is updated to reflect the current state.

refresh_index() and valid_cached_dir() are updated so that any entry not
flagged as potentially dirty is not checked as it cannot have any
changes.

Signed-off-by: Ben Peart <benpeart@microsoft.com>

---
 Makefile               |   1 +
 builtin/update-index.c |   1 +
 cache.h                |   5 ++
 config.c               |   5 ++
 dir.c                  |  13 +++
 dir.h                  |   2 +
 entry.c                |   1 +
 environment.c          |   1 +
 fsmonitor.c            | 233 +++++++++++++++++++++++++++++++++++++++++++++++++
 fsmonitor.h            |   9 ++
 read-cache.c           |  28 +++++-
 unpack-trees.c         |   1 +
 12 files changed, 298 insertions(+), 2 deletions(-)
 create mode 100644 fsmonitor.c
 create mode 100644 fsmonitor.h

diff --git a/Makefile b/Makefile
index 94cce645a5..89acff1f46 100644
--- a/Makefile
+++ b/Makefile
@@ -761,6 +761,7 @@ LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec_cmd.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
+LIB_OBJS += fsmonitor.o
 LIB_OBJS += gettext.o
 LIB_OBJS += gpg-interface.o
 LIB_OBJS += graph.o
diff --git a/builtin/update-index.c b/builtin/update-index.c
index ebfc09faa0..32fd977b43 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -232,6 +232,7 @@ static int mark_ce_flags(const char *path, int flag, int mark)
 		else
 			active_cache[pos]->ce_flags &= ~flag;
 		active_cache[pos]->ce_flags |= CE_UPDATE_IN_BASE;
+		active_cache[pos]->ce_flags |= CE_FSMONITOR_DIRTY;
 		cache_tree_invalidate_path(&the_index, path);
 		active_cache_changed |= CE_ENTRY_CHANGED;
 		return 0;
diff --git a/cache.h b/cache.h
index 40ec032a2d..64aa6e57cd 100644
--- a/cache.h
+++ b/cache.h
@@ -201,6 +201,7 @@ struct cache_entry {
 #define CE_ADDED             (1 << 19)
 
 #define CE_HASHED            (1 << 20)
+#define CE_FSMONITOR_DIRTY   (1 << 21)
 #define CE_WT_REMOVE         (1 << 22) /* remove in work directory */
 #define CE_CONFLICTED        (1 << 23)
 
@@ -324,6 +325,7 @@ static inline unsigned int canon_mode(unsigned int mode)
 #define CACHE_TREE_CHANGED	(1 << 5)
 #define SPLIT_INDEX_ORDERED	(1 << 6)
 #define UNTRACKED_CHANGED	(1 << 7)
+#define FSMONITOR_CHANGED	(1 << 8)
 
 struct split_index;
 struct untracked_cache;
@@ -342,6 +344,8 @@ struct index_state {
 	struct hashmap dir_hash;
 	unsigned char sha1[20];
 	struct untracked_cache *untracked;
+	time_t last_update;
+	struct ewah_bitmap *bitmap;
 };
 
 extern struct index_state the_index;
@@ -767,6 +771,7 @@ extern int precomposed_unicode;
 extern int protect_hfs;
 extern int protect_ntfs;
 extern int git_db_env, git_index_env, git_graft_env, git_common_dir_env;
+extern int core_fsmonitor;
 
 /*
  * Include broken refs in all ref iterations, which will
diff --git a/config.c b/config.c
index d971cc3474..d146c88399 100644
--- a/config.c
+++ b/config.c
@@ -1224,6 +1224,11 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.fsmonitor")) {
+		core_fsmonitor = git_config_bool(var, value);
+		return 0;
+	}
+
 	/* Add other config variables here and to Documentation/config.txt. */
 	return platform_core_config(var, value);
 }
diff --git a/dir.c b/dir.c
index 1b5558fdf9..da428489e2 100644
--- a/dir.c
+++ b/dir.c
@@ -1652,6 +1652,18 @@ static int valid_cached_dir(struct dir_struct *dir,
 	if (!untracked)
 		return 0;
 
+	refresh_by_fsmonitor(&the_index);
+	if (dir->untracked->use_fsmonitor) {
+		/*
+		 * With fsmonitor, we can trust the untracked cache's
+		 * valid field.
+		 */
+		if (untracked->valid)
+			goto skip_stat;
+		else
+			invalidate_directory(dir->untracked, untracked);
+	}
+
 	if (stat(path->len ? path->buf : ".", &st)) {
 		invalidate_directory(dir->untracked, untracked);
 		memset(&untracked->stat_data, 0, sizeof(untracked->stat_data));
@@ -1665,6 +1677,7 @@ static int valid_cached_dir(struct dir_struct *dir,
 		return 0;
 	}
 
+skip_stat:
 	if (untracked->check_only != !!check_only) {
 		invalidate_directory(dir->untracked, untracked);
 		return 0;
diff --git a/dir.h b/dir.h
index 9e387551bd..ff6a00abcc 100644
--- a/dir.h
+++ b/dir.h
@@ -139,6 +139,8 @@ struct untracked_cache {
 	int gitignore_invalidated;
 	int dir_invalidated;
 	int dir_opened;
+	/* fsmonitor invalidation data */
+	unsigned int use_fsmonitor : 1;
 };
 
 struct dir_struct {
diff --git a/entry.c b/entry.c
index d2b512da90..c2d3c1079c 100644
--- a/entry.c
+++ b/entry.c
@@ -221,6 +221,7 @@ static int write_entry(struct cache_entry *ce,
 			lstat(ce->name, &st);
 		fill_stat_cache_info(ce, &st);
 		ce->ce_flags |= CE_UPDATE_IN_BASE;
+		ce->ce_flags |= CE_FSMONITOR_DIRTY;
 		state->istate->cache_changed |= CE_ENTRY_CHANGED;
 	}
 	return 0;
diff --git a/environment.c b/environment.c
index c57731c468..3758e76159 100644
--- a/environment.c
+++ b/environment.c
@@ -63,6 +63,7 @@ int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
 unsigned long pack_size_limit_cfg;
 enum log_refs_config log_all_ref_updates = LOG_REFS_UNSET;
+int core_fsmonitor;
 
 #ifndef PROTECT_HFS_DEFAULT
 #define PROTECT_HFS_DEFAULT 0
diff --git a/fsmonitor.c b/fsmonitor.c
new file mode 100644
index 0000000000..466a0df6a1
--- /dev/null
+++ b/fsmonitor.c
@@ -0,0 +1,233 @@
+#include "cache.h"
+#include "dir.h"
+#include "ewah/ewok.h"
+#include "run-command.h"
+#include "strbuf.h"
+#include "fsmonitor.h"
+
+static struct untracked_cache_dir *find_untracked_cache_dir(
+	struct untracked_cache *uc, struct untracked_cache_dir *ucd,
+	const char *name)
+{
+	const char *end;
+	struct untracked_cache_dir *dir = ucd;
+
+	if (!*name)
+		return dir;
+
+	end = strchr(name, '/');
+	if (end) {
+		dir = lookup_untracked(uc, ucd, name, end - name);
+		if (dir)
+			return find_untracked_cache_dir(uc, dir, end + 1);
+	}
+
+	return dir;
+}
+
+/* This function will be passed to ewah_each_bit() */
+static void mark_no_fsmonitor(size_t pos, void *is)
+{
+	struct index_state *istate = is;
+	struct untracked_cache_dir *dir;
+	struct cache_entry *ce = istate->cache[pos];
+
+	assert(pos < istate->cache_nr);
+	ce->ce_flags |= CE_FSMONITOR_DIRTY;
+
+	if (!istate->untracked || !istate->untracked->root)
+		return;
+
+	dir = find_untracked_cache_dir(istate->untracked, istate->untracked->root, ce->name);
+	if (dir)
+		dir->valid = 0;
+}
+
+int read_fsmonitor_extension(struct index_state *istate, const void *data,
+	unsigned long sz)
+{
+	const char *index = data;
+	uint32_t hdr_version;
+	uint32_t ewah_size;
+	int ret;
+
+	if (sz < sizeof(uint32_t) + sizeof(uint64_t) + sizeof(uint32_t))
+		return error("corrupt fsmonitor extension (too short)");
+
+	hdr_version = ntohl(*(uint32_t *)index);
+	index += sizeof(uint32_t);
+	if (hdr_version != 1)
+		return error("bad fsmonitor version %d", hdr_version);
+
+	istate->last_update = (time_t)ntohll(*(uint64_t *)index);
+	index += sizeof(uint64_t);
+
+	ewah_size = ntohl(*(uint32_t *)index);
+	index += sizeof(uint32_t);
+
+	istate->bitmap = ewah_new();
+	ret = ewah_read_mmap(istate->bitmap, index, ewah_size);
+	if (ret != ewah_size) {
+		ewah_free(istate->bitmap);
+		istate->bitmap = NULL;
+		return error("failed to parse ewah bitmap reading fsmonitor index extension");
+	}
+
+	return 0;
+}
+
+void write_fsmonitor_extension(struct strbuf *sb, struct index_state* istate)
+{
+	uint32_t hdr_version;
+	uint64_t tm;
+	struct ewah_bitmap *bitmap;
+	int i;
+	uint32_t ewah_start;
+	uint32_t ewah_size = 0;
+	int fixup = 0;
+
+	hdr_version = htonl(1);
+	strbuf_add(sb, &hdr_version, sizeof(uint32_t));
+
+	tm = htonll((uint64_t)istate->last_update);
+	strbuf_add(sb, &tm, sizeof(uint64_t));
+	fixup = sb->len;
+	strbuf_add(sb, &ewah_size, sizeof(uint32_t)); /* we'll fix this up later */
+
+	ewah_start = sb->len;
+	bitmap = ewah_new();
+	for (i = 0; i < istate->cache_nr; i++)
+		if (istate->cache[i]->ce_flags & CE_FSMONITOR_DIRTY)
+			ewah_set(bitmap, i);
+	ewah_serialize_strbuf(bitmap, sb);
+	ewah_free(bitmap);
+
+	/* fix up size field */
+	ewah_size = htonl(sb->len - ewah_start);
+	memcpy(sb->buf + fixup, &ewah_size, sizeof(uint32_t));
+}
+
+static int update_istate(const char *name, void *is)
+{
+	struct index_state *istate = is;
+	struct untracked_cache_dir *dir;
+	int pos;
+
+	/* find it in the index and mark that entry as dirty */
+	pos = index_name_pos(istate, name, strlen(name));
+	if (pos >= 0)
+		istate->cache[pos]->ce_flags |= CE_FSMONITOR_DIRTY;
+
+	/*
+	 * Find the corresponding directory in the untracked cache
+	 * and mark it as invalid
+	 */
+	if (!istate->untracked || !istate->untracked->root)
+		return 0;
+
+	dir = find_untracked_cache_dir(istate->untracked, istate->untracked->root, name);
+	if (dir)
+		dir->valid = 0;
+
+	return 0;
+}
+
+/*
+ * Call the query-fsmonitor hook passing the time of the last saved results.
+ */
+static int query_fsmonitor(time_t last_update, struct strbuf *buffer)
+{
+	struct child_process cp = CHILD_PROCESS_INIT;
+	char date[64];
+	const char *argv[3];
+
+	if (!(argv[0] = find_hook("query-fsmonitor")))
+		return -1;
+
+	snprintf(date, sizeof(date), "%" PRIuMAX, (uintmax_t)last_update);
+	argv[1] = date;
+	argv[2] = NULL;
+	cp.argv = argv;
+	cp.out = -1;
+
+	return capture_command(&cp, buffer, 1024);
+}
+
+void process_fsmonitor_extension(struct index_state *istate)
+{
+	if (!istate->bitmap)
+		return;
+
+	ewah_each_bit(istate->bitmap, mark_no_fsmonitor, istate);
+	ewah_free(istate->bitmap);
+	istate->bitmap = NULL;
+}
+
+void refresh_by_fsmonitor(struct index_state *istate)
+{
+	static has_run_once = FALSE;
+	struct strbuf buffer = STRBUF_INIT;
+	int query_success = 0;
+	size_t bol = 0; /* beginning of line */
+	time_t last_update;
+	char *buf, *entry;
+	int i;
+
+	if (!core_fsmonitor || has_run_once)
+		return;
+	has_run_once = TRUE;
+
+	/*
+	 * This could be racy so save the date/time now and the hook
+	 * should be inclusive to ensure we don't miss potential changes.
+	 */
+	last_update = time(NULL);
+
+	/* If we have a last update time, call query-monitor for the set of changes since that time */
+	if (istate->last_update) {
+		query_success = !query_fsmonitor(istate->last_update, &buffer);
+	}
+
+	if (query_success) {
+		/* Mark all entries returned by the monitor as dirty */
+		buf = entry = buffer.buf;
+		for (i = 0; i < buffer.len; i++) {
+			if (buf[i] != '\0')
+				continue;
+			update_istate(buf + bol, istate);
+			bol = i + 1;
+		}
+		if (bol < buffer.len)
+			update_istate(buf + bol, istate);
+
+		/* Mark all clean entries up-to-date */
+		for (i = 0; i < istate->cache_nr; i++) {
+			struct cache_entry *ce = istate->cache[i];
+			if (ce_stage(ce) || (ce->ce_flags & CE_FSMONITOR_DIRTY))
+				continue;
+			ce_mark_uptodate(ce);
+		}
+
+		/*
+		 * Now that we've marked the invalid entries in the
+		 * untracked-cache itself, we can mark the untracked cache for
+		 * fsmonitor usage.
+		 */
+		if (istate->untracked) {
+			istate->untracked->use_fsmonitor = 1;
+		}
+	}
+	else {
+		/* if we can't update the cache, fall back to checking them all */
+		for (i = 0; i < istate->cache_nr; i++)
+			istate->cache[i]->ce_flags |= CE_FSMONITOR_DIRTY;
+
+		/* mark the untracked cache as unusable for fsmonitor */
+		if (istate->untracked)
+			istate->untracked->use_fsmonitor = 0;
+	}
+
+	/* Now that we've updated istate, save the last_update time */
+	istate->last_update = last_update;
+	istate->cache_changed |= FSMONITOR_CHANGED;
+}
diff --git a/fsmonitor.h b/fsmonitor.h
new file mode 100644
index 0000000000..28e61602b1
--- /dev/null
+++ b/fsmonitor.h
@@ -0,0 +1,9 @@
+#ifndef FSMONITOR_H
+#define FSMONITOR_H
+
+int read_fsmonitor_extension(struct index_state *istate, const void *data, unsigned long sz);
+void write_fsmonitor_extension(struct strbuf *sb, struct index_state* istate);
+void process_fsmonitor_extension(struct index_state *istate);
+void refresh_by_fsmonitor(struct index_state *istate);
+
+#endif
diff --git a/read-cache.c b/read-cache.c
index dce05c1dde..a081c4a1d6 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -18,6 +18,7 @@
 #include "varint.h"
 #include "split-index.h"
 #include "utf8.h"
+#include "fsmonitor.h"
 
 #ifndef NO_PTHREADS
 #include <pthread.h>
@@ -41,11 +42,12 @@
 #define CACHE_EXT_RESOLVE_UNDO 0x52455543 /* "REUC" */
 #define CACHE_EXT_LINK 0x6c696e6b	  /* "link" */
 #define CACHE_EXT_UNTRACKED 0x554E5452	  /* "UNTR" */
+#define CACHE_EXT_FSMONITOR 0x46534D4E	  /* "FSMN" */
 
 /* changes that can be kept in $GIT_DIR/index (basically all extensions) */
 #define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \
 		 CE_ENTRY_ADDED | CE_ENTRY_REMOVED | CE_ENTRY_CHANGED | \
-		 SPLIT_INDEX_ORDERED | UNTRACKED_CHANGED)
+		 SPLIT_INDEX_ORDERED | UNTRACKED_CHANGED | FSMONITOR_CHANGED)
 
 struct index_state the_index;
 static const char *alternate_index_output;
@@ -65,6 +67,7 @@ static void replace_index_entry(struct index_state *istate, int nr, struct cache
 	free(old);
 	set_index_entry(istate, nr, ce);
 	ce->ce_flags |= CE_UPDATE_IN_BASE;
+	ce->ce_flags |= CE_FSMONITOR_DIRTY;
 	istate->cache_changed |= CE_ENTRY_CHANGED;
 }
 
@@ -781,6 +784,7 @@ int chmod_index_entry(struct index_state *istate, struct cache_entry *ce,
 	}
 	cache_tree_invalidate_path(istate, ce->name);
 	ce->ce_flags |= CE_UPDATE_IN_BASE;
+	ce->ce_flags |= CE_FSMONITOR_DIRTY;
 	istate->cache_changed |= CE_ENTRY_CHANGED;
 
 	return 0;
@@ -1348,6 +1352,8 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	const char *added_fmt;
 	const char *unmerged_fmt;
 
+	refresh_by_fsmonitor(istate);
+
 	modified_fmt = (in_porcelain ? "M\t%s\n" : "%s: needs update\n");
 	deleted_fmt = (in_porcelain ? "D\t%s\n" : "%s: needs update\n");
 	typechange_fmt = (in_porcelain ? "T\t%s\n" : "%s needs update\n");
@@ -1384,8 +1390,11 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 			continue;
 
 		new = refresh_cache_ent(istate, ce, options, &cache_errno, &changed);
-		if (new == ce)
+		if (new == ce) {
+			ce->ce_flags &= ~CE_FSMONITOR_DIRTY;
 			continue;
+		}
+
 		if (!new) {
 			const char *fmt;
 
@@ -1395,6 +1404,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 				 */
 				ce->ce_flags &= ~CE_VALID;
 				ce->ce_flags |= CE_UPDATE_IN_BASE;
+				ce->ce_flags |= CE_FSMONITOR_DIRTY;
 				istate->cache_changed |= CE_ENTRY_CHANGED;
 			}
 			if (quiet)
@@ -1581,6 +1591,9 @@ static int read_index_extension(struct index_state *istate,
 	case CACHE_EXT_UNTRACKED:
 		istate->untracked = read_untracked_extension(data, sz);
 		break;
+	case CACHE_EXT_FSMONITOR:
+		read_fsmonitor_extension(istate, data, sz);
+		break;
 	default:
 		if (*ext < 'A' || 'Z' < *ext)
 			return error("index uses %.4s extension, which we do not understand",
@@ -1753,6 +1766,7 @@ static void post_read_index_from(struct index_state *istate)
 	check_ce_order(istate);
 	tweak_untracked_cache(istate);
 	tweak_split_index(istate);
+	process_fsmonitor_extension(istate);
 }
 
 /* remember to discard_cache() before reading a different cache! */
@@ -2358,6 +2372,16 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 		if (err)
 			return -1;
 	}
+	if (!strip_extensions && istate->last_update) {
+		struct strbuf sb = STRBUF_INIT;
+
+		write_fsmonitor_extension(&sb, istate);
+		err = write_index_ext_header(&c, newfd, CACHE_EXT_FSMONITOR, sb.len) < 0
+			|| ce_write(&c, newfd, sb.buf, sb.len) < 0;
+		strbuf_release(&sb);
+		if (err)
+			return -1;
+	}
 
 	if (ce_flush(&c, newfd, istate->sha1))
 		return -1;
diff --git a/unpack-trees.c b/unpack-trees.c
index aa15111fef..259e6960b9 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -412,6 +412,7 @@ static int apply_sparse_checkout(struct index_state *istate,
 		ce->ce_flags &= ~CE_SKIP_WORKTREE;
 	if (was_skip_worktree != ce_skip_worktree(ce)) {
 		ce->ce_flags |= CE_UPDATE_IN_BASE;
+		ce->ce_flags |= CE_FSMONITOR_DIRTY;
 		istate->cache_changed |= CE_ENTRY_CHANGED;
 	}
 
-- 
2.13.0.windows.1.6.g4597375fc3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v1 3/5] fsmonitor: add test cases for fsmonitor extension
  2017-05-15 19:13 [PATCH v1 0/5] Fast git status via a file system watcher Ben Peart
  2017-05-15 19:13 ` [PATCH v1 1/5] dir: make lookup_untracked() available outside of dir.c Ben Peart
  2017-05-15 19:13 ` [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files Ben Peart
@ 2017-05-15 19:13 ` Ben Peart
  2017-05-16  4:59   ` Junio C Hamano
  2017-05-15 19:13 ` [PATCH v1 4/5] Add documentation for the fsmonitor extension. This includes the core.fsmonitor setting, the query-fsmonitor hook, and the fsmonitor index extension Ben Peart
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Ben Peart @ 2017-05-15 19:13 UTC (permalink / raw)
  To: git; +Cc: gitster, benpeart, pclouds, johannes.schindelin, David.Turner,
	peff

Add test cases that ensure status results are correct when using the new
fsmonitor extension.  Test untracked, modified, and new files by
ensuring the results are identical to when not using the extension.

Add a test to ensure updates to the index properly mark corresponding
entries in the index extension as dirty so that the status is correct
after commands that modify the index but don't trigger changes in the
working directory.

Add a test that verifies that if the fsmonitor extension doesn't tell
git about a change, it doesn't discover it on its own.  This ensures
git is honoring the extension and that we get the performance benefits
desired.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 t/t7519-status-fsmonitor.sh | 134 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 134 insertions(+)
 create mode 100644 t/t7519-status-fsmonitor.sh

diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
new file mode 100644
index 0000000000..2d63efc27b
--- /dev/null
+++ b/t/t7519-status-fsmonitor.sh
@@ -0,0 +1,134 @@
+#!/bin/sh
+
+test_description='git status with file system watcher'
+
+. ./test-lib.sh
+
+clean_repo () {
+	git reset --hard HEAD
+	git clean -fd
+}
+
+dirty_repo () {
+	: >untracked
+	: >dir1/untracked
+	: >dir2/untracked
+	echo 1 >modified
+	echo 2 >dir1/modified
+	echo 3 >dir2/modified
+	echo 4 >new
+	echo 5 >dir1/new
+	echo 6 >dir2/new
+	git add new
+	git add dir1/new
+	git add dir2/new
+}
+
+test_expect_success 'setup' '
+	mkdir -p .git/hooks &&
+	write_script .git/hooks/query-fsmonitor<<-\EOF &&
+	printf "untracked\0"
+	printf "dir1/untracked\0"
+	printf "dir2/untracked\0"
+	printf "modified\0"
+	printf "dir1/modified\0"
+	printf "dir2/modified\0"
+	printf "new\0""
+	printf "dir1/new\0"
+	printf "dir2/new\0"
+	EOF
+	: >tracked &&
+	: >modified &&
+	mkdir dir1 &&
+	: >dir1/tracked &&
+	: >dir1/modified &&
+	mkdir dir2 &&
+	: >dir2/tracked &&
+	: >dir2/modified &&
+	git add . &&
+	test_tick &&
+	git commit -m initial &&
+	dirty_repo
+'
+
+cat >.gitignore <<\EOF
+.gitignore
+expect*
+output*
+EOF
+
+# Status is well tested elsewhere so we'll just ensure that the results are
+# the same when using core.fsmonitor. First call after turning on the option
+# does a complete scan so need to do two calls to ensure we test the new
+# codepath.
+
+test_expect_success 'status with core.untrackedcache true' '
+	git config core.fsmonitor true  &&
+	git config core.untrackedcache true &&
+	git -c core.fsmonitor=false -c core.untrackedcache=true status >expect &&
+	clean_repo &&
+	git status &&
+	dirty_repo &&
+	git status >output &&
+	test_i18ncmp expect output
+'
+
+test_expect_success 'status with core.untrackedcache false' '
+	git config core.fsmonitor true &&
+	git config core.untrackedcache false &&
+	git -c core.fsmonitor=false -c core.untrackedcache=false status >expect &&
+	clean_repo &&
+	git status &&
+	dirty_repo &&
+	git status >output &&
+	test_i18ncmp expect output
+'
+
+# Ensure commands that call refresh_index() to move the index back in time
+# properly invalidate the fsmonitor cache
+
+test_expect_success 'refresh_index() invalidates fsmonitor cache' '
+	git config core.fsmonitor true &&
+	git config core.untrackedcache true &&
+	clean_repo &&
+	git status &&
+	dirty_repo &&
+	write_script .git/hooks/query-fsmonitor<<-\EOF &&
+	EOF
+	git add . &&
+	git commit -m "to reset" &&
+	git status &&
+	git reset HEAD~1 &&
+	git status >output &&
+	git -c core.fsmonitor=false status >expect &&
+	test_i18ncmp expect output
+'
+
+# Now make sure it's actually skipping the check for modified and untracked
+# files unless it is told about them.  Note, after "git reset --hard HEAD" no
+# extensions exist other than 'TREE' so do a "git status" to get the extension
+# written before testing the results.
+
+test_expect_success 'status doesnt detect unreported modifications' '
+	git config core.fsmonitor true &&
+	git config core.untrackedcache true &&
+	write_script .git/hooks/query-fsmonitor<<-\EOF &&
+	:
+	EOF
+	clean_repo &&
+	git status &&
+	: >untracked &&
+	echo 2 >dir1/modified &&
+	git status >output &&
+	test_i18ngrep ! "Changes not staged for commit:" output &&
+	test_i18ngrep ! "Untracked files:" output &&
+	write_script .git/hooks/query-fsmonitor<<-\EOF &&
+	printf "untracked%s\0"
+	printf "dir1/modified\0"
+	EOF
+	git status >output &&
+	test_i18ngrep "Changes not staged for commit:" output &&
+	test_i18ngrep "Untracked files:" output
+'
+
+test_done
-- 
2.13.0.windows.1.6.g4597375fc3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v1 4/5] Add documentation for the fsmonitor extension.  This includes the core.fsmonitor setting, the query-fsmonitor hook, and the fsmonitor index extension.
  2017-05-15 19:13 [PATCH v1 0/5] Fast git status via a file system watcher Ben Peart
                   ` (2 preceding siblings ...)
  2017-05-15 19:13 ` [PATCH v1 3/5] fsmonitor: add test cases for fsmonitor extension Ben Peart
@ 2017-05-15 19:13 ` Ben Peart
  2017-05-15 19:13 ` [PATCH v1 5/5] Add a sample query-fsmonitor hook script that integrates with the cross platform Watchman file watching service Ben Peart
  2017-05-16  5:00 ` [PATCH v1 0/5] Fast git status via a file system watcher Junio C Hamano
  5 siblings, 0 replies; 26+ messages in thread
From: Ben Peart @ 2017-05-15 19:13 UTC (permalink / raw)
  To: git; +Cc: gitster, benpeart, pclouds, johannes.schindelin, David.Turner,
	peff

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 Documentation/config.txt                 |  7 +++++++
 Documentation/githooks.txt               | 23 +++++++++++++++++++++++
 Documentation/technical/index-format.txt | 18 ++++++++++++++++++
 3 files changed, 48 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index bc7088b287..a9a58cb8a6 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -391,6 +391,13 @@ core.protectNTFS::
 	8.3 "short" names.
 	Defaults to `true` on Windows, and `false` elsewhere.
 
+core.fsmonitor::
+	If set to true, call the query-fsmonitor hook proc which will
+	identify all files that may have had changes since the last
+	request. This information is used to speed up operations like
+	'git commit' and 'git status' by limiting what git must scan to
+	detect changes.
+
 core.trustctime::
 	If false, the ctime differences between the index and the
 	working tree are ignored; useful when the inode change time
diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index 706091a569..f7b4b4a844 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -448,6 +448,29 @@ The commits are guaranteed to be listed in the order that they were
 processed by rebase.
 
 
+[[query-fsmonitor]]
+query-fsmonitor
+~~~~~~~~~~~~
+
+This hook is invoked when the configuration option core.fsmonitor is
+set and git needs to identify changed or untracked files.  It takes
+a single argument which is the time in elapsed seconds since midnight,
+January 1, 1970.
+
+The hook should output to stdout the list of all files in the working
+directory that may have changed since the requested time.  The logic
+should be inclusive so that it does not miss any potential changes.
+The paths should be relative to the root of the working directory
+and be separated by a single NUL.
+
+Git will limit what files it checks for changes as well as which
+directories are checked for untracked files based on the path names
+given.
+
+The exit status determines whether git will use the data from the
+hook to limit its search.  On error, it will fall back to verifying
+all files and folders.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/Documentation/technical/index-format.txt b/Documentation/technical/index-format.txt
index ade0b0c445..b002d23c05 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -295,3 +295,21 @@ The remaining data of each directory block is grouped by type:
     in the previous ewah bitmap.
 
   - One NUL.
+
+== File System Monitor cache
+
+  The file system monitor cache tracks files for which the query-fsmonitor
+  hook has told us about changes.  The signature for this extension is
+  { 'F', 'S', 'M', 'N' }.
+
+  The extension starts with
+
+  - 32-bit version number: the current supported version is 1.
+
+  - 64-bit time: the extension data reflects all changes through the given
+	time which is stored as the seconds elapsed since midnight, January 1, 1970.
+
+  - 32-bit bitmap size: the size of the CE_FSMONITOR_DIRTY bitmap.
+
+  - An ewah bitmap, the n-th bit indicates whether the n-th index entry
+    is CE_FSMONITOR_DIRTY.
-- 
2.13.0.windows.1.6.g4597375fc3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v1 5/5] Add a sample query-fsmonitor hook script that integrates with the cross platform Watchman file watching service.
  2017-05-15 19:13 [PATCH v1 0/5] Fast git status via a file system watcher Ben Peart
                   ` (3 preceding siblings ...)
  2017-05-15 19:13 ` [PATCH v1 4/5] Add documentation for the fsmonitor extension. This includes the core.fsmonitor setting, the query-fsmonitor hook, and the fsmonitor index extension Ben Peart
@ 2017-05-15 19:13 ` Ben Peart
  2017-05-15 19:50   ` David Turner
  2017-05-16  5:00 ` [PATCH v1 0/5] Fast git status via a file system watcher Junio C Hamano
  5 siblings, 1 reply; 26+ messages in thread
From: Ben Peart @ 2017-05-15 19:13 UTC (permalink / raw)
  To: git; +Cc: gitster, benpeart, pclouds, johannes.schindelin, David.Turner,
	peff

To use the script:

Download and install Watchman from https://facebook.github.io/watchman/
and instruct Watchman to watch your working directory for changes
('watchman watch-project /usr/src/git').

Rename the sample integration hook from query-fsmonitor.sample to
query-fsmonitor.

Configure git to use the extension ('git config core.fsmonitor true')
and optionally turn on the untracked cache for optimal performance
('git config core.untrackedcache true').

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 templates/hooks--query-fsmonitor.sample | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)
 create mode 100644 templates/hooks--query-fsmonitor.sample

diff --git a/templates/hooks--query-fsmonitor.sample b/templates/hooks--query-fsmonitor.sample
new file mode 100644
index 0000000000..4bd22f21d8
--- /dev/null
+++ b/templates/hooks--query-fsmonitor.sample
@@ -0,0 +1,27 @@
+#!/bin/sh
+#
+# An example hook script to integrate Watchman
+# (https://facebook.github.io/watchman/) with git to provide fast
+# git status.
+#
+# The hook is passed a time_t formatted as a string and outputs to
+# stdout all files that have been modified since the given time.
+# Paths must be relative to the root of the working tree and
+# separated by a single NUL.
+#
+# To enable this hook, rename this file to "query-fsmonitor"
+
+# Convert unix style paths to escaped Windows style paths
+case "$(uname -s)" in
+MINGW*|MSYS_NT*)
+  GIT_WORK_TREE="$(cygpath -aw "$PWD" | sed 's,\\,\\\\,g')"
+  ;;
+*)
+  GIT_WORK_TREE="$PWD"
+  ;;
+esac
+
+# Query Watchman for all the changes since the requested time
+echo "[\"query\", \"$GIT_WORK_TREE\", {\"since\": $1, \"fields\":[\"name\"]}]" | \
+watchman -j | \
+perl -e 'use JSON::PP; my $o = JSON::PP->new->utf8->decode(join("", <>)); die "Watchman: $o->{'error'}.\nFalling back to scanning...\n" if defined($o->{"error"}); print(join("\0", @{$o->{"files"}}));'
-- 
2.13.0.windows.1.6.g4597375fc3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* RE: [PATCH v1 5/5] Add a sample query-fsmonitor hook script that integrates with the cross platform Watchman file watching service.
  2017-05-15 19:13 ` [PATCH v1 5/5] Add a sample query-fsmonitor hook script that integrates with the cross platform Watchman file watching service Ben Peart
@ 2017-05-15 19:50   ` David Turner
  2017-05-15 20:10     ` Ben Peart
  0 siblings, 1 reply; 26+ messages in thread
From: David Turner @ 2017-05-15 19:50 UTC (permalink / raw)
  To: 'Ben Peart', git@vger.kernel.org
  Cc: gitster@pobox.com, benpeart@microsoft.com, pclouds@gmail.com,
	johannes.schindelin@gmx.de, peff@peff.net



> -----Original Message-----
> From: Ben Peart [mailto:peartben@gmail.com]
> Sent: Monday, May 15, 2017 3:14 PM
> To: git@vger.kernel.org
> Cc: gitster@pobox.com; benpeart@microsoft.com; pclouds@gmail.com;
> johannes.schindelin@gmx.de; David Turner <David.Turner@twosigma.com>;
> peff@peff.net
> Subject: [PATCH v1 5/5] Add a sample query-fsmonitor hook script that
> integrates with the cross platform Watchman file watching service.
> 
> To use the script:
> 
> Download and install Watchman from https://facebook.github.io/watchman/
> and instruct Watchman to watch your working directory for changes
> ('watchman watch-project /usr/src/git').
> 
> Rename the sample integration hook from query-fsmonitor.sample to query-
> fsmonitor.
> 
> Configure git to use the extension ('git config core.fsmonitor true') and
> optionally turn on the untracked cache for optimal performance ('git config
> core.untrackedcache true').
> 
> Signed-off-by: Ben Peart <benpeart@microsoft.com>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  templates/hooks--query-fsmonitor.sample | 27
> +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
>  create mode 100644 templates/hooks--query-fsmonitor.sample
> 
> diff --git a/templates/hooks--query-fsmonitor.sample b/templates/hooks--
> query-fsmonitor.sample
> new file mode 100644
> index 0000000000..4bd22f21d8
> --- /dev/null
> +++ b/templates/hooks--query-fsmonitor.sample
> @@ -0,0 +1,27 @@
> +#!/bin/sh
> +#
> +# An example hook script to integrate Watchman #
> +(https://facebook.github.io/watchman/) with git to provide fast # git
> +status.
> +#
> +# The hook is passed a time_t formatted as a string and outputs to #
> +stdout all files that have been modified since the given time.
> +# Paths must be relative to the root of the working tree and #
> +separated by a single NUL.
> +#
> +# To enable this hook, rename this file to "query-fsmonitor"
> +
> +# Convert unix style paths to escaped Windows style paths case "$(uname
> +-s)" in
> +MINGW*|MSYS_NT*)
> +  GIT_WORK_TREE="$(cygpath -aw "$PWD" | sed 's,\\,\\\\,g')"
> +  ;;
> +*)
> +  GIT_WORK_TREE="$PWD"
> +  ;;
> +esac
> +
> +# Query Watchman for all the changes since the requested time echo
> +"[\"query\", \"$GIT_WORK_TREE\", {\"since\": $1,
> +\"fields\":[\"name\"]}]" | \ watchman -j | \ perl -e 'use JSON::PP; my
> +$o = JSON::PP->new->utf8->decode(join("", <>)); die "Watchman: $o-
> >{'error'}.\nFalling back to scanning...\n" if defined($o->{"error"});
> print(join("\0", @{$o->{"files"}}));'

Last time I checked, the argument to 'since' was not a time_t -- it was a 
watchman clock spec.  Have you tested this?  Does it work?


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 5/5] Add a sample query-fsmonitor hook script that integrates with the cross platform Watchman file watching service.
  2017-05-15 19:50   ` David Turner
@ 2017-05-15 20:10     ` Ben Peart
  0 siblings, 0 replies; 26+ messages in thread
From: Ben Peart @ 2017-05-15 20:10 UTC (permalink / raw)
  To: David Turner, git@vger.kernel.org
  Cc: gitster@pobox.com, benpeart@microsoft.com, pclouds@gmail.com,
	johannes.schindelin@gmx.de, peff@peff.net



On 5/15/2017 3:50 PM, David Turner wrote:
>
>> -----Original Message-----
>> From: Ben Peart [mailto:peartben@gmail.com]
>> Sent: Monday, May 15, 2017 3:14 PM
>> To: git@vger.kernel.org
>> Cc: gitster@pobox.com; benpeart@microsoft.com; pclouds@gmail.com;
>> johannes.schindelin@gmx.de; David Turner <David.Turner@twosigma.com>;
>> peff@peff.net
>> Subject: [PATCH v1 5/5] Add a sample query-fsmonitor hook script that
>> integrates with the cross platform Watchman file watching service.
>>
>> To use the script:
>>
>> Download and install Watchman from https://facebook.github.io/watchman/
>> and instruct Watchman to watch your working directory for changes
>> ('watchman watch-project /usr/src/git').
>>
>> Rename the sample integration hook from query-fsmonitor.sample to query-
>> fsmonitor.
>>
>> Configure git to use the extension ('git config core.fsmonitor true') and
>> optionally turn on the untracked cache for optimal performance ('git config
>> core.untrackedcache true').
>>
>> Signed-off-by: Ben Peart <benpeart@microsoft.com>
>> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>> ---
>>   templates/hooks--query-fsmonitor.sample | 27
>> +++++++++++++++++++++++++++
>>   1 file changed, 27 insertions(+)
>>   create mode 100644 templates/hooks--query-fsmonitor.sample
>>
>> diff --git a/templates/hooks--query-fsmonitor.sample b/templates/hooks--
>> query-fsmonitor.sample
>> new file mode 100644
>> index 0000000000..4bd22f21d8
>> --- /dev/null
>> +++ b/templates/hooks--query-fsmonitor.sample
>> @@ -0,0 +1,27 @@
>> +#!/bin/sh
>> +#
>> +# An example hook script to integrate Watchman #
>> +(https://facebook.github.io/watchman/) with git to provide fast # git
>> +status.
>> +#
>> +# The hook is passed a time_t formatted as a string and outputs to #
>> +stdout all files that have been modified since the given time.
>> +# Paths must be relative to the root of the working tree and #
>> +separated by a single NUL.
>> +#
>> +# To enable this hook, rename this file to "query-fsmonitor"
>> +
>> +# Convert unix style paths to escaped Windows style paths case "$(uname
>> +-s)" in
>> +MINGW*|MSYS_NT*)
>> +  GIT_WORK_TREE="$(cygpath -aw "$PWD" | sed 's,\\,\\\\,g')"
>> +  ;;
>> +*)
>> +  GIT_WORK_TREE="$PWD"
>> +  ;;
>> +esac
>> +
>> +# Query Watchman for all the changes since the requested time echo
>> +"[\"query\", \"$GIT_WORK_TREE\", {\"since\": $1,
>> +\"fields\":[\"name\"]}]" | \ watchman -j | \ perl -e 'use JSON::PP; my
>> +$o = JSON::PP->new->utf8->decode(join("", <>)); die "Watchman: $o-
>>> {'error'}.\nFalling back to scanning...\n" if defined($o->{"error"});
>> print(join("\0", @{$o->{"files"}}));'
> Last time I checked, the argument to 'since' was not a time_t -- it was a
> watchman clock spec.  Have you tested this?  Does it work?
>

Watchman also accepts a Unix time value for "since" as documented here 
(https://facebook.github.io/watchman/docs/expr/since.html).

Yes, this has been tested and works correctly as long as you have a 
recent version that contains the patch 
(https://github.com/facebook/watchman/commit/67b26a8938336f08918fc7187129b6c1a571f35b) 
that made sure it was greedy when using the Unix time.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
  2017-05-15 19:13 ` [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files Ben Peart
@ 2017-05-15 21:21   ` David Turner
  2017-05-16  1:15     ` Ben Peart
  2017-05-16  0:22   ` brian m. carlson
  2017-05-16 21:41   ` Jonathan Tan
  2 siblings, 1 reply; 26+ messages in thread
From: David Turner @ 2017-05-15 21:21 UTC (permalink / raw)
  To: 'Ben Peart', git@vger.kernel.org
  Cc: gitster@pobox.com, benpeart@microsoft.com, pclouds@gmail.com,
	johannes.schindelin@gmx.de, peff@peff.net


> -----Original Message-----
> From: Ben Peart [mailto:peartben@gmail.com]
> Sent: Monday, May 15, 2017 3:14 PM
> To: git@vger.kernel.org
> Cc: gitster@pobox.com; benpeart@microsoft.com; pclouds@gmail.com;
> johannes.schindelin@gmx.de; David Turner <David.Turner@twosigma.com>;
> peff@peff.net
> Subject: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to
> speed up detecting new or changed files.

> @@ -342,6 +344,8 @@ struct index_state {
>  	struct hashmap dir_hash;
>  	unsigned char sha1[20];
>  	struct untracked_cache *untracked;
> +	time_t last_update;
> +	struct ewah_bitmap *bitmap;

The name 'bitmap' doesn't tell the reader much about what it used for.

> +static int update_istate(const char *name, void *is) {

Rename to mark_file_dirty?  Also why does it take a void pointer?  Or return int (rather than void)?

> +void refresh_by_fsmonitor(struct index_state *istate) {
> +	static has_run_once = FALSE;
> +	struct strbuf buffer = STRBUF_INIT;

Rename to query_result? Also I think you're leaking it.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
  2017-05-15 19:13 ` [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files Ben Peart
  2017-05-15 21:21   ` David Turner
@ 2017-05-16  0:22   ` brian m. carlson
  2017-05-16  0:34     ` Jeff King
  2017-05-16 21:41   ` Jonathan Tan
  2 siblings, 1 reply; 26+ messages in thread
From: brian m. carlson @ 2017-05-16  0:22 UTC (permalink / raw)
  To: Ben Peart
  Cc: git, gitster, benpeart, pclouds, johannes.schindelin,
	David.Turner, peff

[-- Attachment #1: Type: text/plain, Size: 1395 bytes --]

On Mon, May 15, 2017 at 03:13:44PM -0400, Ben Peart wrote:
> +	istate->last_update = (time_t)ntohll(*(uint64_t *)index);
> +	index += sizeof(uint64_t);
> +
> +	ewah_size = ntohl(*(uint32_t *)index);
> +	index += sizeof(uint32_t);

To answer the question you asked in your cover letter, you cannot write
this unless you can guarantee (((uintptr_t)index & 7) == 0) is true.
Otherwise, this will produce a SIGBUS on SPARC, Alpha, MIPS, and some
ARM systems, and it will perform poorly on PowerPC and other ARM
systems[0].

If you got that pointer from malloc and have only indexed multiples of 8
on it, you're good.  But if you're not sure, you probably want to use
memcpy.  If the compiler can determine that it's not necessary, it will
omit the copy and perform a direct load.

[0] To be technically correct, all of those systems except SPARC can
have unaligned access fixed up automatically, depending on the kernel
settings.  But such a fixup involves taking a trap into the kernel,
performing two aligned loads and bit shifting, and returning to
userspace, which performs about as well as you'd expect.  For that
reason, Debian build machines have such fixups turned off and will just
SIGBUS.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
  2017-05-16  0:22   ` brian m. carlson
@ 2017-05-16  0:34     ` Jeff King
  2017-05-16  1:55       ` Ben Peart
  0 siblings, 1 reply; 26+ messages in thread
From: Jeff King @ 2017-05-16  0:34 UTC (permalink / raw)
  To: brian m. carlson, Ben Peart, git, gitster, benpeart, pclouds,
	johannes.schindelin, David.Turner

On Tue, May 16, 2017 at 12:22:14AM +0000, brian m. carlson wrote:

> On Mon, May 15, 2017 at 03:13:44PM -0400, Ben Peart wrote:
> > +	istate->last_update = (time_t)ntohll(*(uint64_t *)index);
> > +	index += sizeof(uint64_t);
> > +
> > +	ewah_size = ntohl(*(uint32_t *)index);
> > +	index += sizeof(uint32_t);
> 
> To answer the question you asked in your cover letter, you cannot write
> this unless you can guarantee (((uintptr_t)index & 7) == 0) is true.
> Otherwise, this will produce a SIGBUS on SPARC, Alpha, MIPS, and some
> ARM systems, and it will perform poorly on PowerPC and other ARM
> systems[0].
> 
> If you got that pointer from malloc and have only indexed multiples of 8
> on it, you're good.  But if you're not sure, you probably want to use
> memcpy.  If the compiler can determine that it's not necessary, it will
> omit the copy and perform a direct load.

I think get_be32() does exactly what we want for the ewah_size read. For
the last_update one, we don't have a get_be64() yet, but it should be
easy to make based on the 16/32 versions.

(I note also that time_t is not necessarily 64-bits in the first place,
but David said something about this not really being a time_t).

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
  2017-05-15 21:21   ` David Turner
@ 2017-05-16  1:15     ` Ben Peart
  0 siblings, 0 replies; 26+ messages in thread
From: Ben Peart @ 2017-05-16  1:15 UTC (permalink / raw)
  To: David Turner, git@vger.kernel.org
  Cc: gitster@pobox.com, benpeart@microsoft.com, pclouds@gmail.com,
	johannes.schindelin@gmx.de, peff@peff.net


On 5/15/2017 5:21 PM, David Turner wrote:
>
>> -----Original Message-----
>> From: Ben Peart [mailto:peartben@gmail.com]
>> Sent: Monday, May 15, 2017 3:14 PM
>> To: git@vger.kernel.org
>> Cc: gitster@pobox.com; benpeart@microsoft.com; pclouds@gmail.com;
>> johannes.schindelin@gmx.de; David Turner <David.Turner@twosigma.com>;
>> peff@peff.net
>> Subject: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to
>> speed up detecting new or changed files.
>
>> @@ -342,6 +344,8 @@ struct index_state {
>>  	struct hashmap dir_hash;
>>  	unsigned char sha1[20];
>>  	struct untracked_cache *untracked;
>> +	time_t last_update;
>> +	struct ewah_bitmap *bitmap;
>
> The name 'bitmap' doesn't tell the reader much about what it used for.
>
>> +static int update_istate(const char *name, void *is) {
>
> Rename to mark_file_dirty?  Also why does it take a void pointer?  Or return int (rather than void)?
>

Thanks for the feedback.  I'll do some renaming and change the types passed.


>> +void refresh_by_fsmonitor(struct index_state *istate) {
>> +	static has_run_once = FALSE;
>> +	struct strbuf buffer = STRBUF_INIT;
>
> Rename to query_result? Also I think you're leaking it.
>

Good catch!  I missed the leak there.  Fixed for the next roll.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
  2017-05-16  0:34     ` Jeff King
@ 2017-05-16  1:55       ` Ben Peart
  2017-05-16  2:51         ` Jeff King
  2017-05-16 17:17         ` Ben Peart
  0 siblings, 2 replies; 26+ messages in thread
From: Ben Peart @ 2017-05-16  1:55 UTC (permalink / raw)
  To: Jeff King, brian m. carlson, git, gitster, benpeart, pclouds,
	johannes.schindelin, David.Turner



On 5/15/2017 8:34 PM, Jeff King wrote:
> On Tue, May 16, 2017 at 12:22:14AM +0000, brian m. carlson wrote:
>
>> On Mon, May 15, 2017 at 03:13:44PM -0400, Ben Peart wrote:
>>> +	istate->last_update = (time_t)ntohll(*(uint64_t *)index);
>>> +	index += sizeof(uint64_t);
>>> +
>>> +	ewah_size = ntohl(*(uint32_t *)index);
>>> +	index += sizeof(uint32_t);
>>
>> To answer the question you asked in your cover letter, you cannot write
>> this unless you can guarantee (((uintptr_t)index & 7) == 0) is true.
>> Otherwise, this will produce a SIGBUS on SPARC, Alpha, MIPS, and some
>> ARM systems, and it will perform poorly on PowerPC and other ARM
>> systems[0].
>>
>> If you got that pointer from malloc and have only indexed multiples of 8
>> on it, you're good.  But if you're not sure, you probably want to use
>> memcpy.  If the compiler can determine that it's not necessary, it will
>> omit the copy and perform a direct load.
>
> I think get_be32() does exactly what we want for the ewah_size read. For
> the last_update one, we don't have a get_be64() yet, but it should be
> easy to make based on the 16/32 versions.

Thanks for the pointers.  I'll update this to use the existing get_be32 
and have created a get_be64 and will use that for the last_update.

>
> (I note also that time_t is not necessarily 64-bits in the first place,
> but David said something about this not really being a time_t).
>

The in memory representation is a time_t as that is the return value of 
time(NULL) but it is converted to/from a 64 bit value when written/read 
to the index extension so that the index format is the same no matter 
the native size of time_t.

> -Peff
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
  2017-05-16  1:55       ` Ben Peart
@ 2017-05-16  2:51         ` Jeff King
  2017-05-16 17:17         ` Ben Peart
  1 sibling, 0 replies; 26+ messages in thread
From: Jeff King @ 2017-05-16  2:51 UTC (permalink / raw)
  To: Ben Peart
  Cc: brian m. carlson, git, gitster, benpeart, pclouds,
	johannes.schindelin, David.Turner

On Mon, May 15, 2017 at 09:55:12PM -0400, Ben Peart wrote:

> > > > +	istate->last_update = (time_t)ntohll(*(uint64_t *)index);
> [...]
> > (I note also that time_t is not necessarily 64-bits in the first place,
> > but David said something about this not really being a time_t).
> 
> The in memory representation is a time_t as that is the return value of
> time(NULL) but it is converted to/from a 64 bit value when written/read to
> the index extension so that the index format is the same no matter the
> native size of time_t.

OK. I guess your cast here will truncate on 32-bit systems, but
presumably not until 2038, so we can perhaps ignore it for now (and
anyway, time(NULL) will be broken on such a system at that point).

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 3/5] fsmonitor: add test cases for fsmonitor extension
  2017-05-15 19:13 ` [PATCH v1 3/5] fsmonitor: add test cases for fsmonitor extension Ben Peart
@ 2017-05-16  4:59   ` Junio C Hamano
  2017-05-16 14:28     ` Ben Peart
  0 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2017-05-16  4:59 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, benpeart, pclouds, johannes.schindelin, David.Turner, peff

Ben Peart <peartben@gmail.com> writes:

> Add test cases that ensure status results are correct when using the new
> fsmonitor extension.  Test untracked, modified, and new files by
> ensuring the results are identical to when not using the extension.
>
> Add a test to ensure updates to the index properly mark corresponding
> entries in the index extension as dirty so that the status is correct
> after commands that modify the index but don't trigger changes in the
> working directory.
>
> Add a test that verifies that if the fsmonitor extension doesn't tell
> git about a change, it doesn't discover it on its own.  This ensures
> git is honoring the extension and that we get the performance benefits
> desired.
>
> Signed-off-by: Ben Peart <benpeart@microsoft.com>
> ---
>  t/t7519-status-fsmonitor.sh | 134 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 134 insertions(+)
>  create mode 100644 t/t7519-status-fsmonitor.sh

Please make this executable.

> diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
> new file mode 100644
> index 0000000000..2d63efc27b
> --- /dev/null
> +++ b/t/t7519-status-fsmonitor.sh
> @@ -0,0 +1,134 @@
> ...
> +# Ensure commands that call refresh_index() to move the index back in time
> +# properly invalidate the fsmonitor cache
> +...
> +	git status >output &&
> +	git -c core.fsmonitor=false status >expect &&
> +	test_i18ncmp expect output
> +'

Hmm. I wonder if we can somehow detect the case where we got the
correct and expected result only because fsmonitor was not in
effect, even though the test requested it to be used?  Not limited
to this particular test piece, but applies to all of them in this
file.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 0/5] Fast git status via a file system watcher
  2017-05-15 19:13 [PATCH v1 0/5] Fast git status via a file system watcher Ben Peart
                   ` (4 preceding siblings ...)
  2017-05-15 19:13 ` [PATCH v1 5/5] Add a sample query-fsmonitor hook script that integrates with the cross platform Watchman file watching service Ben Peart
@ 2017-05-16  5:00 ` Junio C Hamano
  5 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2017-05-16  5:00 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, benpeart, pclouds, johannes.schindelin, David.Turner, peff

Ben Peart <peartben@gmail.com> writes:

>   Add documentation for the fsmonitor extension.  This includes the
>     core.fsmonitor setting, the query-fsmonitor hook, and the fsmonitor
>     index extension.
>   Add a sample query-fsmonitor hook script that integrates with the
>     cross platform Watchman file watching service.

These two have looong titles ;-)  Accident?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 1/5] dir: make lookup_untracked() available outside of dir.c
  2017-05-15 19:13 ` [PATCH v1 1/5] dir: make lookup_untracked() available outside of dir.c Ben Peart
@ 2017-05-16  5:01   ` Junio C Hamano
  0 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2017-05-16  5:01 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, benpeart, pclouds, johannes.schindelin, David.Turner, peff

Ben Peart <peartben@gmail.com> writes:

> Remove the static qualifier from lookup_untracked() and make it
> available to other modules by exporting it from dir.h.

Surely that is what you did in this patch, but leaves readers in
suspense wondering why this helper needs to be available to others
in the first place ;-)  Let's read on.

>
> Signed-off-by: Ben Peart <benpeart@microsoft.com>
> ---
>  dir.c | 2 +-
>  dir.h | 3 +++
>  2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/dir.c b/dir.c
> index f451bfa48c..1b5558fdf9 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -660,7 +660,7 @@ static void trim_trailing_spaces(char *buf)
>   *
>   * If "name" has the trailing slash, it'll be excluded in the search.
>   */
> -static struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
> +struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
>  						    struct untracked_cache_dir *dir,
>  						    const char *name, int len)
>  {
> diff --git a/dir.h b/dir.h
> index bf23a470af..9e387551bd 100644
> --- a/dir.h
> +++ b/dir.h
> @@ -339,4 +339,7 @@ extern void connect_work_tree_and_git_dir(const char *work_tree, const char *git
>  extern void relocate_gitdir(const char *path,
>  			    const char *old_git_dir,
>  			    const char *new_git_dir);
> +struct untracked_cache_dir *lookup_untracked(struct untracked_cache *uc,
> +					     struct untracked_cache_dir *dir,
> +					     const char *name, int len);
>  #endif

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 3/5] fsmonitor: add test cases for fsmonitor extension
  2017-05-16  4:59   ` Junio C Hamano
@ 2017-05-16 14:28     ` Ben Peart
  0 siblings, 0 replies; 26+ messages in thread
From: Ben Peart @ 2017-05-16 14:28 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, benpeart, pclouds, johannes.schindelin, David.Turner, peff



On 5/16/2017 12:59 AM, Junio C Hamano wrote:
> Ben Peart <peartben@gmail.com> writes:
>
>> Add test cases that ensure status results are correct when using the new
>> fsmonitor extension.  Test untracked, modified, and new files by
>> ensuring the results are identical to when not using the extension.
>>
>> Add a test to ensure updates to the index properly mark corresponding
>> entries in the index extension as dirty so that the status is correct
>> after commands that modify the index but don't trigger changes in the
>> working directory.
>>
>> Add a test that verifies that if the fsmonitor extension doesn't tell
>> git about a change, it doesn't discover it on its own.  This ensures
>> git is honoring the extension and that we get the performance benefits
>> desired.
>>
>> Signed-off-by: Ben Peart <benpeart@microsoft.com>
>> ---
>>  t/t7519-status-fsmonitor.sh | 134 ++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 134 insertions(+)
>>  create mode 100644 t/t7519-status-fsmonitor.sh
>
> Please make this executable.
>

Sorry, long time Windows developer so I forgot this extra step.  Fixed 
for next roll.

>> diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
>> new file mode 100644
>> index 0000000000..2d63efc27b
>> --- /dev/null
>> +++ b/t/t7519-status-fsmonitor.sh
>> @@ -0,0 +1,134 @@
>> ...
>> +# Ensure commands that call refresh_index() to move the index back in time
>> +# properly invalidate the fsmonitor cache
>> +...
>> +	git status >output &&
>> +	git -c core.fsmonitor=false status >expect &&
>> +	test_i18ncmp expect output
>> +'
>
> Hmm. I wonder if we can somehow detect the case where we got the
> correct and expected result only because fsmonitor was not in
> effect, even though the test requested it to be used?  Not limited
> to this particular test piece, but applies to all of them in this
> file.
>

I have tested this manually by editing the test hook proc to output 
invalid results and ensured that the test failed as a result but adding 
that to the test script was kind of ugly (all tests end up getting 
duplicated - one ensuring success, one ensuring failure).

On further reflection, a better idea is to have the test hook proc 
output a marker file that can be tested for existence.  If it exists, 
the hook was used to update the results, if it doesn't exist, then the 
hook proc wasn't used.  A much cleaner solution that doesn't require 
duplicating the tests.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
  2017-05-16  1:55       ` Ben Peart
  2017-05-16  2:51         ` Jeff King
@ 2017-05-16 17:17         ` Ben Peart
  2017-05-16 17:49           ` Jeff King
  2017-05-16 19:13           ` Johannes Sixt
  1 sibling, 2 replies; 26+ messages in thread
From: Ben Peart @ 2017-05-16 17:17 UTC (permalink / raw)
  To: Jeff King, brian m. carlson, git, gitster, benpeart, pclouds,
	johannes.schindelin, David.Turner



On 5/15/2017 9:55 PM, Ben Peart wrote:
>
>
> On 5/15/2017 8:34 PM, Jeff King wrote:
>> On Tue, May 16, 2017 at 12:22:14AM +0000, brian m. carlson wrote:
>>
>>> On Mon, May 15, 2017 at 03:13:44PM -0400, Ben Peart wrote:
>>>> +    istate->last_update = (time_t)ntohll(*(uint64_t *)index);
>>>> +    index += sizeof(uint64_t);
>>>> +
>>>> +    ewah_size = ntohl(*(uint32_t *)index);
>>>> +    index += sizeof(uint32_t);
>>>
>>> To answer the question you asked in your cover letter, you cannot write
>>> this unless you can guarantee (((uintptr_t)index & 7) == 0) is true.
>>> Otherwise, this will produce a SIGBUS on SPARC, Alpha, MIPS, and some
>>> ARM systems, and it will perform poorly on PowerPC and other ARM
>>> systems[0].
>>>
>>> If you got that pointer from malloc and have only indexed multiples of 8
>>> on it, you're good.  But if you're not sure, you probably want to use
>>> memcpy.  If the compiler can determine that it's not necessary, it will
>>> omit the copy and perform a direct load.
>>
>> I think get_be32() does exactly what we want for the ewah_size read. For
>> the last_update one, we don't have a get_be64() yet, but it should be
>> easy to make based on the 16/32 versions.
>
> Thanks for the pointers.  I'll update this to use the existing get_be32
> and have created a get_be64 and will use that for the last_update.
>

OK, now I'm confused as to the best path for adding a get_be64.  This 
one is trivial:

#define get_be64(p)	ntohll(*(uint64_t *)(p))

but should the unaligned version be:

#define get_be64(p)	( \
	(*((unsigned char *)(p) + 0) << 56) | \
	(*((unsigned char *)(p) + 1) << 48) | \
	(*((unsigned char *)(p) + 2) << 40) | \
	(*((unsigned char *)(p) + 3) << 32) | \
	(*((unsigned char *)(p) + 4) << 24) | \
	(*((unsigned char *)(p) + 5) << 16) | \
	(*((unsigned char *)(p) + 6) <<  8) | \
	(*((unsigned char *)(p) + 7) <<  0) )

or would it be better to do it like this:

#define get_be64(p)	( \
	((uint64_t)get_be32((unsigned char *)(p) + 0) << 32) | \
	((uint64_t)get_be32((unsigned char *)(p) + 4) <<  0)

or with a static inline function like git_bswap64:

or something else entirely?

I'm not sure why the different styles in this one file and which I 
should be emulating.


>>
>> (I note also that time_t is not necessarily 64-bits in the first place,
>> but David said something about this not really being a time_t).
>>
>
> The in memory representation is a time_t as that is the return value of
> time(NULL) but it is converted to/from a 64 bit value when written/read
> to the index extension so that the index format is the same no matter
> the native size of time_t.
>
>> -Peff
>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
  2017-05-16 17:17         ` Ben Peart
@ 2017-05-16 17:49           ` Jeff King
  2017-05-16 19:13           ` Johannes Sixt
  1 sibling, 0 replies; 26+ messages in thread
From: Jeff King @ 2017-05-16 17:49 UTC (permalink / raw)
  To: Ben Peart
  Cc: brian m. carlson, git, gitster, benpeart, pclouds,
	johannes.schindelin, David.Turner

On Tue, May 16, 2017 at 01:17:56PM -0400, Ben Peart wrote:

> > Thanks for the pointers.  I'll update this to use the existing get_be32
> > and have created a get_be64 and will use that for the last_update.
> 
> OK, now I'm confused as to the best path for adding a get_be64.  This one is
> trivial:
> 
> #define get_be64(p)	ntohll(*(uint64_t *)(p))
> 
> but should the unaligned version be:
> 
> #define get_be64(p)	( \
> 	(*((unsigned char *)(p) + 0) << 56) | \
> 	(*((unsigned char *)(p) + 1) << 48) | \
> 	(*((unsigned char *)(p) + 2) << 40) | \
> 	(*((unsigned char *)(p) + 3) << 32) | \
> 	(*((unsigned char *)(p) + 4) << 24) | \
> 	(*((unsigned char *)(p) + 5) << 16) | \
> 	(*((unsigned char *)(p) + 6) <<  8) | \
> 	(*((unsigned char *)(p) + 7) <<  0) )
> 
> or would it be better to do it like this:
> 
> #define get_be64(p)	( \
> 	((uint64_t)get_be32((unsigned char *)(p) + 0) << 32) | \
> 	((uint64_t)get_be32((unsigned char *)(p) + 4) <<  0)

I'd imagine the compiler would generate quite similar code between the
two, and the second is much shorter and easier to read, so I'd probably
prefer it.

> or with a static inline function like git_bswap64:

Try "git log -Sinline compat/bswap.h", which turns up the history of why
it went from a macro to an inline function.

The get_be macros are simple enough that they can remain as macros,
though I'd have no objection personally to them being inline functions.
I'd expect modern compilers to be able to optimize similarly, and it
removes the restriction that you can't call the macro with an argument
that has side effects.

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
  2017-05-16 17:17         ` Ben Peart
  2017-05-16 17:49           ` Jeff King
@ 2017-05-16 19:13           ` Johannes Sixt
  2017-05-17 14:26             ` Ben Peart
  1 sibling, 1 reply; 26+ messages in thread
From: Johannes Sixt @ 2017-05-16 19:13 UTC (permalink / raw)
  To: Ben Peart
  Cc: Jeff King, brian m. carlson, git, gitster, benpeart, pclouds,
	johannes.schindelin, David.Turner

Am 16.05.2017 um 19:17 schrieb Ben Peart:
> OK, now I'm confused as to the best path for adding a get_be64.  This 
> one is trivial:
> 
> #define get_be64(p)    ntohll(*(uint64_t *)(p))

I cringe when I see a cast like this. Unless you can guarantee that p is 
char* (bare or signed or unsigned), you fall pray to strict aliasing 
violations, aka undefined behavior. And I'm not even mentioning correct 
alignment, yet.

-- Hannes

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
  2017-05-15 19:13 ` [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files Ben Peart
  2017-05-15 21:21   ` David Turner
  2017-05-16  0:22   ` brian m. carlson
@ 2017-05-16 21:41   ` Jonathan Tan
  2017-05-17  3:35     ` Ben Peart
  2 siblings, 1 reply; 26+ messages in thread
From: Jonathan Tan @ 2017-05-16 21:41 UTC (permalink / raw)
  To: Ben Peart, git
  Cc: gitster, benpeart, pclouds, johannes.schindelin, David.Turner,
	peff

I'm not very familiar with this part of the code - here is a partial review.

Firstly, if someone invokes update-index, I wonder if it's better just 
to do a full refresh (e.g. by deleting the last_update time from the index).

Also, the change to unpack-trees.c doesn't match my mental model. I 
notice that it is in a function related to sparse checkout, but if the 
working tree changes for whatever reason, it seems simpler to just let 
the hook do its thing. As far as I can tell, it is fine to have files 
overzealously marked as FSMONITOR_DIRTY.

On 05/15/2017 12:13 PM, Ben Peart wrote:
> diff --git a/cache.h b/cache.h
> index 40ec032a2d..64aa6e57cd 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -201,6 +201,7 @@ struct cache_entry {
>  #define CE_ADDED             (1 << 19)
>
>  #define CE_HASHED            (1 << 20)
> +#define CE_FSMONITOR_DIRTY   (1 << 21)
>  #define CE_WT_REMOVE         (1 << 22) /* remove in work directory */
>  #define CE_CONFLICTED        (1 << 23)
>
> @@ -324,6 +325,7 @@ static inline unsigned int canon_mode(unsigned int mode)
>  #define CACHE_TREE_CHANGED	(1 << 5)
>  #define SPLIT_INDEX_ORDERED	(1 << 6)
>  #define UNTRACKED_CHANGED	(1 << 7)
> +#define FSMONITOR_CHANGED	(1 << 8)
>
>  struct split_index;
>  struct untracked_cache;
> @@ -342,6 +344,8 @@ struct index_state {
>  	struct hashmap dir_hash;
>  	unsigned char sha1[20];
>  	struct untracked_cache *untracked;
> +	time_t last_update;
> +	struct ewah_bitmap *bitmap;

Here a bitmap is introduced, presumably corresponding to the entries in 
"struct cache_entry **cache", but there is also a CE_FSMONITOR_DIRTY 
that can be set in each "struct cache_entry". This seems redundant and 
probably at least worth explaining in a comment.

> +/*
> + * Call the query-fsmonitor hook passing the time of the last saved results.
> + */
> +static int query_fsmonitor(time_t last_update, struct strbuf *buffer)
> +{
> +	struct child_process cp = CHILD_PROCESS_INIT;
> +	char date[64];
> +	const char *argv[3];
> +
> +	if (!(argv[0] = find_hook("query-fsmonitor")))
> +		return -1;
> +
> +	snprintf(date, sizeof(date), "%" PRIuMAX, (uintmax_t)last_update);
> +	argv[1] = date;
> +	argv[2] = NULL;
> +	cp.argv = argv;
> +	cp.out = -1;
> +
> +	return capture_command(&cp, buffer, 1024);
> +}

Output argument could probably be named better.

Also, would the output of this command be very large? If yes, it might 
be better to process it little by little instead of buffering the whole 
thing first.

> +void write_fsmonitor_extension(struct strbuf *sb, struct index_state* istate);

Space before * (in the .h and .c files).


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
  2017-05-16 21:41   ` Jonathan Tan
@ 2017-05-17  3:35     ` Ben Peart
  0 siblings, 0 replies; 26+ messages in thread
From: Ben Peart @ 2017-05-17  3:35 UTC (permalink / raw)
  To: Jonathan Tan, git
  Cc: gitster, benpeart, pclouds, johannes.schindelin, David.Turner,
	peff



On 5/16/2017 5:41 PM, Jonathan Tan wrote:
> I'm not very familiar with this part of the code - here is a partial
> review.
>
> Firstly, if someone invokes update-index, I wonder if it's better just
> to do a full refresh (e.g. by deleting the last_update time from the
> index).

A full refresh can be very expensive when the working directory is large 
(the specific case this patch series is trying to improve).  Instead, 
the code does the minimal update required to keep things fast but still 
return correct results.

>
> Also, the change to unpack-trees.c doesn't match my mental model. I
> notice that it is in a function related to sparse checkout, but if the
> working tree changes for whatever reason, it seems simpler to just let
> the hook do its thing. As far as I can tell, it is fine to have files
> overzealously marked as FSMONITOR_DIRTY.

The case this (and the others like it) is solving is when the index is 
updated but there may not be any change to the associated file in the 
working directory.  When this occurs, the hook won't indicate any change 
has happened so the index and working directory could be out of sync. 
To be sure this doesn't happen, the index entry is marked 
CE_FSMONITOR_DIRTY to ensure the file is checked.

This is pretty simple to demonstrate - a simple "git reset HEAD~1" will 
do it as a mixed reset updates the index but doesn't touch the files in 
the working directory.

>
> On 05/15/2017 12:13 PM, Ben Peart wrote:
>> diff --git a/cache.h b/cache.h
>> index 40ec032a2d..64aa6e57cd 100644
>> --- a/cache.h
>> +++ b/cache.h
>> @@ -201,6 +201,7 @@ struct cache_entry {
>>  #define CE_ADDED             (1 << 19)
>>
>>  #define CE_HASHED            (1 << 20)
>> +#define CE_FSMONITOR_DIRTY   (1 << 21)
>>  #define CE_WT_REMOVE         (1 << 22) /* remove in work directory */
>>  #define CE_CONFLICTED        (1 << 23)
>>
>> @@ -324,6 +325,7 @@ static inline unsigned int canon_mode(unsigned int
>> mode)
>>  #define CACHE_TREE_CHANGED    (1 << 5)
>>  #define SPLIT_INDEX_ORDERED    (1 << 6)
>>  #define UNTRACKED_CHANGED    (1 << 7)
>> +#define FSMONITOR_CHANGED    (1 << 8)
>>
>>  struct split_index;
>>  struct untracked_cache;
>> @@ -342,6 +344,8 @@ struct index_state {
>>      struct hashmap dir_hash;
>>      unsigned char sha1[20];
>>      struct untracked_cache *untracked;
>> +    time_t last_update;
>> +    struct ewah_bitmap *bitmap;
>
> Here a bitmap is introduced, presumably corresponding to the entries in
> "struct cache_entry **cache", but there is also a CE_FSMONITOR_DIRTY
> that can be set in each "struct cache_entry". This seems redundant and
> probably at least worth explaining in a comment.
>

The ewah bitmap is loaded from the index extension and saved until it 
can be processed after the untracked cache has been loaded and 
initialized in post_read_index_from().  I'm not opposed to documenting 
that to make it clearer but I've just followed the same pattern the 
untracked cache, and split index extensions use which don't specifically 
document it either.

>> +/*
>> + * Call the query-fsmonitor hook passing the time of the last saved
>> results.
>> + */
>> +static int query_fsmonitor(time_t last_update, struct strbuf *buffer)
>> +{
>> +    struct child_process cp = CHILD_PROCESS_INIT;
>> +    char date[64];
>> +    const char *argv[3];
>> +
>> +    if (!(argv[0] = find_hook("query-fsmonitor")))
>> +        return -1;
>> +
>> +    snprintf(date, sizeof(date), "%" PRIuMAX, (uintmax_t)last_update);
>> +    argv[1] = date;
>> +    argv[2] = NULL;
>> +    cp.argv = argv;
>> +    cp.out = -1;
>> +
>> +    return capture_command(&cp, buffer, 1024);
>> +}
>
> Output argument could probably be named better.

I agree.  I've renamed it query_result for the next iteration.

>
> Also, would the output of this command be very large? If yes, it might
> be better to process it little by little instead of buffering the whole
> thing first.
>

The output is usually quite small as it is is the list of files modified 
in the working directory since the last command that requested the 
updated list.

>> +void write_fsmonitor_extension(struct strbuf *sb, struct index_state*
>> istate);
>
> Space before * (in the .h and .c files).
>

Thanks, missed that.  I'll fix it for the next iteration.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
  2017-05-16 19:13           ` Johannes Sixt
@ 2017-05-17 14:26             ` Ben Peart
  2017-05-17 18:15               ` Johannes Sixt
  0 siblings, 1 reply; 26+ messages in thread
From: Ben Peart @ 2017-05-17 14:26 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Jeff King, brian m. carlson, git, gitster, benpeart, pclouds,
	johannes.schindelin, David.Turner



On 5/16/2017 3:13 PM, Johannes Sixt wrote:
> Am 16.05.2017 um 19:17 schrieb Ben Peart:
>> OK, now I'm confused as to the best path for adding a get_be64.  This
>> one is trivial:
>>
>> #define get_be64(p)    ntohll(*(uint64_t *)(p))
>
> I cringe when I see a cast like this. Unless you can guarantee that p is
> char* (bare or signed or unsigned), you fall pray to strict aliasing
> violations, aka undefined behavior. And I'm not even mentioning correct
> alignment, yet.
>
> -- Hannes

Note, this macro is only used where the CPU architecture is OK with 
unaligned memory access.  You can see it in context with many similar 
macros and casts in bswap.h.  It's outside the scope of this patch 
series to fix them all.  Perhaps a separate patch series?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
  2017-05-17 14:26             ` Ben Peart
@ 2017-05-17 18:15               ` Johannes Sixt
  2017-05-18  4:52                 ` Jeff King
  0 siblings, 1 reply; 26+ messages in thread
From: Johannes Sixt @ 2017-05-17 18:15 UTC (permalink / raw)
  To: Ben Peart
  Cc: Jeff King, brian m. carlson, git, gitster, benpeart, pclouds,
	johannes.schindelin, David.Turner

Am 17.05.2017 um 16:26 schrieb Ben Peart:
> On 5/16/2017 3:13 PM, Johannes Sixt wrote:
>> Am 16.05.2017 um 19:17 schrieb Ben Peart:
>>> OK, now I'm confused as to the best path for adding a get_be64.  This
>>> one is trivial:
>>>
>>> #define get_be64(p)    ntohll(*(uint64_t *)(p))
>>
>> I cringe when I see a cast like this. Unless you can guarantee that p is
>> char* (bare or signed or unsigned), you fall pray to strict aliasing
>> violations, aka undefined behavior. And I'm not even mentioning correct
>> alignment, yet.
> 
> Note, this macro is only used where the CPU architecture is OK with 
> unaligned memory access.

I'm not worried about the unaligned memory access: It either works, or 
we get a SIGBUS. The undefined behavior is more worrisome because the 
code may work or not, and we can never be sure which it is.

-- Hannes

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
  2017-05-17 18:15               ` Johannes Sixt
@ 2017-05-18  4:52                 ` Jeff King
  0 siblings, 0 replies; 26+ messages in thread
From: Jeff King @ 2017-05-18  4:52 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: Ben Peart, brian m. carlson, git, gitster, benpeart, pclouds,
	johannes.schindelin, David.Turner

On Wed, May 17, 2017 at 08:15:03PM +0200, Johannes Sixt wrote:

> Am 17.05.2017 um 16:26 schrieb Ben Peart:
> > On 5/16/2017 3:13 PM, Johannes Sixt wrote:
> > > Am 16.05.2017 um 19:17 schrieb Ben Peart:
> > > > OK, now I'm confused as to the best path for adding a get_be64.  This
> > > > one is trivial:
> > > > 
> > > > #define get_be64(p)    ntohll(*(uint64_t *)(p))
> > > 
> > > I cringe when I see a cast like this. Unless you can guarantee that p is
> > > char* (bare or signed or unsigned), you fall pray to strict aliasing
> > > violations, aka undefined behavior. And I'm not even mentioning correct
> > > alignment, yet.
> > 
> > Note, this macro is only used where the CPU architecture is OK with
> > unaligned memory access.
> 
> I'm not worried about the unaligned memory access: It either works, or we
> get a SIGBUS. The undefined behavior is more worrisome because the code may
> work or not, and we can never be sure which it is.

I don't think there's much we can do, though. That's how all of the
get_be* macros are designed to work (and there's really no point in
using them on something that isn't a char pointer).

I agree it would be nice to have some type safety there if we can get
it, though. I wonder if:

  static inline uint32_t get_be32(unsigned char *p)
  {
	return ntohl(*(unsigned int *)p);
  }

would generate the same code. It does mean we may have problems between
signed/unsigned buffers, though.

-Peff

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-05-18  4:52 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-15 19:13 [PATCH v1 0/5] Fast git status via a file system watcher Ben Peart
2017-05-15 19:13 ` [PATCH v1 1/5] dir: make lookup_untracked() available outside of dir.c Ben Peart
2017-05-16  5:01   ` Junio C Hamano
2017-05-15 19:13 ` [PATCH v1 2/5] Teach git to optionally utilize a file system monitor to speed up detecting new or changed files Ben Peart
2017-05-15 21:21   ` David Turner
2017-05-16  1:15     ` Ben Peart
2017-05-16  0:22   ` brian m. carlson
2017-05-16  0:34     ` Jeff King
2017-05-16  1:55       ` Ben Peart
2017-05-16  2:51         ` Jeff King
2017-05-16 17:17         ` Ben Peart
2017-05-16 17:49           ` Jeff King
2017-05-16 19:13           ` Johannes Sixt
2017-05-17 14:26             ` Ben Peart
2017-05-17 18:15               ` Johannes Sixt
2017-05-18  4:52                 ` Jeff King
2017-05-16 21:41   ` Jonathan Tan
2017-05-17  3:35     ` Ben Peart
2017-05-15 19:13 ` [PATCH v1 3/5] fsmonitor: add test cases for fsmonitor extension Ben Peart
2017-05-16  4:59   ` Junio C Hamano
2017-05-16 14:28     ` Ben Peart
2017-05-15 19:13 ` [PATCH v1 4/5] Add documentation for the fsmonitor extension. This includes the core.fsmonitor setting, the query-fsmonitor hook, and the fsmonitor index extension Ben Peart
2017-05-15 19:13 ` [PATCH v1 5/5] Add a sample query-fsmonitor hook script that integrates with the cross platform Watchman file watching service Ben Peart
2017-05-15 19:50   ` David Turner
2017-05-15 20:10     ` Ben Peart
2017-05-16  5:00 ` [PATCH v1 0/5] Fast git status via a file system watcher Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).