git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
* Re: Some rough edges of core.fsmonitor
  2018-01-27  0:28 Some rough edges of core.fsmonitor Ævar Arnfjörð Bjarmason
@ 2018-01-27  1:36 ` Duy Nguyen
  2018-01-27  1:39   ` [PATCH] trace: measure where the time is spent in the index-heavy operations Nguyễn Thái Ngọc Duy
  2018-01-27 11:43   ` Some rough edges of core.fsmonitor Ævar Arnfjörð Bjarmason
  2018-01-28 20:44 ` Johannes Schindelin
  2018-01-30 22:57 ` Some rough edges of core.fsmonitor Ben Peart
  2 siblings, 2 replies; 28+ messages in thread
From: Duy Nguyen @ 2018-01-27  1:36 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Ben Peart, Alex Vandiver, Christian Couder, David Turner

On Sat, Jan 27, 2018 at 7:28 AM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> 3) A lot of time spend reading the index (or something..)

I'm resending a patch from my old index-helper series. It should
measure all big time consuming blocks in git. Maybe we should get it
merged...

> While the hook itself takes ~20ms (and watchman itself 1/4 of that)
> status as a whole takes much longer. gprof reveals:
>
>     Each sample counts as 0.01 seconds.
>       %   cumulative   self              self     total
>      time   seconds   seconds    calls  ms/call  ms/call  name
>      15.38      0.02     0.02   221690     0.00     0.00  memihash

This sounds like name-hash to me.

>      15.38      0.04     0.02   221689     0.00     0.00  create_from_disk
>       7.69      0.05     0.01  2216897     0.00     0.00  git_bswap32
>       7.69      0.06     0.01   222661     0.00     0.00  ce_path_match
>       7.69      0.07     0.01   221769     0.00     0.00  hashmap_add
>       7.69      0.08     0.01    39941     0.00     0.00  prep_exclude
>       7.69      0.09     0.01    39940     0.00     0.00  strbuf_addch
>       7.69      0.10     0.01        1    10.00    10.00  read_one
>       7.69      0.11     0.01        1    10.00    10.00  refresh_index
>       7.69      0.12     0.01        1    10.00    10.00  tweak_fsmonitor
>       7.69      0.13     0.01                             preload_thread
>
> The index is 24M in this case, I guess it's unpacking it, but I wonder
> if this couldn't be much faster if we saved away the result of the last
> "status" in something that's quick to access, and then if nothing

No we could do better, we could cache parsed index content so
everybody benefits. I demonstrated it with my "index v254" patch a
while back:

https://public-inbox.org/git/1399980019-8706-1-git-send-email-pclouds@gmail.com/

With the patch I'm sending soon, we can see how much time reading an
index take out of that ~140-150ms (and we probably can cut down index
read time to like 10-20% when cached).

> changed we just report that, and no need to re-write the index (or just
> write the "it was clean at this time" part).

Hmm.. does an index write increase that much time?
-- 
Duy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] trace: measure where the time is spent in the index-heavy operations
  2018-01-27  1:36 ` Duy Nguyen
@ 2018-01-27  1:39   ` Nguyễn Thái Ngọc Duy
  2018-01-27 11:58     ` Thomas Gummerer
  2018-01-27 11:43   ` Some rough edges of core.fsmonitor Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 28+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-01-27  1:39 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, benpeart, alexmv,
	christian.couder, dturner, Junio C Hamano,
	Nguyễn Thái Ngọc Duy

All the known heavy code blocks are measured (except object database
access). This should help identify if an optimization is effective or
not. An unoptimized git-status would give something like below (92% of
time is accounted).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 This was in my old index-helper series. The series was replaced by
 fsmonitor but perhaps some measurements like this still helps.

 In my old version I measured packed-refs read time too. But
 packed-refs is mmap'd now, no need to worry about it (or at least its
 initial cost).

 diff-lib.c      |  4 ++++
 dir.c           |  2 ++
 name-hash.c     |  3 +++
 preload-index.c |  2 ++
 read-cache.c    | 11 +++++++++++
 5 files changed, 22 insertions(+)

diff --git a/diff-lib.c b/diff-lib.c
index 8104603a3b..a228e1a219 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -92,6 +92,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 	int diff_unmerged_stage = revs->max_count;
 	unsigned ce_option = ((option & DIFF_RACY_IS_MODIFIED)
 			      ? CE_MATCH_RACY_IS_DIRTY : 0);
+	uint64_t start = getnanotime();
 
 	diff_set_mnemonic_prefix(&revs->diffopt, "i/", "w/");
 
@@ -246,6 +247,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 	}
 	diffcore_std(&revs->diffopt);
 	diff_flush(&revs->diffopt);
+	trace_performance_since(start, "diff-files");
 	return 0;
 }
 
@@ -512,6 +514,7 @@ static int diff_cache(struct rev_info *revs,
 int run_diff_index(struct rev_info *revs, int cached)
 {
 	struct object_array_entry *ent;
+	uint64_t start = getnanotime();
 
 	ent = revs->pending.objects;
 	if (diff_cache(revs, &ent->item->oid, ent->name, cached))
@@ -521,6 +524,7 @@ int run_diff_index(struct rev_info *revs, int cached)
 	diffcore_fix_diff_index(&revs->diffopt);
 	diffcore_std(&revs->diffopt);
 	diff_flush(&revs->diffopt);
+	trace_performance_since(start, "diff-index");
 	return 0;
 }
 
diff --git a/dir.c b/dir.c
index 7c4b45e30e..4479a02a49 100644
--- a/dir.c
+++ b/dir.c
@@ -2248,6 +2248,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 		   const char *path, int len, const struct pathspec *pathspec)
 {
 	struct untracked_cache_dir *untracked;
+	uint64_t start = getnanotime();
 
 	if (has_symlink_leading_path(path, len))
 		return dir->nr;
@@ -2286,6 +2287,7 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 		dir->nr = i;
 	}
 
+	trace_performance_since(start, "read directory %.*s", len, path);
 	if (dir->untracked) {
 		static struct trace_key trace_untracked_stats = TRACE_KEY_INIT(UNTRACKED_STATS);
 		trace_printf_key(&trace_untracked_stats,
diff --git a/name-hash.c b/name-hash.c
index 45c98db0a0..ada66f066a 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -578,6 +578,8 @@ static void threaded_lazy_init_name_hash(
 
 static void lazy_init_name_hash(struct index_state *istate)
 {
+	uint64_t start = getnanotime();
+
 	if (istate->name_hash_initialized)
 		return;
 	hashmap_init(&istate->name_hash, cache_entry_cmp, NULL, istate->cache_nr);
@@ -600,6 +602,7 @@ static void lazy_init_name_hash(struct index_state *istate)
 	}
 
 	istate->name_hash_initialized = 1;
+	trace_performance_since(start, "initialize name hash");
 }
 
 /*
diff --git a/preload-index.c b/preload-index.c
index 2a83255e4e..4d08d44874 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -78,6 +78,7 @@ static void preload_index(struct index_state *index,
 {
 	int threads, i, work, offset;
 	struct thread_data data[MAX_PARALLEL];
+	uint64_t start = getnanotime();
 
 	if (!core_preload_index)
 		return;
@@ -108,6 +109,7 @@ static void preload_index(struct index_state *index,
 		if (pthread_join(p->pthread, NULL))
 			die("unable to join threaded lstat");
 	}
+	trace_performance_since(start, "preload index");
 }
 #endif
 
diff --git a/read-cache.c b/read-cache.c
index 2eb81a66b9..1f00aee6a2 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1372,6 +1372,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	const char *typechange_fmt;
 	const char *added_fmt;
 	const char *unmerged_fmt;
+	uint64_t start = getnanotime();
 
 	modified_fmt = (in_porcelain ? "M\t%s\n" : "%s: needs update\n");
 	deleted_fmt = (in_porcelain ? "D\t%s\n" : "%s: needs update\n");
@@ -1442,6 +1443,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 
 		replace_index_entry(istate, i, new);
 	}
+	trace_performance_since(start, "refresh index");
 	return has_errors;
 }
 
@@ -1877,12 +1879,15 @@ int read_index_from(struct index_state *istate, const char *path)
 	int ret;
 	char *base_sha1_hex;
 	const char *base_path;
+	uint64_t start;
 
 	/* istate->initialized covers both .git/index and .git/sharedindex.xxx */
 	if (istate->initialized)
 		return istate->cache_nr;
 
+	start = getnanotime();
 	ret = do_read_index(istate, path, 0);
+	trace_performance_since(start, "read cache %s", path);
 
 	split_index = istate->split_index;
 	if (!split_index || is_null_sha1(split_index->base_sha1)) {
@@ -1897,6 +1902,7 @@ int read_index_from(struct index_state *istate, const char *path)
 
 	base_sha1_hex = sha1_to_hex(split_index->base_sha1);
 	base_path = git_path("sharedindex.%s", base_sha1_hex);
+	start = getnanotime();
 	ret = do_read_index(split_index->base, base_path, 1);
 	if (hashcmp(split_index->base_sha1, split_index->base->sha1))
 		die("broken index, expect %s in %s, got %s",
@@ -1906,6 +1912,9 @@ int read_index_from(struct index_state *istate, const char *path)
 	freshen_shared_index(base_sha1_hex, 0);
 	merge_base_index(istate);
 	post_read_index_from(istate);
+	trace_performance_since(start, "read cache %s",
+				git_path("sharedindex.%s",
+					 sha1_to_hex(split_index->base_sha1)));
 	return ret;
 }
 
@@ -2244,6 +2253,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 	struct ondisk_cache_entry_extended ondisk;
 	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
 	int drop_cache_tree = 0;
+	uint64_t start = getnanotime();
 
 	for (i = removed = extended = 0; i < entries; i++) {
 		if (cache[i]->ce_flags & CE_REMOVE)
@@ -2374,6 +2384,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
 		return -1;
 	istate->timestamp.sec = (unsigned int)st.st_mtime;
 	istate->timestamp.nsec = ST_MTIME_NSEC(st);
+	trace_performance_since(start, "write index, changed mask = %x", istate->cache_changed);
 	return 0;
 }
 
-- 
2.16.1.205.g271f633410


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Some rough edges of core.fsmonitor
  2018-01-27  1:36 ` Duy Nguyen
  2018-01-27  1:39   ` [PATCH] trace: measure where the time is spent in the index-heavy operations Nguyễn Thái Ngọc Duy
@ 2018-01-27 11:43   ` Ævar Arnfjörð Bjarmason
  2018-01-27 12:39     ` Duy Nguyen
  2018-01-29  9:40     ` Duy Nguyen
  1 sibling, 2 replies; 28+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-01-27 11:43 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: git, Ben Peart, Alex Vandiver, Christian Couder, David Turner


On Sat, Jan 27 2018, Duy Nguyen jotted:

> On Sat, Jan 27, 2018 at 7:28 AM, Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> 3) A lot of time spend reading the index (or something..)
>
> I'm resending a patch from my old index-helper series. It should
> measure all big time consuming blocks in git. Maybe we should get it
> merged...
>
>> While the hook itself takes ~20ms (and watchman itself 1/4 of that)
>> status as a whole takes much longer. gprof reveals:
>>
>>     Each sample counts as 0.01 seconds.
>>       %   cumulative   self              self     total
>>      time   seconds   seconds    calls  ms/call  ms/call  name
>>      15.38      0.02     0.02   221690     0.00     0.00  memihash
>
> This sounds like name-hash to me.
>
>>      15.38      0.04     0.02   221689     0.00     0.00  create_from_disk
>>       7.69      0.05     0.01  2216897     0.00     0.00  git_bswap32
>>       7.69      0.06     0.01   222661     0.00     0.00  ce_path_match
>>       7.69      0.07     0.01   221769     0.00     0.00  hashmap_add
>>       7.69      0.08     0.01    39941     0.00     0.00  prep_exclude
>>       7.69      0.09     0.01    39940     0.00     0.00  strbuf_addch
>>       7.69      0.10     0.01        1    10.00    10.00  read_one
>>       7.69      0.11     0.01        1    10.00    10.00  refresh_index
>>       7.69      0.12     0.01        1    10.00    10.00  tweak_fsmonitor
>>       7.69      0.13     0.01                             preload_thread
>>
>> The index is 24M in this case, I guess it's unpacking it, but I wonder
>> if this couldn't be much faster if we saved away the result of the last
>> "status" in something that's quick to access, and then if nothing
>
> No we could do better, we could cache parsed index content so
> everybody benefits. I demonstrated it with my "index v254" patch a
> while back:
>
> https://public-inbox.org/git/1399980019-8706-1-git-send-email-pclouds@gmail.com/
>
> With the patch I'm sending soon, we can see how much time reading an
> index take out of that ~140-150ms (and we probably can cut down index
> read time to like 10-20% when cached).
>
>> changed we just report that, and no need to re-write the index (or just
>> write the "it was clean at this time" part).
>
> Hmm.. does an index write increase that much time?

Your patch is very useful. Here's (with gcc -03) some runtimes. This
also includes my .git exclusion patch.

These are all best out of 5, and with the top (until <0.5% time) of
strace -c output (run as another invocation, timing not done with
strace)::

a) no fsmonitor

    $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status
    12:32:44.947651 read-cache.c:1890       performance: 0.053153609 s: read cache .git/index
    12:32:44.967943 preload-index.c:112     performance: 0.020161093 s: preload index
    12:32:44.974217 read-cache.c:1446       performance: 0.006230611 s: refresh index
    12:32:44.979083 diff-lib.c:250          performance: 0.004649994 s: diff-files
    12:32:44.982511 diff-lib.c:527          performance: 0.002918416 s: diff-index
    12:32:45.037880 dir.c:2290              performance: 0.055331063 s: read directory
    On branch master
    Your branch is up to date with 'origin/master'.

    nothing to commit, working tree clean
    12:32:45.040666 trace.c:417             performance: 0.146724289 s: git command: '/home/aearnfjord/g/git/git-status'

    real    0m0.153s
    user    0m0.110s
    sys     0m0.354s
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     59.93    0.031924           1     39978         9 stat
     35.86    0.019104        6368         3           futex
      0.84    0.000446          12        36           mprotect
      0.73    0.000389          13        29           munmap
      0.66    0.000349           6        62           mmap
      0.53    0.000285          14        20           clone

b) with fsmonitor

    $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status
    12:34:23.833625 read-cache.c:1890       performance: 0.049485685 s: read cache .git/index
    12:34:23.838622 preload-index.c:112     performance: 0.001221197 s: preload index
    12:34:23.858723 fsmonitor.c:170         performance: 0.020059647 s: fsmonitor process '.git/hooks/fsmonitor-watchman'
    12:34:23.871532 read-cache.c:1446       performance: 0.032870818 s: refresh index
    12:34:23.876427 diff-lib.c:250          performance: 0.004731427 s: diff-files
    12:34:23.880669 diff-lib.c:527          performance: 0.003944422 s: diff-index
    12:34:23.899225 dir.c:2290              performance: 0.018509066 s: read directory
    On branch master
    Your branch is up to date with 'origin/master'.

    nothing to commit, working tree clean
    12:34:23.901914 trace.c:417             performance: 0.118250995 s: git command: '/home/aearnfjord/g/git/git-status'

    real    0m0.125s
    user    0m0.086s
    sys     0m0.043s
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     36.61    0.001281          61        21           clone
     33.84    0.001184          41        29           munmap
      5.09    0.000178           5        36           mprotect
      4.34    0.000152           0       619           brk
      4.17    0.000146           2        62           mmap
      3.34    0.000117           2        55        20 open
      3.00    0.000105           2        60        27 lstat
      1.77    0.000062           2        37         9 stat
      1.60    0.000056           1        51           read
      1.57    0.000055           5        12           write
      1.17    0.000041          41         1           wait4
      0.83    0.000029           1        41           close
      0.83    0.000029           1        22           getcwd
      0.80    0.000028           1        34           fstat

c) with fsmonitor + don't write index

    $ time GIT_TRACE_PERFORMANCE=1 GIT_OPTIONAL_LOCKS=0 ~/g/git/git-status
    12:36:03.176866 read-cache.c:1890       performance: 0.048292088 s: read cache .git/index
    12:36:03.181006 preload-index.c:112     performance: 0.001343593 s: preload index
    12:36:03.200970 fsmonitor.c:170         performance: 0.019936338 s: fsmonitor process '.git/hooks/fsmonitor-watchman'
    12:36:03.210556 read-cache.c:1446       performance: 0.029525531 s: refresh index
    12:36:03.213366 diff-lib.c:250          performance: 0.002718730 s: diff-files
    12:36:03.216273 diff-lib.c:527          performance: 0.002666604 s: diff-index
    12:36:03.233579 dir.c:2290              performance: 0.017261702 s: read directory
    On branch master
    Your branch is up to date with 'origin/master'.

    nothing to commit, working tree clean
    12:36:03.233733 trace.c:417             performance: 0.105632105 s: git command: '/home/aearnfjord/g/git/git-status'

    real    0m0.111s
    user    0m0.073s
    sys     0m0.044s
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     24.42    0.001081          37        29           munmap
     20.74    0.000918          44        21           clone
      7.63    0.000338           5        62           mmap
      7.43    0.000329           6        54        20 open
      6.05    0.000268           7        36           mprotect
      5.99    0.000265           0       619           brk
      4.99    0.000221           4        60        27 lstat
      4.13    0.000183           4        51           read
      3.68    0.000163           9        19        10 access
      3.25    0.000144           4        34           fstat
      2.78    0.000123           3        40           close
      2.48    0.000110           3        37         9 stat
      2.24    0.000099           5        21           getcwd
      1.15    0.000051           4        12           write
      0.99    0.000044          22         2           poll

For comparison just the output from the hook:

    $ time ('.git/hooks/fsmonitor-watchman' '1' '1517052856494406191')
    .git
    real    0m0.017s
    user    0m0.011s
    sys     0m0.003s

And this is what goes on behind the scenes after we get rid of the
.git/hooks/fsmonitor-watchman overhead:
    
    $ time (echo '["query", "/home/aearnfjord/git_tree/2015-04-03-1M-git", {
                            "since": 1517052856,
                            "fields": ["name"],
                            "expression": ["not", ["allof", ["since", 1517052856, "cclock"], ["not", "exists"]]]
                    }]' | watchman -j)
    {
        "version": "4.9.0",
        "is_fresh_instance": false,
        "clock": "c:1517006351:31165:4:1113968",
        "files": [
            ".git"
        ]
    }
    
    real    0m0.003s
    user    0m0.000s
    sys     0m0.004s

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] trace: measure where the time is spent in the index-heavy operations
  2018-01-27  1:39   ` [PATCH] trace: measure where the time is spent in the index-heavy operations Nguyễn Thái Ngọc Duy
@ 2018-01-27 11:58     ` Thomas Gummerer
  0 siblings, 0 replies; 28+ messages in thread
From: Thomas Gummerer @ 2018-01-27 11:58 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy
  Cc: git, Ævar Arnfjörð Bjarmason, benpeart, alexmv,
	christian.couder, dturner, Junio C Hamano

On 01/27, Nguyễn Thái Ngọc Duy wrote:
> All the known heavy code blocks are measured (except object database
> access). This should help identify if an optimization is effective or
> not. An unoptimized git-status would give something like below (92% of
> time is accounted).
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  This was in my old index-helper series. The series was replaced by
>  fsmonitor but perhaps some measurements like this still helps.
> 
>  In my old version I measured packed-refs read time too. But
>  packed-refs is mmap'd now, no need to worry about it (or at least its
>  initial cost).
> 
>  diff-lib.c      |  4 ++++
>  dir.c           |  2 ++
>  name-hash.c     |  3 +++
>  preload-index.c |  2 ++
>  read-cache.c    | 11 +++++++++++
>  5 files changed, 22 insertions(+)
>
> [...]
>  
> diff --git a/read-cache.c b/read-cache.c
> index 2eb81a66b9..1f00aee6a2 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1372,6 +1372,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>  	const char *typechange_fmt;
>  	const char *added_fmt;
>  	const char *unmerged_fmt;
> +	uint64_t start = getnanotime();
>  
>  	modified_fmt = (in_porcelain ? "M\t%s\n" : "%s: needs update\n");
>  	deleted_fmt = (in_porcelain ? "D\t%s\n" : "%s: needs update\n");
> @@ -1442,6 +1443,7 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>  
>  		replace_index_entry(istate, i, new);
>  	}
> +	trace_performance_since(start, "refresh index");
>  	return has_errors;
>  }
>  
> @@ -1877,12 +1879,15 @@ int read_index_from(struct index_state *istate, const char *path)
>  	int ret;
>  	char *base_sha1_hex;
>  	const char *base_path;
> +	uint64_t start;
>  
>  	/* istate->initialized covers both .git/index and .git/sharedindex.xxx */
>  	if (istate->initialized)
>  		return istate->cache_nr;
>  
> +	start = getnanotime();
>  	ret = do_read_index(istate, path, 0);
> +	trace_performance_since(start, "read cache %s", path);
>  
>  	split_index = istate->split_index;
>  	if (!split_index || is_null_sha1(split_index->base_sha1)) {
> @@ -1897,6 +1902,7 @@ int read_index_from(struct index_state *istate, const char *path)
>  
>  	base_sha1_hex = sha1_to_hex(split_index->base_sha1);
>  	base_path = git_path("sharedindex.%s", base_sha1_hex);
> +	start = getnanotime();
>  	ret = do_read_index(split_index->base, base_path, 1);
>  	if (hashcmp(split_index->base_sha1, split_index->base->sha1))
>  		die("broken index, expect %s in %s, got %s",
> @@ -1906,6 +1912,9 @@ int read_index_from(struct index_state *istate, const char *path)
>  	freshen_shared_index(base_sha1_hex, 0);
>  	merge_base_index(istate);
>  	post_read_index_from(istate);
> +	trace_performance_since(start, "read cache %s",
> +				git_path("sharedindex.%s",
> +					 sha1_to_hex(split_index->base_sha1)));

Would it be worth doing this on top of tg/split-index-fixes?  OTOH
this will only give a wrong output when tracing performance is on, and
it should be easy enough to figure out where the sharedindex actually
is.  So it might be better to keep this separate, and then just add a
patch on top for fixing the path later, which might be less work for
Junio.

So dunno what the best way is, just wanted to mention it.

>  	return ret;
>  }
>  
> @@ -2244,6 +2253,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
>  	struct ondisk_cache_entry_extended ondisk;
>  	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
>  	int drop_cache_tree = 0;
> +	uint64_t start = getnanotime();
>  
>  	for (i = removed = extended = 0; i < entries; i++) {
>  		if (cache[i]->ce_flags & CE_REMOVE)
> @@ -2374,6 +2384,7 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
>  		return -1;
>  	istate->timestamp.sec = (unsigned int)st.st_mtime;
>  	istate->timestamp.nsec = ST_MTIME_NSEC(st);
> +	trace_performance_since(start, "write index, changed mask = %x", istate->cache_changed);
>  	return 0;
>  }
>  
> -- 
> 2.16.1.205.g271f633410
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Some rough edges of core.fsmonitor
  2018-01-27 11:43   ` Some rough edges of core.fsmonitor Ævar Arnfjörð Bjarmason
@ 2018-01-27 12:39     ` Duy Nguyen
  2018-01-27 13:09       ` Duy Nguyen
  2018-01-29  9:40     ` Duy Nguyen
  1 sibling, 1 reply; 28+ messages in thread
From: Duy Nguyen @ 2018-01-27 12:39 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Ben Peart, Alex Vandiver, Christian Couder, David Turner

On Sat, Jan 27, 2018 at 6:43 PM, Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> a) no fsmonitor
>
>     $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status
>     12:32:44.947651 read-cache.c:1890       performance: 0.053153609 s: read cache .git/index
>     12:32:44.967943 preload-index.c:112     performance: 0.020161093 s: preload index
>     12:32:44.974217 read-cache.c:1446       performance: 0.006230611 s: refresh index
>
> ...
>
> b) with fsmonitor
>
>     $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status
>     12:34:23.833625 read-cache.c:1890       performance: 0.049485685 s: read cache .git/index
>     12:34:23.838622 preload-index.c:112     performance: 0.001221197 s: preload index
>     12:34:23.858723 fsmonitor.c:170         performance: 0.020059647 s: fsmonitor process '.git/hooks/fsmonitor-watchman'
>     12:34:23.871532 read-cache.c:1446       performance: 0.032870818 s: refresh index

Hmm.. why does refresh take longer with fsmonitor/watchman? With the
help from watchman, we know what files are modified. We don't need
manual stat()'ing and this line should be lower than the "no
fsmonitor" case, which is 0.006230611s.

>     12:34:23.876427 diff-lib.c:250          performance: 0.004731427 s: diff-files
>     12:34:23.880669 diff-lib.c:527          performance: 0.003944422 s: diff-index
>     12:34:23.899225 dir.c:2290              performance: 0.018509066 s: read directory
>     12:34:23.901914 trace.c:417             performance: 0.118250995 s: git command: '/home/aearnfjord/g/git/git-status'

I don't see any "write index" line here, which is interesting since
your case c) is about "don't write index".

> c) with fsmonitor + don't write index
-- 
Duy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Some rough edges of core.fsmonitor
  2018-01-27 12:39     ` Duy Nguyen
@ 2018-01-27 13:09       ` Duy Nguyen
  2018-01-27 19:01         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 28+ messages in thread
From: Duy Nguyen @ 2018-01-27 13:09 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Ben Peart, Alex Vandiver, Christian Couder

On Sat, Jan 27, 2018 at 07:39:27PM +0700, Duy Nguyen wrote:
> On Sat, Jan 27, 2018 at 6:43 PM, Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
> > a) no fsmonitor
> >
> >     $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status
> >     12:32:44.947651 read-cache.c:1890       performance: 0.053153609 s: read cache .git/index
> >     12:32:44.967943 preload-index.c:112     performance: 0.020161093 s: preload index
> >     12:32:44.974217 read-cache.c:1446       performance: 0.006230611 s: refresh index
> >
> > ...
> >
> > b) with fsmonitor
> >
> >     $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status
> >     12:34:23.833625 read-cache.c:1890       performance: 0.049485685 s: read cache .git/index
> >     12:34:23.838622 preload-index.c:112     performance: 0.001221197 s: preload index
> >     12:34:23.858723 fsmonitor.c:170         performance: 0.020059647 s: fsmonitor process '.git/hooks/fsmonitor-watchman'
> >     12:34:23.871532 read-cache.c:1446       performance: 0.032870818 s: refresh index
> 
> Hmm.. why does refresh take longer with fsmonitor/watchman? With the
> help from watchman, we know what files are modified. We don't need
> manual stat()'ing and this line should be lower than the "no
> fsmonitor" case, which is 0.006230611s.

Ahh.. my patch probably does not see that fsmonitor could be activated
lazily inside refresh_index() call. The patch below should fix it.

But between your normal refresh time (0.020 preload + 0.006 actual
refresh) and fsmonitor taking 0.020 just to talk to watchman, this
repo seems "too small" for fsmonitor/watchman to shine.

I'm still a bit curious that refresh index time, after excluding 0.020
for fsmonitor, is stil 0.012s. What does it do? It should really be
doing nothing. Either way, read index time seems to be the elephant in
the room now.

-- 8< --
diff --git a/read-cache.c b/read-cache.c
index eac74bc9f1..d60e0a8480 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1367,12 +1367,21 @@ int refresh_index(struct index_state *istate, unsigned int flags,
 	unsigned int options = (CE_MATCH_REFRESH |
 				(really ? CE_MATCH_IGNORE_VALID : 0) |
 				(not_new ? CE_MATCH_IGNORE_MISSING : 0));
+	int ignore_fsmonitor = options & CE_MATCH_IGNORE_FSMONITOR;
 	const char *modified_fmt;
 	const char *deleted_fmt;
 	const char *typechange_fmt;
 	const char *added_fmt;
 	const char *unmerged_fmt;
-	uint64_t start = getnanotime();
+	uint64_t start;
+
+	/*
+	 * If fsmonitor is used, force its communication early to
+	 * accurately measure how long this function takes without it.
+	 */
+	if (!ignore_fsmonitor)
+		refresh_fsmonitor(istate);
+	start = getnanotime();
 
 	modified_fmt = (in_porcelain ? "M\t%s\n" : "%s: needs update\n");
 	deleted_fmt = (in_porcelain ? "D\t%s\n" : "%s: needs update\n");
-- 8< --

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Some rough edges of core.fsmonitor
  2018-01-27 13:09       ` Duy Nguyen
@ 2018-01-27 19:01         ` Ævar Arnfjörð Bjarmason
  2018-01-30 22:41           ` Ben Peart
  0 siblings, 1 reply; 28+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-01-27 19:01 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: git, Ben Peart, Alex Vandiver, Christian Couder


On Sat, Jan 27 2018, Duy Nguyen jotted:

> On Sat, Jan 27, 2018 at 07:39:27PM +0700, Duy Nguyen wrote:
>> On Sat, Jan 27, 2018 at 6:43 PM, Ævar Arnfjörð Bjarmason
>> <avarab@gmail.com> wrote:
>> > a) no fsmonitor
>> >
>> >     $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status
>> >     12:32:44.947651 read-cache.c:1890       performance: 0.053153609 s: read cache .git/index
>> >     12:32:44.967943 preload-index.c:112     performance: 0.020161093 s: preload index
>> >     12:32:44.974217 read-cache.c:1446       performance: 0.006230611 s: refresh index
>> >
>> > ...
>> >
>> > b) with fsmonitor
>> >
>> >     $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status
>> >     12:34:23.833625 read-cache.c:1890       performance: 0.049485685 s: read cache .git/index
>> >     12:34:23.838622 preload-index.c:112     performance: 0.001221197 s: preload index
>> >     12:34:23.858723 fsmonitor.c:170         performance: 0.020059647 s: fsmonitor process '.git/hooks/fsmonitor-watchman'
>> >     12:34:23.871532 read-cache.c:1446       performance: 0.032870818 s: refresh index
>>
>> Hmm.. why does refresh take longer with fsmonitor/watchman? With the
>> help from watchman, we know what files are modified. We don't need
>> manual stat()'ing and this line should be lower than the "no
>> fsmonitor" case, which is 0.006230611s.
>
> Ahh.. my patch probably does not see that fsmonitor could be activated
> lazily inside refresh_index() call. The patch below should fix it.

Will have to get those numbers to you later, or alternatively clone
https://github.com/avar/2015-04-03-1M-git (or some other test repo) and
test it yourself, sorry. Don't have time to follow-up much this weekend.

> But between your normal refresh time (0.020 preload + 0.006 actual
> refresh) and fsmonitor taking 0.020 just to talk to watchman, this
> repo seems "too small" for fsmonitor/watchman to shine.

Surely that's an implementation limitation and not something inherent,
given that watchman itself returns in 5ms?

I.e. status could work like this, no?:

 1. At start, record the timestamp & find out canonical state via some
    expansive method.
 2. Print out xyz changed, abc added etc.
 3. Record *just* what status would report about xyz, abc etc.
 4. On subsequent git status, just amend that information, e.g. if
    watchman says nothing changed $(cat .git/last-status-output).

We shouldn't need to be reading the entire index in the common case
where just a few things change.

There's also a lot of things that use status to just check "are we
clean?", those would only need to record the last known timestamp when
the tree was clean, and then ask watchman if there were any changes, if
not we're done.

> I'm still a bit curious that refresh index time, after excluding 0.020
> for fsmonitor, is stil 0.012s. What does it do? It should really be
> doing nothing. Either way, read index time seems to be the elephant in
> the room now.
>
> -- 8< --
> diff --git a/read-cache.c b/read-cache.c
> index eac74bc9f1..d60e0a8480 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1367,12 +1367,21 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>  	unsigned int options = (CE_MATCH_REFRESH |
>  				(really ? CE_MATCH_IGNORE_VALID : 0) |
>  				(not_new ? CE_MATCH_IGNORE_MISSING : 0));
> +	int ignore_fsmonitor = options & CE_MATCH_IGNORE_FSMONITOR;
>  	const char *modified_fmt;
>  	const char *deleted_fmt;
>  	const char *typechange_fmt;
>  	const char *added_fmt;
>  	const char *unmerged_fmt;
> -	uint64_t start = getnanotime();
> +	uint64_t start;
> +
> +	/*
> +	 * If fsmonitor is used, force its communication early to
> +	 * accurately measure how long this function takes without it.
> +	 */
> +	if (!ignore_fsmonitor)
> +		refresh_fsmonitor(istate);
> +	start = getnanotime();
>
>  	modified_fmt = (in_porcelain ? "M\t%s\n" : "%s: needs update\n");
>  	deleted_fmt = (in_porcelain ? "D\t%s\n" : "%s: needs update\n");
> -- 8< --

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Some rough edges of core.fsmonitor
  2018-01-27  0:28 Some rough edges of core.fsmonitor Ævar Arnfjörð Bjarmason
  2018-01-27  1:36 ` Duy Nguyen
@ 2018-01-28 20:44 ` Johannes Schindelin
  2018-01-28 22:28   ` Ævar Arnfjörð Bjarmason
  2018-01-30 22:57 ` Some rough edges of core.fsmonitor Ben Peart
  2 siblings, 1 reply; 28+ messages in thread
From: Johannes Schindelin @ 2018-01-28 20:44 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Ben Peart, Alex Vandiver, Christian Couder, David Turner

[-- Attachment #1: Type: text/plain, Size: 1674 bytes --]

Hi,

On Sat, 27 Jan 2018, Ævar Arnfjörð Bjarmason wrote:

> I just got around to testing this since it landed, for context some
> previous poking of mine in [1].
> 
> Issues / stuff I've noticed:
> 
> 1) We end up invalidating the untracked cache because stuff in .git/
> changed. For example:
> 
>     01:09:24.975524 fsmonitor.c:173         fsmonitor process '.git/hooks/fsmonitor-watchman' returned success
>     01:09:24.975548 fsmonitor.c:138         fsmonitor_refresh_callback '.git'
>     01:09:24.975556 fsmonitor.c:138         fsmonitor_refresh_callback '.git/config'
>     01:09:24.975568 fsmonitor.c:138         fsmonitor_refresh_callback '.git/index'
>     01:09:25.122726 fsmonitor.c:91          write fsmonitor extension successful
> 
> Am I missing something or should we do something like:
> 
>     diff --git a/fsmonitor.c b/fsmonitor.c
>     index 0af7c4edba..5067b89bda 100644
>     --- a/fsmonitor.c
>     +++ b/fsmonitor.c
>     @@ -118,7 +118,12 @@ static int query_fsmonitor(int version, uint64_t last_update, struct strbuf *que
> 
>      static void fsmonitor_refresh_callback(struct index_state *istate, const char *name)
>      {
>     -       int pos = index_name_pos(istate, name, strlen(name));
>     +       int pos;
>     +
>     +       if (!strcmp(name, ".git") || starts_with(name, ".git/"))
>     +               return;
>     +
>     +       pos = index_name_pos(istate, name, strlen(name));

I would much rather have the fsmonitor hook already exclude those.

If you *must* add these comparisons, you have to use fspathcmp() and
fspathncmp() instead (because case-insensitivity).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Some rough edges of core.fsmonitor
  2018-01-28 20:44 ` Johannes Schindelin
@ 2018-01-28 22:28   ` Ævar Arnfjörð Bjarmason
  2018-01-30  1:21     ` Ben Peart
  0 siblings, 1 reply; 28+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-01-28 22:28 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Ben Peart, Alex Vandiver, Christian Couder, David Turner

On Sun, Jan 28, 2018 at 9:44 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> On Sat, 27 Jan 2018, Ævar Arnfjörð Bjarmason wrote:
>
>> I just got around to testing this since it landed, for context some
>> previous poking of mine in [1].
>>
>> Issues / stuff I've noticed:
>>
>> 1) We end up invalidating the untracked cache because stuff in .git/
>> changed. For example:
>>
>>     01:09:24.975524 fsmonitor.c:173         fsmonitor process '.git/hooks/fsmonitor-watchman' returned success
>>     01:09:24.975548 fsmonitor.c:138         fsmonitor_refresh_callback '.git'
>>     01:09:24.975556 fsmonitor.c:138         fsmonitor_refresh_callback '.git/config'
>>     01:09:24.975568 fsmonitor.c:138         fsmonitor_refresh_callback '.git/index'
>>     01:09:25.122726 fsmonitor.c:91          write fsmonitor extension successful
>>
>> Am I missing something or should we do something like:
>>
>>     diff --git a/fsmonitor.c b/fsmonitor.c
>>     index 0af7c4edba..5067b89bda 100644
>>     --- a/fsmonitor.c
>>     +++ b/fsmonitor.c
>>     @@ -118,7 +118,12 @@ static int query_fsmonitor(int version, uint64_t last_update, struct strbuf *que
>>
>>      static void fsmonitor_refresh_callback(struct index_state *istate, const char *name)
>>      {
>>     -       int pos = index_name_pos(istate, name, strlen(name));
>>     +       int pos;
>>     +
>>     +       if (!strcmp(name, ".git") || starts_with(name, ".git/"))
>>     +               return;
>>     +
>>     +       pos = index_name_pos(istate, name, strlen(name));
>
> I would much rather have the fsmonitor hook already exclude those.

As documented the fsmonitor-watchman hook we ship doesn't work as
described in githooks(5), unless "files in the working directory" is
taken to include .git/, but I haven't seen that ever used.

On the other hand relying on arbitrary user-provided hooks to do the
right thing at the cost of silent performance degradation is bad. If
we're going to expect the hook to remove these we should probably
warn/die here if it does send us .git/* files.

> If you *must* add these comparisons, you have to use fspathcmp() and
> fspathncmp() instead (because case-insensitivity).

Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Some rough edges of core.fsmonitor
  2018-01-27 11:43   ` Some rough edges of core.fsmonitor Ævar Arnfjörð Bjarmason
  2018-01-27 12:39     ` Duy Nguyen
@ 2018-01-29  9:40     ` Duy Nguyen
  2018-01-29 23:16       ` Ben Peart
  1 sibling, 1 reply; 28+ messages in thread
From: Duy Nguyen @ 2018-01-29  9:40 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Ben Peart, Alex Vandiver, Christian Couder

On Sat, Jan 27, 2018 at 12:43:41PM +0100, Ævar Arnfjörð Bjarmason wrote:
> b) with fsmonitor
> 
>     $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status
>     12:34:23.833625 read-cache.c:1890       performance: 0.049485685 s: read cache .git/index

This is sort of off topic but may be interesting for big repo guys. It
looks like read cache's time is partially dominated by malloc().

This is the performance breakdown of do_read_index()

$ GIT_TRACE_PERFORMANCE=2 ~/w/git/t/helper/test-read-cache
0.000078489 s: open/mmap/close
0.038915239 s: main entries
0.018983150 s: ext TREE
0.012667080 s: ext UNTR
0.000005372 s: ext FSMN
0.001473470 s: munmap
0.072386911 s: read cache .git/index

Reading main index entries takes like half of the time (0.038 vs
0.072). With the below patch to take out hundred thousands of malloc()
we have this, loading main entries now only takes 0.012s:

$ GIT_TRACE_PERFORMANCE=2 ~/w/git/t/helper/test-read-cache
0.000046587 s: open/mmap/close
0.012077300 s: main entries
0.020477683 s: ext TREE
0.010259998 s: ext UNTR
0.000010250 s: ext FSMN
0.000753854 s: munmap
0.043906473 s: read cache .git/index

We used to do less malloc until debed2a629 (read-cache.c: allocate
index entries individually - 2011-10-24) but I don't think we can
simply revert that (not worth the extra complexity of the old way).

Now "TREE" and "UNTR" extensions become a bigger problem.

-- 8< --
diff --git a/read-cache.c b/read-cache.c
index d60e0a8480..88f4213c99 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1622,7 +1622,12 @@ static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *on
 						   const char *name,
 						   size_t len)
 {
+#if 0
 	struct cache_entry *ce = xmalloc(cache_entry_size(len));
+#else
+	static char buf[1024];
+	struct cache_entry *ce = (struct cache_entry *)buf;
+#endif
 
 	ce->ce_stat_data.sd_ctime.sec = get_be32(&ondisk->ctime.sec);
 	ce->ce_stat_data.sd_mtime.sec = get_be32(&ondisk->mtime.sec);
diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c
index 48255eef31..e1d21d17a3 100644
--- a/t/helper/test-read-cache.c
+++ b/t/helper/test-read-cache.c
@@ -8,7 +8,9 @@ int cmd_main(int argc, const char **argv)
 	setup_git_directory();
 	for (i = 0; i < cnt; i++) {
 		read_cache();
+#if 0
 		discard_cache();
+#endif
 	}
 	return 0;
 }
-- 8< --

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Some rough edges of core.fsmonitor
  2018-01-29  9:40     ` Duy Nguyen
@ 2018-01-29 23:16       ` Ben Peart
  2018-02-01 10:40         ` Duy Nguyen
  0 siblings, 1 reply; 28+ messages in thread
From: Ben Peart @ 2018-01-29 23:16 UTC (permalink / raw)
  To: Duy Nguyen, Ævar Arnfjörð Bjarmason
  Cc: git, Ben Peart, Alex Vandiver, Christian Couder, jamill



On 1/29/2018 4:40 AM, Duy Nguyen wrote:
> On Sat, Jan 27, 2018 at 12:43:41PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> b) with fsmonitor
>>
>>      $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status
>>      12:34:23.833625 read-cache.c:1890       performance: 0.049485685 s: read cache .git/index
> 
> This is sort of off topic but may be interesting for big repo guys. It
> looks like read cache's time is partially dominated by malloc().
> 

That is correct.  We have tried a few different ways to address this. 
First was my patch series [1] that would parallelize all of the read 
cache code.

We quickly found that malloc() was the biggest culprit and by speeding 
that up, we got most of the wins.  At Peff's recommendation [2], we 
looked into using tcmalloc but found that 1) it has bugs on Windows and 
2) it isn't being actively maintained so it didn't seem those bugs would 
ever get fixed.

We are currently working on a patch that will use a refactored version 
of the mem_pool in fast-import.c to do block allocations of the cache 
entries which is giving us about a 22% improvement in "git status" 
times.  One challenge has been ensuring that cache entries are not 
passed from one index/mem_pool to another which could cause access after 
free bugs.

[1] 
https://public-inbox.org/git/20171109141737.47976-1-benpeart@microsoft.com/
[2] 
https://public-inbox.org/git/20171120153846.v5b7ho42yzrznqoh@sigill.intra.peff.net/


> This is the performance breakdown of do_read_index()
> 
> $ GIT_TRACE_PERFORMANCE=2 ~/w/git/t/helper/test-read-cache
> 0.000078489 s: open/mmap/close
> 0.038915239 s: main entries
> 0.018983150 s: ext TREE
> 0.012667080 s: ext UNTR
> 0.000005372 s: ext FSMN
> 0.001473470 s: munmap
> 0.072386911 s: read cache .git/index
> 
> Reading main index entries takes like half of the time (0.038 vs
> 0.072). With the below patch to take out hundred thousands of malloc()
> we have this, loading main entries now only takes 0.012s:
> 
> $ GIT_TRACE_PERFORMANCE=2 ~/w/git/t/helper/test-read-cache
> 0.000046587 s: open/mmap/close
> 0.012077300 s: main entries
> 0.020477683 s: ext TREE
> 0.010259998 s: ext UNTR
> 0.000010250 s: ext FSMN
> 0.000753854 s: munmap
> 0.043906473 s: read cache .git/index
> 
> We used to do less malloc until debed2a629 (read-cache.c: allocate
> index entries individually - 2011-10-24) but I don't think we can
> simply revert that (not worth the extra complexity of the old way).
> 
> Now "TREE" and "UNTR" extensions become a bigger problem.
> 
> -- 8< --
> diff --git a/read-cache.c b/read-cache.c
> index d60e0a8480..88f4213c99 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1622,7 +1622,12 @@ static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *on
>   						   const char *name,
>   						   size_t len)
>   {
> +#if 0
>   	struct cache_entry *ce = xmalloc(cache_entry_size(len));
> +#else
> +	static char buf[1024];
> +	struct cache_entry *ce = (struct cache_entry *)buf;
> +#endif
>   
>   	ce->ce_stat_data.sd_ctime.sec = get_be32(&ondisk->ctime.sec);
>   	ce->ce_stat_data.sd_mtime.sec = get_be32(&ondisk->mtime.sec);
> diff --git a/t/helper/test-read-cache.c b/t/helper/test-read-cache.c
> index 48255eef31..e1d21d17a3 100644
> --- a/t/helper/test-read-cache.c
> +++ b/t/helper/test-read-cache.c
> @@ -8,7 +8,9 @@ int cmd_main(int argc, const char **argv)
>   	setup_git_directory();
>   	for (i = 0; i < cnt; i++) {
>   		read_cache();
> +#if 0
>   		discard_cache();
> +#endif
>   	}
>   	return 0;
>   }
> -- 8< --
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Some rough edges of core.fsmonitor
  2018-01-28 22:28   ` Ævar Arnfjörð Bjarmason
@ 2018-01-30  1:21     ` Ben Peart
  2018-01-31 10:15       ` Duy Nguyen
  0 siblings, 1 reply; 28+ messages in thread
From: Ben Peart @ 2018-01-30  1:21 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Johannes Schindelin
  Cc: git, Ben Peart, Alex Vandiver, Christian Couder, David Turner



On 1/28/2018 5:28 PM, Ævar Arnfjörð Bjarmason wrote:
> On Sun, Jan 28, 2018 at 9:44 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>> Hi,
>>
>> On Sat, 27 Jan 2018, Ævar Arnfjörð Bjarmason wrote:
>>
>>> I just got around to testing this since it landed, for context some
>>> previous poking of mine in [1].
>>>
>>> Issues / stuff I've noticed:
>>>
>>> 1) We end up invalidating the untracked cache because stuff in .git/
>>> changed. For example:
>>>
>>>      01:09:24.975524 fsmonitor.c:173         fsmonitor process '.git/hooks/fsmonitor-watchman' returned success
>>>      01:09:24.975548 fsmonitor.c:138         fsmonitor_refresh_callback '.git'
>>>      01:09:24.975556 fsmonitor.c:138         fsmonitor_refresh_callback '.git/config'
>>>      01:09:24.975568 fsmonitor.c:138         fsmonitor_refresh_callback '.git/index'
>>>      01:09:25.122726 fsmonitor.c:91          write fsmonitor extension successful
>>>
>>> Am I missing something or should we do something like:
>>>
>>>      diff --git a/fsmonitor.c b/fsmonitor.c
>>>      index 0af7c4edba..5067b89bda 100644
>>>      --- a/fsmonitor.c
>>>      +++ b/fsmonitor.c
>>>      @@ -118,7 +118,12 @@ static int query_fsmonitor(int version, uint64_t last_update, struct strbuf *que
>>>
>>>       static void fsmonitor_refresh_callback(struct index_state *istate, const char *name)
>>>       {
>>>      -       int pos = index_name_pos(istate, name, strlen(name));
>>>      +       int pos;
>>>      +
>>>      +       if (!strcmp(name, ".git") || starts_with(name, ".git/"))
>>>      +               return;
>>>      +
>>>      +       pos = index_name_pos(istate, name, strlen(name));
>>
>> I would much rather have the fsmonitor hook already exclude those.
> 
> As documented the fsmonitor-watchman hook we ship doesn't work as
> described in githooks(5), unless "files in the working directory" is
> taken to include .git/, but I haven't seen that ever used.
> 
> On the other hand relying on arbitrary user-provided hooks to do the
> right thing at the cost of silent performance degradation is bad. If
> we're going to expect the hook to remove these we should probably
> warn/die here if it does send us .git/* files.
> 

I'm not sure how often something is modified in the git directory when 
nothing was modified in the working directory but this seems like a nice 
optimization.

We can't just blindly ignore changes under ".git" as the git directory 
may have been moved somewhere else.  Instead we'd need to use get_git_dir().

Rather than assuming the hook will optimize for this particular case, I 
think a better solution would be to update 
untracked_cache_invalidate_path() so that it doesn't invalidate the 
untracked cache and mark the index as dirty when it was asked to 
invalidate a path under GIT_DIR.  I can't think of a case when that 
would be the desired behavior.

Somewhat off topic but related to the overall performance discussion: 
I've also thought the untracked cache shouldn't mark the index as dirty 
except in the case where the extension is being added or removed.  We've 
observed that it causes unnecessary index writes that actually slows 
down overall performance.

Since it is a cache, it does not require the index to be written out for 
correctness, it can simply update the cache again the next time it is 
needed. This is typically faster than the cost of the index write so 
makes things faster overall.  I adopted this same model with the 
fsmonitor extension.

>> If you *must* add these comparisons, you have to use fspathcmp() and
>> fspathncmp() instead (because case-insensitivity).
> 
> Thanks.
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Some rough edges of core.fsmonitor
  2018-01-27 19:01         ` Ævar Arnfjörð Bjarmason
@ 2018-01-30 22:41           ` Ben Peart
  0 siblings, 0 replies; 28+ messages in thread
From: Ben Peart @ 2018-01-30 22:41 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Duy Nguyen
  Cc: git, Ben Peart, Alex Vandiver, Christian Couder



On 1/27/2018 2:01 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Sat, Jan 27 2018, Duy Nguyen jotted:
> 
>> On Sat, Jan 27, 2018 at 07:39:27PM +0700, Duy Nguyen wrote:
>>> On Sat, Jan 27, 2018 at 6:43 PM, Ævar Arnfjörð Bjarmason
>>> <avarab@gmail.com> wrote:
>>>> a) no fsmonitor
>>>>
>>>>      $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status
>>>>      12:32:44.947651 read-cache.c:1890       performance: 0.053153609 s: read cache .git/index
>>>>      12:32:44.967943 preload-index.c:112     performance: 0.020161093 s: preload index
>>>>      12:32:44.974217 read-cache.c:1446       performance: 0.006230611 s: refresh index
>>>>
>>>> ...
>>>>
>>>> b) with fsmonitor
>>>>
>>>>      $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status
>>>>      12:34:23.833625 read-cache.c:1890       performance: 0.049485685 s: read cache .git/index
>>>>      12:34:23.838622 preload-index.c:112     performance: 0.001221197 s: preload index
>>>>      12:34:23.858723 fsmonitor.c:170         performance: 0.020059647 s: fsmonitor process '.git/hooks/fsmonitor-watchman'
>>>>      12:34:23.871532 read-cache.c:1446       performance: 0.032870818 s: refresh index
>>>
>>> Hmm.. why does refresh take longer with fsmonitor/watchman? With the
>>> help from watchman, we know what files are modified. We don't need
>>> manual stat()'ing and this line should be lower than the "no
>>> fsmonitor" case, which is 0.006230611s.
>>
>> Ahh.. my patch probably does not see that fsmonitor could be activated
>> lazily inside refresh_index() call. The patch below should fix it.
> 
> Will have to get those numbers to you later, or alternatively clone
> https://github.com/avar/2015-04-03-1M-git (or some other test repo) and
> test it yourself, sorry. Don't have time to follow-up much this weekend.
> 
>> But between your normal refresh time (0.020 preload + 0.006 actual
>> refresh) and fsmonitor taking 0.020 just to talk to watchman, this
>> repo seems "too small" for fsmonitor/watchman to shine.
> 
> Surely that's an implementation limitation and not something inherent,
> given that watchman itself returns in 5ms?
> 
> I.e. status could work like this, no?:
> 
>   1. At start, record the timestamp & find out canonical state via some
>      expansive method.
>   2. Print out xyz changed, abc added etc.
>   3. Record *just* what status would report about xyz, abc etc.
>   4. On subsequent git status, just amend that information, e.g. if
>      watchman says nothing changed $(cat .git/last-status-output).
> 
> We shouldn't need to be reading the entire index in the common case
> where just a few things change.
> 

I agree that reading the entire index in the common case is rather 
expensive. It is, however, the model we have today and all the code in 
git assumes all cache entries are in memory.

We are interested in pursuing a patch series that would enable higher 
performance especially with large and/or sparse repos by making the 
index sparse, hierarchical, and incrementally readable/writable.  As you 
might expect, that is a lot of work and is far beyond what we can 
address in this patch series.

> There's also a lot of things that use status to just check "are we
> clean?", those would only need to record the last known timestamp when
> the tree was clean, and then ask watchman if there were any changes, if
> not we're done.
> 
>> I'm still a bit curious that refresh index time, after excluding 0.020
>> for fsmonitor, is stil 0.012s. What does it do? It should really be
>> doing nothing. Either way, read index time seems to be the elephant in
>> the room now.
>>
>> -- 8< --
>> diff --git a/read-cache.c b/read-cache.c
>> index eac74bc9f1..d60e0a8480 100644
>> --- a/read-cache.c
>> +++ b/read-cache.c
>> @@ -1367,12 +1367,21 @@ int refresh_index(struct index_state *istate, unsigned int flags,
>>   	unsigned int options = (CE_MATCH_REFRESH |
>>   				(really ? CE_MATCH_IGNORE_VALID : 0) |
>>   				(not_new ? CE_MATCH_IGNORE_MISSING : 0));
>> +	int ignore_fsmonitor = options & CE_MATCH_IGNORE_FSMONITOR;
>>   	const char *modified_fmt;
>>   	const char *deleted_fmt;
>>   	const char *typechange_fmt;
>>   	const char *added_fmt;
>>   	const char *unmerged_fmt;
>> -	uint64_t start = getnanotime();
>> +	uint64_t start;
>> +
>> +	/*
>> +	 * If fsmonitor is used, force its communication early to
>> +	 * accurately measure how long this function takes without it.
>> +	 */
>> +	if (!ignore_fsmonitor)
>> +		refresh_fsmonitor(istate);
>> +	start = getnanotime();
>>
>>   	modified_fmt = (in_porcelain ? "M\t%s\n" : "%s: needs update\n");
>>   	deleted_fmt = (in_porcelain ? "D\t%s\n" : "%s: needs update\n");
>> -- 8< --

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Some rough edges of core.fsmonitor
  2018-01-27  0:28 Some rough edges of core.fsmonitor Ævar Arnfjörð Bjarmason
  2018-01-27  1:36 ` Duy Nguyen
  2018-01-28 20:44 ` Johannes Schindelin
@ 2018-01-30 22:57 ` Ben Peart
  2018-01-30 23:16   ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 28+ messages in thread
From: Ben Peart @ 2018-01-30 22:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: Ben Peart, Alex Vandiver, Christian Couder, David Turner

While some of these issues have been discussed in other threads, I 
thought I'd summarize my thoughts here.

On 1/26/2018 7:28 PM, Ævar Arnfjörð Bjarmason wrote:
> I just got around to testing this since it landed, for context some
> previous poking of mine in [1].
> 
> Issues / stuff I've noticed:
> 
> 1) We end up invalidating the untracked cache because stuff in .git/
> changed. For example:
> 
>      01:09:24.975524 fsmonitor.c:173         fsmonitor process '.git/hooks/fsmonitor-watchman' returned success
>      01:09:24.975548 fsmonitor.c:138         fsmonitor_refresh_callback '.git'
>      01:09:24.975556 fsmonitor.c:138         fsmonitor_refresh_callback '.git/config'
>      01:09:24.975568 fsmonitor.c:138         fsmonitor_refresh_callback '.git/index'
>      01:09:25.122726 fsmonitor.c:91          write fsmonitor extension successful
> 
> Am I missing something or should we do something like:
> 
>      diff --git a/fsmonitor.c b/fsmonitor.c
>      index 0af7c4edba..5067b89bda 100644
>      --- a/fsmonitor.c
>      +++ b/fsmonitor.c
>      @@ -118,7 +118,12 @@ static int query_fsmonitor(int version, uint64_t last_update, struct strbuf *que
> 
>       static void fsmonitor_refresh_callback(struct index_state *istate, const char *name)
>       {
>      -       int pos = index_name_pos(istate, name, strlen(name));
>      +       int pos;
>      +
>      +       if (!strcmp(name, ".git") || starts_with(name, ".git/"))
>      +               return;
>      +
>      +       pos = index_name_pos(istate, name, strlen(name));
> 
>              if (pos >= 0) {
>                      struct cache_entry *ce = istate->cache[pos];
> 
> With that patch applied status on a large repo[2] goes from a consistent
> ~180-200ms to ~140-150ms, since we're not invalidating some of the UC
> structure
> 

I favor making this optimization by updating 
untracked_cache_invalidate_path() so that it ignores paths under 
get_git_dir() and doesn't invalidate the untracked cache or flag the 
index as dirty.

> 2) We re-write out the index even though we know nothing changed
> 
> When you first run with core.fsmonitor it needs to
> mark_fsmonitor_clean() for every path, but is there a reason for why we
> wouldn't supply the equivalent of GIT_OPTIONAL_LOCKS=0 if all paths are
> marked and we know from the hook that nothing changed? Why write out the
> index again?
> 

Writing out the index when core.fsmonitor is first turned on is 
necessary to get the index extension added with the current state of the 
dirty flags.  Given it is a one time cost, I don't think we have 
anything worth trying to optimize here.

> 3) A lot of time spend reading the index (or something..)
> 
> While the hook itself takes ~20ms (and watchman itself 1/4 of that)
> status as a whole takes much longer. gprof reveals:
> 
>      Each sample counts as 0.01 seconds.
>        %   cumulative   self              self     total
>       time   seconds   seconds    calls  ms/call  ms/call  name
>       15.38      0.02     0.02   221690     0.00     0.00  memihash
>       15.38      0.04     0.02   221689     0.00     0.00  create_from_disk
>        7.69      0.05     0.01  2216897     0.00     0.00  git_bswap32
>        7.69      0.06     0.01   222661     0.00     0.00  ce_path_match
>        7.69      0.07     0.01   221769     0.00     0.00  hashmap_add
>        7.69      0.08     0.01    39941     0.00     0.00  prep_exclude
>        7.69      0.09     0.01    39940     0.00     0.00  strbuf_addch
>        7.69      0.10     0.01        1    10.00    10.00  read_one
>        7.69      0.11     0.01        1    10.00    10.00  refresh_index
>        7.69      0.12     0.01        1    10.00    10.00  tweak_fsmonitor
>        7.69      0.13     0.01                             preload_thread
> 
> The index is 24M in this case, I guess it's unpacking it, but I wonder
> if this couldn't be much faster if we saved away the result of the last
> "status" in something that's quick to access, and then if nothing
> changed we just report that, and no need to re-write the index (or just
> write the "it was clean at this time" part).

Yes, reading the index is slow.  We've made some improvements (not 
computing the SHA, not validating the sort order, etc) and have one more 
in progress that will reduce the malloc() cost.  I haven't found any 
other easy optimizations but it would be great if you could find more! 
To make significant improvements, I'm afraid it will take more 
substantial changes to the in memory and on disk formats and updates to 
the code to take advantage of those changes.

> 
> 4) core.fsmonitor=false behaves unexpectedly
> 
> The code that reads this variable just treats it as a string, so we do a
> bunch of work for nothing (and nothing warns) if this is set and 'false'
> is executed. Any reason we couldn't do our standard boolean parsing
> here? You couldn't call your hook 0/1/true/false, but that doesn't seem
> like a big loss.
> 
> 1. https://public-inbox.org/git/CACBZZX5a6Op7dH_g9WOFBnejh2zgNK4b34ygxA8daNDqvitFVA@mail.gmail.com/
> 2. https://github.com/avar/2015-04-03-1M-git
> 

I'm torn on this one.  The core.fsmontior setting isn't a boolean value, 
its a string that is the command to run when we need file system 
changes.  It would be pretty simple to add a call to 
git_parse_maybe_bool_text() to treat "false," "no," or "off" the same as 
an empty string but that makes it look even more like a boolean when it 
isn't.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Some rough edges of core.fsmonitor
  2018-01-30 22:57 ` Some rough edges of core.fsmonitor Ben Peart
@ 2018-01-30 23:16   ` Ævar Arnfjörð Bjarmason
  2018-01-31 16:12     ` Ben Peart
  0 siblings, 1 reply; 28+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-01-30 23:16 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, Ben Peart, Alex Vandiver, Christian Couder, David Turner


On Tue, Jan 30 2018, Ben Peart jotted:

> While some of these issues have been discussed in other threads, I
> thought I'd summarize my thoughts here.

Thanks for this & your other reply. I'm going to get to testing some of
Duy's patches soon, and if you have some other relevant WIP I'd be happy
to try them, but meanwhile replying to a few of these:

> On 1/26/2018 7:28 PM, Ævar Arnfjörð Bjarmason wrote:
>> I just got around to testing this since it landed, for context some
>> previous poking of mine in [1].
>>
>> Issues / stuff I've noticed:
>>
>> 1) We end up invalidating the untracked cache because stuff in .git/
>> changed. For example:
>>
>>      01:09:24.975524 fsmonitor.c:173         fsmonitor process '.git/hooks/fsmonitor-watchman' returned success
>>      01:09:24.975548 fsmonitor.c:138         fsmonitor_refresh_callback '.git'
>>      01:09:24.975556 fsmonitor.c:138         fsmonitor_refresh_callback '.git/config'
>>      01:09:24.975568 fsmonitor.c:138         fsmonitor_refresh_callback '.git/index'
>>      01:09:25.122726 fsmonitor.c:91          write fsmonitor extension successful
>>
>> Am I missing something or should we do something like:
>>
>>      diff --git a/fsmonitor.c b/fsmonitor.c
>>      index 0af7c4edba..5067b89bda 100644
>>      --- a/fsmonitor.c
>>      +++ b/fsmonitor.c
>>      @@ -118,7 +118,12 @@ static int query_fsmonitor(int version, uint64_t last_update, struct strbuf *que
>>
>>       static void fsmonitor_refresh_callback(struct index_state *istate, const char *name)
>>       {
>>      -       int pos = index_name_pos(istate, name, strlen(name));
>>      +       int pos;
>>      +
>>      +       if (!strcmp(name, ".git") || starts_with(name, ".git/"))
>>      +               return;
>>      +
>>      +       pos = index_name_pos(istate, name, strlen(name));
>>
>>              if (pos >= 0) {
>>                      struct cache_entry *ce = istate->cache[pos];
>>
>> With that patch applied status on a large repo[2] goes from a consistent
>> ~180-200ms to ~140-150ms, since we're not invalidating some of the UC
>> structure
>>
>
> I favor making this optimization by updating
> untracked_cache_invalidate_path() so that it ignores paths under
> get_git_dir() and doesn't invalidate the untracked cache or flag the
> index as dirty.

*nod*

>> 2) We re-write out the index even though we know nothing changed
>>
>> When you first run with core.fsmonitor it needs to
>> mark_fsmonitor_clean() for every path, but is there a reason for why we
>> wouldn't supply the equivalent of GIT_OPTIONAL_LOCKS=0 if all paths are
>> marked and we know from the hook that nothing changed? Why write out the
>> index again?
>>
>
> Writing out the index when core.fsmonitor is first turned on is
> necessary to get the index extension added with the current state of
> the dirty flags.  Given it is a one time cost, I don't think we have
> anything worth trying to optimize here.

Indeed, that makes sense. What I was showing here is even after the
initial setup we continue to write it out when we know nothing changed.

We do that anyway without fsmonitor, but this seemed like a worthwhile
thing to optimize.

>> 3) A lot of time spend reading the index (or something..)
>>
>> While the hook itself takes ~20ms (and watchman itself 1/4 of that)
>> status as a whole takes much longer. gprof reveals:
>>
>>      Each sample counts as 0.01 seconds.
>>        %   cumulative   self              self     total
>>       time   seconds   seconds    calls  ms/call  ms/call  name
>>       15.38      0.02     0.02   221690     0.00     0.00  memihash
>>       15.38      0.04     0.02   221689     0.00     0.00  create_from_disk
>>        7.69      0.05     0.01  2216897     0.00     0.00  git_bswap32
>>        7.69      0.06     0.01   222661     0.00     0.00  ce_path_match
>>        7.69      0.07     0.01   221769     0.00     0.00  hashmap_add
>>        7.69      0.08     0.01    39941     0.00     0.00  prep_exclude
>>        7.69      0.09     0.01    39940     0.00     0.00  strbuf_addch
>>        7.69      0.10     0.01        1    10.00    10.00  read_one
>>        7.69      0.11     0.01        1    10.00    10.00  refresh_index
>>        7.69      0.12     0.01        1    10.00    10.00  tweak_fsmonitor
>>        7.69      0.13     0.01                             preload_thread
>>
>> The index is 24M in this case, I guess it's unpacking it, but I wonder
>> if this couldn't be much faster if we saved away the result of the last
>> "status" in something that's quick to access, and then if nothing
>> changed we just report that, and no need to re-write the index (or just
>> write the "it was clean at this time" part).
>
> Yes, reading the index is slow.  We've made some improvements (not
> computing the SHA, not validating the sort order, etc) and have one
> more in progress that will reduce the malloc() cost.  I haven't found
> any other easy optimizations but it would be great if you could find
> more! To make significant improvements, I'm afraid it will take more
> substantial changes to the in memory and on disk formats and updates
> to the code to take advantage of those changes.

What I was wondering (not very clearly) is whether an easier
optimization for now would be to speed up the case where nothing
changed, that would involve just reading some flag in the index (or
elsewhere) saying nothing changed last time, then the timestamp the
fsmonitor writes, and trusting the hook when it says nothing changed
since that timestamp.

>>
>> 4) core.fsmonitor=false behaves unexpectedly
>>
>> The code that reads this variable just treats it as a string, so we do a
>> bunch of work for nothing (and nothing warns) if this is set and 'false'
>> is executed. Any reason we couldn't do our standard boolean parsing
>> here? You couldn't call your hook 0/1/true/false, but that doesn't seem
>> like a big loss.
>>
>> 1. https://public-inbox.org/git/CACBZZX5a6Op7dH_g9WOFBnejh2zgNK4b34ygxA8daNDqvitFVA@mail.gmail.com/
>> 2. https://github.com/avar/2015-04-03-1M-git
>>
>
> I'm torn on this one.  The core.fsmontior setting isn't a boolean
> value, its a string that is the command to run when we need file
> system changes.  It would be pretty simple to add a call to
> git_parse_maybe_bool_text() to treat "false," "no," or "off" the same
> as an empty string but that makes it look even more like a boolean
> when it isn't.

Yes, that makes sense. I wonder though if we should warn if the hook is
set and returning non-zero, right now that failure case is silent, maybe
the hook would like to return an exit code for "don't ask me", but it
seems better to make that 125 (like git bisect skip) instead of !0,
since that makes the hook failing completely indistinguishable to the
user from the hook doing its job.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Some rough edges of core.fsmonitor
  2018-01-30  1:21     ` Ben Peart
@ 2018-01-31 10:15       ` Duy Nguyen
  2018-02-04  9:38         ` [PATCH] dir.c: ignore paths containing .git when invalidating untracked cache Nguyễn Thái Ngọc Duy
  0 siblings, 1 reply; 28+ messages in thread
From: Duy Nguyen @ 2018-01-31 10:15 UTC (permalink / raw)
  To: Ben Peart
  Cc: Ævar Arnfjörð Bjarmason, Johannes Schindelin, git,
	Ben Peart, Alex Vandiver, Christian Couder, David Turner

On Tue, Jan 30, 2018 at 8:21 AM, Ben Peart <peartben@gmail.com> wrote:
>
>
> On 1/28/2018 5:28 PM, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Sun, Jan 28, 2018 at 9:44 PM, Johannes Schindelin
>> <Johannes.Schindelin@gmx.de> wrote:
>>>
>>> Hi,
>>>
>>> On Sat, 27 Jan 2018, Ævar Arnfjörð Bjarmason wrote:
>>>
>>>> I just got around to testing this since it landed, for context some
>>>> previous poking of mine in [1].
>>>>
>>>> Issues / stuff I've noticed:
>>>>
>>>> 1) We end up invalidating the untracked cache because stuff in .git/
>>>> changed. For example:
>>>>
>>>>      01:09:24.975524 fsmonitor.c:173         fsmonitor process
>>>> '.git/hooks/fsmonitor-watchman' returned success
>>>>      01:09:24.975548 fsmonitor.c:138         fsmonitor_refresh_callback
>>>> '.git'
>>>>      01:09:24.975556 fsmonitor.c:138         fsmonitor_refresh_callback
>>>> '.git/config'
>>>>      01:09:24.975568 fsmonitor.c:138         fsmonitor_refresh_callback
>>>> '.git/index'
>>>>      01:09:25.122726 fsmonitor.c:91          write fsmonitor extension
>>>> successful
>>>>
>>>> Am I missing something or should we do something like:
>>>>
>>>>      diff --git a/fsmonitor.c b/fsmonitor.c
>>>>      index 0af7c4edba..5067b89bda 100644
>>>>      --- a/fsmonitor.c
>>>>      +++ b/fsmonitor.c
>>>>      @@ -118,7 +118,12 @@ static int query_fsmonitor(int version,
>>>> uint64_t last_update, struct strbuf *que
>>>>
>>>>       static void fsmonitor_refresh_callback(struct index_state *istate,
>>>> const char *name)
>>>>       {
>>>>      -       int pos = index_name_pos(istate, name, strlen(name));
>>>>      +       int pos;
>>>>      +
>>>>      +       if (!strcmp(name, ".git") || starts_with(name, ".git/"))
>>>>      +               return;
>>>>      +
>>>>      +       pos = index_name_pos(istate, name, strlen(name));
>>>
>>>
>>> I would much rather have the fsmonitor hook already exclude those.
>>
>>
>> As documented the fsmonitor-watchman hook we ship doesn't work as
>> described in githooks(5), unless "files in the working directory" is
>> taken to include .git/, but I haven't seen that ever used.
>>
>> On the other hand relying on arbitrary user-provided hooks to do the
>> right thing at the cost of silent performance degradation is bad. If
>> we're going to expect the hook to remove these we should probably
>> warn/die here if it does send us .git/* files.
>>
>
> I'm not sure how often something is modified in the git directory when
> nothing was modified in the working directory but this seems like a nice
> optimization.
>
> We can't just blindly ignore changes under ".git" as the git directory may
> have been moved somewhere else.  Instead we'd need to use get_git_dir().

In theory. But we do blindly ignore changes under ".git" in some
cases, see treat_path() in dir.c for example.

> Rather than assuming the hook will optimize for this particular case, I
> think a better solution would be to update untracked_cache_invalidate_path()
> so that it doesn't invalidate the untracked cache and mark the index as
> dirty when it was asked to invalidate a path under GIT_DIR.  I can't think
> of a case when that would be the desired behavior.

You see, my only problem with this is tying the check with $GIT_DIR,
which may involve normalizing the path and all that stuff because the
current code base is not prepared to deal with that. Simply ignoring
anything in ".git" may work too. Not pretty but it's more in line with
what we have. Though I'm still not sure how it interacts with ".git"
from submodules which is why I still have not sent a patch to update
untracked_cache_invalidate_path() because it does make sense to fix it
in there.

> Somewhat off topic but related to the overall performance discussion: I've
> also thought the untracked cache shouldn't mark the index as dirty except in
> the case where the extension is being added or removed.  We've observed that
> it causes unnecessary index writes that actually slows down overall
> performance.
>
> Since it is a cache, it does not require the index to be written out for
> correctness, it can simply update the cache again the next time it is
> needed. This is typically faster than the cost of the index write so makes
> things faster overall.  I adopted this same model with the fsmonitor
> extension.

If you turn on split index, the write cost should be much much less
(but I think read cost increases a bit due to merging the two indexes
in core; I noticed this but haven't really dug down). You basically
pay writing modified index entries and extensions.

But yeah not writing is possible. The index's dirty flag can show that
only untracked cache extension is dirty, then it's a judgement call
whether to write it down or drop it. You still need to occasionally
write it down though. Dirty directories will fall back to slow path.
If you don't write it down, the set of dirty paths keeps increasing
and will start to slow git-status down. I don't know at what point we
should write it down though if you choose not to go the split-index
route.
-- 
Duy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Some rough edges of core.fsmonitor
  2018-01-30 23:16   ` Ævar Arnfjörð Bjarmason
@ 2018-01-31 16:12     ` Ben Peart
  0 siblings, 0 replies; 28+ messages in thread
From: Ben Peart @ 2018-01-31 16:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Ben Peart, Alex Vandiver, Christian Couder, David Turner, pclouds



On 1/30/2018 6:16 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Jan 30 2018, Ben Peart jotted:
> 
>> While some of these issues have been discussed in other threads, I
>> thought I'd summarize my thoughts here.
> 
> Thanks for this & your other reply. I'm going to get to testing some of
> Duy's patches soon, and if you have some other relevant WIP I'd be happy
> to try them, but meanwhile replying to a few of these:
> 
>> On 1/26/2018 7:28 PM, Ævar Arnfjörð Bjarmason wrote:
>>> I just got around to testing this since it landed, for context some
>>> previous poking of mine in [1].
>>>
>>> Issues / stuff I've noticed:
>>>
>>> 1) We end up invalidating the untracked cache because stuff in .git/
>>> changed. For example:
>>>
>>>       01:09:24.975524 fsmonitor.c:173         fsmonitor process '.git/hooks/fsmonitor-watchman' returned success
>>>       01:09:24.975548 fsmonitor.c:138         fsmonitor_refresh_callback '.git'
>>>       01:09:24.975556 fsmonitor.c:138         fsmonitor_refresh_callback '.git/config'
>>>       01:09:24.975568 fsmonitor.c:138         fsmonitor_refresh_callback '.git/index'
>>>       01:09:25.122726 fsmonitor.c:91          write fsmonitor extension successful
>>>
>>> Am I missing something or should we do something like:
>>>
>>>       diff --git a/fsmonitor.c b/fsmonitor.c
>>>       index 0af7c4edba..5067b89bda 100644
>>>       --- a/fsmonitor.c
>>>       +++ b/fsmonitor.c
>>>       @@ -118,7 +118,12 @@ static int query_fsmonitor(int version, uint64_t last_update, struct strbuf *que
>>>
>>>        static void fsmonitor_refresh_callback(struct index_state *istate, const char *name)
>>>        {
>>>       -       int pos = index_name_pos(istate, name, strlen(name));
>>>       +       int pos;
>>>       +
>>>       +       if (!strcmp(name, ".git") || starts_with(name, ".git/"))
>>>       +               return;
>>>       +
>>>       +       pos = index_name_pos(istate, name, strlen(name));
>>>
>>>               if (pos >= 0) {
>>>                       struct cache_entry *ce = istate->cache[pos];
>>>
>>> With that patch applied status on a large repo[2] goes from a consistent
>>> ~180-200ms to ~140-150ms, since we're not invalidating some of the UC
>>> structure
>>>
>>
>> I favor making this optimization by updating
>> untracked_cache_invalidate_path() so that it ignores paths under
>> get_git_dir() and doesn't invalidate the untracked cache or flag the
>> index as dirty.
> 
> *nod*
> 
>>> 2) We re-write out the index even though we know nothing changed
>>>
>>> When you first run with core.fsmonitor it needs to
>>> mark_fsmonitor_clean() for every path, but is there a reason for why we
>>> wouldn't supply the equivalent of GIT_OPTIONAL_LOCKS=0 if all paths are
>>> marked and we know from the hook that nothing changed? Why write out the
>>> index again?
>>>
>>
>> Writing out the index when core.fsmonitor is first turned on is
>> necessary to get the index extension added with the current state of
>> the dirty flags.  Given it is a one time cost, I don't think we have
>> anything worth trying to optimize here.
> 
> Indeed, that makes sense. What I was showing here is even after the
> initial setup we continue to write it out when we know nothing changed.
> 
> We do that anyway without fsmonitor, but this seemed like a worthwhile
> thing to optimize.
> 

There is already logic to avoid writing out the index unless there is 
something that requires it.  In my testing, it was often the untracked 
cache marking the index as dirty that was causing the unexpected writes.

The patch to make the untracked cache only flag the index as dirty when 
the feature is being turned on or off is pretty simple (see below).  The 
challenge was that many of the untracked cache tests assume all changes 
are saved to disk after every command so they fail after making this 
change.  The patch below does a simple hack of checking for a test 
environment variable and only forcing the index writes if it is set.

Internally, we simply turned off the untracked cache as it's usefulness 
in combination with GVFS is limited and without the patch, it actually 
slowed down common operations more than it sped them up.

Typically, changes to the untracked cache don't accumulate long before 
the user does some operation that requires the index to be written out 
at which point the untracked cache is updated as well.


diff --git a/dir.c b/dir.c
index 5e93a1350b..af1d33aae1 100644
--- a/dir.c
+++ b/dir.c
@@ -2256,7 +2256,8 @@ int read_directory(struct dir_struct *dir, struct 
index_state *istate,
                                  dir->untracked->gitignore_invalidated,
                                  dir->untracked->dir_invalidated,
                                  dir->untracked->dir_opened);
-               if (dir->untracked == istate->untracked &&
+               if (getenv("GIT_TEST_UNTRACKED_CACHE") &&
+                       dir->untracked == istate->untracked &&
                     (dir->untracked->dir_opened ||
                      dir->untracked->gitignore_invalidated ||
                      dir->untracked->dir_invalidated))
diff --git a/t/t7063-status-untracked-cache.sh 
b/t/t7063-status-untracked-cache.sh
index ea9383e8cb..e5811b6ef2 100755
--- a/t/t7063-status-untracked-cache.sh
+++ b/t/t7063-status-untracked-cache.sh
@@ -14,6 +14,8 @@ test_description='test untracked cache'
  # See <20160803174522.5571-1-pclouds@gmail.com> if you want to know
  # more.

+export GIT_TEST_UNTRACKED_CACHE=true
+
  sync_mtime () {
         if test_have_prereq BUSYBOX
         then



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Some rough edges of core.fsmonitor
  2018-01-29 23:16       ` Ben Peart
@ 2018-02-01 10:40         ` Duy Nguyen
  0 siblings, 0 replies; 28+ messages in thread
From: Duy Nguyen @ 2018-02-01 10:40 UTC (permalink / raw)
  To: Ben Peart
  Cc: Ævar Arnfjörð Bjarmason, git, Ben Peart,
	Alex Vandiver, Christian Couder, jamill

On Tue, Jan 30, 2018 at 6:16 AM, Ben Peart <peartben@gmail.com> wrote:
>
>
> On 1/29/2018 4:40 AM, Duy Nguyen wrote:
>>
>> On Sat, Jan 27, 2018 at 12:43:41PM +0100, Ævar Arnfjörð Bjarmason wrote:
>>>
>>> b) with fsmonitor
>>>
>>>      $ time GIT_TRACE_PERFORMANCE=1 ~/g/git/git-status
>>>      12:34:23.833625 read-cache.c:1890       performance: 0.049485685 s:
>>> read cache .git/index
>>
>>
>> This is sort of off topic but may be interesting for big repo guys. It
>> looks like read cache's time is partially dominated by malloc().
>>
>
> That is correct.  We have tried a few different ways to address this. First
> was my patch series [1] that would parallelize all of the read cache code.
>
> We quickly found that malloc() was the biggest culprit and by speeding that
> up, we got most of the wins.  At Peff's recommendation [2], we looked into
> using tcmalloc but found that 1) it has bugs on Windows and 2) it isn't
> being actively maintained so it didn't seem those bugs would ever get fixed.
>
> We are currently working on a patch that will use a refactored version of
> the mem_pool in fast-import.c to do block allocations of the cache entries
> which is giving us about a 22% improvement in "git status" times.  One

My apologies if this has been discussed in the second half of 2017
which I have no idea what happened.

I just wonder if it's possible to design a "file format" that is
basically a memory dump of lots of struct cache_entry? This "file"
will stay in a shared memory somewhere and never get written down to
disk. Since it's very close to the data structure we have in core, the
most we need to do after mmap'ing it (and keeping it mmap'd until the
end) is adjust some pointers. These entries are of course read-only.
When you modify/create new entries, they are created new, using the
old malloc(). We just need to make sure not free the read-only
cache_entry entries and munmap() the whole thing when we
discard_index().

This opens up another option to deal with the large UNTR and TREE
extensions in a similar way. These will be your next headache after
you have reduced parse time for main entries.

> challenge has been ensuring that cache entries are not passed from one
> index/mem_pool to another which could cause access after free bugs.

We kind of have something close to that, but not entirely the same.
When split index is used, the same cache_entry can appear in two
index_state structs. Of course you can free only one of them (and you
can only do so when you know both index_state are gone). I see some
code cleanup opportunity :)

>
> [1]
> https://public-inbox.org/git/20171109141737.47976-1-benpeart@microsoft.com/
> [2]
> https://public-inbox.org/git/20171120153846.v5b7ho42yzrznqoh@sigill.intra.peff.net/
-- 
Duy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] dir.c: ignore paths containing .git when invalidating untracked cache
  2018-01-31 10:15       ` Duy Nguyen
@ 2018-02-04  9:38         ` Nguyễn Thái Ngọc Duy
  2018-02-05 17:44           ` Ben Peart
  2018-02-07  9:21           ` [PATCH v2] " Nguyễn Thái Ngọc Duy
  0 siblings, 2 replies; 28+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-02-04  9:38 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Christian Couder, Johannes Schindelin, Ben Peart,
	Ben Peart, Alex Vandiver, git,
	Nguyễn Thái Ngọc Duy

read_directory() code ignores all paths named ".git" even if it's not
a valid git repository. See treat_path() for details. Since ".git" is
basically invisible to read_directory(), when we are asked to
invalidate a path that contains ".git", we can safely ignore it
because the slow path would not consider it anyway.

This helps when fsmonitor is used and we have a real ".git" repo at
worktree top. Occasionally .git/index will be updated and if the
fsmonitor hook does not filter it, untracked cache is asked to
invalidate the path ".git/index".

Without this patch, we invalidate the root directory unncessarily,
which:

- makes read_directory() fall back to slow path for root directory
  (slower)

- makes the index dirty (because UNTR extension is updated). Depending
  on the index size, writing it down could also be slow.

Noticed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Sorry for the resend, I forgot git@vger.

 dir.c             | 13 ++++++++++++-
 git-compat-util.h |  2 ++
 wrapper.c         | 12 ++++++++++++
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index 7c4b45e30e..f8b4cabba9 100644
--- a/dir.c
+++ b/dir.c
@@ -1773,7 +1773,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 	if (!de)
 		return treat_path_fast(dir, untracked, cdir, istate, path,
 				       baselen, pathspec);
-	if (is_dot_or_dotdot(de->d_name) || !strcmp(de->d_name, ".git"))
+	if (is_dot_or_dotdot(de->d_name) || !fspathcmp(de->d_name, ".git"))
 		return path_none;
 	strbuf_setlen(path, baselen);
 	strbuf_addstr(path, de->d_name);
@@ -2970,8 +2970,19 @@ static int invalidate_one_component(struct untracked_cache *uc,
 void untracked_cache_invalidate_path(struct index_state *istate,
 				     const char *path)
 {
+	const char *end;
+	int skipped;
+
 	if (!istate->untracked || !istate->untracked->root)
 		return;
+	if (!fspathcmp(path, ".git"))
+		return;
+	if (ignore_case)
+		skipped = skip_caseprefix(path, "/.git", &end);
+	else
+		skipped = skip_prefix(path, "/.git", &end);
+	if (skipped && (*end == '\0' || *end == '/'))
+		return;
 	invalidate_one_component(istate->untracked, istate->untracked->root,
 				 path, strlen(path));
 }
diff --git a/git-compat-util.h b/git-compat-util.h
index 68b2ad531e..27e0b761a3 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -484,6 +484,8 @@ static inline int skip_prefix(const char *str, const char *prefix,
 	return 0;
 }
 
+int skip_caseprefix(const char *str, const char *prefix, const char **out);
+
 /*
  * If the string "str" is the same as the string in "prefix", then the "arg"
  * parameter is set to the "def" parameter and 1 is returned.
diff --git a/wrapper.c b/wrapper.c
index d20356a776..bb888d9401 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -690,3 +690,15 @@ int xgethostname(char *buf, size_t len)
 		buf[len - 1] = 0;
 	return ret;
 }
+
+int skip_caseprefix(const char *str, const char *prefix, const char **out)
+{
+	do {
+		if (!*prefix) {
+			*out = str;
+			return 1;
+		}
+	} while (tolower(*str++) == tolower(*prefix++));
+
+	return 0;
+}
-- 
2.16.1.207.gedba492059


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] dir.c: ignore paths containing .git when invalidating untracked cache
  2018-02-04  9:38         ` [PATCH] dir.c: ignore paths containing .git when invalidating untracked cache Nguyễn Thái Ngọc Duy
@ 2018-02-05 17:44           ` Ben Peart
  2018-02-06 12:02             ` Duy Nguyen
  2018-02-07  9:21           ` [PATCH v2] " Nguyễn Thái Ngọc Duy
  1 sibling, 1 reply; 28+ messages in thread
From: Ben Peart @ 2018-02-05 17:44 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Christian Couder, Johannes Schindelin, Ben Peart,
	Alex Vandiver, git



On 2/4/2018 4:38 AM, Nguyễn Thái Ngọc Duy wrote:
> read_directory() code ignores all paths named ".git" even if it's not
> a valid git repository. See treat_path() for details. Since ".git" is
> basically invisible to read_directory(), when we are asked to
> invalidate a path that contains ".git", we can safely ignore it
> because the slow path would not consider it anyway.
> 
> This helps when fsmonitor is used and we have a real ".git" repo at
> worktree top. Occasionally .git/index will be updated and if the
> fsmonitor hook does not filter it, untracked cache is asked to
> invalidate the path ".git/index".
> 
> Without this patch, we invalidate the root directory unncessarily,
> which:
> 
> - makes read_directory() fall back to slow path for root directory
>    (slower)
> 
> - makes the index dirty (because UNTR extension is updated). Depending
>    on the index size, writing it down could also be slow.
> 
> Noticed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>   Sorry for the resend, I forgot git@vger.
> 
>   dir.c             | 13 ++++++++++++-
>   git-compat-util.h |  2 ++
>   wrapper.c         | 12 ++++++++++++
>   3 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/dir.c b/dir.c
> index 7c4b45e30e..f8b4cabba9 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1773,7 +1773,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
>   	if (!de)
>   		return treat_path_fast(dir, untracked, cdir, istate, path,
>   				       baselen, pathspec);
> -	if (is_dot_or_dotdot(de->d_name) || !strcmp(de->d_name, ".git"))
> +	if (is_dot_or_dotdot(de->d_name) || !fspathcmp(de->d_name, ".git"))
>   		return path_none;
>   	strbuf_setlen(path, baselen);
>   	strbuf_addstr(path, de->d_name);
> @@ -2970,8 +2970,19 @@ static int invalidate_one_component(struct untracked_cache *uc,
>   void untracked_cache_invalidate_path(struct index_state *istate,
>   				     const char *path)
>   {
> +	const char *end;
> +	int skipped;
> +
>   	if (!istate->untracked || !istate->untracked->root)
>   		return;

Thank you for this patch!  It's great to see people helping improve the 
performance of git.

I ran a quick test and this is not sufficient to prevent the index from 
getting flagged as dirty and written out to disk when fsmonitor detects 
changes for files under the .git folder.  What I'm seeing is that when 
".git/index" is returned, the test below doesn't catch them and end 
early as would be expected.

As a result, invalidate_one_component() is called which calls 
invalidate_one_directory() which increments dir_invalidated counter and 
the regular dirty process continues which triggers the index to be 
written to disk unnecessarily.

> +	if (!fspathcmp(path, ".git"))
> +		return;
> +	if (ignore_case)
> +		skipped = skip_caseprefix(path, "/.git", &end);
> +	else
> +		skipped = skip_prefix(path, "/.git", &end);
> +	if (skipped && (*end == '\0' || *end == '/'))
> +		return;

If I replace the above lines with:

	if (ignore_case)
		skipped = skip_caseprefix(path, ".git", &end);
	else
		skipped = skip_prefix(path, ".git", &end);

Then it correctly skips _all_ files under ".git".  I'm not sure if by 
removing the leading slash, I'm breaking some other case but I was not 
able to find where that was expected or needed. Removing the leading 
slash also allows me to remove the fsmpathcmp() call as it is now redundant.

I wondered what would happen if there was some other directory named 
".git" and how that would behave.  With or without this patch and with 
or without the untracked cache, a file "dir1/.git/foo" is ignored by git 
so no change in behavior there either.

>   	invalidate_one_component(istate->untracked, istate->untracked->root,
>   				 path, strlen(path));
>   }
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 68b2ad531e..27e0b761a3 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -484,6 +484,8 @@ static inline int skip_prefix(const char *str, const char *prefix,
>   	return 0;
>   }
>   
> +int skip_caseprefix(const char *str, const char *prefix, const char **out);
> +
>   /*
>    * If the string "str" is the same as the string in "prefix", then the "arg"
>    * parameter is set to the "def" parameter and 1 is returned.
> diff --git a/wrapper.c b/wrapper.c
> index d20356a776..bb888d9401 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -690,3 +690,15 @@ int xgethostname(char *buf, size_t len)
>   		buf[len - 1] = 0;
>   	return ret;
>   }
> +
> +int skip_caseprefix(const char *str, const char *prefix, const char **out)
> +{
> +	do {
> +		if (!*prefix) {
> +			*out = str;
> +			return 1;
> +		}
> +	} while (tolower(*str++) == tolower(*prefix++));
> +
> +	return 0;
> +}
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] dir.c: ignore paths containing .git when invalidating untracked cache
  2018-02-05 17:44           ` Ben Peart
@ 2018-02-06 12:02             ` Duy Nguyen
  0 siblings, 0 replies; 28+ messages in thread
From: Duy Nguyen @ 2018-02-06 12:02 UTC (permalink / raw)
  To: Ben Peart
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Christian Couder, Johannes Schindelin, Ben Peart, Alex Vandiver,
	Git Mailing List

On Tue, Feb 6, 2018 at 12:44 AM, Ben Peart <peartben@gmail.com> wrote:
>
>
> On 2/4/2018 4:38 AM, Nguyễn Thái Ngọc Duy wrote:
>>
>> read_directory() code ignores all paths named ".git" even if it's not
>> a valid git repository. See treat_path() for details. Since ".git" is
>> basically invisible to read_directory(), when we are asked to
>> invalidate a path that contains ".git", we can safely ignore it
>> because the slow path would not consider it anyway.
>>
>> This helps when fsmonitor is used and we have a real ".git" repo at
>> worktree top. Occasionally .git/index will be updated and if the
>> fsmonitor hook does not filter it, untracked cache is asked to
>> invalidate the path ".git/index".
>>
>> Without this patch, we invalidate the root directory unncessarily,
>> which:
>>
>> - makes read_directory() fall back to slow path for root directory
>>    (slower)
>>
>> - makes the index dirty (because UNTR extension is updated). Depending
>>    on the index size, writing it down could also be slow.
>>
>> Noticed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>> ---
>>   Sorry for the resend, I forgot git@vger.
>>
>>   dir.c             | 13 ++++++++++++-
>>   git-compat-util.h |  2 ++
>>   wrapper.c         | 12 ++++++++++++
>>   3 files changed, 26 insertions(+), 1 deletion(-)
>>
>> diff --git a/dir.c b/dir.c
>> index 7c4b45e30e..f8b4cabba9 100644
>> --- a/dir.c
>> +++ b/dir.c
>> @@ -1773,7 +1773,7 @@ static enum path_treatment treat_path(struct
>> dir_struct *dir,
>>         if (!de)
>>                 return treat_path_fast(dir, untracked, cdir, istate, path,
>>                                        baselen, pathspec);
>> -       if (is_dot_or_dotdot(de->d_name) || !strcmp(de->d_name, ".git"))
>> +       if (is_dot_or_dotdot(de->d_name) || !fspathcmp(de->d_name,
>> ".git"))
>>                 return path_none;
>>         strbuf_setlen(path, baselen);
>>         strbuf_addstr(path, de->d_name);
>> @@ -2970,8 +2970,19 @@ static int invalidate_one_component(struct
>> untracked_cache *uc,
>>   void untracked_cache_invalidate_path(struct index_state *istate,
>>                                      const char *path)
>>   {
>> +       const char *end;
>> +       int skipped;
>> +
>>         if (!istate->untracked || !istate->untracked->root)
>>                 return;
>
>
> Thank you for this patch!  It's great to see people helping improve the
> performance of git.
>
> I ran a quick test and this is not sufficient to prevent the index from
> getting flagged as dirty and written out to disk when fsmonitor detects
> changes for files under the .git folder.  What I'm seeing is that when
> ".git/index" is returned, the test below doesn't catch them and end early as
> would be expected.

Right. I focused too much on ".git" in the middle and the end of the
path and forgot that it's also at the beginning.

> As a result, invalidate_one_component() is called which calls
> invalidate_one_directory() which increments dir_invalidated counter and the
> regular dirty process continues which triggers the index to be written to
> disk unnecessarily.
>
>> +       if (!fspathcmp(path, ".git"))
>> +               return;
>> +       if (ignore_case)
>> +               skipped = skip_caseprefix(path, "/.git", &end);
>> +       else
>> +               skipped = skip_prefix(path, "/.git", &end);
>> +       if (skipped && (*end == '\0' || *end == '/'))
>> +               return;
>
>
> If I replace the above lines with:
>
>         if (ignore_case)
>                 skipped = skip_caseprefix(path, ".git", &end);
>         else
>                 skipped = skip_prefix(path, ".git", &end);
>
> Then it correctly skips _all_ files under ".git".  I'm not sure if by
> removing the leading slash, I'm breaking some other case but I was not able
> to find where that was expected or needed. Removing the leading slash also
> allows me to remove the fsmpathcmp() call as it is now redundant.

Removing "/" could catch things like abc/lala.git/def, which
treat_path does not consider special and may show up as untracked
entries. In that sense, the updated invalidate_... is broken if we
don't invalidate them properly.

I think what we need here is something like

    if (!fspathcmp(path, ".git") || starts_with(path, ".git/"))

but can handle case-insensitivity as well (starts_with can't).

> I wondered what would happen if there was some other directory named ".git"
> and how that would behave.  With or without this patch and with or without
> the untracked cache, a file "dir1/.git/foo" is ignored by git so no change
> in behavior there either.

That's what I meant by treat_path() and invisibility in my commit
message. We always ignore directories (or even files) named ".git". It
does not matter if it contains anything remotely related to git. I'm
not saying it's a good thing, but it's how it is and changing that
requires a lot more work, possibly some performance degradation if you
start to check if ".git" is a valid repo before ignoring. So I focus
on improving this function alone first.
-- 
Duy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2] dir.c: ignore paths containing .git when invalidating untracked cache
  2018-02-04  9:38         ` [PATCH] dir.c: ignore paths containing .git when invalidating untracked cache Nguyễn Thái Ngọc Duy
  2018-02-05 17:44           ` Ben Peart
@ 2018-02-07  9:21           ` " Nguyễn Thái Ngọc Duy
  2018-02-07  9:21             ` Nguyễn Thái Ngọc Duy
  1 sibling, 1 reply; 28+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-02-07  9:21 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Christian Couder, Johannes Schindelin, Ben Peart,
	Ben Peart, Alex Vandiver, git,
	Nguyễn Thái Ngọc Duy

read_directory() code ignores all paths named ".git" even if it's not
a valid git repository. See treat_path() for details. Since ".git" is
basically invisible to read_directory(), when we are asked to
invalidate a path that contains ".git", we can safely ignore it
because the slow path would not consider it anyway.

This helps when fsmonitor is used and we have a real ".git" repo at
worktree top. Occasionally .git/index will be updated and if the
fsmonitor hook does not filter it, untracked cache is asked to
invalidate the path ".git/index".

Without this patch, we invalidate the root directory unncessarily,
which:

- makes read_directory() fall back to slow path for root directory
  (slower)

- makes the index dirty (because UNTR extension is updated). Depending
  on the index size, writing it down could also be slow.

A note about the new "safe_path" knob. Since this new check could be
relatively expensive, avoid it when we know it's not needed. If the
path comes from the index, it can't contain ".git". If it does
contain, we may be screwed up at many more levels, not just this one.

Noticed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 My v1 was rubbish. It's no wonder Ben didn't see my intention. v2
 corrects the "is .git in a given path?" logic and adds a test to
 verify it.

 dir.c                       | 10 ++++++----
 dir.h                       |  2 +-
 fsmonitor.c                 |  2 +-
 fsmonitor.h                 |  2 +-
 t/t7519-status-fsmonitor.sh | 39 +++++++++++++++++++++++++++++++++++++
 unpack-trees.c              |  2 +-
 6 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/dir.c b/dir.c
index 7c4b45e30e..fce45fc55e 100644
--- a/dir.c
+++ b/dir.c
@@ -1773,7 +1773,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 	if (!de)
 		return treat_path_fast(dir, untracked, cdir, istate, path,
 				       baselen, pathspec);
-	if (is_dot_or_dotdot(de->d_name) || !strcmp(de->d_name, ".git"))
+	if (is_dot_or_dotdot(de->d_name) || !fspathcmp(de->d_name, ".git"))
 		return path_none;
 	strbuf_setlen(path, baselen);
 	strbuf_addstr(path, de->d_name);
@@ -2968,10 +2968,12 @@ static int invalidate_one_component(struct untracked_cache *uc,
 }
 
 void untracked_cache_invalidate_path(struct index_state *istate,
-				     const char *path)
+				     const char *path, int safe_path)
 {
 	if (!istate->untracked || !istate->untracked->root)
 		return;
+	if (!safe_path && !verify_path(path))
+		return;
 	invalidate_one_component(istate->untracked, istate->untracked->root,
 				 path, strlen(path));
 }
@@ -2979,13 +2981,13 @@ void untracked_cache_invalidate_path(struct index_state *istate,
 void untracked_cache_remove_from_index(struct index_state *istate,
 				       const char *path)
 {
-	untracked_cache_invalidate_path(istate, path);
+	untracked_cache_invalidate_path(istate, path, 1);
 }
 
 void untracked_cache_add_to_index(struct index_state *istate,
 				  const char *path)
 {
-	untracked_cache_invalidate_path(istate, path);
+	untracked_cache_invalidate_path(istate, path, 1);
 }
 
 /* Update gitfile and core.worktree setting to connect work tree and git dir */
diff --git a/dir.h b/dir.h
index 11a047ba48..06df057054 100644
--- a/dir.h
+++ b/dir.h
@@ -350,7 +350,7 @@ static inline int dir_path_match(const struct dir_entry *ent,
 int cmp_dir_entry(const void *p1, const void *p2);
 int check_dir_entry_contains(const struct dir_entry *out, const struct dir_entry *in);
 
-void untracked_cache_invalidate_path(struct index_state *, const char *);
+void untracked_cache_invalidate_path(struct index_state *, const char *, int safe_path);
 void untracked_cache_remove_from_index(struct index_state *, const char *);
 void untracked_cache_add_to_index(struct index_state *, const char *);
 
diff --git a/fsmonitor.c b/fsmonitor.c
index 0af7c4edba..6d7bcd5d0e 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -130,7 +130,7 @@ static void fsmonitor_refresh_callback(struct index_state *istate, const char *n
 	 * as it could be a new untracked file.
 	 */
 	trace_printf_key(&trace_fsmonitor, "fsmonitor_refresh_callback '%s'", name);
-	untracked_cache_invalidate_path(istate, name);
+	untracked_cache_invalidate_path(istate, name, 0);
 }
 
 void refresh_fsmonitor(struct index_state *istate)
diff --git a/fsmonitor.h b/fsmonitor.h
index cd3cc0ccf2..65f3743636 100644
--- a/fsmonitor.h
+++ b/fsmonitor.h
@@ -65,7 +65,7 @@ static inline void mark_fsmonitor_invalid(struct index_state *istate, struct cac
 {
 	if (core_fsmonitor) {
 		ce->ce_flags &= ~CE_FSMONITOR_VALID;
-		untracked_cache_invalidate_path(istate, ce->name);
+		untracked_cache_invalidate_path(istate, ce->name, 1);
 		trace_printf_key(&trace_fsmonitor, "mark_fsmonitor_invalid '%s'", ce->name);
 	}
 }
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index eb2d13bbcf..756beb0d8e 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -314,4 +314,43 @@ test_expect_success 'splitting the index results in the same state' '
 	test_cmp expect actual
 '
 
+test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR' '
+	test_create_repo dot-git &&
+	(
+		cd dot-git &&
+		mkdir -p .git/hooks &&
+		: >tracked &&
+		: >modified &&
+		mkdir dir1 &&
+		: >dir1/tracked &&
+		: >dir1/modified &&
+		mkdir dir2 &&
+		: >dir2/tracked &&
+		: >dir2/modified &&
+		write_integration_script &&
+		git config core.fsmonitor .git/hooks/fsmonitor-test &&
+		git update-index --untracked-cache &&
+		git update-index --fsmonitor &&
+		GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-before" \
+		git status &&
+		test-dump-untracked-cache >../before
+	) &&
+	cat >>dot-git/.git/hooks/fsmonitor-test <<-\EOF &&
+	printf ".git\0"
+	printf ".git/index\0"
+	printf "dir1/.git\0"
+	printf "dir1/.git/index\0"
+	EOF
+	(
+		cd dot-git &&
+		GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-after" \
+		git status &&
+		test-dump-untracked-cache >../after
+	) &&
+	grep "directory invalidation" trace-before >>before &&
+	grep "directory invalidation" trace-after >>after &&
+	# UNTR extension unchanged, dir invalidation count unchanged
+	test_cmp before after
+'
+
 test_done
diff --git a/unpack-trees.c b/unpack-trees.c
index 96c3327f19..9a327696c5 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1506,7 +1506,7 @@ static void invalidate_ce_path(const struct cache_entry *ce,
 	if (!ce)
 		return;
 	cache_tree_invalidate_path(o->src_index, ce->name);
-	untracked_cache_invalidate_path(o->src_index, ce->name);
+	untracked_cache_invalidate_path(o->src_index, ce->name, 1);
 }
 
 /*
-- 
2.16.1.207.gedba492059


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2] dir.c: ignore paths containing .git when invalidating untracked cache
  2018-02-07  9:21           ` [PATCH v2] " Nguyễn Thái Ngọc Duy
@ 2018-02-07  9:21             ` Nguyễn Thái Ngọc Duy
  2018-02-07 16:59               ` Ben Peart
  0 siblings, 1 reply; 28+ messages in thread
From: Nguyễn Thái Ngọc Duy @ 2018-02-07  9:21 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Christian Couder, Johannes Schindelin, Ben Peart,
	Ben Peart, Alex Vandiver, git,
	Nguyễn Thái Ngọc Duy

read_directory() code ignores all paths named ".git" even if it's not
a valid git repository. See treat_path() for details. Since ".git" is
basically invisible to read_directory(), when we are asked to
invalidate a path that contains ".git", we can safely ignore it
because the slow path would not consider it anyway.

This helps when fsmonitor is used and we have a real ".git" repo at
worktree top. Occasionally .git/index will be updated and if the
fsmonitor hook does not filter it, untracked cache is asked to
invalidate the path ".git/index".

Without this patch, we invalidate the root directory unncessarily,
which:

- makes read_directory() fall back to slow path for root directory
  (slower)

- makes the index dirty (because UNTR extension is updated). Depending
  on the index size, writing it down could also be slow.

A note about the new "safe_path" knob. Since this new check could be
relatively expensive, avoid it when we know it's not needed. If the
path comes from the index, it can't contain ".git". If it does
contain, we may be screwed up at many more levels, not just this one.

Noticed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 My v1 was rubbish. It's no wonder Ben didn't see my intention. v2
 corrects the "is .git in a given path?" logic and adds a test to
 verify it.

 dir.c                       | 10 ++++++----
 dir.h                       |  2 +-
 fsmonitor.c                 |  2 +-
 fsmonitor.h                 |  2 +-
 t/t7519-status-fsmonitor.sh | 39 +++++++++++++++++++++++++++++++++++++
 unpack-trees.c              |  2 +-
 6 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/dir.c b/dir.c
index 7c4b45e30e..fce45fc55e 100644
--- a/dir.c
+++ b/dir.c
@@ -1773,7 +1773,7 @@ static enum path_treatment treat_path(struct dir_struct *dir,
 	if (!de)
 		return treat_path_fast(dir, untracked, cdir, istate, path,
 				       baselen, pathspec);
-	if (is_dot_or_dotdot(de->d_name) || !strcmp(de->d_name, ".git"))
+	if (is_dot_or_dotdot(de->d_name) || !fspathcmp(de->d_name, ".git"))
 		return path_none;
 	strbuf_setlen(path, baselen);
 	strbuf_addstr(path, de->d_name);
@@ -2968,10 +2968,12 @@ static int invalidate_one_component(struct untracked_cache *uc,
 }
 
 void untracked_cache_invalidate_path(struct index_state *istate,
-				     const char *path)
+				     const char *path, int safe_path)
 {
 	if (!istate->untracked || !istate->untracked->root)
 		return;
+	if (!safe_path && !verify_path(path))
+		return;
 	invalidate_one_component(istate->untracked, istate->untracked->root,
 				 path, strlen(path));
 }
@@ -2979,13 +2981,13 @@ void untracked_cache_invalidate_path(struct index_state *istate,
 void untracked_cache_remove_from_index(struct index_state *istate,
 				       const char *path)
 {
-	untracked_cache_invalidate_path(istate, path);
+	untracked_cache_invalidate_path(istate, path, 1);
 }
 
 void untracked_cache_add_to_index(struct index_state *istate,
 				  const char *path)
 {
-	untracked_cache_invalidate_path(istate, path);
+	untracked_cache_invalidate_path(istate, path, 1);
 }
 
 /* Update gitfile and core.worktree setting to connect work tree and git dir */
diff --git a/dir.h b/dir.h
index 11a047ba48..06df057054 100644
--- a/dir.h
+++ b/dir.h
@@ -350,7 +350,7 @@ static inline int dir_path_match(const struct dir_entry *ent,
 int cmp_dir_entry(const void *p1, const void *p2);
 int check_dir_entry_contains(const struct dir_entry *out, const struct dir_entry *in);
 
-void untracked_cache_invalidate_path(struct index_state *, const char *);
+void untracked_cache_invalidate_path(struct index_state *, const char *, int safe_path);
 void untracked_cache_remove_from_index(struct index_state *, const char *);
 void untracked_cache_add_to_index(struct index_state *, const char *);
 
diff --git a/fsmonitor.c b/fsmonitor.c
index 0af7c4edba..6d7bcd5d0e 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -130,7 +130,7 @@ static void fsmonitor_refresh_callback(struct index_state *istate, const char *n
 	 * as it could be a new untracked file.
 	 */
 	trace_printf_key(&trace_fsmonitor, "fsmonitor_refresh_callback '%s'", name);
-	untracked_cache_invalidate_path(istate, name);
+	untracked_cache_invalidate_path(istate, name, 0);
 }
 
 void refresh_fsmonitor(struct index_state *istate)
diff --git a/fsmonitor.h b/fsmonitor.h
index cd3cc0ccf2..65f3743636 100644
--- a/fsmonitor.h
+++ b/fsmonitor.h
@@ -65,7 +65,7 @@ static inline void mark_fsmonitor_invalid(struct index_state *istate, struct cac
 {
 	if (core_fsmonitor) {
 		ce->ce_flags &= ~CE_FSMONITOR_VALID;
-		untracked_cache_invalidate_path(istate, ce->name);
+		untracked_cache_invalidate_path(istate, ce->name, 1);
 		trace_printf_key(&trace_fsmonitor, "mark_fsmonitor_invalid '%s'", ce->name);
 	}
 }
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index eb2d13bbcf..756beb0d8e 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -314,4 +314,43 @@ test_expect_success 'splitting the index results in the same state' '
 	test_cmp expect actual
 '
 
+test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating UNTR' '
+	test_create_repo dot-git &&
+	(
+		cd dot-git &&
+		mkdir -p .git/hooks &&
+		: >tracked &&
+		: >modified &&
+		mkdir dir1 &&
+		: >dir1/tracked &&
+		: >dir1/modified &&
+		mkdir dir2 &&
+		: >dir2/tracked &&
+		: >dir2/modified &&
+		write_integration_script &&
+		git config core.fsmonitor .git/hooks/fsmonitor-test &&
+		git update-index --untracked-cache &&
+		git update-index --fsmonitor &&
+		GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-before" \
+		git status &&
+		test-dump-untracked-cache >../before
+	) &&
+	cat >>dot-git/.git/hooks/fsmonitor-test <<-\EOF &&
+	printf ".git\0"
+	printf ".git/index\0"
+	printf "dir1/.git\0"
+	printf "dir1/.git/index\0"
+	EOF
+	(
+		cd dot-git &&
+		GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-after" \
+		git status &&
+		test-dump-untracked-cache >../after
+	) &&
+	grep "directory invalidation" trace-before >>before &&
+	grep "directory invalidation" trace-after >>after &&
+	# UNTR extension unchanged, dir invalidation count unchanged
+	test_cmp before after
+'
+
 test_done
diff --git a/unpack-trees.c b/unpack-trees.c
index 96c3327f19..9a327696c5 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1506,7 +1506,7 @@ static void invalidate_ce_path(const struct cache_entry *ce,
 	if (!ce)
 		return;
 	cache_tree_invalidate_path(o->src_index, ce->name);
-	untracked_cache_invalidate_path(o->src_index, ce->name);
+	untracked_cache_invalidate_path(o->src_index, ce->name, 1);
 }
 
 /*
-- 
2.16.1.207.gedba492059


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] dir.c: ignore paths containing .git when invalidating untracked cache
  2018-02-07  9:21             ` Nguyễn Thái Ngọc Duy
@ 2018-02-07 16:59               ` Ben Peart
  2018-02-13 10:00                 ` Duy Nguyen
  0 siblings, 1 reply; 28+ messages in thread
From: Ben Peart @ 2018-02-07 16:59 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy,
	Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Christian Couder, Johannes Schindelin, Ben Peart,
	Alex Vandiver, git



On 2/7/2018 4:21 AM, Nguyễn Thái Ngọc Duy wrote:
> read_directory() code ignores all paths named ".git" even if it's not
> a valid git repository. See treat_path() for details. Since ".git" is
> basically invisible to read_directory(), when we are asked to
> invalidate a path that contains ".git", we can safely ignore it
> because the slow path would not consider it anyway.
> 
> This helps when fsmonitor is used and we have a real ".git" repo at
> worktree top. Occasionally .git/index will be updated and if the
> fsmonitor hook does not filter it, untracked cache is asked to
> invalidate the path ".git/index".
> 
> Without this patch, we invalidate the root directory unncessarily,
> which:
> 
> - makes read_directory() fall back to slow path for root directory
>    (slower)
> 
> - makes the index dirty (because UNTR extension is updated). Depending
>    on the index size, writing it down could also be slow.
> 

Thank you again, this patch makes much more sense to me.

> A note about the new "safe_path" knob. Since this new check could be
> relatively expensive, avoid it when we know it's not needed. If the
> path comes from the index, it can't contain ".git". If it does
> contain, we may be screwed up at many more levels, not just this one.
> 

I do have a simplifying suggestion to make.  I noticed that other uses 
of verify_path() check when the potentially erroneous path is passed in 
and then all the underlying code can assume it is valid.  I think that 
makes sense here as well and it makes for a smaller patch.


> diff --git a/fsmonitor.h b/fsmonitor.h
> index cd3cc0ccf2..65f3743636 100644
> --- a/fsmonitor.h
> +++ b/fsmonitor.h
> @@ -65,7 +65,7 @@ static inline void mark_fsmonitor_invalid(struct index_state *istate, struct cac
>   {
>   	if (core_fsmonitor) {
>   		ce->ce_flags &= ~CE_FSMONITOR_VALID;
> -		untracked_cache_invalidate_path(istate, ce->name);
> +		untracked_cache_invalidate_path(istate, ce->name, 1);

This test isn't needed because we're pulling the name right out of the 
cache entry so it doesn't need to be verified.

Here is a modified version of your patch for consideration:

================

read_directory() code ignores all paths named ".git" even if it's not
a valid git repository. See treat_path() for details. Since ".git" is
basically invisible to read_directory(), when we are asked to
invalidate a path that contains ".git", we can safely ignore it
because the slow path would not consider it anyway.

This helps when fsmonitor is used and we have a real ".git" repo at
worktree top. Occasionally .git/index will be updated and if the
fsmonitor hook does not filter it, untracked cache is asked to
invalidate the path ".git/index".

Without this patch, we invalidate the root directory unnecessarily,
which:

- makes read_directory() fall back to slow path for root directory
   (slower)

- makes the index dirty (because UNTR extension is updated). Depending
   on the index size, writing it down could also be slow.

Noticed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Ben Peart <benpeart@microsoft.com>
---

Notes:
     Base Ref: master
     Web-Diff: https://github.com/benpeart/git/commit/218a577618
     Checkout: git fetch https://github.com/benpeart/git verify_path-v1 
&& git checkout 218a577618

  dir.c                       |  2 +-
  fsmonitor.c                 |  6 +++++-
  t/t7519-status-fsmonitor.sh | 39 +++++++++++++++++++++++++++++++++++++++
  3 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index 7c4b45e30e..d431da46f5 100644
--- a/dir.c
+++ b/dir.c
@@ -1773,7 +1773,7 @@ static enum path_treatment treat_path(struct 
dir_struct *dir,
  	if (!de)
  		return treat_path_fast(dir, untracked, cdir, istate, path,
  				       baselen, pathspec);
-	if (is_dot_or_dotdot(de->d_name) || !strcmp(de->d_name, ".git"))
+	if (is_dot_or_dotdot(de->d_name) || !fspathcmp(de->d_name, ".git"))
  		return path_none;
  	strbuf_setlen(path, baselen);
  	strbuf_addstr(path, de->d_name);
diff --git a/fsmonitor.c b/fsmonitor.c
index 0af7c4edba..019576f306 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -118,8 +118,12 @@ static int query_fsmonitor(int version, uint64_t 
last_update, struct strbuf *que

  static void fsmonitor_refresh_callback(struct index_state *istate, 
const char *name)
  {
-	int pos = index_name_pos(istate, name, strlen(name));
+	int pos;

+	if (!verify_path(name))
+		return;
+
+	pos = index_name_pos(istate, name, strlen(name));
  	if (pos >= 0) {
  		struct cache_entry *ce = istate->cache[pos];
  		ce->ce_flags &= ~CE_FSMONITOR_VALID;
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index eb2d13bbcf..756beb0d8e 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -314,4 +314,43 @@ test_expect_success 'splitting the index results in 
the same state' '
  	test_cmp expect actual
  '

+test_expect_success UNTRACKED_CACHE 'ignore .git changes when 
invalidating UNTR' '
+	test_create_repo dot-git &&
+	(
+		cd dot-git &&
+		mkdir -p .git/hooks &&
+		: >tracked &&
+		: >modified &&
+		mkdir dir1 &&
+		: >dir1/tracked &&
+		: >dir1/modified &&
+		mkdir dir2 &&
+		: >dir2/tracked &&
+		: >dir2/modified &&
+		write_integration_script &&
+		git config core.fsmonitor .git/hooks/fsmonitor-test &&
+		git update-index --untracked-cache &&
+		git update-index --fsmonitor &&
+		GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-before" \
+		git status &&
+		test-dump-untracked-cache >../before
+	) &&
+	cat >>dot-git/.git/hooks/fsmonitor-test <<-\EOF &&
+	printf ".git\0"
+	printf ".git/index\0"
+	printf "dir1/.git\0"
+	printf "dir1/.git/index\0"
+	EOF
+	(
+		cd dot-git &&
+		GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-after" \
+		git status &&
+		test-dump-untracked-cache >../after
+	) &&
+	grep "directory invalidation" trace-before >>before &&
+	grep "directory invalidation" trace-after >>after &&
+	# UNTR extension unchanged, dir invalidation count unchanged
+	test_cmp before after
+'
+
  test_done

base-commit: 5be1f00a9a701532232f57958efab4be8c959a29
-- 
2.15.0.windows.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] dir.c: ignore paths containing .git when invalidating untracked cache
  2018-02-07 16:59               ` Ben Peart
@ 2018-02-13 10:00                 ` Duy Nguyen
  2018-02-13 17:57                   ` Junio C Hamano
  0 siblings, 1 reply; 28+ messages in thread
From: Duy Nguyen @ 2018-02-13 10:00 UTC (permalink / raw)
  To: Ben Peart
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Christian Couder, Johannes Schindelin, Ben Peart, Alex Vandiver,
	Git Mailing List

On Wed, Feb 7, 2018 at 11:59 PM, Ben Peart <peartben@gmail.com> wrote:
> diff --git a/dir.c b/dir.c
> index 7c4b45e30e..d431da46f5 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1773,7 +1773,7 @@ static enum path_treatment treat_path(struct
> dir_struct *dir,
>         if (!de)
>                 return treat_path_fast(dir, untracked, cdir, istate, path,
>                                        baselen, pathspec);
> -       if (is_dot_or_dotdot(de->d_name) || !strcmp(de->d_name, ".git"))
> +       if (is_dot_or_dotdot(de->d_name) || !fspathcmp(de->d_name, ".git"))
>                 return path_none;
>         strbuf_setlen(path, baselen);
>         strbuf_addstr(path, de->d_name);
> diff --git a/fsmonitor.c b/fsmonitor.c
> index 0af7c4edba..019576f306 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -118,8 +118,12 @@ static int query_fsmonitor(int version, uint64_t
> last_update, struct strbuf *que
>
>  static void fsmonitor_refresh_callback(struct index_state *istate, const
> char *name)
>  {
> -       int pos = index_name_pos(istate, name, strlen(name));
> +       int pos;
>
> +       if (!verify_path(name))
> +               return;
> +
> +       pos = index_name_pos(istate, name, strlen(name));
>         if (pos >= 0) {
>                 struct cache_entry *ce = istate->cache[pos];
>                 ce->ce_flags &= ~CE_FSMONITOR_VALID;
>

It's very tempting considering that the amount of changes is much
smaller. But I think we should go with my version. The hope is when a
_new_ call site appears, the author would think twice before passing
zero or one to the safe_path argument.

> diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
> index eb2d13bbcf..756beb0d8e 100755
> --- a/t/t7519-status-fsmonitor.sh
> +++ b/t/t7519-status-fsmonitor.sh
> @@ -314,4 +314,43 @@ test_expect_success 'splitting the index results in the
> same state' '
>         test_cmp expect actual
>  '
>
> +test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating
> UNTR' '
> +       test_create_repo dot-git &&
> +       (
> +               cd dot-git &&
> +               mkdir -p .git/hooks &&
> +               : >tracked &&
> +               : >modified &&
> +               mkdir dir1 &&
> +               : >dir1/tracked &&
> +               : >dir1/modified &&
> +               mkdir dir2 &&
> +               : >dir2/tracked &&
> +               : >dir2/modified &&
> +               write_integration_script &&
> +               git config core.fsmonitor .git/hooks/fsmonitor-test &&
> +               git update-index --untracked-cache &&
> +               git update-index --fsmonitor &&
> +               GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-before" \
> +               git status &&
> +               test-dump-untracked-cache >../before
> +       ) &&
> +       cat >>dot-git/.git/hooks/fsmonitor-test <<-\EOF &&
> +       printf ".git\0"
> +       printf ".git/index\0"
> +       printf "dir1/.git\0"
> +       printf "dir1/.git/index\0"
> +       EOF
> +       (
> +               cd dot-git &&
> +               GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-after" \
> +               git status &&
> +               test-dump-untracked-cache >../after
> +       ) &&
> +       grep "directory invalidation" trace-before >>before &&
> +       grep "directory invalidation" trace-after >>after &&
> +       # UNTR extension unchanged, dir invalidation count unchanged
> +       test_cmp before after
> +'
> +
>  test_done
>
> base-commit: 5be1f00a9a701532232f57958efab4be8c959a29
> --
> 2.15.0.windows.1
>



-- 
Duy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] dir.c: ignore paths containing .git when invalidating untracked cache
  2018-02-13 10:00                 ` Duy Nguyen
@ 2018-02-13 17:57                   ` Junio C Hamano
  2018-02-14  1:24                     ` Duy Nguyen
  0 siblings, 1 reply; 28+ messages in thread
From: Junio C Hamano @ 2018-02-13 17:57 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Ben Peart, Ævar Arnfjörð Bjarmason,
	Christian Couder, Johannes Schindelin, Ben Peart, Alex Vandiver,
	Git Mailing List

Duy Nguyen <pclouds@gmail.com> writes:

> It's very tempting considering that the amount of changes is much
> smaller. But I think we should go with my version. The hope is when a
> _new_ call site appears, the author would think twice before passing
> zero or one to the safe_path argument.

Wouldn't it be a better API if the author of new callsite does not
have to think twice and can instead rely on the called function
untracked_cache_invalidate_path() to always do the right thing?



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] dir.c: ignore paths containing .git when invalidating untracked cache
  2018-02-13 17:57                   ` Junio C Hamano
@ 2018-02-14  1:24                     ` Duy Nguyen
  2018-02-14  8:00                       ` Junio C Hamano
  0 siblings, 1 reply; 28+ messages in thread
From: Duy Nguyen @ 2018-02-14  1:24 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ben Peart, Ævar Arnfjörð Bjarmason,
	Christian Couder, Johannes Schindelin, Ben Peart, Alex Vandiver,
	Git Mailing List

On Wed, Feb 14, 2018 at 12:57 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Duy Nguyen <pclouds@gmail.com> writes:
>
>> It's very tempting considering that the amount of changes is much
>> smaller. But I think we should go with my version. The hope is when a
>> _new_ call site appears, the author would think twice before passing
>> zero or one to the safe_path argument.
>
> Wouldn't it be a better API if the author of new callsite does not
> have to think twice and can instead rely on the called function
> untracked_cache_invalidate_path() to always do the right thing?

I am worried that always doing the right thing may carry performance
penalty (this is based purely on reading verify_path() code, no actual
benchmarking). For safety, you can always set safe_path to zero. But
if you do a lot of invalidation and something starts to slow down,
then you can consider setting safe_path to 1 (if it's actually safe to
do so). I think we do mass invalidation in some case, so I will try to
actually benchmark that and see if this safe_path argument is
justified or if we can always call verify_path().
-- 
Duy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2] dir.c: ignore paths containing .git when invalidating untracked cache
  2018-02-14  1:24                     ` Duy Nguyen
@ 2018-02-14  8:00                       ` Junio C Hamano
  0 siblings, 0 replies; 28+ messages in thread
From: Junio C Hamano @ 2018-02-14  8:00 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Ben Peart, Ævar Arnfjörð Bjarmason,
	Christian Couder, Johannes Schindelin, Ben Peart, Alex Vandiver,
	Git Mailing List

On Tue, Feb 13, 2018 at 5:24 PM, Duy Nguyen <pclouds@gmail.com> wrote:
> I am worried that always doing the right thing may carry performance
> penalty (this is based purely on reading verify_path() code, no actual
> benchmarking). For safety, you can always set safe_path to zero. But
> if you do a lot of invalidation and something starts to slow down,
> then you can consider setting safe_path to 1 (if it's actually safe to
> do so).

Fair enough. Thanks for articulating the reasoning.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, back to index

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-27  0:28 Some rough edges of core.fsmonitor Ævar Arnfjörð Bjarmason
2018-01-27  1:36 ` Duy Nguyen
2018-01-27  1:39   ` [PATCH] trace: measure where the time is spent in the index-heavy operations Nguyễn Thái Ngọc Duy
2018-01-27 11:58     ` Thomas Gummerer
2018-01-27 11:43   ` Some rough edges of core.fsmonitor Ævar Arnfjörð Bjarmason
2018-01-27 12:39     ` Duy Nguyen
2018-01-27 13:09       ` Duy Nguyen
2018-01-27 19:01         ` Ævar Arnfjörð Bjarmason
2018-01-30 22:41           ` Ben Peart
2018-01-29  9:40     ` Duy Nguyen
2018-01-29 23:16       ` Ben Peart
2018-02-01 10:40         ` Duy Nguyen
2018-01-28 20:44 ` Johannes Schindelin
2018-01-28 22:28   ` Ævar Arnfjörð Bjarmason
2018-01-30  1:21     ` Ben Peart
2018-01-31 10:15       ` Duy Nguyen
2018-02-04  9:38         ` [PATCH] dir.c: ignore paths containing .git when invalidating untracked cache Nguyễn Thái Ngọc Duy
2018-02-05 17:44           ` Ben Peart
2018-02-06 12:02             ` Duy Nguyen
2018-02-07  9:21           ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2018-02-07  9:21             ` Nguyễn Thái Ngọc Duy
2018-02-07 16:59               ` Ben Peart
2018-02-13 10:00                 ` Duy Nguyen
2018-02-13 17:57                   ` Junio C Hamano
2018-02-14  1:24                     ` Duy Nguyen
2018-02-14  8:00                       ` Junio C Hamano
2018-01-30 22:57 ` Some rough edges of core.fsmonitor Ben Peart
2018-01-30 23:16   ` Ævar Arnfjörð Bjarmason
2018-01-31 16:12     ` Ben Peart

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox