From: Karsten Blees <karsten.blees@gmail.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: kusmabite@gmail.com, Ramkumar Ramachandra <artagnon@gmail.com>,
Robert Zeh <robert.allan.zeh@gmail.com>,
Junio C Hamano <gitster@pobox.com>,
Git List <git@vger.kernel.org>,
finnag@pvv.org
Subject: Re: inotify to minimize stat() calls
Date: Tue, 12 Feb 2013 21:48:18 +0100 [thread overview]
Message-ID: <511AAA92.4030508@gmail.com> (raw)
In-Reply-To: <CACsJy8AWyJ=dW5f44huWyPPe4X62xyi+R9CNM5Tg6u6TYf+thQ@mail.gmail.com>
Am 11.02.2013 04:53, schrieb Duy Nguyen:
> On Sun, Feb 10, 2013 at 11:58 PM, Erik Faye-Lund <kusmabite@gmail.com> wrote:
>> Karsten Blees has done something similar-ish on Windows, and he posted
>> the results here:
>>
>> https://groups.google.com/forum/#!topic/msysgit/fL_jykUmUNE/discussion
>>
The new hashtable implementation in fscache [1] supports O(1) removal and has no mingw dependencies - might come in handy for anyone trying to implement an inotify daemon.
[1] https://github.com/kblees/git/commit/f7eb85c2
>> I also seem to remember he doing a ReadDirectoryChangesW version, but
>> I don't remember what happened with that.
>
> Thanks. I came across that but did not remember. For one thing, we
> know the inotify alternative for Windows: ReadDirectoryChangesW.
>
I dropped ReadDirectoryChangesW because maintaining a 'live' file system cache became more and more complicated. For example, according to MSDN docs, ReadDirectoryChangesW *may* report short DOS 8.3 names (i.e. "PROGRA~1" instead of "Program Files"), so a correct and fast cache implementation would have to be indexed by long *and* short names...
Another problem was that the 'live' cache had quite negative performance impact on mutating git commands (checkout, reset...). An inotify daemon running as a background process (not in-process as fscache) will probably affect everyone that modifies the working copy, e.g. running 'make' or the test-suite. This should be considered in the design.
> I copy "git status"'s (impressive) numbers from fscache-v0 for those
> who are interested in:
>
> preload | -u | normal | cached | gain
> --------+-----+--------+--------+------
> false | all | 25.144 | 3.055 | 8.2
> false | no | 22.822 | 1.748 | 12.8
> true | all | 9.234 | 2.179 | 4.2
> true | no | 6.833 | 0.955 | 7.2
>
Note that I wasn't able to reproduce such bad 'normal' values in later tests, I guess disk fragmentation and/or virus scanner must have tricked me on that day...gain factors of 2.5 - 5 are more appropriate.
However, the difference between git status -uall and -uno was always about 1.3 s in all fscache versions, even though opendir/readdir/closedir was served entirely from the cache. I added a bit of performance tracing to find the cause, and I think most of the time spent in wt_status_collect_untracked can be eliminated:
1.) 0.939 s is spent in dir.c/excluded (i.e. checking .gitignore). This check is done for *every* file in the working copy, including files in the index. Checking the index first could eliminate most of that, i.e.:
(Note: patches are for discussion only, I'm aware that they may have unintended side effects...)
@@ -1097,6 +1097,8 @@ static enum path_treatment treat_path(struct dir_struct *dir,
return path_ignored;
strbuf_setlen(path, baselen);
strbuf_addstr(path, de->d_name);
+ if (cache_name_exists(path->buf, path->len, ignore_case))
+ return path_ignored;
if (simplify_away(path->buf, path->len, simplify))
return path_ignored;
---
2.) 0.135 s is spent in name-hash.c/hash_index_entry_directories, reindexing the same directories over and over again. In the end, the hashtable contains 939k directory entries, even though the WebKit test repo only has 7k directories. Checking if a directory entry already exists could reduce that, i.e.:
@@ -53,14 +55,23 @@ static void hash_index_entry_directories(struct index_state *istate, struct cach
unsigned int hash;
void **pos;
double t = ticks();
+ struct cache_entry *ce2;
+ int len = ce_namelen(ce);
- const char *ptr = ce->name;
- while (*ptr) {
- while (*ptr && *ptr != '/')
- ++ptr;
- if (*ptr == '/') {
- ++ptr;
- hash = hash_name(ce->name, ptr - ce->name);
+ while (len > 0) {
+ while (len > 0 && ce->name[len - 1] != '/')
+ len--;
+ if (len > 0) {
+ hash = hash_name(ce->name, len);
+ ce2 = lookup_hash(hash, &istate->name_hash);
+ while (ce2) {
+ if (same_name(ce2, ce->name, len, ignore_case)) {
+ add_since(t, &hash_dirs);
+ return;
+ }
+ ce2 = ce2->dir_next;
+ }
+ len--;
pos = insert_hash(hash, ce, &istate->name_hash);
if (pos) {
ce->dir_next = *pos;
---
Tests were done with the WebKit repo (~200k files, ~7k directories, 15 .gitignore files, ~100 entries in root .gitignore). Instrumented code can be found here: https://github.com/kblees/git/tree/kb/git-status-performance-tracing
Here's the performance traces of 'git status -s -uall'
Before patches:
trace: at builtin/commit.c:1221, time: 0.523429 s: cmd_status/read_cache_preload
trace: at builtin/commit.c:1223, time: 0.00403477 s: cmd_status/refresh_index
trace: at builtin/commit.c:1231, time: 0.00318494 s: cmd_status/hold_locked_index
trace: at wt-status.c:539, time: 0.00527396 s: wt_status_collect_changes_worktree
trace: at wt-status.c:544, time: 0.00545771 s: wt_status_collect_changes
trace: at wt-status.c:546, time: 1.286 s: wt_status_collect_untracked
trace: at builtin/commit.c:1233, time: 1.29852 s: cmd_status/wt_status_collect
trace: at dir.c:1540, time: 0.00170986 s: read_directory_recursive/strbuf_add
trace: at dir.c:1541, time: 0.00623972 s: read_directory_recursive/opendir
trace: at dir.c:1542, time: 0.00517881 s: read_directory_recursive/readdir
trace: at dir.c:1543, time: 0.992936 s: read_directory_recursive/treat_path
trace: at dir.c:1544, time: 0.277942 s: read_directory_recursive/dir_add_name
trace: at dir.c:1545, time: 0.0014594 s: read_directory_recursive/close
trace: at dir.c:1546, time: 0.939349 s: treat_one_path/excluded
trace: at dir.c:1547, time: 0.0050811 s: treat_one_path/dir_add_ignored
trace: at dir.c:1548, time: 0.00515875 s: treat_one_path/get_dtype
trace: at dir.c:1549, time: 0.00329322 s: treat_one_path/treat_directory
trace: at dir.c:1550, time: 0.222969 s: excluded/prep_exclude
trace: at dir.c:1551, time: 0.00443398 s: excluded/excluded_from_list[EXC_CMDL]
trace: at dir.c:1552, time: 0.699602 s: excluded/excluded_from_list[EXC_DIRS]
trace: at dir.c:1553, time: 0.00475736 s: excluded/excluded_from_list[EXC_FILE]
trace: at read-cache.c:460, time: 0.00967987 s: index_name_pos
trace: at name-hash.c:213, time: 0.190481 s: lazy_init_name_hash
trace: at name-hash.c:216, time: 0.135248 s: hash_index_entry_directories (938865 entries)
trace: at name-hash.c:217, time: 0.0806647 s: index_name_exists
trace: at compat/mingw.c:2137, time: 1.97424 s: command: c:\git\msysgit\git\git-status.exe -s -uall
After patches:
trace: at builtin/commit.c:1221, time: 0.517511 s: cmd_status/read_cache_preload
trace: at builtin/commit.c:1223, time: 0.00405227 s: cmd_status/refresh_index
trace: at builtin/commit.c:1231, time: 0.00322796 s: cmd_status/hold_locked_index
trace: at wt-status.c:539, time: 0.00530057 s: wt_status_collect_changes_worktree
trace: at wt-status.c:544, time: 0.00546062 s: wt_status_collect_changes
trace: at wt-status.c:546, time: 0.322799 s: wt_status_collect_untracked
trace: at builtin/commit.c:1233, time: 0.33536 s: cmd_status/wt_status_collect
trace: at dir.c:1542, time: 0.00120529 s: read_directory_recursive/strbuf_add
trace: at dir.c:1543, time: 0.00476647 s: read_directory_recursive/opendir
trace: at dir.c:1544, time: 0.00502022 s: read_directory_recursive/readdir
trace: at dir.c:1545, time: 0.310515 s: read_directory_recursive/treat_path
trace: at dir.c:1546, time: 0 s: read_directory_recursive/dir_add_name
trace: at dir.c:1547, time: 0.000831234 s: read_directory_recursive/close
trace: at dir.c:1548, time: 0.0668582 s: treat_one_path/excluded
trace: at dir.c:1549, time: 0.000173174 s: treat_one_path/dir_add_ignored
trace: at dir.c:1550, time: 0.000174267 s: treat_one_path/get_dtype
trace: at dir.c:1551, time: 0.00315468 s: treat_one_path/treat_directory
trace: at dir.c:1552, time: 0.039733 s: excluded/prep_exclude
trace: at dir.c:1553, time: 0.000185205 s: excluded/excluded_from_list[EXC_CMDL]
trace: at dir.c:1554, time: 0.0264496 s: excluded/excluded_from_list[EXC_DIRS]
trace: at dir.c:1555, time: 0.000170622 s: excluded/excluded_from_list[EXC_FILE]
trace: at read-cache.c:460, time: 0.00260636 s: index_name_pos
trace: at name-hash.c:224, time: 0.126637 s: lazy_init_name_hash
trace: at name-hash.c:227, time: 0.0500866 s: hash_index_entry_directories (7152 entries)
trace: at name-hash.c:228, time: 0.0790143 s: index_name_exists
trace: at compat/mingw.c:2137, time: 1.00595 s: command: c:\git\msysgit\git\git-status.exe -s -uall
next prev parent reply other threads:[~2013-02-12 20:48 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-08 21:10 inotify to minimize stat() calls Ramkumar Ramachandra
2013-02-08 22:15 ` Junio C Hamano
2013-02-08 22:45 ` Junio C Hamano
2013-02-09 2:10 ` Duy Nguyen
2013-02-09 2:37 ` Junio C Hamano
2013-02-09 2:56 ` Junio C Hamano
2013-02-09 3:36 ` Robert Zeh
2013-02-09 12:05 ` Ramkumar Ramachandra
2013-02-09 12:11 ` Ramkumar Ramachandra
2013-02-09 12:53 ` Ramkumar Ramachandra
2013-02-09 12:59 ` Duy Nguyen
2013-02-09 17:10 ` Ramkumar Ramachandra
2013-02-09 18:56 ` Ramkumar Ramachandra
2013-02-10 5:24 ` Duy Nguyen
2013-02-10 11:17 ` Duy Nguyen
2013-02-10 11:22 ` Duy Nguyen
2013-02-10 20:16 ` Junio C Hamano
2013-02-11 2:56 ` Duy Nguyen
2013-02-11 11:12 ` Duy Nguyen
2013-03-07 22:16 ` Torsten Bögershausen
2013-03-08 0:04 ` Junio C Hamano
2013-03-08 7:01 ` Torsten Bögershausen
2013-03-08 8:15 ` Junio C Hamano
2013-03-08 9:24 ` Torsten Bögershausen
2013-03-08 10:53 ` Duy Nguyen
2013-03-10 8:23 ` Ramkumar Ramachandra
2013-03-13 12:59 ` [PATCH] status: hint the user about -uno if read_directory takes too long Nguyễn Thái Ngọc Duy
2013-03-13 15:21 ` Torsten Bögershausen
2013-03-13 16:16 ` Junio C Hamano
2013-03-14 10:22 ` Duy Nguyen
2013-03-14 15:05 ` Junio C Hamano
2013-03-15 12:30 ` Duy Nguyen
2013-03-15 15:52 ` Torsten Bögershausen
2013-03-15 15:57 ` Ramkumar Ramachandra
2013-03-15 16:53 ` Junio C Hamano
2013-03-15 17:41 ` Torsten Bögershausen
2013-03-15 20:06 ` Junio C Hamano
2013-03-15 21:14 ` Torsten Bögershausen
2013-03-15 21:59 ` Junio C Hamano
2013-03-16 7:21 ` Torsten Bögershausen
2013-03-17 4:47 ` Junio C Hamano
2013-03-16 1:51 ` Duy Nguyen
2013-02-10 13:26 ` inotify to minimize stat() calls demerphq
2013-02-10 15:35 ` Duy Nguyen
2013-02-14 14:36 ` Magnus Bäck
2013-02-10 16:45 ` Ramkumar Ramachandra
2013-02-11 3:03 ` Duy Nguyen
2013-02-10 16:58 ` Erik Faye-Lund
2013-02-11 3:53 ` Duy Nguyen
2013-02-12 20:48 ` Karsten Blees [this message]
2013-02-13 10:06 ` Duy Nguyen
2013-02-13 12:15 ` Duy Nguyen
2013-02-13 18:18 ` Jeff King
2013-02-13 19:47 ` Jeff King
2013-02-13 20:25 ` Karsten Blees
2013-02-13 22:55 ` Jeff King
2013-02-14 0:48 ` Karsten Blees
2013-02-27 14:45 ` [PATCH] name-hash.c: fix endless loop with core.ignorecase=true Karsten Blees
2013-02-27 16:53 ` Junio C Hamano
2013-02-27 21:52 ` Karsten Blees
2013-02-27 23:57 ` [PATCH v2] " Karsten Blees
2013-02-28 0:27 ` Junio C Hamano
2013-02-19 9:49 ` inotify to minimize stat() calls Ramkumar Ramachandra
2013-02-19 14:25 ` Karsten Blees
2013-02-19 13:16 ` Drew Northup
2013-02-19 13:47 ` Duy Nguyen
2013-02-09 19:35 ` Junio C Hamano
2013-02-10 19:03 ` Robert Zeh
2013-02-10 19:26 ` Martin Fick
2013-02-10 20:18 ` Robert Zeh
2013-02-11 3:21 ` Duy Nguyen
2013-02-11 14:13 ` Robert Zeh
2013-02-19 9:57 ` Ramkumar Ramachandra
2013-04-24 17:20 ` [PATCH] " Robert Zeh
2013-04-24 21:32 ` Duy Nguyen
2013-04-25 19:44 ` Robert Zeh
2013-04-25 21:20 ` Duy Nguyen
2013-04-26 15:35 ` Robert Zeh
2013-04-25 8:18 ` Thomas Rast
2013-04-25 19:37 ` Robert Zeh
2013-04-25 19:59 ` Thomas Rast
2013-04-27 13:51 ` Thomas Rast
2013-04-27 23:56 ` Duy Nguyen
[not found] ` <CAKXa9=r2A7UeBV2s2H3wVGdPkS1zZ9huNJhtvTC-p0S5Ed12xA@mail.gmail.com>
2013-04-30 0:27 ` Duy Nguyen
2013-02-09 11:32 ` Ramkumar Ramachandra
2013-02-14 15:16 ` Ævar Arnfjörð Bjarmason
2013-02-14 16:31 ` Junio C Hamano
2013-02-19 9:40 ` Ramkumar Ramachandra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=511AAA92.4030508@gmail.com \
--to=karsten.blees@gmail.com \
--cc=artagnon@gmail.com \
--cc=blees@dcon.de \
--cc=finnag@pvv.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=kusmabite@gmail.com \
--cc=pclouds@gmail.com \
--cc=robert.allan.zeh@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).