git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Christian Couder <christian.couder@gmail.com>
To: Ben Peart <peartben@gmail.com>
Cc: git <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com>,
	Ben Peart <benpeart@microsoft.com>,
	Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
	Johannes Schindelin <johannes.schindelin@gmx.de>,
	David Turner <David.Turner@twosigma.com>,
	Jeff King <peff@peff.net>
Subject: Re: [PATCH v2 0/6] Fast git status via a file system watcher
Date: Wed, 31 May 2017 09:37:17 +0200	[thread overview]
Message-ID: <CAP8UFD3aZU1D+bbqDEV05QG06gySt9PPL7XD4Xz0TP84zM8M6A@mail.gmail.com> (raw)
In-Reply-To: <0d553422-5b4f-cfdd-961a-d007826b68cb@gmail.com>

>>>> Yeah, they could be encoded in the integration script, but it could be
>>>> better if it was possible to just configure a generic command line.
>>>>
>>>> For example if the directory that should be watched for filesystem
>>>> changes could be passed as well as the time since the last changes,
>>>> perhaps only a generic command line would be need.
>>>
>>> Maybe I'm not understanding what you have in mind but I haven't found
>>> this
>>> to be the case in the two integrations I've done with file system
>>> watchers
>>> (one internal and Watchman).  They require you download, install, and
>>> configure them by telling them about the folders you want monitored.
>>> Then
>>> you can start querying them for changes and processing the output to
>>> match
>>> what git expects.  While the download and install steps vary, having that
>>> query + process and return results wrapped up in an integration hook has
>>> worked well.
>>
>> It looks like one can also just ask watchman to monitor a directory with:
>>
>> watchman watch /path/to/dir
>>
>> or:
>>
>> echo '["watch", "/path/to/dir"]' | watchman -j
>>
>> Also for example on Linux people might want to use command line tools
>> like:
>>
>> https://linux.die.net/man/1/inotifywait
>>
>> and you can pass the directories you want to be watched as arguments
>> to this kind of tools.
>>
>> So it would be nice, if we didn't require the user to configure
>> anything and we could just configure the watching of what we need in
>> the hook (or a command configured using a config option). If the hook
>> (or configured command) could be passed the directory by git, it could
>> also be generic.
>
> OK, I think I understand what you're attempting to accomplish now. Often,
> Watchman (and other similar tools) are used to do much more than speed up
> git (in fact, _all_ use cases today are not used for that since this patch
> series hasn't been accepted yet :)).  They trigger builds, run verification
> tools, test passes, or other tasks.

Yeah, but some people might only be interested in installing Watchman
or similar tools to speed up "git status".

> I'm afraid that attempting to have the user configure git to configure the
> tool "automatically" is just adding an extra layer of complexity rather than
> making it simpler.

I think that for the user it makes things simpler, as the user
wouldn't have to configure anything.

For example if the hook does something like the following :

# Check that watchman is already watching the worktree
if ! watchman watch-list | grep "\"$GIT_WORK_TREE\""
then
       # Ask watchman to watch the worktree
       watchman watch "$GIT_WORK_TREE"
       exit 1
fi

# Query Watchman for all the changes since the requested time
echo "[\"query\", \"$GIT_WORK_TREE\", {\"since\": $1,
\"fields\":[\"name\"]}]" | \
watchman -j | \
...

Then users might not need to configure Watchman in the first place,
and if they move their repo they might not need to reconfigure
Watchman.

> I'll leave that to a future patch series to work out.

Yeah, the above improvement can be done later.

>>>> I am also wondering about sparse checkout, as we might want to pass
>>>> all the directories we are interested in.
>>>> How is it supposed to work with sparse checkout?
>>>
>>>
>>> The fsmonitor code works well with or without a sparse-checkout.  The
>>> file
>>> system monitor is unaware of the sparse checkout so will notify git about
>>> any change irrespective of whether git will eventually ignore it because
>>> the
>>> skip worktree bit is set.
>>
>> I was wondering if it could ease the job for the monitoring service
>> and perhaps improve performance to just ask to watch the directories
>> we are interested in when using sparse checkout.
>> On Linux it looks like a separate inotify watch is created for every
>> subdirectory and there is maximum amount of inotify watches per user.
>> This can be increased by writing in
>> /proc/sys/fs/inotify/max_user_watches, but it is not nice to have to
>> ask admins to increase this.
>
> Having a single instance that watches the root of the working directory is
> the simplest model and minimizes use of system resources like inotify as
> there is only one needed per clone.

From https://linux.die.net/man/1/inotifywait:

-r, --recursive

Watch all subdirectories of any directories passed as arguments.
Watches will be set up recursively to an unlimited depth. Symbolic
links are not traversed. Newly created subdirectories will also be
watched.

Warning: If you use this option while watching the root directory of a
large tree, it may take quite a while until all inotify watches are
established, and events will not be received in this time. Also, since
one inotify watch will be established per subdirectory, it is possible
that the maximum amount of inotify watches per user will be reached.
The default maximum is 8192; it can be increased by writing to
/proc/sys/fs/inotify/max_user_watches.

> In addition, when the sparse-checkout file is modified, there is no need to
> try and automatically update the monitor by adding and removing folders as
> necessary.

Yeah, I agree it is simpler to not have to worry about the
sparse-checkout file being modified.

> Finally, if files or directories are excluded via sparse-checkout, they are
> removed from the working directory at checkout time so don't add any
> additional overhead to the file system watcher anyway as they clearly can't
> generate write events if they don't exist.

Yeah, but if you configure sparse-checkout when you already have
untracked files in many directories, like .o files, these files and
the directories they are in are not removed when you perform a
checkout, so the directories are still watched by the file system
watcher.

  reply	other threads:[~2017-05-31  7:37 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-18 20:13 [PATCH v2 0/6] Fast git status via a file system watcher Ben Peart
2017-05-18 20:13 ` [PATCH v2 1/6] bswap: add 64 bit endianness helper get_be64 Ben Peart
2017-05-18 20:13 ` [PATCH v2 2/6] dir: make lookup_untracked() available outside of dir.c Ben Peart
2017-05-18 20:13 ` [PATCH v2 3/6] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files Ben Peart
2017-05-19 15:33   ` Ben Peart
2017-05-20 10:41     ` Junio C Hamano
2017-05-24 12:30   ` Christian Couder
2017-05-18 20:13 ` [PATCH v2 4/6] fsmonitor: add test cases for fsmonitor extension Ben Peart
2017-05-20 16:55   ` Torsten Bögershausen
2017-05-18 20:13 ` [PATCH v2 5/6] fsmonitor: add documentation for the " Ben Peart
2017-05-20 11:28   ` Junio C Hamano
2017-05-20 12:10   ` Ævar Arnfjörð Bjarmason
2017-05-22 16:18     ` Ben Peart
2017-05-22 17:28       ` Ævar Arnfjörð Bjarmason
2017-05-25 13:49         ` Ben Peart
2017-05-18 20:13 ` [PATCH v2 6/6] fsmonitor: add a sample query-fsmonitor hook script for Watchman Ben Peart
2017-05-24 13:12   ` Christian Couder
2017-05-26  9:47     ` Ævar Arnfjörð Bjarmason
2017-05-26 16:02       ` Ben Peart
2017-05-25 21:05   ` Ævar Arnfjörð Bjarmason
2017-05-24 10:54 ` [PATCH v2 0/6] Fast git status via a file system watcher Christian Couder
2017-05-25 13:55   ` Ben Peart
2017-05-27  6:57     ` Christian Couder
2017-05-30 18:05       ` Ben Peart
2017-05-30 20:33         ` Christian Couder
2017-05-30 23:11           ` Ben Peart
2017-05-31  7:37             ` Christian Couder [this message]
2017-05-31  7:59     ` Christian Couder
2017-05-31 13:37       ` Ben Peart
2017-05-31 14:10         ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAP8UFD3aZU1D+bbqDEV05QG06gySt9PPL7XD4Xz0TP84zM8M6A@mail.gmail.com \
    --to=christian.couder@gmail.com \
    --cc=David.Turner@twosigma.com \
    --cc=benpeart@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=pclouds@gmail.com \
    --cc=peartben@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).