git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* What to do with fsmonitor-watchman hook and config-based hooks?
@ 2021-03-11 18:42 Emily Shaffer
  2021-03-11 19:23 ` Derrick Stolee
  0 siblings, 1 reply; 6+ messages in thread
From: Emily Shaffer @ 2021-03-11 18:42 UTC (permalink / raw)
  To: Git List
  Cc: Jonathan Nieder, Jonathan Tan, Kevin Willford, Derrick Stolee,
	Jeff Hostetler, Jeff King

Hi folks, I grabbed a bunch of CC from 'git blame fsmonitor.c' so
sorry if you don't care about fsmonitor-watchman anymore... :) Note
that this whole conversation has to do with the series proposed at
https://lore.kernel.org/git/20210311021037.3001235-1-emilyshaffer@google.com.

When I was looking through the remaining hooks in
Documentation/githooks.txt I noticed that the fsmonitor-watchman hook
is implemented somewhat differently than most other hooks. As I
understand it, to use that hook someone needs to:

1. Configure core.fsmonitor with a path to some fsmonitor-watchman
hook. The documentation in 'Documentation/githooks.txt' claims that it
needs to point to '.git/hooks/fsmonitor-watchman' or
'.git/hooks/fsmonitor-watchmanv2', but I don't see that constraint
enforced when the config is read (config.c:git_config_get_fsmonitor()
and fsmonitor.c:query_fsmonitor()), so it seems that core.fsmonitor
can point to wherever it wants. (Plus
'templates/blt/hooks/fsmonitor-watchman.sample' suggests setting
'core.fsmonitor' = '.git/hooks/query-watchman'...)
2. Configure core.fsmonitorhookversion to 1 or 2, to indicate the arg
structure for the executable specified in core.fsmonitor.

Because the executable doesn't necessarily live in .git/hooks/,
fsmonitor.c:query_fsmonitor() completely skips the "API" for running
hooks (run-command.h:run_hook_le()) and just uses
run-command.h:capture_command() directly.

Interestingly, this is really similar to the way config-based hooks
(https://lore.kernel.org/git/20210311021037.3001235-1-emilyshaffer@google.com)
work - but different enough that I think it may be awkward to
transition fsmonitor-watchman to work like everything else. So, some
questions, plus a proposal:

Q1: Does fsmonitor-watchman make sense to provide more than once
(since config-based hooks support multiple execs per hook)? (I get the
feeling the answer is no here - but what if, for example, I have an
NTFS partition with some repos and an ext4 partition with some other
repos, and only want to configure fsmonitor magic once at global
scope? I don't know anything about fsmonitor so maybe that is a very
stupid question ;) )
Q2: Does it make sense to specify anywhere other than per-repo scope?
(system/global scope, e.g., applying to all repos? One could imagine
providing it in a config which is inherited by submodules of a
superproject?) It looks like it operates on CWD + timestamp-ish, so it
seems to me like some user might want to only set it up once (e.g.
globally) and have it Just Work in all their repos afterwards. In
fact, I think you could configure core.fsmonitor in ~/.gitconfig today
and it would work for everything.

If we want to support checking vs multiple fsmonitor-watchman execs
(that is, if Q1's answer is "yes"), then I propose:
1. Check for values on 'hook.fsmonitor-watchman.command' and add them
to the execution list in config order.
  1a. Here we could either check for
'hookcmd.<command-value>.fsmonitor-watchman-version' or we could just
check separately for 'hook.fsmonitor-watchmanv2.command' as implied by
'githooks.txt' to determine the arg syntax for each hook.
2. Check for values on 'core.fsmonitor' and
'core.fsmonitorhookversion' to add a "legacy" hook at the end of the
execution list. (This is the same thing we do for hooks that live in
.git/hooks/<hookname>.)

If we don't want to support adding multiple fsmonitor-watchman execs,
then I propose:
1. Check hook.runHookDir's value to decide whether we believe in "legacy" hooks.
2. If we do, then check for core.fsmonitor and
core.fsmonitorhookversion, and run that one.
3. If we don't, or if core.fsmonitor was unset, then check for
(hook.fsmonitor-watchman +
hookcmd.<command-value>.fsmonitor-watchman-version) or
(hook.fsmonitor-watchman.command or
hook.fsmonitor-watchmanv2.command), error if we find more than one
candidate, and run the one candidate we do find. (Or I could see us
running only the most-local one, but that might be confusing for the
user.)

Thanks, all - I'm curious to learn more about this hook and the right
way to move forward :)

 - Emily

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: What to do with fsmonitor-watchman hook and config-based hooks?
  2021-03-11 18:42 What to do with fsmonitor-watchman hook and config-based hooks? Emily Shaffer
@ 2021-03-11 19:23 ` Derrick Stolee
  2021-03-11 19:36   ` Emily Shaffer
  0 siblings, 1 reply; 6+ messages in thread
From: Derrick Stolee @ 2021-03-11 19:23 UTC (permalink / raw)
  To: Emily Shaffer, Git List
  Cc: Jonathan Nieder, Jonathan Tan, Kevin Willford, Derrick Stolee,
	Jeff Hostetler, Jeff King

On 3/11/2021 1:42 PM, Emily Shaffer wrote:
> Hi folks, I grabbed a bunch of CC from 'git blame fsmonitor.c' so
> sorry if you don't care about fsmonitor-watchman anymore... :) Note
> that this whole conversation has to do with the series proposed at
> https://lore.kernel.org/git/20210311021037.3001235-1-emilyshaffer@google.com.
> 
> When I was looking through the remaining hooks in
> Documentation/githooks.txt I noticed that the fsmonitor-watchman hook
> is implemented somewhat differently than most other hooks. As I
> understand it, to use that hook someone needs to:
> 
> 1. Configure core.fsmonitor with a path to some fsmonitor-watchman
> hook. The documentation in 'Documentation/githooks.txt' claims that it
> needs to point to '.git/hooks/fsmonitor-watchman' or
> '.git/hooks/fsmonitor-watchmanv2', but I don't see that constraint
> enforced when the config is read (config.c:git_config_get_fsmonitor()
> and fsmonitor.c:query_fsmonitor()), so it seems that core.fsmonitor
> can point to wherever it wants. (Plus
> 'templates/blt/hooks/fsmonitor-watchman.sample' suggests setting
> 'core.fsmonitor' = '.git/hooks/query-watchman'...)
> 2. Configure core.fsmonitorhookversion to 1 or 2, to indicate the arg
> structure for the executable specified in core.fsmonitor.

This is correct.

> Because the executable doesn't necessarily live in .git/hooks/,
> fsmonitor.c:query_fsmonitor() completely skips the "API" for running
> hooks (run-command.h:run_hook_le()) and just uses
> run-command.h:capture_command() directly.
> 
> Interestingly, this is really similar to the way config-based hooks
> (https://lore.kernel.org/git/20210311021037.3001235-1-emilyshaffer@google.com)
> work - but different enough that I think it may be awkward to
> transition fsmonitor-watchman to work like everything else. So, some
> questions, plus a proposal:

You'll want to get Jeff Hostetler's perspective first, but I'm of
the opinion that we'll want to stop recommending the Watchman hook
when the Git-native FS Monitor feature lands, with some time to
let things release and simmer before we remove the core.fsmonitor
config option. We would also need a Linux implementation, but that
is planned.

If we think that the plan of "eventually, FS Monitor won't use hooks"
is reasonable, then how much do you want to spend time unifying it
with your config-based hooks? Can they live together temporarily?

(Naturally, deprecating FS Monitor through the hook might not be
a reasonable plan.)

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: What to do with fsmonitor-watchman hook and config-based hooks?
  2021-03-11 19:23 ` Derrick Stolee
@ 2021-03-11 19:36   ` Emily Shaffer
  2021-03-12 16:51     ` Jeff Hostetler
  0 siblings, 1 reply; 6+ messages in thread
From: Emily Shaffer @ 2021-03-11 19:36 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Git List, Jonathan Nieder, Jonathan Tan, Kevin Willford,
	Derrick Stolee, Jeff Hostetler, Jeff King

On Thu, Mar 11, 2021 at 02:23:03PM -0500, Derrick Stolee wrote:
> 
> On 3/11/2021 1:42 PM, Emily Shaffer wrote:
> > Hi folks, I grabbed a bunch of CC from 'git blame fsmonitor.c' so
> > sorry if you don't care about fsmonitor-watchman anymore... :) Note
> > that this whole conversation has to do with the series proposed at
> > https://lore.kernel.org/git/20210311021037.3001235-1-emilyshaffer@google.com.
> > 
> > When I was looking through the remaining hooks in
> > Documentation/githooks.txt I noticed that the fsmonitor-watchman hook
> > is implemented somewhat differently than most other hooks. As I
> > understand it, to use that hook someone needs to:
> > 
> > 1. Configure core.fsmonitor with a path to some fsmonitor-watchman
> > hook. The documentation in 'Documentation/githooks.txt' claims that it
> > needs to point to '.git/hooks/fsmonitor-watchman' or
> > '.git/hooks/fsmonitor-watchmanv2', but I don't see that constraint
> > enforced when the config is read (config.c:git_config_get_fsmonitor()
> > and fsmonitor.c:query_fsmonitor()), so it seems that core.fsmonitor
> > can point to wherever it wants. (Plus
> > 'templates/blt/hooks/fsmonitor-watchman.sample' suggests setting
> > 'core.fsmonitor' = '.git/hooks/query-watchman'...)
> > 2. Configure core.fsmonitorhookversion to 1 or 2, to indicate the arg
> > structure for the executable specified in core.fsmonitor.
> 
> This is correct.
> 
> > Because the executable doesn't necessarily live in .git/hooks/,
> > fsmonitor.c:query_fsmonitor() completely skips the "API" for running
> > hooks (run-command.h:run_hook_le()) and just uses
> > run-command.h:capture_command() directly.
> > 
> > Interestingly, this is really similar to the way config-based hooks
> > (https://lore.kernel.org/git/20210311021037.3001235-1-emilyshaffer@google.com)
> > work - but different enough that I think it may be awkward to
> > transition fsmonitor-watchman to work like everything else. So, some
> > questions, plus a proposal:
> 
> You'll want to get Jeff Hostetler's perspective first, but I'm of
> the opinion that we'll want to stop recommending the Watchman hook
> when the Git-native FS Monitor feature lands, with some time to
> let things release and simmer before we remove the core.fsmonitor
> config option. We would also need a Linux implementation, but that
> is planned.
> 
> If we think that the plan of "eventually, FS Monitor won't use hooks"
> is reasonable, then how much do you want to spend time unifying it
> with your config-based hooks? Can they live together temporarily?

Oh, that's useful context. If fsmonitor-watchman hook is going away, I
don't think it's necessary to convert it at all, unless someone starts
asking for multihooks or something. There's no practical conflict between
config-based hooks and the current implementation - it's just a
surprising inconsistency. I'll be curious to hear Jeff's opinion, of
course, but given what you're describing, I'm not convinced it's worth
spending any time on - and when we're ready to stop checking
core.fsmonitor then the inconsistency will just go away.

The documentation in githooks.txt could use an update, though. :)

 - Emily

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: What to do with fsmonitor-watchman hook and config-based hooks?
  2021-03-11 19:36   ` Emily Shaffer
@ 2021-03-12 16:51     ` Jeff Hostetler
  2021-03-12 18:33       ` Ævar Arnfjörð Bjarmason
  2021-03-12 20:33       ` Emily Shaffer
  0 siblings, 2 replies; 6+ messages in thread
From: Jeff Hostetler @ 2021-03-12 16:51 UTC (permalink / raw)
  To: Emily Shaffer, Derrick Stolee
  Cc: Git List, Jonathan Nieder, Jonathan Tan, Kevin Willford,
	Derrick Stolee, Jeff Hostetler, Jeff King



On 3/11/21 2:36 PM, Emily Shaffer wrote:
> On Thu, Mar 11, 2021 at 02:23:03PM -0500, Derrick Stolee wrote:
>>
>> On 3/11/2021 1:42 PM, Emily Shaffer wrote:
>>> Hi folks, I grabbed a bunch of CC from 'git blame fsmonitor.c' so
>>> sorry if you don't care about fsmonitor-watchman anymore... :) Note
>>> that this whole conversation has to do with the series proposed at
>>> https://lore.kernel.org/git/20210311021037.3001235-1-emilyshaffer@google.com.
>>>
>>> When I was looking through the remaining hooks in
>>> Documentation/githooks.txt I noticed that the fsmonitor-watchman hook
>>> is implemented somewhat differently than most other hooks. As I
>>> understand it, to use that hook someone needs to:
>>>
>>> 1. Configure core.fsmonitor with a path to some fsmonitor-watchman
>>> hook. The documentation in 'Documentation/githooks.txt' claims that it
>>> needs to point to '.git/hooks/fsmonitor-watchman' or
>>> '.git/hooks/fsmonitor-watchmanv2', but I don't see that constraint
>>> enforced when the config is read (config.c:git_config_get_fsmonitor()
>>> and fsmonitor.c:query_fsmonitor()), so it seems that core.fsmonitor
>>> can point to wherever it wants. (Plus
>>> 'templates/blt/hooks/fsmonitor-watchman.sample' suggests setting
>>> 'core.fsmonitor' = '.git/hooks/query-watchman'...)
>>> 2. Configure core.fsmonitorhookversion to 1 or 2, to indicate the arg
>>> structure for the executable specified in core.fsmonitor.
>>
>> This is correct.
>>
>>> Because the executable doesn't necessarily live in .git/hooks/,
>>> fsmonitor.c:query_fsmonitor() completely skips the "API" for running
>>> hooks (run-command.h:run_hook_le()) and just uses
>>> run-command.h:capture_command() directly.
>>>
>>> Interestingly, this is really similar to the way config-based hooks
>>> (https://lore.kernel.org/git/20210311021037.3001235-1-emilyshaffer@google.com)
>>> work - but different enough that I think it may be awkward to
>>> transition fsmonitor-watchman to work like everything else. So, some
>>> questions, plus a proposal:
>>
>> You'll want to get Jeff Hostetler's perspective first, but I'm of
>> the opinion that we'll want to stop recommending the Watchman hook
>> when the Git-native FS Monitor feature lands, with some time to
>> let things release and simmer before we remove the core.fsmonitor
>> config option. We would also need a Linux implementation, but that
>> is planned.
>>
>> If we think that the plan of "eventually, FS Monitor won't use hooks"
>> is reasonable, then how much do you want to spend time unifying it
>> with your config-based hooks? Can they live together temporarily?
> 
> Oh, that's useful context. If fsmonitor-watchman hook is going away, I
> don't think it's necessary to convert it at all, unless someone starts
> asking for multihooks or something. There's no practical conflict between
> config-based hooks and the current implementation - it's just a
> surprising inconsistency. I'll be curious to hear Jeff's opinion, of
> course, but given what you're describing, I'm not convinced it's worth
> spending any time on - and when we're ready to stop checking
> core.fsmonitor then the inconsistency will just go away.
> 
> The documentation in githooks.txt could use an update, though. :)
> 
>   - Emily
> 

I don't think it makes sense to have multiple fsmonitors for a given
working directory.  They are fairly expensive to operate (listening
to the kernel for events and building an in-memory database of changed
files) and I'm not sure how two running concurrently (and listening to
the same kernel event stream) would come up with different answers.

The thing about the fsmonitor-watchman or query-watchman hook is that
it is a bash/perl script that talks to a long-running service daemon.
The hook script itself is just a proxy to decode the JSON response from
Watchman and emit it on stdout in a way that the foreground Git command
expects.  Git cannot directly talk to Watchman because it doesn't
currently have the plumbing to talk to it on anything other than a fd
pair that it sets up to give to the hook child.

So your example of a watcher for NTFS and a separate watcher for ext4
means you could maybe have two services running, but the foreground
Git command would only need to spawn a single hook and maybe it would
decide which service to talk to based upon the filesystem type of the
working directory.  Or you set the repo-local config for each repo to
point to a hook script that knew which service to talk to.  Either way
you only need to invoke one hook.

Setting it globally is an option, but fsmonitor is beneficial for large
repos and working directories.  There is an overhead to having it
running and talking to it.  (Spawning a hook written in PERL, having
the hook talk to Watchman via some form of IPC, the hook parsing the 
mess of JSON returned, pumping that data back over stdout to Git, and
having Git apply the results to the ce_flags.)  All of that has to
happen *before* Git actually starts to do anything useful.  For small
repos, all of that overhead costs more than the cost of letting the
foreground `git status` just lstat() everything.  Of course all of this
depends on the speed of your filesystem and the size of your working
directory (and whether you have a sparse-checkout and etc), but there
are lots of repos that just don't need fsmonitor.

So yes, I would leave the existing fsmonitor code as is and not try
to combine it with your current efforts (even if I wasn't working on
a replacement for the fsmonitor-watchman setup).

As Stolee mentioned I'm currently working on a builtin fsmonitor
feature.  It will have a native coded-in-Git-C-code daemon to watch
the filesystem.  Clients, such as `git status`, will directly talk
to it via my "Simple IPC" patch series and completely bypass all of
the hook stuff.

Long term both fsmonitor solutions can co-exist.  Or we can eventually
deprecate the hook version.  Given that, I don't see a need to change
the existing fsmonitor hook code.

Hope this helps,
Jeff


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: What to do with fsmonitor-watchman hook and config-based hooks?
  2021-03-12 16:51     ` Jeff Hostetler
@ 2021-03-12 18:33       ` Ævar Arnfjörð Bjarmason
  2021-03-12 20:33       ` Emily Shaffer
  1 sibling, 0 replies; 6+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-12 18:33 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Emily Shaffer, Derrick Stolee, Git List, Jonathan Nieder,
	Jonathan Tan, Kevin Willford, Derrick Stolee, Jeff Hostetler,
	Jeff King


On Fri, Mar 12 2021, Jeff Hostetler wrote:

> On 3/11/21 2:36 PM, Emily Shaffer wrote:
>> On Thu, Mar 11, 2021 at 02:23:03PM -0500, Derrick Stolee wrote:
>>>
>>> On 3/11/2021 1:42 PM, Emily Shaffer wrote:
>>>> Hi folks, I grabbed a bunch of CC from 'git blame fsmonitor.c' so
>>>> sorry if you don't care about fsmonitor-watchman anymore... :) Note
>>>> that this whole conversation has to do with the series proposed at
>>>> https://lore.kernel.org/git/20210311021037.3001235-1-emilyshaffer@google.com.
>>>>
>>>> When I was looking through the remaining hooks in
>>>> Documentation/githooks.txt I noticed that the fsmonitor-watchman hook
>>>> is implemented somewhat differently than most other hooks. As I
>>>> understand it, to use that hook someone needs to:
>>>>
>>>> 1. Configure core.fsmonitor with a path to some fsmonitor-watchman
>>>> hook. The documentation in 'Documentation/githooks.txt' claims that it
>>>> needs to point to '.git/hooks/fsmonitor-watchman' or
>>>> '.git/hooks/fsmonitor-watchmanv2', but I don't see that constraint
>>>> enforced when the config is read (config.c:git_config_get_fsmonitor()
>>>> and fsmonitor.c:query_fsmonitor()), so it seems that core.fsmonitor
>>>> can point to wherever it wants. (Plus
>>>> 'templates/blt/hooks/fsmonitor-watchman.sample' suggests setting
>>>> 'core.fsmonitor' = '.git/hooks/query-watchman'...)
>>>> 2. Configure core.fsmonitorhookversion to 1 or 2, to indicate the arg
>>>> structure for the executable specified in core.fsmonitor.
>>>
>>> This is correct.
>>>
>>>> Because the executable doesn't necessarily live in .git/hooks/,
>>>> fsmonitor.c:query_fsmonitor() completely skips the "API" for running
>>>> hooks (run-command.h:run_hook_le()) and just uses
>>>> run-command.h:capture_command() directly.
>>>>
>>>> Interestingly, this is really similar to the way config-based hooks
>>>> (https://lore.kernel.org/git/20210311021037.3001235-1-emilyshaffer@google.com)
>>>> work - but different enough that I think it may be awkward to
>>>> transition fsmonitor-watchman to work like everything else. So, some
>>>> questions, plus a proposal:
>>>
>>> You'll want to get Jeff Hostetler's perspective first, but I'm of
>>> the opinion that we'll want to stop recommending the Watchman hook
>>> when the Git-native FS Monitor feature lands, with some time to
>>> let things release and simmer before we remove the core.fsmonitor
>>> config option. We would also need a Linux implementation, but that
>>> is planned.
>>>
>>> If we think that the plan of "eventually, FS Monitor won't use hooks"
>>> is reasonable, then how much do you want to spend time unifying it
>>> with your config-based hooks? Can they live together temporarily?
>> Oh, that's useful context. If fsmonitor-watchman hook is going away,
>> I
>> don't think it's necessary to convert it at all, unless someone starts
>> asking for multihooks or something. There's no practical conflict between
>> config-based hooks and the current implementation - it's just a
>> surprising inconsistency. I'll be curious to hear Jeff's opinion, of
>> course, but given what you're describing, I'm not convinced it's worth
>> spending any time on - and when we're ready to stop checking
>> core.fsmonitor then the inconsistency will just go away.
>> The documentation in githooks.txt could use an update, though. :)
>>   - Emily
>> 
>
> I don't think it makes sense to have multiple fsmonitors for a given
> working directory.  They are fairly expensive to operate (listening
> to the kernel for events and building an in-memory database of changed
> files) and I'm not sure how two running concurrently (and listening to
> the same kernel event stream) would come up with different answers.
>
> The thing about the fsmonitor-watchman or query-watchman hook is that
> it is a bash/perl script that talks to a long-running service daemon.
> The hook script itself is just a proxy to decode the JSON response from
> Watchman and emit it on stdout in a way that the foreground Git command
> expects.  Git cannot directly talk to Watchman because it doesn't
> currently have the plumbing to talk to it on anything other than a fd
> pair that it sets up to give to the hook child.
>
> So your example of a watcher for NTFS and a separate watcher for ext4
> means you could maybe have two services running, but the foreground
> Git command would only need to spawn a single hook and maybe it would
> decide which service to talk to based upon the filesystem type of the
> working directory.  Or you set the repo-local config for each repo to
> point to a hook script that knew which service to talk to.  Either way
> you only need to invoke one hook.
>
> Setting it globally is an option, but fsmonitor is beneficial for large
> repos and working directories.  There is an overhead to having it
> running and talking to it.  (Spawning a hook written in PERL, having
> the hook talk to Watchman via some form of IPC, the hook parsing the
> mess of JSON returned, pumping that data back over stdout to Git, and
> having Git apply the results to the ce_flags.)  All of that has to
> happen *before* Git actually starts to do anything useful.  For small
> repos, all of that overhead costs more than the cost of letting the
> foreground `git status` just lstat() everything.  Of course all of this
> depends on the speed of your filesystem and the size of your working
> directory (and whether you have a sparse-checkout and etc), but there
> are lots of repos that just don't need fsmonitor.
>
> So yes, I would leave the existing fsmonitor code as is and not try
> to combine it with your current efforts (even if I wasn't working on
> a replacement for the fsmonitor-watchman setup).
>
> As Stolee mentioned I'm currently working on a builtin fsmonitor
> feature.  It will have a native coded-in-Git-C-code daemon to watch
> the filesystem.  Clients, such as `git status`, will directly talk
> to it via my "Simple IPC" patch series and completely bypass all of
> the hook stuff.
>
> Long term both fsmonitor solutions can co-exist.  Or we can eventually
> deprecate the hook version.  Given that, I don't see a need to change
> the existing fsmonitor hook code.

I'm all for built-in watchman, but this claim about Perl etc. being the
slow part and the hook parsing etc. just doesn't add up with performance
testing I've done on it.

See numbers at:
https://lore.kernel.org/git/CACBZZX6D8oC34qat7kdrDOWC5eYm-DRkMWG9eOPPvKKsQtgPyw@mail.gmail.com/#t

I don't have this re-paged into my memory, but I remember the invoking
of watchman etc. being trivially cheap, the cost was taking the
information from the hook, updating git's view of the world (index
etc.), and spewing out info for the user.

It was something like "git status" taking 500ms, invoking the watchman
binary (talking to the daemon) would take 2-5ms, the whole Perl
etc. parsing another mfew ms, and the rest of "git status" 10x that time
at least...


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: What to do with fsmonitor-watchman hook and config-based hooks?
  2021-03-12 16:51     ` Jeff Hostetler
  2021-03-12 18:33       ` Ævar Arnfjörð Bjarmason
@ 2021-03-12 20:33       ` Emily Shaffer
  1 sibling, 0 replies; 6+ messages in thread
From: Emily Shaffer @ 2021-03-12 20:33 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Derrick Stolee, Git List, Jonathan Nieder, Jonathan Tan,
	Kevin Willford, Derrick Stolee, Jeff Hostetler, Jeff King

On Fri, Mar 12, 2021 at 11:51:38AM -0500, Jeff Hostetler wrote:
> I don't think it makes sense to have multiple fsmonitors for a given
> working directory.  They are fairly expensive to operate (listening
> to the kernel for events and building an in-memory database of changed
> files) and I'm not sure how two running concurrently (and listening to
> the same kernel event stream) would come up with different answers.
> 
> The thing about the fsmonitor-watchman or query-watchman hook is that
> it is a bash/perl script that talks to a long-running service daemon.
> The hook script itself is just a proxy to decode the JSON response from
> Watchman and emit it on stdout in a way that the foreground Git command
> expects.  Git cannot directly talk to Watchman because it doesn't
> currently have the plumbing to talk to it on anything other than a fd
> pair that it sets up to give to the hook child.
> 
> So your example of a watcher for NTFS and a separate watcher for ext4
> means you could maybe have two services running, but the foreground
> Git command would only need to spawn a single hook and maybe it would
> decide which service to talk to based upon the filesystem type of the
> working directory.  Or you set the repo-local config for each repo to
> point to a hook script that knew which service to talk to.  Either way
> you only need to invoke one hook.

Thanks, that makes a lot of sense!

> Setting it globally is an option, but fsmonitor is beneficial for large
> repos and working directories.  There is an overhead to having it
> running and talking to it.  (Spawning a hook written in PERL, having
> the hook talk to Watchman via some form of IPC, the hook parsing the mess of
> JSON returned, pumping that data back over stdout to Git, and
> having Git apply the results to the ce_flags.)  All of that has to
> happen *before* Git actually starts to do anything useful.  For small
> repos, all of that overhead costs more than the cost of letting the
> foreground `git status` just lstat() everything.  Of course all of this
> depends on the speed of your filesystem and the size of your working
> directory (and whether you have a sparse-checkout and etc), but there
> are lots of repos that just don't need fsmonitor.
> 
> So yes, I would leave the existing fsmonitor code as is and not try
> to combine it with your current efforts (even if I wasn't working on
> a replacement for the fsmonitor-watchman setup).
> 
> As Stolee mentioned I'm currently working on a builtin fsmonitor
> feature.  It will have a native coded-in-Git-C-code daemon to watch
> the filesystem.  Clients, such as `git status`, will directly talk
> to it via my "Simple IPC" patch series and completely bypass all of
> the hook stuff.
> 
> Long term both fsmonitor solutions can co-exist.  Or we can eventually
> deprecate the hook version.  Given that, I don't see a need to change
> the existing fsmonitor hook code.

Yeah, that seems like the direction everyone agrees with. Thanks very
much for the detailed explanation, that helped me feel sure that doing
nothing is the right approach (how convenient...) :)

I think, then, all that's needed is a patch to the githooks.txt doc 1)
removing the incorrect info about core.fsmonitor's contents needing to
point to a specific name in .git/hooks/ and 2) explaining that because
it uses this special config, it doesn't use the usual hook.*.command
approach.

 - Emily

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-03-12 20:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-11 18:42 What to do with fsmonitor-watchman hook and config-based hooks? Emily Shaffer
2021-03-11 19:23 ` Derrick Stolee
2021-03-11 19:36   ` Emily Shaffer
2021-03-12 16:51     ` Jeff Hostetler
2021-03-12 18:33       ` Ævar Arnfjörð Bjarmason
2021-03-12 20:33       ` Emily Shaffer

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).