git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Thomas Rast <tr@thomasrast.ch>
To: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH/WIP v2 00/14] inotify support
Date: Sun, 19 Jan 2014 18:04:36 +0100	[thread overview]
Message-ID: <87mwirewd7.fsf@thomasrast.ch> (raw)
In-Reply-To: <1389952060-12297-1-git-send-email-pclouds@gmail.com> ("Nguyễn	Thái Ngọc Duy"'s message of "Fri, 17 Jan 2014 16:47:26 +0700")

Nguyễn Thái Ngọc Duy <pclouds@gmail.com> writes:
>   read-cache: save trailing sha-1
>   read-cache: new extension to mark what file is watched
>   read-cache: connect to file watcher
>   read-cache: ask file watcher to watch files
>   read-cache: put some limits on file watching
>   read-cache: get modified file list from file watcher
>   read-cache: add config to start file watcher automatically
>   read-cache: add GIT_TEST_FORCE_WATCHER for testing
>   file-watcher: add --shutdown and --log options
>   file-watcher: automatically quit
>   file-watcher: support inotify
>   file-watcher: exit when cwd is gone
>   pkt-line.c: increase buffer size to 8192
>   t1301: exclude sockets from file permission check

I never got the last three patches, did you send them?

Also, this doesn't cleanly apply anywhere at my end.  Can you push it
somewhere for easier experimentation?

> This is getting in better shape. Still wondering if the design is
> right, so documentation, tests and some error cases are still
> neglected. I have not addressed Jonathan's and Jeff's comments in this
> reroll, but I haven't forgotten them yet. The test suite seems to be
> fine when file-watcher is forced on with GIT_TEST_FORCE_WATCHER set..

I tried to figure out whether there were any corners you are painting it
into, but doing so without a clear overview of the protocol makes my
head spin.

So here's what I gather from the patches (I would have started from the
final result, but see above).

The slow path, before the watcher is ready, works like:

  spawn watcher
  send "hello $index_sha1"
  receive "hello "
  mark most files CE_WATCHED
  send "watch $paths" for each CE_WATCHED
    the paths are actually a pkt-line-encoded series of paths
    get back "fine $n" (n = number of watches processed)
  do work as usual (including lstat())

The fast path:

  load CE_WATCHED from index
  send "hello $index_sha1"
  receive "hello $index_sha1"
  send "status"
    get back "new $path" for each changed file
    again aggregate
  mark files that are "new" as ~CE_WATCHED
  files that are CE_WATCHED can skip their lstat(), since they are unchanged

On index writing (both paths):

  save CE_WATCHED in index extension
  send "bye $index_sha1"
  send "forget $index_sha1 $path" for every path that is known to be changed

So I think the watcher protocol can be specified roughly as:

  Definitions: The watcher is a separate process that maintains a set of
  _watched paths_.  Git uses the commands below to add or remove paths
  from this set.

  In addition it maintains a set of _changed paths_, which is a subset
  of the watched paths.  Git occasionally queries the changed paths.  If
  at any time after a path P was added to the "watched" set, P has or
  may have changed, it MUST be added to the "changed" set.

  Note that a path being "unchanged" under this definition does NOT mean
  that it is unchanged in the index as per `git diff`.  It may have been
  changed before the watch was issued!  Therefore, Git MUST test whether
  the path started out unchanged using lstat() AFTER it was added to the
  "watched" set.

  Handshake::
    On connecting, git sends "hello <magic>".  <magic> is an opaque
    string that identifies the state of the index.  The watcher MUST
    respond with "hello <previous_magic>", where previous_magic is
    obtained from the "bye" command (below).  If <magic> !=
    <previous_magic>, the watcher MUST reset state for this repository.

  Watch::
    Git sends "watch <pathlist>" where the <pathlist> consists of a
    series of 4-digit lengths and literal pathnames (as in the pkt-line
    format) for each path it is interested in.  The watcher MUST respond
    with "fine <n>" after processing a message that contained <n> paths.
    The watcher MUST add each path to the "watched" set and remove it
    from the "changed" set (no error if it was already watched, or not
    changed).

  Status::
    Git sends "status".  The watcher MUST respond with its current
    "changed" set.  This uses the format "new <pathlist>", with
    <pathlist> formatted as for the "watch" command.

  Bye::
    Git sends "bye <magic>" to indicate it has finished writing the
    index.  The watcher must store <magic> so as to use it as the
    <previous_magic> in a "hello" response.

  Forget::
    Git sends "forget <pathlist>", with the <pathlist> formatted as for
    the "watch" command.  The watcher SHOULD remove each path in
    <pathlist> from its watched set.

Did I get that approximately straight?  Perhaps you can fix up as needed
and then turn it into the documentation for the protocol.

There are several points about this that I don't like:

* What does the datagram socket buy us?  You already do a lot of
  pkt-line work for the really big messages, so why not just use
  pkt-line everywhere and a streaming socket?

* The handshake should have room for capabilities, so that we can extend
  the protocol.

* The hello/hello handshake is confusing, and probably would allow you
  to point two watchers at each other without them noticing.  Make it
  hello/ready, or some other unambiguous choice.

* I took some liberty and tried to fully define the transitions between
  the sets.  So even though I'm not sure this is currently handled, I
  made it well-defined to issue "watch" for a path that is in the
  changed set.

* "bye" is confusing, because in practice git issues "forget"s after
  "bye".  The best I can come up with is "setstate", I'm sure you have
  better ideas.

There's also the problem of ordering guarantees between the socket and
inotify.  I haven't found any, so I would conservatively assume that the
socket messages may in fact arrive before inotify, which is a race in
the current code.  E.g., in the sequence 'touch foo; git status' the
daemon may see

  socket                    inotify                  
  < hello...
  < status
  > new <empty list>
                            touch foo

I think a clever way to handle this would be to add a new command:

  Wait::
    This command serves synchronization.  Git creates a file of its
    choice in $GIT_DIR/watch (say, `.git/watch/wait.<random>`).  Then it
    sends "wait <path>".  The watcher MUST block until it has processed
    all change notifications up to and including <path>.

This assumes the FS notification API to be ordered, which appears to be
the case for inotify; from inotify(7):

  The events returned by reading from an inotify file descriptor form an
  ordered queue.  Thus, for example, it is guaranteed that when renaming
  from one directory to another, events will be produced in the correct
  order on the inotify file descriptor.

As a corollary, from watching .git/watch you get free notification of
'rm .git/watch/socket' as a termination signal.

We also need to think about how other OS's APIs work with this.  From
what I've heard about the Windows API, it should work well, so that's
great.  I suspect the *bsd/darwin API corresponding to inotify won't be
too different, but it's better to work this out now.

> Thomas, you were a proponent of per-user daemon last time. I agree
> that is a better solution when you need to support submodules. So if
> you have time, have a look and see if anything I did may prevent
> per-user daemon changes later (hint, I have a few unfriendly exit() in
> file-watcher.c). You also worked with inotify before maybe you can
> help spot some mishandling too as I'm totally new to inotify.

As you note, the protocol for "help, I don't know what's true any more"
(i.e., it got an inotify buffer overflow event) needs to be a special
"reset" message, not exit().

The per-repo approach does require keeping open an extra FD per repo
though, regardless of whether the daemon is actually for all of the
user's repos.  The linuxen I run as desktop systems usually default
ulimit -n to 1024, so that's not a massive restriction, but still.

As a reminder to other reviewers, the reason I wanted a per-user watcher
is that /proc/sys/fs/inotify/max_user_instances defaults to 128 on my
systems.  We need one inotify FD per watcher process, and given that a
full android tree had something on the order of 300 repos last I looked,
that just won't fly.

As far as inotify corner-cases go, the only one I'm aware of is
directory renames.  I suspect we'll have to watch directories all the
way up to the repository root to reliably detect when this happens.  Not
sure how to best handle this.  Perhaps we should declare Git completely
agnostic wrt such issues, and behind the scenes issue all watches up to
the root even if we don't need them for anything other than directory
renames.


Ok, that's probably a confused sum of rambles.  Let me know if you can
make any sense of it.

-- 
Thomas Rast
tr@thomasrast.ch

  parent reply	other threads:[~2014-01-19 17:05 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-12 11:03 [PATCH 0/6] inotify support Nguyễn Thái Ngọc Duy
2014-01-12 11:03 ` [PATCH 1/6] read-cache: save trailing sha-1 Nguyễn Thái Ngọc Duy
2014-01-12 11:03 ` [PATCH 2/6] read-cache: new extension to mark what file is watched Nguyễn Thái Ngọc Duy
2014-01-13 17:02   ` Jonathan Nieder
2014-01-14  1:25     ` Duy Nguyen
2014-01-14  1:39   ` Duy Nguyen
2014-01-12 11:03 ` [PATCH 3/6] read-cache: connect to file watcher Nguyễn Thái Ngọc Duy
2014-01-15 10:58   ` Jeff King
2014-01-12 11:03 ` [PATCH 4/6] read-cache: get "updated" path list from " Nguyễn Thái Ngọc Duy
2014-01-12 11:03 ` [PATCH 5/6] read-cache: ask file watcher to watch files Nguyễn Thái Ngọc Duy
2014-01-12 11:03 ` [PATCH 6/6] file-watcher: support inotify Nguyễn Thái Ngọc Duy
2014-01-17  9:47 ` [PATCH/WIP v2 00/14] inotify support Nguyễn Thái Ngọc Duy
2014-01-17  9:47   ` [PATCH/WIP v2 01/14] read-cache: save trailing sha-1 Nguyễn Thái Ngọc Duy
2014-01-17  9:47   ` [PATCH/WIP v2 02/14] read-cache: new extension to mark what file is watched Nguyễn Thái Ngọc Duy
2014-01-17 11:19     ` Thomas Gummerer
2014-01-19 17:06     ` Thomas Rast
2014-01-20  1:38       ` Duy Nguyen
2014-01-17  9:47   ` [PATCH/WIP v2 03/14] read-cache: connect to file watcher Nguyễn Thái Ngọc Duy
2014-01-17 15:24     ` Torsten Bögershausen
2014-01-17 16:21       ` Duy Nguyen
2014-01-17  9:47   ` [PATCH/WIP v2 04/14] read-cache: ask file watcher to watch files Nguyễn Thái Ngọc Duy
2014-01-17  9:47   ` [PATCH/WIP v2 05/14] read-cache: put some limits on file watching Nguyễn Thái Ngọc Duy
2014-01-19 17:06     ` Thomas Rast
2014-01-20  1:36       ` Duy Nguyen
2014-01-17  9:47   ` [PATCH/WIP v2 06/14] read-cache: get modified file list from file watcher Nguyễn Thái Ngọc Duy
2014-01-17  9:47   ` [PATCH/WIP v2 07/14] read-cache: add config to start file watcher automatically Nguyễn Thái Ngọc Duy
2014-01-17  9:47   ` [PATCH/WIP v2 08/14] read-cache: add GIT_TEST_FORCE_WATCHER for testing Nguyễn Thái Ngọc Duy
2014-01-19 17:04     ` Thomas Rast
2014-01-20  1:32       ` Duy Nguyen
2014-01-17  9:47   ` [PATCH/WIP v2 09/14] file-watcher: add --shutdown and --log options Nguyễn Thái Ngọc Duy
2014-01-17  9:47   ` [PATCH/WIP v2 10/14] file-watcher: automatically quit Nguyễn Thái Ngọc Duy
2014-01-17  9:47   ` [PATCH/WIP v2 11/14] file-watcher: support inotify Nguyễn Thái Ngọc Duy
2014-01-19 17:04   ` Thomas Rast [this message]
2014-01-20  1:28     ` [PATCH/WIP v2 00/14] inotify support Duy Nguyen
2014-01-20 21:51       ` Thomas Rast
2014-01-28 10:46     ` Duy Nguyen
2014-02-03  4:28   ` [PATCH v3 00/26] " Nguyễn Thái Ngọc Duy
2014-02-03  4:28     ` [PATCH v3 01/26] pkt-line.c: rename global variable buffer[] to something less generic Nguyễn Thái Ngọc Duy
2014-02-03  4:28     ` [PATCH v3 02/26] pkt-line.c: add packet_write_timeout() Nguyễn Thái Ngọc Duy
2014-02-03  4:28     ` [PATCH v3 03/26] pkt-line.c: add packet_read_line_timeout() Nguyễn Thái Ngọc Duy
2014-02-03  4:28     ` [PATCH v3 04/26] unix-socket: make unlink() optional in unix_stream_listen() Nguyễn Thái Ngọc Duy
2014-02-03  4:28     ` [PATCH v3 05/26] Add git-file-watcher and basic connection handling logic Nguyễn Thái Ngọc Duy
2014-02-03  4:28     ` [PATCH v3 06/26] file-watcher: check socket directory permission Nguyễn Thái Ngọc Duy
2014-02-03  4:28     ` [PATCH v3 07/26] file-watcher: remove socket on exit Nguyễn Thái Ngọc Duy
2014-02-03  4:28     ` [PATCH v3 08/26] file-watcher: add --detach Nguyễn Thái Ngọc Duy
2014-02-03  4:28     ` [PATCH v3 09/26] read-cache: save trailing sha-1 Nguyễn Thái Ngọc Duy
2014-02-03  4:28     ` [PATCH v3 10/26] read-cache: new flag CE_WATCHED to mark what file is watched Nguyễn Thái Ngọc Duy
2014-02-03  4:28     ` [PATCH v3 11/26] Clear CE_WATCHED when set CE_VALID alone Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 12/26] read-cache: basic hand shaking to the file watcher Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 13/26] read-cache: ask file watcher to watch files Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 14/26] read-cache: put some limits on file watching Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 15/26] read-cache: get changed file list from file watcher Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 16/26] git-compat-util.h: add inotify stubs on non-Linux platforms Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 17/26] file-watcher: inotify support, watching part Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 18/26] file-watcher: inotify support, notification part Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 19/26] Wrap CE_VALID test with ce_valid() Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 20/26] read-cache: new variable to verify file-watcher results Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 21/26] Support running file watcher with the test suite Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 22/26] file-watcher: quit if $WATCHER/socket is gone Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 23/26] file-watcher: tests for the daemon Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 24/26] ls-files: print CE_WATCHED as W (or "w" with CE_VALID) Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 25/26] file-watcher: tests for the client side Nguyễn Thái Ngọc Duy
2014-02-03  4:29     ` [PATCH v3 26/26] Disable file-watcher with system inotify on some tests Nguyễn Thái Ngọc Duy
2014-02-08  8:04     ` [PATCH v3 00/26] inotify support Torsten Bögershausen
2014-02-08  8:53       ` Duy Nguyen
2014-02-09 20:19         ` Torsten Bögershausen
2014-02-10 10:37           ` Duy Nguyen
2014-02-10 16:55             ` Torsten Bögershausen
2014-02-10 23:34               ` Duy Nguyen
2014-02-17 12:36           ` Duy Nguyen
2014-02-19 20:35 ` [PATCH 0/6] " Shawn Pearce
2014-02-19 23:45   ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mwirewd7.fsf@thomasrast.ch \
    --to=tr@thomasrast.ch \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).