git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Ben Peart <peartben@gmail.com>
Cc: "Git Mailing List" <git@vger.kernel.org>,
	"Junio C Hamano" <gitster@pobox.com>,
	benpeart@microsoft.com,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>,
	David.Turner@twosigma.com, "Jeff King" <peff@peff.net>
Subject: Re: [PATCH v2 5/6] fsmonitor: add documentation for the fsmonitor extension.
Date: Mon, 22 May 2017 19:28:12 +0200	[thread overview]
Message-ID: <CACBZZX6LENwuVuTyU-RetaXz8jZtUp1dv+gmQQ281sPx1czPnA@mail.gmail.com> (raw)
In-Reply-To: <5ab333a4-c3cd-1cb5-ba3e-6b08fa14c9e7@gmail.com>

On Mon, May 22, 2017 at 6:18 PM, Ben Peart <peartben@gmail.com> wrote:
> On 5/20/2017 8:10 AM, Ævar Arnfjörð Bjarmason wrote:
>>>
>>> +== File System Monitor cache
>>> +
>>> +  The file system monitor cache tracks files for which the
>>> query-fsmonitor
>>> +  hook has told us about changes.  The signature for this extension is
>>> +  { 'F', 'S', 'M', 'N' }.
>>> +
>>> +  The extension starts with
>>> +
>>> +  - 32-bit version number: the current supported version is 1.
>>> +
>>> +  - 64-bit time: the extension data reflects all changes through the
>>> given
>>> +       time which is stored as the seconds elapsed since midnight,
>>> January 1, 1970.
>>> +
>>> +  - 32-bit bitmap size: the size of the CE_FSMONITOR_DIRTY bitmap.
>>> +
>>> +  - An ewah bitmap, the n-th bit indicates whether the n-th index entry
>>> +    is CE_FSMONITOR_DIRTY.
>>
>>
>> We already have a uint64_t in one place in the codebase (getnanotime)
>> which uses a 64 bit time for nanosecond accuracy, and numerous
>> filesystems already support nanosecond timestamps (ext4, that new
>> Apple thingy...).
>>
>> I don't know if any of the inotify/fsmonitor APIs support that yet,
>> but it seems inevitable that that'll be added if not, in some
>> pathological cases we can have a lot of files modified in 1 second, so
>> using nanosecond accuracy means there'll be a lot less data to
>> consider in some cases.
>>
>> It does mean this'll only work until the year ~2500, but that seems
>> like an acceptable trade-off.
>>
>
> I really don't think nano-second resolution is needed in this case for a few
> reasons.
>
> The number of files that can change within a given second is limited by the
> IO throughput of the underlying device. Even assuming a very fast device and
> very small files and changes, this won't be that many files.
>
> Without this patch, git would have scanned all those files every time. With
> this patch, git will only scan those files a 2nd time that are modified in
> the same second that it did the first scan *that came before the first scan
> started* (the "lots of files modified" section in the 1 second timeline
> below).
>
> |------------------------- one second ---------------------|
> |-lots of files modified - git status - more file modified-|
>
> Yes, some duplicate status checks can be made but its still a significant
> win in any reasonable scenario. Especially when you consider that it is
> pretty unusual to do git status/add/commit calls in the middle of making
> lots of changes to files.

I understand that we get most of the wins either way.

I'm just wondering why not make it nanosecond-resolution now from the
beginning since that's where major filesystems are heading already,
and changing stuff like this can be a hassle once it's initially out
there, whereas just dividing by 10^9 for APIs that need seconds from
the beginning is easy & future-proof.

There are some uses of git where this would probably matter in practice.

E.g. consider a git-annex backed fileserver using ext4, currently
git-annex comes with its own FS watcher as a daemon invoked via `git
annex watch` to constantly add new files without killing performance
via a status/add in a loop, with this a daemon like that could just
run status/add in a loop, but on a busy repo the 1s interval size
might start to matter as you're constantly inspecting larger
intervals.

More importantly though, I can't think of any case where having it in
nanoseconds to begin with would do any harm.

> In addition, the backing file system monitor (Watchman) supports number of
> seconds since the unix epoch (unix time_t style).  This means any support of
> nano seconds by git is academic until someone provides a file system watcher
> that does support nano second granularity.

I haven't used watchman for anything non-trivial, but the
documentation for the query API you seem to be using says you can
request a {ctime,mtime}_ns field:

https://github.com/facebook/watchman/blob/master/website/_docs/cmd.query.markdown#user-content-available-fields

And isn't this the documentation for the "since" query you're using,
saying you can specify any arbitrary fs timing field, such as a _ns
time field:

https://github.com/facebook/watchman/blob/master/website/_docs/expr.since.md

?

> Finally, the fsmonitor index extension is versioned so that we can
> seamlessly upgrade to nano second resolution later if we desire.

Aside from my nanosecond bikeshedding, let's say we change the
semantics of any of this in the future: The index has the version, but
there's one argument passed to the hook without a version. Is the hook
expected to potentially be reading the version from the index to make
sense of whether (in this case) the argument is a mtime or mtime_ns?

Wouldn't this make more sense than that on top, i.e. pass the version
to the hook, untested (and probably whines about the sprintf
format...):

$ git diff -U1
diff --git a/cache.h b/cache.h
index 6eafd34fff..3c63f179f8 100644
--- a/cache.h
+++ b/cache.h
@@ -346,2 +346,3 @@ struct index_state {
        struct untracked_cache *untracked;
+       uint32_t fsmonitor_version;
        time_t fsmonitor_last_update;
diff --git a/fsmonitor.c b/fsmonitor.c
index f5a9f818e8..7236b538ac 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -60,2 +60,4 @@ int read_fsmonitor_extension(struct index_state
*istate, const void *data,
                return error("bad fsmonitor version %d", hdr_version);
+       else
+               istate->fsmonitor_version = hdr_version;

@@ -137,2 +139,3 @@ static int query_fsmonitor(time_t last_update,
struct strbuf *query_result)
        struct child_process cp = CHILD_PROCESS_INIT;
+       char version[1];
        char date[64];
@@ -143,2 +146,3 @@ static int query_fsmonitor(time_t last_update,
struct strbuf *query_result)

+       snprintf(version, sizeof(version), "%d", istate->fsmonitor_version);
        snprintf(date, sizeof(date), "%" PRIuMAX, (uintmax_t)last_update)

  reply	other threads:[~2017-05-22 17:28 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-18 20:13 [PATCH v2 0/6] Fast git status via a file system watcher Ben Peart
2017-05-18 20:13 ` [PATCH v2 1/6] bswap: add 64 bit endianness helper get_be64 Ben Peart
2017-05-18 20:13 ` [PATCH v2 2/6] dir: make lookup_untracked() available outside of dir.c Ben Peart
2017-05-18 20:13 ` [PATCH v2 3/6] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files Ben Peart
2017-05-19 15:33   ` Ben Peart
2017-05-20 10:41     ` Junio C Hamano
2017-05-24 12:30   ` Christian Couder
2017-05-18 20:13 ` [PATCH v2 4/6] fsmonitor: add test cases for fsmonitor extension Ben Peart
2017-05-20 16:55   ` Torsten Bögershausen
2017-05-18 20:13 ` [PATCH v2 5/6] fsmonitor: add documentation for the " Ben Peart
2017-05-20 11:28   ` Junio C Hamano
2017-05-20 12:10   ` Ævar Arnfjörð Bjarmason
2017-05-22 16:18     ` Ben Peart
2017-05-22 17:28       ` Ævar Arnfjörð Bjarmason [this message]
2017-05-25 13:49         ` Ben Peart
2017-05-18 20:13 ` [PATCH v2 6/6] fsmonitor: add a sample query-fsmonitor hook script for Watchman Ben Peart
2017-05-24 13:12   ` Christian Couder
2017-05-26  9:47     ` Ævar Arnfjörð Bjarmason
2017-05-26 16:02       ` Ben Peart
2017-05-25 21:05   ` Ævar Arnfjörð Bjarmason
2017-05-24 10:54 ` [PATCH v2 0/6] Fast git status via a file system watcher Christian Couder
2017-05-25 13:55   ` Ben Peart
2017-05-27  6:57     ` Christian Couder
2017-05-30 18:05       ` Ben Peart
2017-05-30 20:33         ` Christian Couder
2017-05-30 23:11           ` Ben Peart
2017-05-31  7:37             ` Christian Couder
2017-05-31  7:59     ` Christian Couder
2017-05-31 13:37       ` Ben Peart
2017-05-31 14:10         ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACBZZX6LENwuVuTyU-RetaXz8jZtUp1dv+gmQQ281sPx1czPnA@mail.gmail.com \
    --to=avarab@gmail.com \
    --cc=David.Turner@twosigma.com \
    --cc=benpeart@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=pclouds@gmail.com \
    --cc=peartben@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).