git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: Ben Peart <peartben@gmail.com>
Cc: "Brandon Williams" <bmwill@google.com>,
	"Ben Peart" <benpeart@microsoft.com>,
	"Git Mailing List" <git@vger.kernel.org>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH v1] dir.c: don't flag the index as dirty for changes to the untracked cache
Date: Tue, 6 Feb 2018 19:27:07 +0700	[thread overview]
Message-ID: <CACsJy8DLP=j-h3knwX9zOpejAfUbv1YJwfB-iw4476oy0hTfxg@mail.gmail.com> (raw)
In-Reply-To: <6fb43664-7546-7865-0488-8ed6292d77a6@gmail.com>

On Tue, Feb 6, 2018 at 8:48 AM, Ben Peart <peartben@gmail.com> wrote:
> With the new behavior, making a change in dir1/, then calling status would
> update the dir1/ untracked cache entry but not write it out. On the next
> status, git would detect the change in dir1/ again and update the untracked

Thing only missing piece here I would add is, this dir1/ detection is
done by watchman. We have to contact watchman and ask the set of
changed paths since $TIME where $TIME is the last time we updated
untracked cache and invalidate those paths in core. Then it should
work correctly. I checked the watchman query in the fsmonitor hook and
I think it's correct so far.

Except one point. What if you activate fsmonitor, write down the
fsmonitor_last_update field there but _not_ create UNTR extension for
the same timestamp? UNTR extension is created only when
read_directory() is executed and we don't do that in every command. I
haven't checked fsmonitor.c carefully, maybe I'm missing something.

One thing I'd like to point out in this patch is untracked cache will
start degrading if you just make the initial UNTR extension once and
don't ever update it again. Dirty paths hit slow path and will start
poking the disk. If somebody accidentally change a minor thing in
every single directory (worst case scenario), the untracked cache
becomes useless. We need a threshold or something to start repairing
UNTR extension if it gets damaged too much.

If you rely on random index updates (e.g. the main entries got updated
and .git/index must be updated) to write UNTR extension down together.
Please don't do that, at least not this way. cache_changed mask should
reflect all dirty parts in .git/index. If UNTR extension is not marked
updated, it's legit to just skip generating/writing it down (e.g. if I
kept the old UNTR extension from the last time I read .git/index
around in memory)

> cache.  All of the other cached entries are still valid and the cache would
> be used for them.  The updated cache entry for dir1/ would not get persisted
> to disk until something that required the index to be written out.
>
> The behavior is correct in both cases.  You just don't get the benefit of
> the updated cache for the dir1/ entry until the index is persisted again.
> What you gain in exchange is that you don't have to write out the index
> which is (typically) a lot more expensive than checking dir1/ for changes.

This is another thing that bugs me. I know you're talking about huge
index files, but at what size should we start this sort of
optimization? Writing down a few MBs on linux is cheap enough that I
won't bother optimizing (and I get my UNTR extension repaired all the
time, so reduced lstat calls and stuff). This "typically" only comes
at certain size, what size?
-- 
Duy

  reply	other threads:[~2018-02-06 12:27 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-05 19:56 [PATCH v1] dir.c: don't flag the index as dirty for changes to the untracked cache Ben Peart
2018-02-05 20:55 ` Junio C Hamano
2018-02-06  1:39   ` Ben Peart
2018-02-05 21:58 ` Brandon Williams
2018-02-06  1:48   ` Ben Peart
2018-02-06 12:27     ` Duy Nguyen [this message]
2018-02-06 12:55       ` Duy Nguyen
2018-02-07 10:59         ` Duy Nguyen
2018-02-07 13:46           ` Ben Peart
2018-02-06 14:50       ` Junio C Hamano
2018-02-07 14:13       ` Ben Peart
2018-02-12 10:20         ` Duy Nguyen
2018-02-12 17:57           ` Ben Peart
2018-02-13  9:57             ` Duy Nguyen
2018-02-08 10:33 ` Jeff King
2018-02-28 21:27   ` Junio C Hamano
2018-03-01  7:42     ` Jeff King
2018-03-01 12:35       ` Ben Peart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACsJy8DLP=j-h3knwX9zOpejAfUbv1YJwfB-iw4476oy0hTfxg@mail.gmail.com' \
    --to=pclouds@gmail.com \
    --cc=avarab@gmail.com \
    --cc=benpeart@microsoft.com \
    --cc=bmwill@google.com \
    --cc=git@vger.kernel.org \
    --cc=peartben@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).