git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Ben Peart <peartben@gmail.com>
To: Brandon Williams <bmwill@google.com>, Ben Peart <benpeart@microsoft.com>
Cc: git@vger.kernel.org, pclouds@gmail.com, avarab@gmail.com
Subject: Re: [PATCH v1] dir.c: don't flag the index as dirty for changes to the untracked cache
Date: Mon, 5 Feb 2018 20:48:00 -0500	[thread overview]
Message-ID: <6fb43664-7546-7865-0488-8ed6292d77a6@gmail.com> (raw)
In-Reply-To: <20180205215805.GA90084@google.com>



On 2/5/2018 4:58 PM, Brandon Williams wrote:
> On 02/05, Ben Peart wrote:
>> The untracked cache saves its current state in the UNTR index extension.
>> Currently, _any_ change to that state causes the index to be flagged as dirty
>> and written out to disk.  Unfortunately, the cost to write out the index can
>> exceed the savings gained by using the untracked cache.  Since it is a cache
>> that can be updated from the current state of the working directory, there is
>> no functional requirement that the index be written out for every change to the
>> untracked cache.
>>
>> Update the untracked cache logic so that it no longer forces the index to be
>> written to disk except in the case where the extension is being turned on or
>> off.  When some other git command requires the index to be written to disk, the
>> untracked cache will take advantage of that to save it's updated state as well.
>> This results in a performance win when looked at over common sequences of git
>> commands (ie such as a status followed by add, commit, etc).
>>
>> After this patch, all the logic to track statistics for the untracked cache
>> could be removed as it is only used by debug tracing used to debug the untracked
>> cache.
> 
> So we don't need to update it every time because its just a cache
> and if its inaccurate between status calls that's ok?  So only
> operations like add and commit will actually write out the untracked
> cache (as a part of writing out the index).  Sounds ok.
> 
> What benefit is there to using the untracked cache then?  Sounds like
> you should just turn it off instead?
> (I'm sure this is a naive question :D )

The parts of the untracked cache that have not changed since the 
extension was updated are still cached and valid.  Only those 
directories that have changes will need to be checked.

With the old behavior, making a change in dir1/, then calling status 
would update the dir1/ untracked cache entry, flag the index as dirty 
and write it out.  On the next status, git would detect that no changes 
have been made and use the cached data for dir1/.

With the new behavior, making a change in dir1/, then calling status 
would update the dir1/ untracked cache entry but not write it out. On 
the next status, git would detect the change in dir1/ again and update 
the untracked cache.  All of the other cached entries are still valid 
and the cache would be used for them.  The updated cache entry for dir1/ 
would not get persisted to disk until something that required the index 
to be written out.

The behavior is correct in both cases.  You just don't get the benefit 
of the updated cache for the dir1/ entry until the index is persisted 
again.  What you gain in exchange is that you don't have to write out 
the index which is (typically) a lot more expensive than checking dir1/ 
for changes.

> 
>>
>> Signed-off-by: Ben Peart <benpeart@microsoft.com>
>> ---
>>
>> Notes:
>>      Base Ref: master
>>      Web-Diff: https://github.com/benpeart/git/commit/20c2e8d787
>>      Checkout: git fetch https://github.com/benpeart/git untracked-cache-v1 && git checkout 20c2e8d787
>>
>>   dir.c                             | 3 ++-
>>   t/t7063-status-untracked-cache.sh | 3 +++
>>   2 files changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/dir.c b/dir.c
>> index 7c4b45e30e..da93374f0c 100644
>> --- a/dir.c
>> +++ b/dir.c
>> @@ -2297,7 +2297,8 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
>>   				 dir->untracked->gitignore_invalidated,
>>   				 dir->untracked->dir_invalidated,
>>   				 dir->untracked->dir_opened);
>> -		if (dir->untracked == istate->untracked &&
>> +		if (getenv("GIT_TEST_UNTRACKED_CACHE") &&
>> +			dir->untracked == istate->untracked &&
>>   		    (dir->untracked->dir_opened ||
>>   		     dir->untracked->gitignore_invalidated ||
>>   		     dir->untracked->dir_invalidated))
>> diff --git a/t/t7063-status-untracked-cache.sh b/t/t7063-status-untracked-cache.sh
>> index e5fb892f95..6ef520e823 100755
>> --- a/t/t7063-status-untracked-cache.sh
>> +++ b/t/t7063-status-untracked-cache.sh
>> @@ -14,6 +14,9 @@ test_description='test untracked cache'
>>   # See <20160803174522.5571-1-pclouds@gmail.com> if you want to know
>>   # more.
>>   
>> +GIT_TEST_UNTRACKED_CACHE=true
>> +export GIT_TEST_UNTRACKED_CACHE
>> +
>>   sync_mtime () {
>>   	find . -type d -ls >/dev/null
>>   }
>>
>> base-commit: 5be1f00a9a701532232f57958efab4be8c959a29
>> -- 
>> 2.15.0.windows.1
>>
> 

  reply	other threads:[~2018-02-06  1:48 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-05 19:56 [PATCH v1] dir.c: don't flag the index as dirty for changes to the untracked cache Ben Peart
2018-02-05 20:55 ` Junio C Hamano
2018-02-06  1:39   ` Ben Peart
2018-02-05 21:58 ` Brandon Williams
2018-02-06  1:48   ` Ben Peart [this message]
2018-02-06 12:27     ` Duy Nguyen
2018-02-06 12:55       ` Duy Nguyen
2018-02-07 10:59         ` Duy Nguyen
2018-02-07 13:46           ` Ben Peart
2018-02-06 14:50       ` Junio C Hamano
2018-02-07 14:13       ` Ben Peart
2018-02-12 10:20         ` Duy Nguyen
2018-02-12 17:57           ` Ben Peart
2018-02-13  9:57             ` Duy Nguyen
2018-02-08 10:33 ` Jeff King
2018-02-28 21:27   ` Junio C Hamano
2018-03-01  7:42     ` Jeff King
2018-03-01 12:35       ` Ben Peart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6fb43664-7546-7865-0488-8ed6292d77a6@gmail.com \
    --to=peartben@gmail.com \
    --cc=avarab@gmail.com \
    --cc=benpeart@microsoft.com \
    --cc=bmwill@google.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).