git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] dir: force untracked cache with core.untrackedCache
@ 2022-02-14 17:37 Derrick Stolee via GitGitGadget
  2022-02-14 20:16 ` Junio C Hamano
  2022-02-17 21:00 ` [PATCH v2] " Derrick Stolee via GitGitGadget
  0 siblings, 2 replies; 5+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-02-14 17:37 UTC (permalink / raw)
  To: git; +Cc: gitster, pclouds, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The GIT_FORCE_UNTRACKED_CACHE environment variable writes the untracked
cache more frequently than the core.untrackedCache config variable. This
is due to how read_directory() handles the creation of an untracked
cache. The old mechanism required using something like 'git update-index
--untracked-cache' before the index would actually contain an untracked
cache. This was noted as a performance problem on macOS in the past, and
this is a resolution for that issue.

The decision to not write the untracked cache without an environment
variable tracks back to fc9ecbeb9 (dir.c: don't flag the index as dirty
for changes to the untracked cache, 2018-02-05). The motivation of that
change is that writing the index is expensive, and if the untracked
cache is the only thing that needs to be written, then it is more
expensive than the benefit of the cache. However, this also means that
the untracked cache never gets populated, so the user who enabled it via
config does not actually get the extension until running 'git
update-index --untracked-cache' manually or using the environment
variable.

We have had a version of this change in the microsoft/git fork for a few
major releases now. It has been working well to get users into a good
state. Yes, that first index write is slow, but the remaining index
writes are much faster than they would be without this change.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
    dir: force untracked cache with core.untrackedCache
    
    We have seen users in the wild that have had core.untrackedCache
    enabled, but never actually have an untracked cache created for them. We
    have a test in t7063 that shows git status should write the untracked
    cache, so I'm not exactly sure how users are in this state for long.
    
    This patch fixes the situation. I also know of another group that sets
    GIT_FORCE_UNTRACKED_CACHE=1 in their developer environment in order to
    get this behavior.
    
    -Stolee

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1058%2Fderrickstolee%2Funtracked-cache-write-more-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1058/derrickstolee/untracked-cache-write-more-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1058

 dir.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index d91295f2bcd..79a5f6918c8 100644
--- a/dir.c
+++ b/dir.c
@@ -2936,7 +2936,9 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 
 		if (force_untracked_cache < 0)
 			force_untracked_cache =
-				git_env_bool("GIT_FORCE_UNTRACKED_CACHE", 0);
+				git_env_bool("GIT_FORCE_UNTRACKED_CACHE", -1);
+		if (force_untracked_cache < 0)
+			force_untracked_cache = (istate->repo->settings.core_untracked_cache == UNTRACKED_CACHE_WRITE);
 		if (force_untracked_cache &&
 			dir->untracked == istate->untracked &&
 		    (dir->untracked->dir_opened ||

base-commit: b80121027d1247a0754b3cc46897fee75c050b44
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] dir: force untracked cache with core.untrackedCache
  2022-02-14 17:37 [PATCH] dir: force untracked cache with core.untrackedCache Derrick Stolee via GitGitGadget
@ 2022-02-14 20:16 ` Junio C Hamano
  2022-02-14 20:40   ` Derrick Stolee
  2022-02-17 21:00 ` [PATCH v2] " Derrick Stolee via GitGitGadget
  1 sibling, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2022-02-14 20:16 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, pclouds, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> The GIT_FORCE_UNTRACKED_CACHE environment variable writes the untracked
> cache more frequently than the core.untrackedCache config variable. This
> is due to how read_directory() handles the creation of an untracked
> cache. The old mechanism required using something like 'git update-index
> --untracked-cache' before the index would actually contain an untracked
> cache. This was noted as a performance problem on macOS in the past, and
> this is a resolution for that issue.

"The old mechanism" meaning "core.untrackedCache does not add a new
one; it only updates an existing one"?  What "this" refers to that
was noted as a problem on macOS is not quite clear; is "writing
untracked cache is a performance problem"? And the last "this" which
is a resolution is "not to add untrackedCache merely because the
configuration variable says we are allowed to use it"?

> The decision to not write the untracked cache without an environment
> variable tracks back to fc9ecbeb9 (dir.c: don't flag the index as dirty
> for changes to the untracked cache, 2018-02-05). The motivation of that
> change is that writing the index is expensive, and if the untracked
> cache is the only thing that needs to be written, then it is more
> expensive than the benefit of the cache. However, this also means that
> the untracked cache never gets populated, so the user who enabled it via
> config does not actually get the extension until running 'git
> update-index --untracked-cache' manually or using the environment
> variable.

OK.  It was invented solely as a test mechanism it seems, but at
least to the workflow of Microsoft folks, once we spent cycles to
prepare UNTR data, it helps their future use of the index to spend
a bit more cycle to write it out, instead of discarding.

I have to wonder if there are workflows that are sufficiently
different from what Microsoft folks use that the write-out cost of
more frequent updates to the untracked cache outweigh the runtime
performance boost of not having to run around and readdir() for
untracked files?

ad0fb659 (repo-settings: parse core.untrackedCache, 2019-08-13)
explains that unset core.untrackedCache means "keep", and "true"
means untracked cache is "automatically added", which this change is
not invalidated, so I guess there is no need to update anything in
the documentation for this change.  In fact, we might be able to
sell this change as a bugfix (i.e. "I set the configuration to
'true' but it wasn't written out when it should have").

> diff --git a/dir.c b/dir.c
> index d91295f2bcd..79a5f6918c8 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -2936,7 +2936,9 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
>  
>  		if (force_untracked_cache < 0)
>  			force_untracked_cache =
> -				git_env_bool("GIT_FORCE_UNTRACKED_CACHE", 0);
> +				git_env_bool("GIT_FORCE_UNTRACKED_CACHE", -1);
> +		if (force_untracked_cache < 0)
> +			force_untracked_cache = (istate->repo->settings.core_untracked_cache == UNTRACKED_CACHE_WRITE);
>  		if (force_untracked_cache &&
>  			dir->untracked == istate->untracked &&
>  		    (dir->untracked->dir_opened ||
>
> base-commit: b80121027d1247a0754b3cc46897fee75c050b44

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] dir: force untracked cache with core.untrackedCache
  2022-02-14 20:16 ` Junio C Hamano
@ 2022-02-14 20:40   ` Derrick Stolee
  0 siblings, 0 replies; 5+ messages in thread
From: Derrick Stolee @ 2022-02-14 20:40 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget; +Cc: git, pclouds

On 2/14/2022 3:16 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> The GIT_FORCE_UNTRACKED_CACHE environment variable writes the untracked
>> cache more frequently than the core.untrackedCache config variable. This
>> is due to how read_directory() handles the creation of an untracked
>> cache. The old mechanism required using something like 'git update-index
>> --untracked-cache' before the index would actually contain an untracked
>> cache. This was noted as a performance problem on macOS in the past, and
>> this is a resolution for that issue.
> 
> "The old mechanism" meaning "core.untrackedCache does not add a new
> one; it only updates an existing one"?  What "this" refers to that
> was noted as a problem on macOS is not quite clear; is "writing
> untracked cache is a performance problem"? And the last "this" which
> is a resolution is "not to add untrackedCache merely because the
> configuration variable says we are allowed to use it"?

Right. I can see how that is confusing. Here's another attempt:

   The GIT_FORCE_UNTRACKED_CACHE environment variable writes the untracked
   cache more frequently than the core.untrackedCache config variable. This
   is due to how read_directory() handles the creation of an untracked
   cache.

   Before this change, Git would not create the untracked cache extension
   for an index that did not already have one. Users would need to run a
   command such as 'git update-index --untracked-cache' before the index
   would actually contain an untracked cache.

   In particular, users noticed that the untracked cache would not appear
   even with core.untrackedCache=true. Some users reported setting
   GIT_FORCE_UNTRACKED_CACHE=1 in their engineering system environment to
   ensure the untracked cache would be created.
 
>> The decision to not write the untracked cache without an environment
>> variable tracks back to fc9ecbeb9 (dir.c: don't flag the index as dirty
>> for changes to the untracked cache, 2018-02-05). The motivation of that
>> change is that writing the index is expensive, and if the untracked
>> cache is the only thing that needs to be written, then it is more
>> expensive than the benefit of the cache. However, this also means that
>> the untracked cache never gets populated, so the user who enabled it via
>> config does not actually get the extension until running 'git
>> update-index --untracked-cache' manually or using the environment
>> variable.
> 
> OK.  It was invented solely as a test mechanism it seems, but at
> least to the workflow of Microsoft folks, once we spent cycles to
> prepare UNTR data, it helps their future use of the index to spend
> a bit more cycle to write it out, instead of discarding.
> 
> I have to wonder if there are workflows that are sufficiently
> different from what Microsoft folks use that the write-out cost of
> more frequent updates to the untracked cache outweigh the runtime
> performance boost of not having to run around and readdir() for
> untracked files?

I think the only difference here is the transition state from no cache
to an existing cache. From then on, the cache is kept up-to-date with
the same frequency as without this change.

> ad0fb659 (repo-settings: parse core.untrackedCache, 2019-08-13)
> explains that unset core.untrackedCache means "keep", and "true"
> means untracked cache is "automatically added", which this change is
> not invalidated, so I guess there is no need to update anything in
> the documentation for this change.  In fact, we might be able to
> sell this change as a bugfix (i.e. "I set the configuration to
> 'true' but it wasn't written out when it should have").

Yes, I believe this to be the case. Hopefully the rewritten
paragraphs above make this more clear.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2] dir: force untracked cache with core.untrackedCache
  2022-02-14 17:37 [PATCH] dir: force untracked cache with core.untrackedCache Derrick Stolee via GitGitGadget
  2022-02-14 20:16 ` Junio C Hamano
@ 2022-02-17 21:00 ` Derrick Stolee via GitGitGadget
  2022-02-17 22:51   ` Junio C Hamano
  1 sibling, 1 reply; 5+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2022-02-17 21:00 UTC (permalink / raw)
  To: git; +Cc: gitster, pclouds, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <derrickstolee@github.com>

The GIT_FORCE_UNTRACKED_CACHE environment variable writes the untracked
cache more frequently than the core.untrackedCache config variable. This
is due to how read_directory() handles the creation of an untracked
cache.

Before this change, Git would not create the untracked cache extension
for an index that did not already have one. Users would need to run a
command such as 'git update-index --untracked-cache' before the index
would actually contain an untracked cache.

In particular, users noticed that the untracked cache would not appear
even with core.untrackedCache=true. Some users reported setting
GIT_FORCE_UNTRACKED_CACHE=1 in their engineering system environment to
ensure the untracked cache would be created.

The decision to not write the untracked cache without an environment
variable tracks back to fc9ecbeb9 (dir.c: don't flag the index as dirty
for changes to the untracked cache, 2018-02-05). The motivation of that
change is that writing the index is expensive, and if the untracked
cache is the only thing that needs to be written, then it is more
expensive than the benefit of the cache. However, this also means that
the untracked cache never gets populated, so the user who enabled it via
config does not actually get the extension until running 'git
update-index --untracked-cache' manually or using the environment
variable.

We have had a version of this change in the microsoft/git fork for a few
major releases now. It has been working well to get users into a good
state. Yes, that first index write is slow, but the remaining index
writes are much faster than they would be without this change.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
    dir: force untracked cache with core.untrackedCache
    
    We have seen users in the wild that have had core.untrackedCache
    enabled, but never actually have an untracked cache created for them. We
    have a test in t7063 that shows git status should write the untracked
    cache, so I'm not exactly sure how users are in this state for long.
    
    This patch fixes the situation. I also know of another group that sets
    GIT_FORCE_UNTRACKED_CACHE=1 in their developer environment in order to
    get this behavior.
    
    -Stolee
    
    
    Update in v2
    ============
    
     * Edited the commit message to be clearer.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1058%2Fderrickstolee%2Funtracked-cache-write-more-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1058/derrickstolee/untracked-cache-write-more-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1058

Range-diff vs v1:

 1:  061ee9a379d ! 1:  8d132bc5566 dir: force untracked cache with core.untrackedCache
     @@ Commit message
          The GIT_FORCE_UNTRACKED_CACHE environment variable writes the untracked
          cache more frequently than the core.untrackedCache config variable. This
          is due to how read_directory() handles the creation of an untracked
     -    cache. The old mechanism required using something like 'git update-index
     -    --untracked-cache' before the index would actually contain an untracked
     -    cache. This was noted as a performance problem on macOS in the past, and
     -    this is a resolution for that issue.
     +    cache.
     +
     +    Before this change, Git would not create the untracked cache extension
     +    for an index that did not already have one. Users would need to run a
     +    command such as 'git update-index --untracked-cache' before the index
     +    would actually contain an untracked cache.
     +
     +    In particular, users noticed that the untracked cache would not appear
     +    even with core.untrackedCache=true. Some users reported setting
     +    GIT_FORCE_UNTRACKED_CACHE=1 in their engineering system environment to
     +    ensure the untracked cache would be created.
      
          The decision to not write the untracked cache without an environment
          variable tracks back to fc9ecbeb9 (dir.c: don't flag the index as dirty


 dir.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/dir.c b/dir.c
index d91295f2bcd..79a5f6918c8 100644
--- a/dir.c
+++ b/dir.c
@@ -2936,7 +2936,9 @@ int read_directory(struct dir_struct *dir, struct index_state *istate,
 
 		if (force_untracked_cache < 0)
 			force_untracked_cache =
-				git_env_bool("GIT_FORCE_UNTRACKED_CACHE", 0);
+				git_env_bool("GIT_FORCE_UNTRACKED_CACHE", -1);
+		if (force_untracked_cache < 0)
+			force_untracked_cache = (istate->repo->settings.core_untracked_cache == UNTRACKED_CACHE_WRITE);
 		if (force_untracked_cache &&
 			dir->untracked == istate->untracked &&
 		    (dir->untracked->dir_opened ||

base-commit: b80121027d1247a0754b3cc46897fee75c050b44
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] dir: force untracked cache with core.untrackedCache
  2022-02-17 21:00 ` [PATCH v2] " Derrick Stolee via GitGitGadget
@ 2022-02-17 22:51   ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2022-02-17 22:51 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, pclouds, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> The GIT_FORCE_UNTRACKED_CACHE environment variable writes the untracked
> cache more frequently than the core.untrackedCache config variable. This
> is due to how read_directory() handles the creation of an untracked
> cache.
>
> Before this change, Git would not create the untracked cache extension
> for an index that did not already have one. Users would need to run a
> command such as 'git update-index --untracked-cache' before the index
> would actually contain an untracked cache.
>
> In particular, users noticed that the untracked cache would not appear
> even with core.untrackedCache=true. Some users reported setting
> GIT_FORCE_UNTRACKED_CACHE=1 in their engineering system environment to
> ensure the untracked cache would be created.
>
> The decision to not write the untracked cache without an environment
> variable tracks back to fc9ecbeb9 (dir.c: don't flag the index as dirty
> for changes to the untracked cache, 2018-02-05). The motivation of that
> change is that writing the index is expensive, and if the untracked
> cache is the only thing that needs to be written, then it is more
> expensive than the benefit of the cache. However, this also means that
> the untracked cache never gets populated, so the user who enabled it via
> config does not actually get the extension until running 'git
> update-index --untracked-cache' manually or using the environment
> variable.
>
> We have had a version of this change in the microsoft/git fork for a few
> major releases now. It has been working well to get users into a good
> state. Yes, that first index write is slow, but the remaining index
> writes are much faster than they would be without this change.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---

Will queue.

Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-02-17 22:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-14 17:37 [PATCH] dir: force untracked cache with core.untrackedCache Derrick Stolee via GitGitGadget
2022-02-14 20:16 ` Junio C Hamano
2022-02-14 20:40   ` Derrick Stolee
2022-02-17 21:00 ` [PATCH v2] " Derrick Stolee via GitGitGadget
2022-02-17 22:51   ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).