git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] index: add trace2 region for clear skip worktree
@ 2022-10-26  0:05 Anh Le via GitGitGadget
  2022-10-26  3:16 ` Junio C Hamano
  2022-10-28  0:46 ` [PATCH v2] " Anh Le via GitGitGadget
  0 siblings, 2 replies; 18+ messages in thread
From: Anh Le via GitGitGadget @ 2022-10-26  0:05 UTC (permalink / raw)
  To: git; +Cc: Timothy Jones, Jeff Hostetler, Anh Le, Anh Le

From: Anh Le <anh@canva.com>

In a large repository using sparse checkout, checking
whether a file marked with skip worktree is present
on disk and its skip worktree bit should be cleared
can take a considerable amount of time. Add a trace2
region to surface this information.

Signed-off-by: Anh Le <anh@canva.com>
---
    index: add trace2 region for clear skip worktree
    
    In large repository using sparse checkout, checking whether a file
    marked with skip worktree is present on disk and its skip worktree bit
    should be cleared can take a considerable amount of time. Add a trace2
    region to surface this information.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1368%2FHaizzz%2Fmaster-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1368/Haizzz/master-v1
Pull-Request: https://github.com/git/git/pull/1368

 sparse-index.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/sparse-index.c b/sparse-index.c
index e4a54ce1943..d11049c8aeb 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -493,24 +493,33 @@ void clear_skip_worktree_from_present_files(struct index_state *istate)
 	int dir_found = 1;
 
 	int i;
+	intmax_t path_count = 0;
+	intmax_t restart_count = 0;
 
 	if (!core_apply_sparse_checkout ||
 	    sparse_expect_files_outside_of_patterns)
 		return;
 
+	trace2_region_enter("index", "clear_skip_worktree_from_present_files", istate->repo);
 restart:
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce = istate->cache[i];
 
-		if (ce_skip_worktree(ce) &&
-		    path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
-			if (S_ISSPARSEDIR(ce->ce_mode)) {
-				ensure_full_index(istate);
-				goto restart;
+		if (ce_skip_worktree(ce)) {
+			path_count++;
+			if (path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
+				if (S_ISSPARSEDIR(ce->ce_mode)) {
+					ensure_full_index(istate);
+					restart_count++;
+					goto restart;
+				}
+				ce->ce_flags &= ~CE_SKIP_WORKTREE;
 			}
-			ce->ce_flags &= ~CE_SKIP_WORKTREE;
 		}
 	}
+	trace2_data_intmax("index", istate->repo, "clear_skip_worktree_from_present_files/path_count", path_count);
+	trace2_data_intmax("index", istate->repo, "clear_skip_worktree_from_present_files/restart_count", restart_count);
+	trace2_region_leave("index", "clear_skip_worktree_from_present_files", istate->repo);
 }
 
 /*

base-commit: 1fc3c0ad407008c2f71dd9ae1241d8b75f8ef886
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] index: add trace2 region for clear skip worktree
  2022-10-26  0:05 [PATCH] index: add trace2 region for clear skip worktree Anh Le via GitGitGadget
@ 2022-10-26  3:16 ` Junio C Hamano
  2022-10-26 14:13   ` Jeff Hostetler
  2022-10-28  0:46 ` [PATCH v2] " Anh Le via GitGitGadget
  1 sibling, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2022-10-26  3:16 UTC (permalink / raw)
  To: Anh Le via GitGitGadget; +Cc: git, Timothy Jones, Jeff Hostetler, Anh Le

"Anh Le via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Anh Le <anh@canva.com>
>
> In a large repository using sparse checkout, checking
> whether a file marked with skip worktree is present
> on disk and its skip worktree bit should be cleared
> can take a considerable amount of time. Add a trace2
> region to surface this information.
>
> Signed-off-by: Anh Le <anh@canva.com>
> ---
>     index: add trace2 region for clear skip worktree
>     
>     In large repository using sparse checkout, checking whether a file
>     marked with skip worktree is present on disk and its skip worktree bit
>     should be cleared can take a considerable amount of time. Add a trace2
>     region to surface this information.

It is easy to see that the change is no-op from functionality's
standpoint.  The condition under which ce->ce_flags loses the
CE_SKIP_WORKTREE bit is the same as before, and the only change is
that in the iteration a couple of variables are incremented, which
may (or may not) have performance impact, but shouldn't break
correctness.

I am not sure about the value of these counters, honestly.

If we hit a sparse dir just once, we fully flatten the index and go
back to the restart state, and it would be a bug if we still see a
sparsified directory in the in-core index after that point, so
restart_count would be at most one, wouldn't it?  Why do we even
need to count with intmax_t?

Similarly, path_count is bounded by istate->cache_nr.  As you know
the approximate size of your project, I am not sure what extra
information you want to get by counting the paths with the skip bit
set before the first sparsified directory in the in-core index
twice, and the other paths with the skip bit set just once, and
adding these numbers together.  Also again, I am not quite sure what
the point is to count the paths in intmax_t.

> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1368%2FHaizzz%2Fmaster-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1368/Haizzz/master-v1
> Pull-Request: https://github.com/git/git/pull/1368
>
>  sparse-index.c | 21 +++++++++++++++------
>  1 file changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/sparse-index.c b/sparse-index.c
> index e4a54ce1943..d11049c8aeb 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -493,24 +493,33 @@ void clear_skip_worktree_from_present_files(struct index_state *istate)
>  	int dir_found = 1;
>  
>  	int i;
> +	intmax_t path_count = 0;
> +	intmax_t restart_count = 0;
>  
>  	if (!core_apply_sparse_checkout ||
>  	    sparse_expect_files_outside_of_patterns)
>  		return;
>  
> +	trace2_region_enter("index", "clear_skip_worktree_from_present_files", istate->repo);
>  restart:
>  	for (i = 0; i < istate->cache_nr; i++) {
>  		struct cache_entry *ce = istate->cache[i];
>  
> -		if (ce_skip_worktree(ce) &&
> -		    path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
> -			if (S_ISSPARSEDIR(ce->ce_mode)) {
> -				ensure_full_index(istate);
> -				goto restart;
> +		if (ce_skip_worktree(ce)) {
> +			path_count++;
> +			if (path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
> +				if (S_ISSPARSEDIR(ce->ce_mode)) {
> +					ensure_full_index(istate);
> +					restart_count++;
> +					goto restart;
> +				}
> +				ce->ce_flags &= ~CE_SKIP_WORKTREE;
>  			}
> -			ce->ce_flags &= ~CE_SKIP_WORKTREE;
>  		}
>  	}
> +	trace2_data_intmax("index", istate->repo, "clear_skip_worktree_from_present_files/path_count", path_count);
> +	trace2_data_intmax("index", istate->repo, "clear_skip_worktree_from_present_files/restart_count", restart_count);
> +	trace2_region_leave("index", "clear_skip_worktree_from_present_files", istate->repo);
>  }
>  
>  /*
>
> base-commit: 1fc3c0ad407008c2f71dd9ae1241d8b75f8ef886

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] index: add trace2 region for clear skip worktree
  2022-10-26  3:16 ` Junio C Hamano
@ 2022-10-26 14:13   ` Jeff Hostetler
  2022-10-26 16:01     ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff Hostetler @ 2022-10-26 14:13 UTC (permalink / raw)
  To: Junio C Hamano, Anh Le via GitGitGadget
  Cc: git, Timothy Jones, Jeff Hostetler, Anh Le



On 10/25/22 11:16 PM, Junio C Hamano wrote:
> "Anh Le via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> From: Anh Le <anh@canva.com>
>>
>> In a large repository using sparse checkout, checking
>> whether a file marked with skip worktree is present
>> on disk and its skip worktree bit should be cleared
>> can take a considerable amount of time. Add a trace2
>> region to surface this information.
>>
>> Signed-off-by: Anh Le <anh@canva.com>
>> ---
>>      index: add trace2 region for clear skip worktree
>>      
>>      In large repository using sparse checkout, checking whether a file
>>      marked with skip worktree is present on disk and its skip worktree bit
>>      should be cleared can take a considerable amount of time. Add a trace2
>>      region to surface this information.
> 
> It is easy to see that the change is no-op from functionality's
> standpoint.  The condition under which ce->ce_flags loses the
> CE_SKIP_WORKTREE bit is the same as before, and the only change is
> that in the iteration a couple of variables are incremented, which
> may (or may not) have performance impact, but shouldn't break
> correctness.
> 
> I am not sure about the value of these counters, honestly.

I suggested adding the counters while the series was still over
on GitGitGadget.

The original goal was to measure the time spent in that loop
with the region_enter/region_leave events.  They will tell you
whether the loop was slow or not and you can compare that with
other (peer or containing) regions to see if it significant.

However, it doesn't tell us anything about WHY it was slow or
was slower than another instance.  The path_count gives us a
simple estimate on the number of lstat() calls -- which is
one of frequent sources of slowness.  Yes, it is bounded by
`istate->cache_nr`, but if I have a repo with 1M cache-entries
and 95% are sparse, this loop is going to be much slower than
if only 5% are sparse.

The restart_count (which we only expect to be 0 or 1), will
tell us whether or not at some point mid-loop we had to fully
inflate the index -- which is another source of slowness.
Granted, `ensure_full_index()` probably contains a region of
its own, so the restart_count here may not absolutely necessary,
but in the context of this loop -- this counter may point to the
other smoking gun.

In the worst case, we walk the entire index and lstat() for a
significant number of skipped-and-not-present files, then near
the end of the loop, we find a skipped-but-present directory
and have to restart the loop.  The second pass will still run
the full loop again.  Will the second pass actually see any
skipped cache-entries?  Will it re-lstat() them?  Could the
`goto restart` just be a `break` or `return`?

I haven't had time to look under the hood here, but I was
hoping that these two counters would help the series author
collect telemetry over many runs and gain more insight into
the perf problem.

Continuing the example from above, if we've already paid the
costs to lstat() the 95% sparse files AND THEN near the bottom
of the loop we have to do a restart, then we should expect
this loop to be doubly slow.  It was my hope that this combination
of counters would help us understand the variations in perf.

WRT the `intmax_t` vs just `int`: either is fine.  I suggested
the former because that is what `trace2_data_intmax()` takes.
I really don't expect any usage to overflow a regular 32bit int.

[...]
>> +	trace2_data_intmax("index", istate->repo, "clear_skip_worktree_from_present_files/path_count", path_count);
>> +	trace2_data_intmax("index", istate->repo, "clear_skip_worktree_from_present_files/restart_count", restart_count);

One thing I forgot to mention in my GGG suggestion was to only
emit these data events if the counter is > 0.  We've been trying
to avoid logging zeroes.

>> +	trace2_region_leave("index", "clear_skip_worktree_from_present_files", istate->repo);
[...]

Jeff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] index: add trace2 region for clear skip worktree
  2022-10-26 14:13   ` Jeff Hostetler
@ 2022-10-26 16:01     ` Junio C Hamano
  2022-10-26 18:29       ` Jeff Hostetler
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2022-10-26 16:01 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Anh Le via GitGitGadget, git, Timothy Jones, Jeff Hostetler,
	Anh Le

Jeff Hostetler <git@jeffhostetler.com> writes:

> In the worst case, we walk the entire index and lstat() for a
> significant number of skipped-and-not-present files, then near
> the end of the loop, we find a skipped-but-present directory
> and have to restart the loop.  The second pass will still run
> the full loop again.  Will the second pass actually see any
> skipped cache-entries?  Will it re-lstat() them?  Could the
> `goto restart` just be a `break` or `return`?
>
> I haven't had time to look under the hood here, but I was
> hoping that these two counters would help the series author
> collect telemetry over many runs and gain more insight into
> the perf problem.

Without being able to answer these questions, would we be able to
interpret the numbers reported from these counters?

> Continuing the example from above, if we've already paid the
> costs to lstat() the 95% sparse files AND THEN near the bottom
> of the loop we have to do a restart, then we should expect
> this loop to be doubly slow.  It was my hope that this combination
> of counters would help us understand the variations in perf.

Yes, I understand that double-counting may give some clue to detect
that, but it just looked roundabout way to do that.  Perhaps
counting the path inspected during the first iteration and the path
inspected during the second iteration, separately, without the "how
many times did we repeat?", would give you a better picture?  "After
inspecting 2400 paths, we need to go back and then ended up scanning
3000 paths in the flattened index in the second round" would be
easier to interpret than "We needed flattening, and scanned 5400
paths in total in the two iterations".

> WRT the `intmax_t` vs just `int`: either is fine.

I thought "int" was supposed to be natural machine word, while
incrementing "intmax_t" is allowed to be much slower than "int".
Do we expect more than 2 billion paths?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] index: add trace2 region for clear skip worktree
  2022-10-26 16:01     ` Junio C Hamano
@ 2022-10-26 18:29       ` Jeff Hostetler
  2022-10-27  0:04         ` Anh Le
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff Hostetler @ 2022-10-26 18:29 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Anh Le via GitGitGadget, git, Timothy Jones, Jeff Hostetler,
	Anh Le



On 10/26/22 12:01 PM, Junio C Hamano wrote:
> Jeff Hostetler <git@jeffhostetler.com> writes:
> 
>> In the worst case, we walk the entire index and lstat() for a
>> significant number of skipped-and-not-present files, then near
>> the end of the loop, we find a skipped-but-present directory
>> and have to restart the loop.  The second pass will still run
>> the full loop again.  Will the second pass actually see any
>> skipped cache-entries?  Will it re-lstat() them?  Could the
>> `goto restart` just be a `break` or `return`?
>>
>> I haven't had time to look under the hood here, but I was
>> hoping that these two counters would help the series author
>> collect telemetry over many runs and gain more insight into
>> the perf problem.
> 
> Without being able to answer these questions, would we be able to
> interpret the numbers reported from these counters?
> 
>> Continuing the example from above, if we've already paid the
>> costs to lstat() the 95% sparse files AND THEN near the bottom
>> of the loop we have to do a restart, then we should expect
>> this loop to be doubly slow.  It was my hope that this combination
>> of counters would help us understand the variations in perf.
> 
> Yes, I understand that double-counting may give some clue to detect
> that, but it just looked roundabout way to do that.  Perhaps
> counting the path inspected during the first iteration and the path
> inspected during the second iteration, separately, without the "how
> many times did we repeat?", would give you a better picture?  "After
> inspecting 2400 paths, we need to go back and then ended up scanning
> 3000 paths in the flattened index in the second round" would be
> easier to interpret than "We needed flattening, and scanned 5400
> paths in total in the two iterations".

Good point and I was wondering about whether we might see "2 x 95%"
values for path_count in really slow cases.  And yes, it would be
better to have `int path_count[2]` and tally each loop pass
independently.

I guess I was looking for a first-order "where is the pain?" knowing
that we could always dig deeper later.  That is, is the lstat's or
is it the ensure-full and index rewrite?  Or both?

We have other places that do lstat() over the cache-entries and have
added code to spread the loop across n threads.  I doubt that is needed
here and I didn't want to lead with it.


> 
>> WRT the `intmax_t` vs just `int`: either is fine.
> 
> I thought "int" was supposed to be natural machine word, while
> incrementing "intmax_t" is allowed to be much slower than "int".
> Do we expect more than 2 billion paths?
> 

You're right.  An `int` would be fine here.

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] index: add trace2 region for clear skip worktree
  2022-10-26 18:29       ` Jeff Hostetler
@ 2022-10-27  0:04         ` Anh Le
  0 siblings, 0 replies; 18+ messages in thread
From: Anh Le @ 2022-10-27  0:04 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Junio C Hamano, Anh Le via GitGitGadget, git, Timothy Jones,
	Jeff Hostetler

Thank you Junio and Jeff for the feedback! I hadn't considered that
`ensure_full_index()` would mean the loop can only at most restart
once. I'll action the feedbacks:
- count both loop iterations separately with int instead of intmax_t
- remove the restart counter
- don't log metrics if it's 0

Regards,
Anh


Regards,
Anh


On Thu, Oct 27, 2022 at 5:29 AM Jeff Hostetler <git@jeffhostetler.com> wrote:
>
>
>
> On 10/26/22 12:01 PM, Junio C Hamano wrote:
> > Jeff Hostetler <git@jeffhostetler.com> writes:
> >
> >> In the worst case, we walk the entire index and lstat() for a
> >> significant number of skipped-and-not-present files, then near
> >> the end of the loop, we find a skipped-but-present directory
> >> and have to restart the loop.  The second pass will still run
> >> the full loop again.  Will the second pass actually see any
> >> skipped cache-entries?  Will it re-lstat() them?  Could the
> >> `goto restart` just be a `break` or `return`?
> >>
> >> I haven't had time to look under the hood here, but I was
> >> hoping that these two counters would help the series author
> >> collect telemetry over many runs and gain more insight into
> >> the perf problem.
> >
> > Without being able to answer these questions, would we be able to
> > interpret the numbers reported from these counters?
> >
> >> Continuing the example from above, if we've already paid the
> >> costs to lstat() the 95% sparse files AND THEN near the bottom
> >> of the loop we have to do a restart, then we should expect
> >> this loop to be doubly slow.  It was my hope that this combination
> >> of counters would help us understand the variations in perf.
> >
> > Yes, I understand that double-counting may give some clue to detect
> > that, but it just looked roundabout way to do that.  Perhaps
> > counting the path inspected during the first iteration and the path
> > inspected during the second iteration, separately, without the "how
> > many times did we repeat?", would give you a better picture?  "After
> > inspecting 2400 paths, we need to go back and then ended up scanning
> > 3000 paths in the flattened index in the second round" would be
> > easier to interpret than "We needed flattening, and scanned 5400
> > paths in total in the two iterations".
>
> Good point and I was wondering about whether we might see "2 x 95%"
> values for path_count in really slow cases.  And yes, it would be
> better to have `int path_count[2]` and tally each loop pass
> independently.
>
> I guess I was looking for a first-order "where is the pain?" knowing
> that we could always dig deeper later.  That is, is the lstat's or
> is it the ensure-full and index rewrite?  Or both?
>
> We have other places that do lstat() over the cache-entries and have
> added code to spread the loop across n threads.  I doubt that is needed
> here and I didn't want to lead with it.
>
>
> >
> >> WRT the `intmax_t` vs just `int`: either is fine.
> >
> > I thought "int" was supposed to be natural machine word, while
> > incrementing "intmax_t" is allowed to be much slower than "int".
> > Do we expect more than 2 billion paths?
> >
>
> You're right.  An `int` would be fine here.
>
> Thanks,
> Jeff

-- 
**
** <https://www.canva.com/>Empowering the world to design
We're hiring, 
apply here <https://www.canva.com/careers/>! Check out the latest news and 
learnings from our team on the Canva Newsroom 
<https://www.canva.com/newsroom/news/>.
 <https://twitter.com/canva> 
<https://facebook.com/canva> <https://au.linkedin.com/company/canva> 
<https://twitter.com/canva>  <https://facebook.com/canva>  
<https://www.linkedin.com/company/canva> 
 <https://instagram.com/canva>




^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2] index: add trace2 region for clear skip worktree
  2022-10-26  0:05 [PATCH] index: add trace2 region for clear skip worktree Anh Le via GitGitGadget
  2022-10-26  3:16 ` Junio C Hamano
@ 2022-10-28  0:46 ` Anh Le via GitGitGadget
  2022-10-28 15:49   ` Derrick Stolee
                     ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: Anh Le via GitGitGadget @ 2022-10-28  0:46 UTC (permalink / raw)
  To: git; +Cc: Timothy Jones, Jeff Hostetler, Jeff Hostetler, Anh Le, Anh Le

From: Anh Le <anh@canva.com>

In a large repository using sparse checkout, checking
whether a file marked with skip worktree is present
on disk and its skip worktree bit should be cleared
can take a considerable amount of time. Add a trace2
region to surface this information, keeping a count of how many paths
have been checked and separately
keep counts for after a full index is
materialised.

Signed-off-by: Anh Le <anh@canva.com>
---
    index: add trace2 region for clear skip worktree
    
    In large repository using sparse checkout, checking whether a file
    marked with skip worktree is present on disk and its skip worktree bit
    should be cleared can take a considerable amount of time. Add a trace2
    region to surface this information.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1368%2FHaizzz%2Fmaster-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1368/Haizzz/master-v2
Pull-Request: https://github.com/git/git/pull/1368

Range-diff vs v1:

 1:  7d1bb3cba2a ! 1:  effe6b5b912 index: add trace2 region for clear skip worktree
     @@ Commit message
          whether a file marked with skip worktree is present
          on disk and its skip worktree bit should be cleared
          can take a considerable amount of time. Add a trace2
     -    region to surface this information.
     +    region to surface this information, keeping a count of how many paths
     +    have been checked and separately
     +    keep counts for after a full index is
     +    materialised.
      
          Signed-off-by: Anh Le <anh@canva.com>
      
     @@ sparse-index.c: void clear_skip_worktree_from_present_files(struct index_state *
       	int dir_found = 1;
       
       	int i;
     -+	intmax_t path_count = 0;
     -+	intmax_t restart_count = 0;
     ++	int path_counts[2] = {0, 0};
     ++	int restarted = 0;
       
       	if (!core_apply_sparse_checkout ||
       	    sparse_expect_files_outside_of_patterns)
     @@ sparse-index.c: void clear_skip_worktree_from_present_files(struct index_state *
      -				ensure_full_index(istate);
      -				goto restart;
      +		if (ce_skip_worktree(ce)) {
     -+			path_count++;
     ++			path_counts[restarted]++;
      +			if (path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
      +				if (S_ISSPARSEDIR(ce->ce_mode)) {
      +					ensure_full_index(istate);
     -+					restart_count++;
     ++					restarted = 1;
      +					goto restart;
      +				}
      +				ce->ce_flags &= ~CE_SKIP_WORKTREE;
     @@ sparse-index.c: void clear_skip_worktree_from_present_files(struct index_state *
      -			ce->ce_flags &= ~CE_SKIP_WORKTREE;
       		}
       	}
     -+	trace2_data_intmax("index", istate->repo, "clear_skip_worktree_from_present_files/path_count", path_count);
     -+	trace2_data_intmax("index", istate->repo, "clear_skip_worktree_from_present_files/restart_count", restart_count);
     ++
     ++	if (path_counts[0] > 0) {
     ++		trace2_data_intmax("index", istate->repo, "clear_skip_worktree_from_present_files/path_count", path_counts[0]);
     ++	}
     ++	if (restarted) {
     ++		trace2_data_intmax("index", istate->repo, "clear_skip_worktree_from_present_files/full_index/path_count", path_counts[1]);
     ++	}
      +	trace2_region_leave("index", "clear_skip_worktree_from_present_files", istate->repo);
       }
       


 sparse-index.c | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/sparse-index.c b/sparse-index.c
index e4a54ce1943..dbf647949c1 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -493,24 +493,38 @@ void clear_skip_worktree_from_present_files(struct index_state *istate)
 	int dir_found = 1;
 
 	int i;
+	int path_counts[2] = {0, 0};
+	int restarted = 0;
 
 	if (!core_apply_sparse_checkout ||
 	    sparse_expect_files_outside_of_patterns)
 		return;
 
+	trace2_region_enter("index", "clear_skip_worktree_from_present_files", istate->repo);
 restart:
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce = istate->cache[i];
 
-		if (ce_skip_worktree(ce) &&
-		    path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
-			if (S_ISSPARSEDIR(ce->ce_mode)) {
-				ensure_full_index(istate);
-				goto restart;
+		if (ce_skip_worktree(ce)) {
+			path_counts[restarted]++;
+			if (path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
+				if (S_ISSPARSEDIR(ce->ce_mode)) {
+					ensure_full_index(istate);
+					restarted = 1;
+					goto restart;
+				}
+				ce->ce_flags &= ~CE_SKIP_WORKTREE;
 			}
-			ce->ce_flags &= ~CE_SKIP_WORKTREE;
 		}
 	}
+
+	if (path_counts[0] > 0) {
+		trace2_data_intmax("index", istate->repo, "clear_skip_worktree_from_present_files/path_count", path_counts[0]);
+	}
+	if (restarted) {
+		trace2_data_intmax("index", istate->repo, "clear_skip_worktree_from_present_files/full_index/path_count", path_counts[1]);
+	}
+	trace2_region_leave("index", "clear_skip_worktree_from_present_files", istate->repo);
 }
 
 /*

base-commit: 1fc3c0ad407008c2f71dd9ae1241d8b75f8ef886
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] index: add trace2 region for clear skip worktree
  2022-10-28  0:46 ` [PATCH v2] " Anh Le via GitGitGadget
@ 2022-10-28 15:49   ` Derrick Stolee
  2022-10-28 17:17     ` Junio C Hamano
  2022-10-28 16:50   ` Jeff Hostetler
  2022-10-31  0:56   ` [PATCH v3] " Anh Le via GitGitGadget
  2 siblings, 1 reply; 18+ messages in thread
From: Derrick Stolee @ 2022-10-28 15:49 UTC (permalink / raw)
  To: Anh Le via GitGitGadget, git
  Cc: Timothy Jones, Jeff Hostetler, Jeff Hostetler, Anh Le

On 10/27/2022 8:46 PM, Anh Le via GitGitGadget wrote:
> From: Anh Le <anh@canva.com>
> 
> In a large repository using sparse checkout, checking
> whether a file marked with skip worktree is present
> on disk and its skip worktree bit should be cleared
> can take a considerable amount of time. Add a trace2
> region to surface this information, keeping a count of how many paths
> have been checked and separately
> keep counts for after a full index is
> materialised.
You have some strange line wrapping here. You can wrap them better
using several editors, but here is brief technique if your core.editor
is vim (do within 'git commit --amend'):

1. Use "SHIFT+V" to get into visual mode.
2. Use up and down arrows to select the entire paragraph.
3. Type "G" then "Q" to word wrap the selection.
4. Save and close the editor.
>  	int i;
> +	int path_counts[2] = {0, 0};
> +	int restarted = 0;

This count mechanism is a good one. Nice and simple.

>  
>  	if (!core_apply_sparse_checkout ||
>  	    sparse_expect_files_outside_of_patterns)
>  		return;
>  
> +	trace2_region_enter("index", "clear_skip_worktree_from_present_files", istate->repo);
>  restart:
>  	for (i = 0; i < istate->cache_nr; i++) {
>  		struct cache_entry *ce = istate->cache[i];
>  
> -		if (ce_skip_worktree(ce) &&
> -		    path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
> -			if (S_ISSPARSEDIR(ce->ce_mode)) {
> -				ensure_full_index(istate);
> -				goto restart;
> +		if (ce_skip_worktree(ce)) {
> +			path_counts[restarted]++;
> +			if (path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
> +				if (S_ISSPARSEDIR(ce->ce_mode)) {
> +					ensure_full_index(istate);
> +					restarted = 1;
> +					goto restart;
> +				}
> +				ce->ce_flags &= ~CE_SKIP_WORKTREE;
>  			}
> -			ce->ce_flags &= ~CE_SKIP_WORKTREE;
>  		}
>  	}
> +
> +	if (path_counts[0] > 0) {
> +		trace2_data_intmax("index", istate->repo, "clear_skip_worktree_from_present_files/path_count", path_counts[0]);
> +	}
> +	if (restarted) {
> +		trace2_data_intmax("index", istate->repo, "clear_skip_worktree_from_present_files/full_index/path_count", path_counts[1]);
> +	}

A few style things:

1. Use "if (path_counts[0])" to detect a non-zero value.
2. Don't use { } around a single-line block.
3. Your lines are quite long, due a lot from your long keys.
   Shorten the keys (maybe "sparse_path_count" and "restarted_count"
   is enough context) and consider using a line break in
   the middle of the parameters.

> +	trace2_region_leave("index", "clear_skip_worktree_from_present_files", istate->repo);

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] index: add trace2 region for clear skip worktree
  2022-10-28  0:46 ` [PATCH v2] " Anh Le via GitGitGadget
  2022-10-28 15:49   ` Derrick Stolee
@ 2022-10-28 16:50   ` Jeff Hostetler
  2022-10-31  0:56   ` [PATCH v3] " Anh Le via GitGitGadget
  2 siblings, 0 replies; 18+ messages in thread
From: Jeff Hostetler @ 2022-10-28 16:50 UTC (permalink / raw)
  To: Anh Le via GitGitGadget, git; +Cc: Timothy Jones, Jeff Hostetler, Anh Le



On 10/27/22 8:46 PM, Anh Le via GitGitGadget wrote:
[...]
> diff --git a/sparse-index.c b/sparse-index.c
> index e4a54ce1943..dbf647949c1 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -493,24 +493,38 @@ void clear_skip_worktree_from_present_files(struct index_state *istate)
>   	int dir_found = 1;
>   
>   	int i;
> +	int path_counts[2] = {0, 0};
> +	int restarted = 0;
>   
>   	if (!core_apply_sparse_checkout ||
>   	    sparse_expect_files_outside_of_patterns)
>   		return;
>   
> +	trace2_region_enter("index", "clear_skip_worktree_from_present_files", istate->repo);
>   restart:
>   	for (i = 0; i < istate->cache_nr; i++) {
>   		struct cache_entry *ce = istate->cache[i];
>   
> -		if (ce_skip_worktree(ce) &&
> -		    path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
> -			if (S_ISSPARSEDIR(ce->ce_mode)) {
> -				ensure_full_index(istate);
> -				goto restart;
> +		if (ce_skip_worktree(ce)) {
> +			path_counts[restarted]++;
> +			if (path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
> +				if (S_ISSPARSEDIR(ce->ce_mode)) {
> +					ensure_full_index(istate);
> +					restarted = 1;
> +					goto restart;
> +				}
> +				ce->ce_flags &= ~CE_SKIP_WORKTREE;
>   			}
> -			ce->ce_flags &= ~CE_SKIP_WORKTREE;
>   		}
>   	}
> +
> +	if (path_counts[0] > 0) {
> +		trace2_data_intmax("index", istate->repo, "clear_skip_worktree_from_present_files/path_count", path_counts[0]);
> +	}
> +	if (restarted) {
> +		trace2_data_intmax("index", istate->repo, "clear_skip_worktree_from_present_files/full_index/path_count", path_counts[1]);
> +	}
> +	trace2_region_leave("index", "clear_skip_worktree_from_present_files", istate->repo);
>   }

This looks good.  And I think it sets us up to later
answer some of our earlier perf questions after you
get some data.

For example, if the path_count[0] phase is expensive,
does the restart always need to restart the loop at i=0?
Granted, the `ensure_full_index()` may create a new cache-entry
array, so the value of `i` prior to the call may not correspond
to the same cell after the call at the `goto`, but we could
cache the strdup the pathname of cache[i] before the call and
do a find on the new cache[] afterwards to get the corresponding
i-prime value and let the restart start the loop on i-prime.
That's just an idea, but let's wait for the data to see if that
would be helpful.


Thanks!
Jeff


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] index: add trace2 region for clear skip worktree
  2022-10-28 15:49   ` Derrick Stolee
@ 2022-10-28 17:17     ` Junio C Hamano
  2022-10-30 23:28       ` Anh Le
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2022-10-28 17:17 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Anh Le via GitGitGadget, git, Timothy Jones, Jeff Hostetler,
	Jeff Hostetler, Anh Le

Derrick Stolee <derrickstolee@github.com> writes:

> A few style things:
>
> 1. Use "if (path_counts[0])" to detect a non-zero value.
> 2. Don't use { } around a single-line block.
> 3. Your lines are quite long, due a lot from your long keys.
>    Shorten the keys (maybe "sparse_path_count" and "restarted_count"
>    is enough context) and consider using a line break in
>    the middle of the parameters.

All good suggestions.  Let me add another one.

  4. Call an array you primarily access its individual elements in
     singular.  That way, you can say path_count[0] (i.e. the count
     for the zeroth round).  An array that exists mostly to be
     passed around as a whole to represent a "set of things", on the
     other hand, can be called in plural.

Taking them all together, perhaps something like this is what you
meant?

 sparse-index.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git c/sparse-index.c w/sparse-index.c
index dbf647949c..8c269dab80 100644
--- c/sparse-index.c
+++ w/sparse-index.c
@@ -493,22 +493,25 @@ void clear_skip_worktree_from_present_files(struct index_state *istate)
 	int dir_found = 1;
 
 	int i;
-	int path_counts[2] = {0, 0};
+	int path_count[2] = {0, 0};
 	int restarted = 0;
 
 	if (!core_apply_sparse_checkout ||
 	    sparse_expect_files_outside_of_patterns)
 		return;
 
-	trace2_region_enter("index", "clear_skip_worktree_from_present_files", istate->repo);
+	trace2_region_enter("index", "clear_skip_worktree_from_present_files",
+			    istate->repo);
 restart:
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce = istate->cache[i];
 
 		if (ce_skip_worktree(ce)) {
-			path_counts[restarted]++;
+			path_count[restarted]++;
 			if (path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
 				if (S_ISSPARSEDIR(ce->ce_mode)) {
+					if (restarted)
+						BUG("ensure-full-index did not fully flatten?");
 					ensure_full_index(istate);
 					restarted = 1;
 					goto restart;
@@ -518,13 +521,14 @@ void clear_skip_worktree_from_present_files(struct index_state *istate)
 		}
 	}
 
-	if (path_counts[0] > 0) {
-		trace2_data_intmax("index", istate->repo, "clear_skip_worktree_from_present_files/path_count", path_counts[0]);
-	}
-	if (restarted) {
-		trace2_data_intmax("index", istate->repo, "clear_skip_worktree_from_present_files/full_index/path_count", path_counts[1]);
-	}
-	trace2_region_leave("index", "clear_skip_worktree_from_present_files", istate->repo);
+	if (path_count[0])
+		trace2_data_intmax("index", istate->repo,
+				   "sparse_path_count", path_count[0]);
+	if (restarted)
+		trace2_data_intmax("index", istate->repo,
+				   "sparse_path_count_full", path_count[1]);
+	trace2_region_leave("index", "clear_skip_worktree_from_present_files",
+			    istate->repo);
 }
 
 /*


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] index: add trace2 region for clear skip worktree
  2022-10-28 17:17     ` Junio C Hamano
@ 2022-10-30 23:28       ` Anh Le
  0 siblings, 0 replies; 18+ messages in thread
From: Anh Le @ 2022-10-30 23:28 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, Anh Le via GitGitGadget, git, Timothy Jones,
	Jeff Hostetler, Jeff Hostetler

Thank you everyone, lots of things for me to take notes of. I'll send
in another patch.

Regards,
Anh

On Sat, Oct 29, 2022 at 4:17 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Derrick Stolee <derrickstolee@github.com> writes:
>
> > A few style things:
> >
> > 1. Use "if (path_counts[0])" to detect a non-zero value.
> > 2. Don't use { } around a single-line block.
> > 3. Your lines are quite long, due a lot from your long keys.
> >    Shorten the keys (maybe "sparse_path_count" and "restarted_count"
> >    is enough context) and consider using a line break in
> >    the middle of the parameters.
>
> All good suggestions.  Let me add another one.
>
>   4. Call an array you primarily access its individual elements in
>      singular.  That way, you can say path_count[0] (i.e. the count
>      for the zeroth round).  An array that exists mostly to be
>      passed around as a whole to represent a "set of things", on the
>      other hand, can be called in plural.
>
> Taking them all together, perhaps something like this is what you
> meant?
>
>  sparse-index.c | 24 ++++++++++++++----------
>  1 file changed, 14 insertions(+), 10 deletions(-)
>
> diff --git c/sparse-index.c w/sparse-index.c
> index dbf647949c..8c269dab80 100644
> --- c/sparse-index.c
> +++ w/sparse-index.c
> @@ -493,22 +493,25 @@ void clear_skip_worktree_from_present_files(struct index_state *istate)
>         int dir_found = 1;
>
>         int i;
> -       int path_counts[2] = {0, 0};
> +       int path_count[2] = {0, 0};
>         int restarted = 0;
>
>         if (!core_apply_sparse_checkout ||
>             sparse_expect_files_outside_of_patterns)
>                 return;
>
> -       trace2_region_enter("index", "clear_skip_worktree_from_present_files", istate->repo);
> +       trace2_region_enter("index", "clear_skip_worktree_from_present_files",
> +                           istate->repo);
>  restart:
>         for (i = 0; i < istate->cache_nr; i++) {
>                 struct cache_entry *ce = istate->cache[i];
>
>                 if (ce_skip_worktree(ce)) {
> -                       path_counts[restarted]++;
> +                       path_count[restarted]++;
>                         if (path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
>                                 if (S_ISSPARSEDIR(ce->ce_mode)) {
> +                                       if (restarted)
> +                                               BUG("ensure-full-index did not fully flatten?");
>                                         ensure_full_index(istate);
>                                         restarted = 1;
>                                         goto restart;
> @@ -518,13 +521,14 @@ void clear_skip_worktree_from_present_files(struct index_state *istate)
>                 }
>         }
>
> -       if (path_counts[0] > 0) {
> -               trace2_data_intmax("index", istate->repo, "clear_skip_worktree_from_present_files/path_count", path_counts[0]);
> -       }
> -       if (restarted) {
> -               trace2_data_intmax("index", istate->repo, "clear_skip_worktree_from_present_files/full_index/path_count", path_counts[1]);
> -       }
> -       trace2_region_leave("index", "clear_skip_worktree_from_present_files", istate->repo);
> +       if (path_count[0])
> +               trace2_data_intmax("index", istate->repo,
> +                                  "sparse_path_count", path_count[0]);
> +       if (restarted)
> +               trace2_data_intmax("index", istate->repo,
> +                                  "sparse_path_count_full", path_count[1]);
> +       trace2_region_leave("index", "clear_skip_worktree_from_present_files",
> +                           istate->repo);
>  }
>
>  /*
>

-- 
**
** <https://www.canva.com/>Empowering the world to design
We're hiring, 
apply here <https://www.canva.com/careers/>! Check out the latest news and 
learnings from our team on the Canva Newsroom 
<https://www.canva.com/newsroom/news/>.
 <https://twitter.com/canva> 
<https://facebook.com/canva> <https://au.linkedin.com/company/canva> 
<https://twitter.com/canva>  <https://facebook.com/canva>  
<https://www.linkedin.com/company/canva> 
 <https://instagram.com/canva>




^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3] index: add trace2 region for clear skip worktree
  2022-10-28  0:46 ` [PATCH v2] " Anh Le via GitGitGadget
  2022-10-28 15:49   ` Derrick Stolee
  2022-10-28 16:50   ` Jeff Hostetler
@ 2022-10-31  0:56   ` Anh Le via GitGitGadget
  2022-10-31 22:34     ` Taylor Blau
  2022-11-03 23:04     ` [PATCH v4 0/2] " Anh Le via GitGitGadget
  2 siblings, 2 replies; 18+ messages in thread
From: Anh Le via GitGitGadget @ 2022-10-31  0:56 UTC (permalink / raw)
  To: git
  Cc: Timothy Jones, Jeff Hostetler, Jeff Hostetler, Derrick Stolee,
	Anh Le, Anh Le

From: Anh Le <anh@canva.com>

In a large repository using sparse checkout, checking whether a file marked
with skip worktree is present on disk and its skip worktree bit should be
cleared can take a considerable amount of time. Add a trace2 region to
surface this information, keeping a count of how many paths have been
checked and separately keep counts for after a full index is materialised.

Signed-off-by: Anh Le <anh@canva.com>
---
    index: add trace2 region for clear skip worktree
    
    In large repository using sparse checkout, checking whether a file
    marked with skip worktree is present on disk and its skip worktree bit
    should be cleared can take a considerable amount of time. Add a trace2
    region to surface this information.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1368%2FHaizzz%2Fmaster-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1368/Haizzz/master-v3
Pull-Request: https://github.com/git/git/pull/1368

Range-diff vs v2:

 1:  effe6b5b912 ! 1:  d0d9f258b08 index: add trace2 region for clear skip worktree
     @@ Metadata
       ## Commit message ##
          index: add trace2 region for clear skip worktree
      
     -    In a large repository using sparse checkout, checking
     -    whether a file marked with skip worktree is present
     -    on disk and its skip worktree bit should be cleared
     -    can take a considerable amount of time. Add a trace2
     -    region to surface this information, keeping a count of how many paths
     -    have been checked and separately
     -    keep counts for after a full index is
     -    materialised.
     +    In a large repository using sparse checkout, checking whether a file marked
     +    with skip worktree is present on disk and its skip worktree bit should be
     +    cleared can take a considerable amount of time. Add a trace2 region to
     +    surface this information, keeping a count of how many paths have been
     +    checked and separately keep counts for after a full index is materialised.
      
          Signed-off-by: Anh Le <anh@canva.com>
      
     @@ sparse-index.c: void clear_skip_worktree_from_present_files(struct index_state *
       	int dir_found = 1;
       
       	int i;
     -+	int path_counts[2] = {0, 0};
     ++	int path_count[2] = {0, 0};
      +	int restarted = 0;
       
       	if (!core_apply_sparse_checkout ||
       	    sparse_expect_files_outside_of_patterns)
       		return;
       
     -+	trace2_region_enter("index", "clear_skip_worktree_from_present_files", istate->repo);
     ++	trace2_region_enter("index", "clear_skip_worktree_from_present_files",
     ++											istate->repo);
       restart:
       	for (i = 0; i < istate->cache_nr; i++) {
       		struct cache_entry *ce = istate->cache[i];
     @@ sparse-index.c: void clear_skip_worktree_from_present_files(struct index_state *
      -				ensure_full_index(istate);
      -				goto restart;
      +		if (ce_skip_worktree(ce)) {
     -+			path_counts[restarted]++;
     ++			path_count[restarted]++;
      +			if (path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
      +				if (S_ISSPARSEDIR(ce->ce_mode)) {
     ++					if (restarted)
     ++						BUG("ensure-full-index did not fully flatten?");
      +					ensure_full_index(istate);
      +					restarted = 1;
      +					goto restart;
     @@ sparse-index.c: void clear_skip_worktree_from_present_files(struct index_state *
       		}
       	}
      +
     -+	if (path_counts[0] > 0) {
     -+		trace2_data_intmax("index", istate->repo, "clear_skip_worktree_from_present_files/path_count", path_counts[0]);
     -+	}
     -+	if (restarted) {
     -+		trace2_data_intmax("index", istate->repo, "clear_skip_worktree_from_present_files/full_index/path_count", path_counts[1]);
     -+	}
     -+	trace2_region_leave("index", "clear_skip_worktree_from_present_files", istate->repo);
     ++	if (path_count[0])
     ++		trace2_data_intmax("index", istate->repo,
     ++											 "sparse_path_count", path_count[0]);
     ++	if (restarted)
     ++		trace2_data_intmax("index", istate->repo,
     ++											 "sparse_path_count_full", path_count[1]);
     ++	trace2_region_leave("index", "clear_skip_worktree_from_present_files",
     ++										  istate->repo);
       }
       
       /*


 sparse-index.c | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/sparse-index.c b/sparse-index.c
index e4a54ce1943..8301129bf8f 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -493,24 +493,42 @@ void clear_skip_worktree_from_present_files(struct index_state *istate)
 	int dir_found = 1;
 
 	int i;
+	int path_count[2] = {0, 0};
+	int restarted = 0;
 
 	if (!core_apply_sparse_checkout ||
 	    sparse_expect_files_outside_of_patterns)
 		return;
 
+	trace2_region_enter("index", "clear_skip_worktree_from_present_files",
+											istate->repo);
 restart:
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce = istate->cache[i];
 
-		if (ce_skip_worktree(ce) &&
-		    path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
-			if (S_ISSPARSEDIR(ce->ce_mode)) {
-				ensure_full_index(istate);
-				goto restart;
+		if (ce_skip_worktree(ce)) {
+			path_count[restarted]++;
+			if (path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
+				if (S_ISSPARSEDIR(ce->ce_mode)) {
+					if (restarted)
+						BUG("ensure-full-index did not fully flatten?");
+					ensure_full_index(istate);
+					restarted = 1;
+					goto restart;
+				}
+				ce->ce_flags &= ~CE_SKIP_WORKTREE;
 			}
-			ce->ce_flags &= ~CE_SKIP_WORKTREE;
 		}
 	}
+
+	if (path_count[0])
+		trace2_data_intmax("index", istate->repo,
+											 "sparse_path_count", path_count[0]);
+	if (restarted)
+		trace2_data_intmax("index", istate->repo,
+											 "sparse_path_count_full", path_count[1]);
+	trace2_region_leave("index", "clear_skip_worktree_from_present_files",
+										  istate->repo);
 }
 
 /*

base-commit: 1fc3c0ad407008c2f71dd9ae1241d8b75f8ef886
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] index: add trace2 region for clear skip worktree
  2022-10-31  0:56   ` [PATCH v3] " Anh Le via GitGitGadget
@ 2022-10-31 22:34     ` Taylor Blau
  2022-11-03 23:04     ` [PATCH v4 0/2] " Anh Le via GitGitGadget
  1 sibling, 0 replies; 18+ messages in thread
From: Taylor Blau @ 2022-10-31 22:34 UTC (permalink / raw)
  To: Anh Le via GitGitGadget
  Cc: git, Timothy Jones, Jeff Hostetler, Jeff Hostetler,
	Derrick Stolee, Anh Le

On Mon, Oct 31, 2022 at 12:56:31AM +0000, Anh Le via GitGitGadget wrote:
> From: Anh Le <anh@canva.com>
>
> In a large repository using sparse checkout, checking whether a file marked
> with skip worktree is present on disk and its skip worktree bit should be
> cleared can take a considerable amount of time. Add a trace2 region to
> surface this information, keeping a count of how many paths have been
> checked and separately keep counts for after a full index is materialised.

I think the text you wrote here is fine, but it did take me a couple of
passes to make sure I understood it correctly.

Perhaps the following might be clearer:

    When using sparse checkout, clear_skip_worktree_from_present_files()
    must enumerate index entries to find ones with the SKIP_WORKTREE bit
    to determine whether those index entries exist on disk (in which
    case their SKIP_WORKTREE bit should be removed).

    In a large repository, this may take considerable time depending on
    the size of the index.

    Add a trace2 region to surface this information, keeping a count o
    fhow many paths have been checked. Separately, keep counts after a
    full index is materialized.

Is that accurate?

> diff --git a/sparse-index.c b/sparse-index.c
> index e4a54ce1943..8301129bf8f 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -493,24 +493,42 @@ void clear_skip_worktree_from_present_files(struct index_state *istate)
>  	int dir_found = 1;
>
>  	int i;
> +	int path_count[2] = {0, 0};
> +	int restarted = 0;

Looks good.

>  	if (!core_apply_sparse_checkout ||
>  	    sparse_expect_files_outside_of_patterns)
>  		return;
>
> +	trace2_region_enter("index", "clear_skip_worktree_from_present_files",
> +											istate->repo);

Here and below, there is some funky indentation. When I apply the patch
locally, I find that I need to squash this in:

--- >8 ---
diff --git a/sparse-index.c b/sparse-index.c
index 8301129bf8..23f997e103 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -501,7 +501,7 @@ void clear_skip_worktree_from_present_files(struct index_state *istate)
 		return;

 	trace2_region_enter("index", "clear_skip_worktree_from_present_files",
-											istate->repo);
+			    istate->repo);
 restart:
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce = istate->cache[i];
--- 8< ---

>  restart:
>  	for (i = 0; i < istate->cache_nr; i++) {
>  		struct cache_entry *ce = istate->cache[i];
>
> -		if (ce_skip_worktree(ce) &&
> -		    path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
> -			if (S_ISSPARSEDIR(ce->ce_mode)) {
> -				ensure_full_index(istate);
> -				goto restart;
> +		if (ce_skip_worktree(ce)) {
> +			path_count[restarted]++;
> +			if (path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
> +				if (S_ISSPARSEDIR(ce->ce_mode)) {
> +					if (restarted)
> +						BUG("ensure-full-index did not fully flatten?");
> +					ensure_full_index(istate);
> +					restarted = 1;
> +					goto restart;
> +				}
> +				ce->ce_flags &= ~CE_SKIP_WORKTREE;
>  			}
> -			ce->ce_flags &= ~CE_SKIP_WORKTREE;

Looks fine, though the "if (restarted) BUG(...)" is new. It's not wrong,
per-se, but seeing it in a separate commit either before or after the
main change would be appreciated.

>  		}
>  	}
> +
> +	if (path_count[0])
> +		trace2_data_intmax("index", istate->repo,
> +											 "sparse_path_count", path_count[0]);
> +	if (restarted)
> +		trace2_data_intmax("index", istate->repo,
> +											 "sparse_path_count_full", path_count[1]);
> +	trace2_region_leave("index", "clear_skip_worktree_from_present_files",
> +										  istate->repo);

Same note about indentation here.

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 0/2] index: add trace2 region for clear skip worktree
  2022-10-31  0:56   ` [PATCH v3] " Anh Le via GitGitGadget
  2022-10-31 22:34     ` Taylor Blau
@ 2022-11-03 23:04     ` Anh Le via GitGitGadget
  2022-11-03 23:05       ` [PATCH v4 1/2] " Anh Le via GitGitGadget
                         ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: Anh Le via GitGitGadget @ 2022-11-03 23:04 UTC (permalink / raw)
  To: git
  Cc: Timothy Jones, Jeff Hostetler, Jeff Hostetler, Derrick Stolee,
	Taylor Blau, Anh Le

In large repository using sparse checkout, checking whether a file marked
with skip worktree is present on disk and its skip worktree bit should be
cleared can take a considerable amount of time. Add a trace2 region to
surface this information.

Anh Le (2):
  index: add trace2 region for clear skip worktree
  index: raise a bug if the index is materialised more than once

 sparse-index.c | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)


base-commit: 1fc3c0ad407008c2f71dd9ae1241d8b75f8ef886
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1368%2FHaizzz%2Fmaster-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1368/Haizzz/master-v4
Pull-Request: https://github.com/git/git/pull/1368

Range-diff vs v3:

 1:  d0d9f258b08 ! 1:  33e9b2afd91 index: add trace2 region for clear skip worktree
     @@ Metadata
       ## Commit message ##
          index: add trace2 region for clear skip worktree
      
     -    In a large repository using sparse checkout, checking whether a file marked
     -    with skip worktree is present on disk and its skip worktree bit should be
     -    cleared can take a considerable amount of time. Add a trace2 region to
     -    surface this information, keeping a count of how many paths have been
     -    checked and separately keep counts for after a full index is materialised.
     +    When using sparse checkout, clear_skip_worktree_from_present_files() must
     +    enumerate index entries to find ones with the SKIP_WORKTREE bit to
     +    determine whether those index entries exist on disk (in which case their
     +    SKIP_WORKTREE bit should be removed).
     +
     +    In a large repository, this may take considerable time depending on the
     +    size of the index.
     +
     +    Add a trace2 region to surface this information, keeping a count of how
     +    many paths have been checked. Separately, keep counts after a full index is
     +    materialized.
      
          Signed-off-by: Anh Le <anh@canva.com>
      
     @@ sparse-index.c: void clear_skip_worktree_from_present_files(struct index_state *
       		return;
       
      +	trace2_region_enter("index", "clear_skip_worktree_from_present_files",
     -+											istate->repo);
     ++			    istate->repo);
       restart:
       	for (i = 0; i < istate->cache_nr; i++) {
       		struct cache_entry *ce = istate->cache[i];
     @@ sparse-index.c: void clear_skip_worktree_from_present_files(struct index_state *
      +			path_count[restarted]++;
      +			if (path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
      +				if (S_ISSPARSEDIR(ce->ce_mode)) {
     -+					if (restarted)
     -+						BUG("ensure-full-index did not fully flatten?");
      +					ensure_full_index(istate);
      +					restarted = 1;
      +					goto restart;
     @@ sparse-index.c: void clear_skip_worktree_from_present_files(struct index_state *
      +
      +	if (path_count[0])
      +		trace2_data_intmax("index", istate->repo,
     -+											 "sparse_path_count", path_count[0]);
     ++				   "sparse_path_count", path_count[0]);
      +	if (restarted)
      +		trace2_data_intmax("index", istate->repo,
     -+											 "sparse_path_count_full", path_count[1]);
     ++				   "sparse_path_count_full", path_count[1]);
      +	trace2_region_leave("index", "clear_skip_worktree_from_present_files",
     -+										  istate->repo);
     ++			    istate->repo);
       }
       
       /*
 -:  ----------- > 2:  91ad7973307 index: raise a bug if the index is materialised more than once

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v4 1/2] index: add trace2 region for clear skip worktree
  2022-11-03 23:04     ` [PATCH v4 0/2] " Anh Le via GitGitGadget
@ 2022-11-03 23:05       ` Anh Le via GitGitGadget
  2022-11-03 23:05       ` [PATCH v4 2/2] index: raise a bug if the index is materialised more than once Anh Le via GitGitGadget
  2022-11-05  0:29       ` [PATCH v4 0/2] index: add trace2 region for clear skip worktree Taylor Blau
  2 siblings, 0 replies; 18+ messages in thread
From: Anh Le via GitGitGadget @ 2022-11-03 23:05 UTC (permalink / raw)
  To: git
  Cc: Timothy Jones, Jeff Hostetler, Jeff Hostetler, Derrick Stolee,
	Taylor Blau, Anh Le, Anh Le

From: Anh Le <anh@canva.com>

When using sparse checkout, clear_skip_worktree_from_present_files() must
enumerate index entries to find ones with the SKIP_WORKTREE bit to
determine whether those index entries exist on disk (in which case their
SKIP_WORKTREE bit should be removed).

In a large repository, this may take considerable time depending on the
size of the index.

Add a trace2 region to surface this information, keeping a count of how
many paths have been checked. Separately, keep counts after a full index is
materialized.

Signed-off-by: Anh Le <anh@canva.com>
---
 sparse-index.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/sparse-index.c b/sparse-index.c
index e4a54ce1943..8713a15611d 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -493,24 +493,40 @@ void clear_skip_worktree_from_present_files(struct index_state *istate)
 	int dir_found = 1;
 
 	int i;
+	int path_count[2] = {0, 0};
+	int restarted = 0;
 
 	if (!core_apply_sparse_checkout ||
 	    sparse_expect_files_outside_of_patterns)
 		return;
 
+	trace2_region_enter("index", "clear_skip_worktree_from_present_files",
+			    istate->repo);
 restart:
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce = istate->cache[i];
 
-		if (ce_skip_worktree(ce) &&
-		    path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
-			if (S_ISSPARSEDIR(ce->ce_mode)) {
-				ensure_full_index(istate);
-				goto restart;
+		if (ce_skip_worktree(ce)) {
+			path_count[restarted]++;
+			if (path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
+				if (S_ISSPARSEDIR(ce->ce_mode)) {
+					ensure_full_index(istate);
+					restarted = 1;
+					goto restart;
+				}
+				ce->ce_flags &= ~CE_SKIP_WORKTREE;
 			}
-			ce->ce_flags &= ~CE_SKIP_WORKTREE;
 		}
 	}
+
+	if (path_count[0])
+		trace2_data_intmax("index", istate->repo,
+				   "sparse_path_count", path_count[0]);
+	if (restarted)
+		trace2_data_intmax("index", istate->repo,
+				   "sparse_path_count_full", path_count[1]);
+	trace2_region_leave("index", "clear_skip_worktree_from_present_files",
+			    istate->repo);
 }
 
 /*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v4 2/2] index: raise a bug if the index is materialised more than once
  2022-11-03 23:04     ` [PATCH v4 0/2] " Anh Le via GitGitGadget
  2022-11-03 23:05       ` [PATCH v4 1/2] " Anh Le via GitGitGadget
@ 2022-11-03 23:05       ` Anh Le via GitGitGadget
  2022-11-05  0:29       ` [PATCH v4 0/2] index: add trace2 region for clear skip worktree Taylor Blau
  2 siblings, 0 replies; 18+ messages in thread
From: Anh Le via GitGitGadget @ 2022-11-03 23:05 UTC (permalink / raw)
  To: git
  Cc: Timothy Jones, Jeff Hostetler, Jeff Hostetler, Derrick Stolee,
	Taylor Blau, Anh Le, Anh Le

From: Anh Le <anh@canva.com>

If clear_skip_worktree_from_present_files() encounter a sparse directory,
it fully materialise the index which should expand any sparse directories
and start going through each entries again. If this happens more than once,
raise it with a BUG.

Signed-off-by: Anh Le <anh@canva.com>
---
 sparse-index.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/sparse-index.c b/sparse-index.c
index 8713a15611d..8c269dab803 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -510,6 +510,8 @@ restart:
 			path_count[restarted]++;
 			if (path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
 				if (S_ISSPARSEDIR(ce->ce_mode)) {
+					if (restarted)
+						BUG("ensure-full-index did not fully flatten?");
 					ensure_full_index(istate);
 					restarted = 1;
 					goto restart;
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 0/2] index: add trace2 region for clear skip worktree
  2022-11-03 23:04     ` [PATCH v4 0/2] " Anh Le via GitGitGadget
  2022-11-03 23:05       ` [PATCH v4 1/2] " Anh Le via GitGitGadget
  2022-11-03 23:05       ` [PATCH v4 2/2] index: raise a bug if the index is materialised more than once Anh Le via GitGitGadget
@ 2022-11-05  0:29       ` Taylor Blau
  2022-11-07 20:50         ` Derrick Stolee
  2 siblings, 1 reply; 18+ messages in thread
From: Taylor Blau @ 2022-11-05  0:29 UTC (permalink / raw)
  To: Anh Le via GitGitGadget
  Cc: git, Timothy Jones, Jeff Hostetler, Jeff Hostetler,
	Derrick Stolee, Taylor Blau, Anh Le

On Thu, Nov 03, 2022 at 11:04:59PM +0000, Anh Le via GitGitGadget wrote:
> In large repository using sparse checkout, checking whether a file marked
> with skip worktree is present on disk and its skip worktree bit should be
> cleared can take a considerable amount of time. Add a trace2 region to
> surface this information.
>
> Anh Le (2):
>   index: add trace2 region for clear skip worktree
>   index: raise a bug if the index is materialised more than once
>
>  sparse-index.c | 30 ++++++++++++++++++++++++------
>  1 file changed, 24 insertions(+), 6 deletions(-)

This version is looking good. Thanks, will queue.


Thanks,
Taylor

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v4 0/2] index: add trace2 region for clear skip worktree
  2022-11-05  0:29       ` [PATCH v4 0/2] index: add trace2 region for clear skip worktree Taylor Blau
@ 2022-11-07 20:50         ` Derrick Stolee
  0 siblings, 0 replies; 18+ messages in thread
From: Derrick Stolee @ 2022-11-07 20:50 UTC (permalink / raw)
  To: Taylor Blau, Anh Le via GitGitGadget
  Cc: git, Timothy Jones, Jeff Hostetler, Jeff Hostetler, Anh Le

On 11/4/22 8:29 PM, Taylor Blau wrote:
> On Thu, Nov 03, 2022 at 11:04:59PM +0000, Anh Le via GitGitGadget wrote:
>> In large repository using sparse checkout, checking whether a file marked
>> with skip worktree is present on disk and its skip worktree bit should be
>> cleared can take a considerable amount of time. Add a trace2 region to
>> surface this information.
>>
>> Anh Le (2):
>>   index: add trace2 region for clear skip worktree
>>   index: raise a bug if the index is materialised more than once
>>
>>  sparse-index.c | 30 ++++++++++++++++++++++++------
>>  1 file changed, 24 insertions(+), 6 deletions(-)
> 
> This version is looking good. Thanks, will queue.

I also took a full re-read of this version and agree
that it is ready to merge.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-11-07 20:51 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-26  0:05 [PATCH] index: add trace2 region for clear skip worktree Anh Le via GitGitGadget
2022-10-26  3:16 ` Junio C Hamano
2022-10-26 14:13   ` Jeff Hostetler
2022-10-26 16:01     ` Junio C Hamano
2022-10-26 18:29       ` Jeff Hostetler
2022-10-27  0:04         ` Anh Le
2022-10-28  0:46 ` [PATCH v2] " Anh Le via GitGitGadget
2022-10-28 15:49   ` Derrick Stolee
2022-10-28 17:17     ` Junio C Hamano
2022-10-30 23:28       ` Anh Le
2022-10-28 16:50   ` Jeff Hostetler
2022-10-31  0:56   ` [PATCH v3] " Anh Le via GitGitGadget
2022-10-31 22:34     ` Taylor Blau
2022-11-03 23:04     ` [PATCH v4 0/2] " Anh Le via GitGitGadget
2022-11-03 23:05       ` [PATCH v4 1/2] " Anh Le via GitGitGadget
2022-11-03 23:05       ` [PATCH v4 2/2] index: raise a bug if the index is materialised more than once Anh Le via GitGitGadget
2022-11-05  0:29       ` [PATCH v4 0/2] index: add trace2 region for clear skip worktree Taylor Blau
2022-11-07 20:50         ` Derrick Stolee

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).