git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] fetch: limit shared symref check only for local branches
@ 2022-05-16  8:37 Orgad Shaneh via GitGitGadget
  2022-05-16  8:41 ` [PATCH v2] " Orgad Shaneh via GitGitGadget
  0 siblings, 1 reply; 8+ messages in thread
From: Orgad Shaneh via GitGitGadget @ 2022-05-16  8:37 UTC (permalink / raw)
  To: git; +Cc: Orgad Shaneh, Orgad Shaneh

From: Orgad Shaneh <orgads@gmail.com>

This check was introduced in 8ee5d73137f (Fix fetch/pull when run without
--update-head-ok, 2008-10-13) in order to protect against replacing the ref
of the active branch by mistake, for example by running git fetch origin
master:master.

It was later extended in 8bc1f39f411 (fetch: protect branches checked out
in all worktrees, 2021-12-01) to scan all worktrees.

This operation is very expensive (takes about 30s in my repository) when
there are many tags or branches, and it is executed on every fetch, even if
no local heads are updated at all.

Limit it to protect only refs/heads/* to improve fetch performance.

Signed-off-by: Orgad Shaneh <orgads@gmail.com>
---
    fetch: limit shared symref check only for local branches

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1266%2Forgads%2Ffetch-perf-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1266/orgads/fetch-perf-v1
Pull-Request: https://github.com/git/git/pull/1266

 builtin/fetch.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index e3791f09ed5..4f16fc357b0 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1440,6 +1440,7 @@ static void check_not_current_branch(struct ref *ref_map,
 	const struct worktree *wt;
 	for (; ref_map; ref_map = ref_map->next)
 		if (ref_map->peer_ref &&
+			starts_with(ref_map->peer_ref->name, "refs/heads/") &&
 		    (wt = find_shared_symref(worktrees, "HEAD",
 					     ref_map->peer_ref->name)) &&
 		    !wt->is_bare)

base-commit: 277cf0bc36094f6dc4297d8c9cef79df045b735d
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2] fetch: limit shared symref check only for local branches
  2022-05-16  8:37 [PATCH] fetch: limit shared symref check only for local branches Orgad Shaneh via GitGitGadget
@ 2022-05-16  8:41 ` Orgad Shaneh via GitGitGadget
  2022-05-16 16:00   ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Orgad Shaneh via GitGitGadget @ 2022-05-16  8:41 UTC (permalink / raw)
  To: git; +Cc: Orgad Shaneh, Orgad Shaneh

From: Orgad Shaneh <orgads@gmail.com>

This check was introduced in 8ee5d73137f (Fix fetch/pull when run without
--update-head-ok, 2008-10-13) in order to protect against replacing the ref
of the active branch by mistake, for example by running git fetch origin
master:master.

It was later extended in 8bc1f39f411 (fetch: protect branches checked out
in all worktrees, 2021-12-01) to scan all worktrees.

This operation is very expensive (takes about 30s in my repository) when
there are many tags or branches, and it is executed on every fetch, even if
no local heads are updated at all.

Limit it to protect only refs/heads/* to improve fetch performance.

Signed-off-by: Orgad Shaneh <orgads@gmail.com>
---
    fetch: limit shared symref check only for local branches

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1266%2Forgads%2Ffetch-perf-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1266/orgads/fetch-perf-v2
Pull-Request: https://github.com/git/git/pull/1266

Range-diff vs v1:

 1:  5e86dc86d3d ! 1:  72bea90b26f fetch: limit shared symref check only for local branches
     @@ builtin/fetch.c: static void check_not_current_branch(struct ref *ref_map,
       	const struct worktree *wt;
       	for (; ref_map; ref_map = ref_map->next)
       		if (ref_map->peer_ref &&
     -+			starts_with(ref_map->peer_ref->name, "refs/heads/") &&
     ++		    starts_with(ref_map->peer_ref->name, "refs/heads/") &&
       		    (wt = find_shared_symref(worktrees, "HEAD",
       					     ref_map->peer_ref->name)) &&
       		    !wt->is_bare)


 builtin/fetch.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index e3791f09ed5..eeee5ac8f15 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1440,6 +1440,7 @@ static void check_not_current_branch(struct ref *ref_map,
 	const struct worktree *wt;
 	for (; ref_map; ref_map = ref_map->next)
 		if (ref_map->peer_ref &&
+		    starts_with(ref_map->peer_ref->name, "refs/heads/") &&
 		    (wt = find_shared_symref(worktrees, "HEAD",
 					     ref_map->peer_ref->name)) &&
 		    !wt->is_bare)

base-commit: 277cf0bc36094f6dc4297d8c9cef79df045b735d
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] fetch: limit shared symref check only for local branches
  2022-05-16  8:41 ` [PATCH v2] " Orgad Shaneh via GitGitGadget
@ 2022-05-16 16:00   ` Junio C Hamano
  2022-05-16 17:57     ` Junio C Hamano
  2022-05-17  6:05     ` Orgad Shaneh
  0 siblings, 2 replies; 8+ messages in thread
From: Junio C Hamano @ 2022-05-16 16:00 UTC (permalink / raw)
  To: Orgad Shaneh via GitGitGadget; +Cc: git, Orgad Shaneh

"Orgad Shaneh via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Orgad Shaneh <orgads@gmail.com>
>
> This check was introduced in 8ee5d73137f (Fix fetch/pull when run without
> --update-head-ok, 2008-10-13) in order to protect against replacing the ref
> of the active branch by mistake, for example by running git fetch origin
> master:master.
>
> It was later extended in 8bc1f39f411 (fetch: protect branches checked out
> in all worktrees, 2021-12-01) to scan all worktrees.
>
> This operation is very expensive (takes about 30s in my repository) when
> there are many tags or branches, and it is executed on every fetch, even if
> no local heads are updated at all.
>
> Limit it to protect only refs/heads/* to improve fetch performance.

The point of the check is to prevent the index+working tree in the
worktrees to go out of sync with HEAD, and HEAD by definition can
point only into refs/heads/*, this change should be OK.

It is surprising find_shared_symref() is so expensive, though.  If
you have a dozen worktrees linked to the current repository, there
are at most a dozen HEAD that point at various refs in refs/heads/
namespace.  Even if you need to check a thousand ref_map elements,
it should cost almost nothing if you build a hashmap to find matches
with these dozen HEADs upfront, no?

Another thing that is surprising is that you say this loop is
expensive when there are many tags or branches.  Do you mean it is
expensive when there are many tags and branches that are updated, or
it is expensive to merely have thousands of dormant tags and
branches?  If the latter, I wonder if it is sensible to limit the
check only to the refs that are going to be updated.

> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index e3791f09ed5..eeee5ac8f15 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1440,6 +1440,7 @@ static void check_not_current_branch(struct ref *ref_map,
>  	const struct worktree *wt;
>  	for (; ref_map; ref_map = ref_map->next)
>  		if (ref_map->peer_ref &&
> +		    starts_with(ref_map->peer_ref->name, "refs/heads/") &&
>  		    (wt = find_shared_symref(worktrees, "HEAD",
>  					     ref_map->peer_ref->name)) &&
>  		    !wt->is_bare)
>
> base-commit: 277cf0bc36094f6dc4297d8c9cef79df045b735d

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] fetch: limit shared symref check only for local branches
  2022-05-16 16:00   ` Junio C Hamano
@ 2022-05-16 17:57     ` Junio C Hamano
  2022-05-17  6:05     ` Orgad Shaneh
  1 sibling, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2022-05-16 17:57 UTC (permalink / raw)
  To: Orgad Shaneh via GitGitGadget; +Cc: git, Orgad Shaneh

Junio C Hamano <gitster@pobox.com> writes:

>> Limit it to protect only refs/heads/* to improve fetch performance.
>
> The point of the check is to prevent the index+working tree in the
> worktrees to go out of sync with HEAD, and HEAD by definition can
> point only into refs/heads/*, this change should be OK.

I am willing to take the above change as-is as a standalone patch,
but independently from that ...

> It is surprising find_shared_symref() is so expensive, though.  If
> you have a dozen worktrees linked to the current repository, there
> are at most a dozen HEAD that point at various refs in refs/heads/
> namespace.  Even if you need to check a thousand ref_map elements,
> it should cost almost nothing if you build a hashmap to find matches
> with these dozen HEADs upfront, no?
>
> Another thing that is surprising is that you say this loop is
> expensive when there are many tags or branches.  Do you mean it is
> expensive when there are many tags and branches that are updated, or
> it is expensive to merely have thousands of dormant tags and
> branches?  If the latter, I wonder if it is sensible to limit the
> check only to the refs that are going to be updated.

... I was wondering if an approach like the attached might be a
better way to proceed in the longer run.  Instead of (or in addition
to) reducing the number of calls to the function, we should see if
we can make the function less expensive.

In short, find_shared_symref(), and is_worktree_being_FUTZed(), is
following a wrong API pattern that encourages a loop to call them
repeatedly by taking 'target' parameter and doing a comparison
itself.  If you want to find if any of your 1000 refs violate what
they are trying to check, you'd need to call them 1000 times.

Instead, you should be able to learn which branch is to be protected
per each worktree and do the asking about your 1000 refs yourself.

If we instead find out what branch, other than the one pointed at by
the HEAD symref, each worktree is working on, just like we find out
what branch is pointed at by the HEAD symref, and store that
findings in the worktree structure, you should be able to walk the
list of worktrees just once to build a hashmap that lets you tell
which branches are not to be messed with before deciding if the
fetch should go through.

The following is not even compile tested, and some details may be
wrong, but I hope it is good enough to illustrate the idea.

 worktree.c | 73 ++++++++++++++++++++++++++++++++++----------------------------
 worktree.h |  5 +++++
 2 files changed, 45 insertions(+), 33 deletions(-)

diff --git c/worktree.c w/worktree.c
index 90fc085f76..2d96bd9dd1 100644
--- c/worktree.c
+++ w/worktree.c
@@ -15,6 +15,7 @@ void free_worktrees(struct worktree **worktrees)
 		free(worktrees[i]->path);
 		free(worktrees[i]->id);
 		free(worktrees[i]->head_ref);
+		free(worktrees[i]->protected_branch);
 		free(worktrees[i]->lock_reason);
 		free(worktrees[i]->prune_reason);
 		free(worktrees[i]);
@@ -22,13 +23,28 @@ void free_worktrees(struct worktree **worktrees)
 	free (worktrees);
 }
 
+int is_worktree_being_rebased(const struct worktree *wt,
+			      const char *target)
+{
+	return ((wt->protected_reason == WT_BRANCH_REBASING) &&
+		(!strcmp(target, wt->protected_branch)));
+}
+
+int is_worktree_being_bisected(const struct worktree *wt,
+			       const char *target)
+{
+	return ((wt->protected_reason == WT_BRANCH_BISECTING) &&
+		(!strcmp(target, wt->protected_branch)));
+}
+
 /**
- * Update head_oid, head_ref and is_detached of the given worktree
+ * Grab HEAD-related information of the given worktree
  */
 static void add_head_info(struct worktree *wt)
 {
 	int flags;
 	const char *target;
+	struct wt_status_state state;
 
 	target = refs_resolve_ref_unsafe(get_worktree_ref_store(wt),
 					 "HEAD",
@@ -41,6 +57,29 @@ static void add_head_info(struct worktree *wt)
 		wt->head_ref = xstrdup(target);
 	else
 		wt->is_detached = 1;
+
+	wt->protected_reason = 0;
+	memset(&state, 0, sizeof(state));
+	if (wt_status_check_rebase(wt, &state) &&
+	    (state.rebase_in_progress ||
+	     state.rebase_interactive_in_progress) &&
+	    state.branch &&
+	    skip_prefix(state.branch, "refs/heads/", &target)) {
+		wt->protected_branch = xstrdup(target);
+		wt->protected_reason = WT_BRANCH_REBASING;
+	}
+	wt_status_state_free_buffers(&state);
+
+	memset(&state, 0, sizeof(state));
+	if (wt_status_check_bisect(wt, &state) && state.branch &&
+	    skip_prefix(state.branch, "refs/heads/", &target)) {
+		if (wt->protected_reason)
+			BUG("branch '%s' being bisected and rebased at the same time?",
+			    wt->protected_branch);
+		wt->protected_branch = xstrdup(target);
+		wt->protected_reason = WT_BRANCH_BISECTING;
+	}
+	wt_status_state_free_buffers(&state);
 }
 
 /**
@@ -365,38 +404,6 @@ void update_worktree_location(struct worktree *wt, const char *path_)
 	strbuf_release(&path);
 }
 
-int is_worktree_being_rebased(const struct worktree *wt,
-			      const char *target)
-{
-	struct wt_status_state state;
-	int found_rebase;
-
-	memset(&state, 0, sizeof(state));
-	found_rebase = wt_status_check_rebase(wt, &state) &&
-		       (state.rebase_in_progress ||
-			state.rebase_interactive_in_progress) &&
-		       state.branch &&
-		       skip_prefix(target, "refs/heads/", &target) &&
-		       !strcmp(state.branch, target);
-	wt_status_state_free_buffers(&state);
-	return found_rebase;
-}
-
-int is_worktree_being_bisected(const struct worktree *wt,
-			       const char *target)
-{
-	struct wt_status_state state;
-	int found_bisect;
-
-	memset(&state, 0, sizeof(state));
-	found_bisect = wt_status_check_bisect(wt, &state) &&
-		       state.branch &&
-		       skip_prefix(target, "refs/heads/", &target) &&
-		       !strcmp(state.branch, target);
-	wt_status_state_free_buffers(&state);
-	return found_bisect;
-}
-
 /*
  * note: this function should be able to detect shared symref even if
  * HEAD is temporarily detached (e.g. in the middle of rebase or
diff --git c/worktree.h w/worktree.h
index e9e839926b..4e9d06c26a 100644
--- c/worktree.h
+++ w/worktree.h
@@ -10,6 +10,11 @@ struct worktree {
 	char *path;
 	char *id;
 	char *head_ref;		/* NULL if HEAD is broken or detached */
+	char *protected_branch; /* being rebased or bisected, othewise NULL */
+	enum {
+		WT_BRANCH_REBASING = 1,
+		WT_BRANCH_BISECTING,
+	} protected_reason;
 	char *lock_reason;	/* private - use worktree_lock_reason */
 	char *prune_reason;     /* private - use worktree_prune_reason */
 	struct object_id head_oid;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] fetch: limit shared symref check only for local branches
  2022-05-16 16:00   ` Junio C Hamano
  2022-05-16 17:57     ` Junio C Hamano
@ 2022-05-17  6:05     ` Orgad Shaneh
  2022-05-17 10:27       ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Orgad Shaneh @ 2022-05-17  6:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Orgad Shaneh via GitGitGadget, git

On Mon, May 16, 2022 at 7:01 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Orgad Shaneh via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Orgad Shaneh <orgads@gmail.com>
> >
> > This check was introduced in 8ee5d73137f (Fix fetch/pull when run without
> > --update-head-ok, 2008-10-13) in order to protect against replacing the ref
> > of the active branch by mistake, for example by running git fetch origin
> > master:master.
> >
> > It was later extended in 8bc1f39f411 (fetch: protect branches checked out
> > in all worktrees, 2021-12-01) to scan all worktrees.
> >
> > This operation is very expensive (takes about 30s in my repository) when
> > there are many tags or branches, and it is executed on every fetch, even if
> > no local heads are updated at all.
> >
> > Limit it to protect only refs/heads/* to improve fetch performance.
>
> The point of the check is to prevent the index+working tree in the
> worktrees to go out of sync with HEAD, and HEAD by definition can
> point only into refs/heads/*, this change should be OK.
>
> It is surprising find_shared_symref() is so expensive, though.  If
> you have a dozen worktrees linked to the current repository, there
> are at most a dozen HEAD that point at various refs in refs/heads/
> namespace.  Even if you need to check a thousand ref_map elements,
> it should cost almost nothing if you build a hashmap to find matches
> with these dozen HEADs upfront, no?

I also had this idea, but I'm not familiar enough with the codebase to
implement it. I see you already started that.

> Another thing that is surprising is that you say this loop is
> expensive when there are many tags or branches.  Do you mean it is
> expensive when there are many tags and branches that are updated, or
> it is expensive to merely have thousands of dormant tags and
> branches?  If the latter, I wonder if it is sensible to limit the
> check only to the refs that are going to be updated.

It's expensive even when *nothing* is updated. I have a repo with 44K
tags, 13K of the tags are annotated, 134 remote branches and 4
worktrees (except the main repo) with 33 local branches.

I counted the calls to find_shared_symref - it was called 35755 times,
and refs_read_raw_ref was called 357585 times.

- Orgad

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] fetch: limit shared symref check only for local branches
  2022-05-17  6:05     ` Orgad Shaneh
@ 2022-05-17 10:27       ` Junio C Hamano
  2022-05-17 10:41         ` Orgad Shaneh
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2022-05-17 10:27 UTC (permalink / raw)
  To: Orgad Shaneh; +Cc: Orgad Shaneh via GitGitGadget, git

Orgad Shaneh <orgads@gmail.com> writes:

>> Another thing that is surprising is that you say this loop is
>> expensive when there are many tags or branches.  Do you mean it is
>> expensive when there are many tags and branches that are updated, or
>> it is expensive to merely have thousands of dormant tags and
>> branches?  If the latter, I wonder if it is sensible to limit the
>> check only to the refs that are going to be updated.
>
> It's expensive even when *nothing* is updated. I have a repo with 44K
> tags, 13K of the tags are annotated, 134 remote branches and 4
> worktrees (except the main repo) with 33 local branches.
>
> I counted the calls to find_shared_symref - it was called 35755 times,
> and refs_read_raw_ref was called 357585 times.

That is exactly why I asked, as the above number hints that it could
be a viable optimization to omit calls for refs whose old_ and
new_oid are the same, just like you omit calls for refs that are not
inside refs/heads/ in your patch, perhaps?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] fetch: limit shared symref check only for local branches
  2022-05-17 10:27       ` Junio C Hamano
@ 2022-05-17 10:41         ` Orgad Shaneh
  2022-05-18 15:50           ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Orgad Shaneh @ 2022-05-17 10:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Orgad Shaneh via GitGitGadget, git

On Tue, May 17, 2022 at 1:27 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Orgad Shaneh <orgads@gmail.com> writes:
>
> >> Another thing that is surprising is that you say this loop is
> >> expensive when there are many tags or branches.  Do you mean it is
> >> expensive when there are many tags and branches that are updated, or
> >> it is expensive to merely have thousands of dormant tags and
> >> branches?  If the latter, I wonder if it is sensible to limit the
> >> check only to the refs that are going to be updated.
> >
> > It's expensive even when *nothing* is updated. I have a repo with 44K
> > tags, 13K of the tags are annotated, 134 remote branches and 4
> > worktrees (except the main repo) with 33 local branches.
> >
> > I counted the calls to find_shared_symref - it was called 35755 times,
> > and refs_read_raw_ref was called 357585 times.
>
> That is exactly why I asked, as the above number hints that it could
> be a viable optimization to omit calls for refs whose old_ and
> new_oid are the same, just like you omit calls for refs that are not
> inside refs/heads/ in your patch, perhaps?

This would require shuffling the code. check_not_current_branch() is
called by do_fetch before fetch_and_consume_refs (which updates
ref->old_oid and ref->new_oid).

- Orgad

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] fetch: limit shared symref check only for local branches
  2022-05-17 10:41         ` Orgad Shaneh
@ 2022-05-18 15:50           ` Junio C Hamano
  0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2022-05-18 15:50 UTC (permalink / raw)
  To: Orgad Shaneh; +Cc: Orgad Shaneh via GitGitGadget, git

Orgad Shaneh <orgads@gmail.com> writes:

> This would require shuffling the code. check_not_current_branch() is
> called by do_fetch before fetch_and_consume_refs (which updates
> ref->old_oid and ref->new_oid).

Oh, that's unfortunate.

We may not be matching them with the current values of the local
copies in ref_map in today's code until fetch_and_consume_refs()
calls store_updated_refs() to update the .new_oid member, but I
would have thought that we learned the equivalent of "git ls-remote"
output a lot earlier, at least most of the time, via a call to
transport_get_remote_refs(), and that was why I thought such an
optimization was doable.

Thanks.  

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-05-18 15:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-16  8:37 [PATCH] fetch: limit shared symref check only for local branches Orgad Shaneh via GitGitGadget
2022-05-16  8:41 ` [PATCH v2] " Orgad Shaneh via GitGitGadget
2022-05-16 16:00   ` Junio C Hamano
2022-05-16 17:57     ` Junio C Hamano
2022-05-17  6:05     ` Orgad Shaneh
2022-05-17 10:27       ` Junio C Hamano
2022-05-17 10:41         ` Orgad Shaneh
2022-05-18 15:50           ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).