* [PATCH] fetch: limit shared symref check only for local branches @ 2022-05-16 8:37 Orgad Shaneh via GitGitGadget 2022-05-16 8:41 ` [PATCH v2] " Orgad Shaneh via GitGitGadget 0 siblings, 1 reply; 8+ messages in thread From: Orgad Shaneh via GitGitGadget @ 2022-05-16 8:37 UTC (permalink / raw) To: git; +Cc: Orgad Shaneh, Orgad Shaneh From: Orgad Shaneh <orgads@gmail.com> This check was introduced in 8ee5d73137f (Fix fetch/pull when run without --update-head-ok, 2008-10-13) in order to protect against replacing the ref of the active branch by mistake, for example by running git fetch origin master:master. It was later extended in 8bc1f39f411 (fetch: protect branches checked out in all worktrees, 2021-12-01) to scan all worktrees. This operation is very expensive (takes about 30s in my repository) when there are many tags or branches, and it is executed on every fetch, even if no local heads are updated at all. Limit it to protect only refs/heads/* to improve fetch performance. Signed-off-by: Orgad Shaneh <orgads@gmail.com> --- fetch: limit shared symref check only for local branches Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1266%2Forgads%2Ffetch-perf-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1266/orgads/fetch-perf-v1 Pull-Request: https://github.com/git/git/pull/1266 builtin/fetch.c | 1 + 1 file changed, 1 insertion(+) diff --git a/builtin/fetch.c b/builtin/fetch.c index e3791f09ed5..4f16fc357b0 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -1440,6 +1440,7 @@ static void check_not_current_branch(struct ref *ref_map, const struct worktree *wt; for (; ref_map; ref_map = ref_map->next) if (ref_map->peer_ref && + starts_with(ref_map->peer_ref->name, "refs/heads/") && (wt = find_shared_symref(worktrees, "HEAD", ref_map->peer_ref->name)) && !wt->is_bare) base-commit: 277cf0bc36094f6dc4297d8c9cef79df045b735d -- gitgitgadget ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2] fetch: limit shared symref check only for local branches 2022-05-16 8:37 [PATCH] fetch: limit shared symref check only for local branches Orgad Shaneh via GitGitGadget @ 2022-05-16 8:41 ` Orgad Shaneh via GitGitGadget 2022-05-16 16:00 ` Junio C Hamano 0 siblings, 1 reply; 8+ messages in thread From: Orgad Shaneh via GitGitGadget @ 2022-05-16 8:41 UTC (permalink / raw) To: git; +Cc: Orgad Shaneh, Orgad Shaneh From: Orgad Shaneh <orgads@gmail.com> This check was introduced in 8ee5d73137f (Fix fetch/pull when run without --update-head-ok, 2008-10-13) in order to protect against replacing the ref of the active branch by mistake, for example by running git fetch origin master:master. It was later extended in 8bc1f39f411 (fetch: protect branches checked out in all worktrees, 2021-12-01) to scan all worktrees. This operation is very expensive (takes about 30s in my repository) when there are many tags or branches, and it is executed on every fetch, even if no local heads are updated at all. Limit it to protect only refs/heads/* to improve fetch performance. Signed-off-by: Orgad Shaneh <orgads@gmail.com> --- fetch: limit shared symref check only for local branches Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1266%2Forgads%2Ffetch-perf-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1266/orgads/fetch-perf-v2 Pull-Request: https://github.com/git/git/pull/1266 Range-diff vs v1: 1: 5e86dc86d3d ! 1: 72bea90b26f fetch: limit shared symref check only for local branches @@ builtin/fetch.c: static void check_not_current_branch(struct ref *ref_map, const struct worktree *wt; for (; ref_map; ref_map = ref_map->next) if (ref_map->peer_ref && -+ starts_with(ref_map->peer_ref->name, "refs/heads/") && ++ starts_with(ref_map->peer_ref->name, "refs/heads/") && (wt = find_shared_symref(worktrees, "HEAD", ref_map->peer_ref->name)) && !wt->is_bare) builtin/fetch.c | 1 + 1 file changed, 1 insertion(+) diff --git a/builtin/fetch.c b/builtin/fetch.c index e3791f09ed5..eeee5ac8f15 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -1440,6 +1440,7 @@ static void check_not_current_branch(struct ref *ref_map, const struct worktree *wt; for (; ref_map; ref_map = ref_map->next) if (ref_map->peer_ref && + starts_with(ref_map->peer_ref->name, "refs/heads/") && (wt = find_shared_symref(worktrees, "HEAD", ref_map->peer_ref->name)) && !wt->is_bare) base-commit: 277cf0bc36094f6dc4297d8c9cef79df045b735d -- gitgitgadget ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2] fetch: limit shared symref check only for local branches 2022-05-16 8:41 ` [PATCH v2] " Orgad Shaneh via GitGitGadget @ 2022-05-16 16:00 ` Junio C Hamano 2022-05-16 17:57 ` Junio C Hamano 2022-05-17 6:05 ` Orgad Shaneh 0 siblings, 2 replies; 8+ messages in thread From: Junio C Hamano @ 2022-05-16 16:00 UTC (permalink / raw) To: Orgad Shaneh via GitGitGadget; +Cc: git, Orgad Shaneh "Orgad Shaneh via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Orgad Shaneh <orgads@gmail.com> > > This check was introduced in 8ee5d73137f (Fix fetch/pull when run without > --update-head-ok, 2008-10-13) in order to protect against replacing the ref > of the active branch by mistake, for example by running git fetch origin > master:master. > > It was later extended in 8bc1f39f411 (fetch: protect branches checked out > in all worktrees, 2021-12-01) to scan all worktrees. > > This operation is very expensive (takes about 30s in my repository) when > there are many tags or branches, and it is executed on every fetch, even if > no local heads are updated at all. > > Limit it to protect only refs/heads/* to improve fetch performance. The point of the check is to prevent the index+working tree in the worktrees to go out of sync with HEAD, and HEAD by definition can point only into refs/heads/*, this change should be OK. It is surprising find_shared_symref() is so expensive, though. If you have a dozen worktrees linked to the current repository, there are at most a dozen HEAD that point at various refs in refs/heads/ namespace. Even if you need to check a thousand ref_map elements, it should cost almost nothing if you build a hashmap to find matches with these dozen HEADs upfront, no? Another thing that is surprising is that you say this loop is expensive when there are many tags or branches. Do you mean it is expensive when there are many tags and branches that are updated, or it is expensive to merely have thousands of dormant tags and branches? If the latter, I wonder if it is sensible to limit the check only to the refs that are going to be updated. > diff --git a/builtin/fetch.c b/builtin/fetch.c > index e3791f09ed5..eeee5ac8f15 100644 > --- a/builtin/fetch.c > +++ b/builtin/fetch.c > @@ -1440,6 +1440,7 @@ static void check_not_current_branch(struct ref *ref_map, > const struct worktree *wt; > for (; ref_map; ref_map = ref_map->next) > if (ref_map->peer_ref && > + starts_with(ref_map->peer_ref->name, "refs/heads/") && > (wt = find_shared_symref(worktrees, "HEAD", > ref_map->peer_ref->name)) && > !wt->is_bare) > > base-commit: 277cf0bc36094f6dc4297d8c9cef79df045b735d ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] fetch: limit shared symref check only for local branches 2022-05-16 16:00 ` Junio C Hamano @ 2022-05-16 17:57 ` Junio C Hamano 2022-05-17 6:05 ` Orgad Shaneh 1 sibling, 0 replies; 8+ messages in thread From: Junio C Hamano @ 2022-05-16 17:57 UTC (permalink / raw) To: Orgad Shaneh via GitGitGadget; +Cc: git, Orgad Shaneh Junio C Hamano <gitster@pobox.com> writes: >> Limit it to protect only refs/heads/* to improve fetch performance. > > The point of the check is to prevent the index+working tree in the > worktrees to go out of sync with HEAD, and HEAD by definition can > point only into refs/heads/*, this change should be OK. I am willing to take the above change as-is as a standalone patch, but independently from that ... > It is surprising find_shared_symref() is so expensive, though. If > you have a dozen worktrees linked to the current repository, there > are at most a dozen HEAD that point at various refs in refs/heads/ > namespace. Even if you need to check a thousand ref_map elements, > it should cost almost nothing if you build a hashmap to find matches > with these dozen HEADs upfront, no? > > Another thing that is surprising is that you say this loop is > expensive when there are many tags or branches. Do you mean it is > expensive when there are many tags and branches that are updated, or > it is expensive to merely have thousands of dormant tags and > branches? If the latter, I wonder if it is sensible to limit the > check only to the refs that are going to be updated. ... I was wondering if an approach like the attached might be a better way to proceed in the longer run. Instead of (or in addition to) reducing the number of calls to the function, we should see if we can make the function less expensive. In short, find_shared_symref(), and is_worktree_being_FUTZed(), is following a wrong API pattern that encourages a loop to call them repeatedly by taking 'target' parameter and doing a comparison itself. If you want to find if any of your 1000 refs violate what they are trying to check, you'd need to call them 1000 times. Instead, you should be able to learn which branch is to be protected per each worktree and do the asking about your 1000 refs yourself. If we instead find out what branch, other than the one pointed at by the HEAD symref, each worktree is working on, just like we find out what branch is pointed at by the HEAD symref, and store that findings in the worktree structure, you should be able to walk the list of worktrees just once to build a hashmap that lets you tell which branches are not to be messed with before deciding if the fetch should go through. The following is not even compile tested, and some details may be wrong, but I hope it is good enough to illustrate the idea. worktree.c | 73 ++++++++++++++++++++++++++++++++++---------------------------- worktree.h | 5 +++++ 2 files changed, 45 insertions(+), 33 deletions(-) diff --git c/worktree.c w/worktree.c index 90fc085f76..2d96bd9dd1 100644 --- c/worktree.c +++ w/worktree.c @@ -15,6 +15,7 @@ void free_worktrees(struct worktree **worktrees) free(worktrees[i]->path); free(worktrees[i]->id); free(worktrees[i]->head_ref); + free(worktrees[i]->protected_branch); free(worktrees[i]->lock_reason); free(worktrees[i]->prune_reason); free(worktrees[i]); @@ -22,13 +23,28 @@ void free_worktrees(struct worktree **worktrees) free (worktrees); } +int is_worktree_being_rebased(const struct worktree *wt, + const char *target) +{ + return ((wt->protected_reason == WT_BRANCH_REBASING) && + (!strcmp(target, wt->protected_branch))); +} + +int is_worktree_being_bisected(const struct worktree *wt, + const char *target) +{ + return ((wt->protected_reason == WT_BRANCH_BISECTING) && + (!strcmp(target, wt->protected_branch))); +} + /** - * Update head_oid, head_ref and is_detached of the given worktree + * Grab HEAD-related information of the given worktree */ static void add_head_info(struct worktree *wt) { int flags; const char *target; + struct wt_status_state state; target = refs_resolve_ref_unsafe(get_worktree_ref_store(wt), "HEAD", @@ -41,6 +57,29 @@ static void add_head_info(struct worktree *wt) wt->head_ref = xstrdup(target); else wt->is_detached = 1; + + wt->protected_reason = 0; + memset(&state, 0, sizeof(state)); + if (wt_status_check_rebase(wt, &state) && + (state.rebase_in_progress || + state.rebase_interactive_in_progress) && + state.branch && + skip_prefix(state.branch, "refs/heads/", &target)) { + wt->protected_branch = xstrdup(target); + wt->protected_reason = WT_BRANCH_REBASING; + } + wt_status_state_free_buffers(&state); + + memset(&state, 0, sizeof(state)); + if (wt_status_check_bisect(wt, &state) && state.branch && + skip_prefix(state.branch, "refs/heads/", &target)) { + if (wt->protected_reason) + BUG("branch '%s' being bisected and rebased at the same time?", + wt->protected_branch); + wt->protected_branch = xstrdup(target); + wt->protected_reason = WT_BRANCH_BISECTING; + } + wt_status_state_free_buffers(&state); } /** @@ -365,38 +404,6 @@ void update_worktree_location(struct worktree *wt, const char *path_) strbuf_release(&path); } -int is_worktree_being_rebased(const struct worktree *wt, - const char *target) -{ - struct wt_status_state state; - int found_rebase; - - memset(&state, 0, sizeof(state)); - found_rebase = wt_status_check_rebase(wt, &state) && - (state.rebase_in_progress || - state.rebase_interactive_in_progress) && - state.branch && - skip_prefix(target, "refs/heads/", &target) && - !strcmp(state.branch, target); - wt_status_state_free_buffers(&state); - return found_rebase; -} - -int is_worktree_being_bisected(const struct worktree *wt, - const char *target) -{ - struct wt_status_state state; - int found_bisect; - - memset(&state, 0, sizeof(state)); - found_bisect = wt_status_check_bisect(wt, &state) && - state.branch && - skip_prefix(target, "refs/heads/", &target) && - !strcmp(state.branch, target); - wt_status_state_free_buffers(&state); - return found_bisect; -} - /* * note: this function should be able to detect shared symref even if * HEAD is temporarily detached (e.g. in the middle of rebase or diff --git c/worktree.h w/worktree.h index e9e839926b..4e9d06c26a 100644 --- c/worktree.h +++ w/worktree.h @@ -10,6 +10,11 @@ struct worktree { char *path; char *id; char *head_ref; /* NULL if HEAD is broken or detached */ + char *protected_branch; /* being rebased or bisected, othewise NULL */ + enum { + WT_BRANCH_REBASING = 1, + WT_BRANCH_BISECTING, + } protected_reason; char *lock_reason; /* private - use worktree_lock_reason */ char *prune_reason; /* private - use worktree_prune_reason */ struct object_id head_oid; ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2] fetch: limit shared symref check only for local branches 2022-05-16 16:00 ` Junio C Hamano 2022-05-16 17:57 ` Junio C Hamano @ 2022-05-17 6:05 ` Orgad Shaneh 2022-05-17 10:27 ` Junio C Hamano 1 sibling, 1 reply; 8+ messages in thread From: Orgad Shaneh @ 2022-05-17 6:05 UTC (permalink / raw) To: Junio C Hamano; +Cc: Orgad Shaneh via GitGitGadget, git On Mon, May 16, 2022 at 7:01 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Orgad Shaneh via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Orgad Shaneh <orgads@gmail.com> > > > > This check was introduced in 8ee5d73137f (Fix fetch/pull when run without > > --update-head-ok, 2008-10-13) in order to protect against replacing the ref > > of the active branch by mistake, for example by running git fetch origin > > master:master. > > > > It was later extended in 8bc1f39f411 (fetch: protect branches checked out > > in all worktrees, 2021-12-01) to scan all worktrees. > > > > This operation is very expensive (takes about 30s in my repository) when > > there are many tags or branches, and it is executed on every fetch, even if > > no local heads are updated at all. > > > > Limit it to protect only refs/heads/* to improve fetch performance. > > The point of the check is to prevent the index+working tree in the > worktrees to go out of sync with HEAD, and HEAD by definition can > point only into refs/heads/*, this change should be OK. > > It is surprising find_shared_symref() is so expensive, though. If > you have a dozen worktrees linked to the current repository, there > are at most a dozen HEAD that point at various refs in refs/heads/ > namespace. Even if you need to check a thousand ref_map elements, > it should cost almost nothing if you build a hashmap to find matches > with these dozen HEADs upfront, no? I also had this idea, but I'm not familiar enough with the codebase to implement it. I see you already started that. > Another thing that is surprising is that you say this loop is > expensive when there are many tags or branches. Do you mean it is > expensive when there are many tags and branches that are updated, or > it is expensive to merely have thousands of dormant tags and > branches? If the latter, I wonder if it is sensible to limit the > check only to the refs that are going to be updated. It's expensive even when *nothing* is updated. I have a repo with 44K tags, 13K of the tags are annotated, 134 remote branches and 4 worktrees (except the main repo) with 33 local branches. I counted the calls to find_shared_symref - it was called 35755 times, and refs_read_raw_ref was called 357585 times. - Orgad ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] fetch: limit shared symref check only for local branches 2022-05-17 6:05 ` Orgad Shaneh @ 2022-05-17 10:27 ` Junio C Hamano 2022-05-17 10:41 ` Orgad Shaneh 0 siblings, 1 reply; 8+ messages in thread From: Junio C Hamano @ 2022-05-17 10:27 UTC (permalink / raw) To: Orgad Shaneh; +Cc: Orgad Shaneh via GitGitGadget, git Orgad Shaneh <orgads@gmail.com> writes: >> Another thing that is surprising is that you say this loop is >> expensive when there are many tags or branches. Do you mean it is >> expensive when there are many tags and branches that are updated, or >> it is expensive to merely have thousands of dormant tags and >> branches? If the latter, I wonder if it is sensible to limit the >> check only to the refs that are going to be updated. > > It's expensive even when *nothing* is updated. I have a repo with 44K > tags, 13K of the tags are annotated, 134 remote branches and 4 > worktrees (except the main repo) with 33 local branches. > > I counted the calls to find_shared_symref - it was called 35755 times, > and refs_read_raw_ref was called 357585 times. That is exactly why I asked, as the above number hints that it could be a viable optimization to omit calls for refs whose old_ and new_oid are the same, just like you omit calls for refs that are not inside refs/heads/ in your patch, perhaps? ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] fetch: limit shared symref check only for local branches 2022-05-17 10:27 ` Junio C Hamano @ 2022-05-17 10:41 ` Orgad Shaneh 2022-05-18 15:50 ` Junio C Hamano 0 siblings, 1 reply; 8+ messages in thread From: Orgad Shaneh @ 2022-05-17 10:41 UTC (permalink / raw) To: Junio C Hamano; +Cc: Orgad Shaneh via GitGitGadget, git On Tue, May 17, 2022 at 1:27 PM Junio C Hamano <gitster@pobox.com> wrote: > > Orgad Shaneh <orgads@gmail.com> writes: > > >> Another thing that is surprising is that you say this loop is > >> expensive when there are many tags or branches. Do you mean it is > >> expensive when there are many tags and branches that are updated, or > >> it is expensive to merely have thousands of dormant tags and > >> branches? If the latter, I wonder if it is sensible to limit the > >> check only to the refs that are going to be updated. > > > > It's expensive even when *nothing* is updated. I have a repo with 44K > > tags, 13K of the tags are annotated, 134 remote branches and 4 > > worktrees (except the main repo) with 33 local branches. > > > > I counted the calls to find_shared_symref - it was called 35755 times, > > and refs_read_raw_ref was called 357585 times. > > That is exactly why I asked, as the above number hints that it could > be a viable optimization to omit calls for refs whose old_ and > new_oid are the same, just like you omit calls for refs that are not > inside refs/heads/ in your patch, perhaps? This would require shuffling the code. check_not_current_branch() is called by do_fetch before fetch_and_consume_refs (which updates ref->old_oid and ref->new_oid). - Orgad ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2] fetch: limit shared symref check only for local branches 2022-05-17 10:41 ` Orgad Shaneh @ 2022-05-18 15:50 ` Junio C Hamano 0 siblings, 0 replies; 8+ messages in thread From: Junio C Hamano @ 2022-05-18 15:50 UTC (permalink / raw) To: Orgad Shaneh; +Cc: Orgad Shaneh via GitGitGadget, git Orgad Shaneh <orgads@gmail.com> writes: > This would require shuffling the code. check_not_current_branch() is > called by do_fetch before fetch_and_consume_refs (which updates > ref->old_oid and ref->new_oid). Oh, that's unfortunate. We may not be matching them with the current values of the local copies in ref_map in today's code until fetch_and_consume_refs() calls store_updated_refs() to update the .new_oid member, but I would have thought that we learned the equivalent of "git ls-remote" output a lot earlier, at least most of the time, via a call to transport_get_remote_refs(), and that was why I thought such an optimization was doable. Thanks. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-05-18 15:53 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-05-16 8:37 [PATCH] fetch: limit shared symref check only for local branches Orgad Shaneh via GitGitGadget 2022-05-16 8:41 ` [PATCH v2] " Orgad Shaneh via GitGitGadget 2022-05-16 16:00 ` Junio C Hamano 2022-05-16 17:57 ` Junio C Hamano 2022-05-17 6:05 ` Orgad Shaneh 2022-05-17 10:27 ` Junio C Hamano 2022-05-17 10:41 ` Orgad Shaneh 2022-05-18 15:50 ` Junio C Hamano
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).