From: Phil Hord <phil.hord@gmail.com>
To: Elijah Newren <newren@gmail.com>
Cc: Git <git@vger.kernel.org>,
"Martin Ågren" <martin.agren@gmail.com>,
"Junio C Hamano" <gitster@pobox.com>
Subject: Re: [PATCH v2] use delete_refs when deleting tags or branches
Date: Fri, 15 Jan 2021 18:27:19 -0800 [thread overview]
Message-ID: <CABURp0p5e7vz-5tGJ_bByqYAmDcf+TQwAtV4gjnaqsdqDx0Zow@mail.gmail.com> (raw)
In-Reply-To: <CABPp-BEUPH5Yc08uDehAXNQ5-3fJ9YeW0xscVBR45hniDe+HEg@mail.gmail.com>
On Fri, Jan 15, 2021 at 10:43 AM Elijah Newren <newren@gmail.com> wrote:
>
> Hi,
>
> On Thu, Jan 14, 2021 at 6:00 PM Phil Hord <phil.hord@gmail.com> wrote:
> >
> > I noticed this is still only in my local branch. Can I get an ACK/NAK?
>
> Sorry for missing this when you posted in August. Thanks for sending
> in the update from v1.
>
> For other reviewers: v1 is over here:
> https://lore.kernel.org/git/20190808035935.30023-1-phil.hord@gmail.com/,
> and has review comments from Martin, me, Peff, and Junio.
>
> > On Tue, Aug 13, 2019 at 7:49 PM Phil Hord <phil.hord@gmail.com> wrote:
> >>
> >> From: Phil Hord <phil.hord@gmail.com>
> >>
> >> 'git tag -d' and 'git branch -d' both accept one or more refs to
> >> delete, but each deletion is done by calling `delete_ref` on each argv.
> >> This is very slow when removing from packed refs as packed-refs is
> >> locked and rewritten each time. Use delete_refs instead so all the
> >> removals can be done inside a single transaction with a single update.
>
> Awesome, thanks for also fixing up git branch with v2.
>
> >> Since delete_refs performs all the packed-refs delete operations
> >> inside a single transaction, if any of the deletes fail then all
> >> them will be skipped. In practice, none of them should fail since
> >> we verify the hash of each one before calling delete_refs, but some
> >> network error or odd permissions problem could have different results
> >> after this change.
> >>
> >> Also, since the file-backed deletions are not performed in the same
> >> transaction, those could succeed even when the packed-refs transaction
> >> fails.
> >>
> >> After deleting refs, report the deletion's success only if the ref was
> >> actually deleted. For branch deletion, remove the branch config only
> >> if the branch ref is actually removed.
> >>
> >> A manual test deleting 24,000 tags took about 30 minutes using
> >> delete_ref. It takes about 5 seconds using delete_refs.
>
> As I said on v1, it's really nice to have this fixed. Thanks for doing it.
>
> >>
> >> Signed-off-by: Phil Hord <phil.hord@gmail.com>
> >> ---
> >> This reroll adds the same delete_refs change to 'git branch'. It checks
> >> individual refs after the operation to report correctly on each whether
> >> it was successfully deleted or not. Maybe this is an unnecessary step,
> >> though. This handles the weird case where some file system error
> >> prevented us from deleting refs, leaving us with an error from
> >> delete_refs but without any idea which refs might have been affected.
> >>
> >> builtin/branch.c | 50 +++++++++++++++++++++++++++++-------------------
> >> builtin/tag.c | 45 +++++++++++++++++++++++++++++++++----------
> >> 2 files changed, 65 insertions(+), 30 deletions(-)
> >>
> >> diff --git a/builtin/branch.c b/builtin/branch.c
> >> index 2ef214632f..2273239f41 100644
> >> --- a/builtin/branch.c
> >> +++ b/builtin/branch.c
> >> @@ -202,6 +202,9 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
> >> int remote_branch = 0;
> >> struct strbuf bname = STRBUF_INIT;
> >> unsigned allowed_interpret;
> >> + struct string_list refs_to_delete = STRING_LIST_INIT_DUP;
> >> + struct string_list_item *item;
> >> + int refname_pos = 0;
> >>
> >> switch (kinds) {
> >> case FILTER_REFS_REMOTES:
> >> @@ -209,12 +212,13 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
> >> /* For subsequent UI messages */
> >> remote_branch = 1;
> >> allowed_interpret = INTERPRET_BRANCH_REMOTE;
> >> -
> >> + refname_pos = 13;
> >> force = 1;
> >> break;
> >> case FILTER_REFS_BRANCHES:
> >> fmt = "refs/heads/%s";
> >> allowed_interpret = INTERPRET_BRANCH_LOCAL;
> >> + refname_pos = 11;
> >> break;
> >> default:
> >> die(_("cannot use -a with -d"));
> >> @@ -265,30 +269,36 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
> >> goto next;
> >> }
> >>
> >> - if (delete_ref(NULL, name, is_null_oid(&oid) ? NULL : &oid,
> >> - REF_NO_DEREF)) {
> >> - error(remote_branch
> >> - ? _("Error deleting remote-tracking branch '%s'")
> >> - : _("Error deleting branch '%s'"),
> >> - bname.buf);
> >> - ret = 1;
> >> - goto next;
>
> The code used to set the return code to 1 if it failed to delete a branch
>
> >> - }
> >> - if (!quiet) {
> >> - printf(remote_branch
> >> - ? _("Deleted remote-tracking branch %s (was %s).\n")
> >> - : _("Deleted branch %s (was %s).\n"),
> >> - bname.buf,
> >> - (flags & REF_ISBROKEN) ? "broken"
> >> - : (flags & REF_ISSYMREF) ? target
> >> - : find_unique_abbrev(&oid, DEFAULT_ABBREV));
> >> - }
> >> - delete_branch_config(bname.buf);
> >> + item = string_list_append(&refs_to_delete, name);
> >> + item->util = xstrdup((flags & REF_ISBROKEN) ? "broken"
> >> + : (flags & REF_ISSYMREF) ? target
> >> + : find_unique_abbrev(&oid, DEFAULT_ABBREV));
> >>
> >> next:
> >> free(target);
> >> }
> >>
> >> + delete_refs(NULL, &refs_to_delete, REF_NO_DEREF);
> >> +
> >> + for_each_string_list_item(item, &refs_to_delete) {
> >> + char * describe_ref = item->util;
> >> + char * name = item->string;
> >> + if (ref_exists(name))
> >> + ret = 1;
>
> Now it sets the return code if the branch still exists after trying to
> delete. I thought that was subtly different...but I tried doing a
> branch deletion of a non-existent branch since I thought that would be
> the only difference -- however, that errors out earlier in the
> codepath before even getting to the stage of deleting refs. So I
> think these are effectively the same.
>
> >> + else {
> >> + char * refname = name + refname_pos;
> >> + if (!quiet)
> >> + printf(remote_branch
> >> + ? _("Deleted remote-tracking branch %s (was %s).\n")
> >> + : _("Deleted branch %s (was %s).\n"),
> >> + name + refname_pos, describe_ref);
>
> Neither remote_branch nor refname_pos are changing throughout this
> loop, which I at first thought was in error, but it looks like git
> branch only allows you to delete one type or the other -- not a
> mixture. So this is correct.
>
> >> +
> >> + delete_branch_config(refname);
> >> + }
> >> + free(describe_ref);
> >> + }
> >> + string_list_clear(&refs_to_delete, 0);
> >> +
> >> free(name);
> >> strbuf_release(&bname);
> >>
> >> diff --git a/builtin/tag.c b/builtin/tag.c
> >> index e0a4c25382..0d11ffcd04 100644
> >> --- a/builtin/tag.c
> >> +++ b/builtin/tag.c
> >> @@ -72,10 +72,10 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
> >> }
> >>
> >> typedef int (*each_tag_name_fn)(const char *name, const char *ref,
> >> - const struct object_id *oid, const void *cb_data);
> >> + const struct object_id *oid, void *cb_data);
> >>
> >> static int for_each_tag_name(const char **argv, each_tag_name_fn fn,
> >> - const void *cb_data)
> >> + void *cb_data)
> >> {
> >> const char **p;
> >> struct strbuf ref = STRBUF_INIT;
> >> @@ -97,18 +97,43 @@ static int for_each_tag_name(const char **argv, each_tag_name_fn fn,
> >> return had_error;
> >> }
> >>
> >> -static int delete_tag(const char *name, const char *ref,
> >> - const struct object_id *oid, const void *cb_data)
> >> +static int collect_tags(const char *name, const char *ref,
> >> + const struct object_id *oid, void *cb_data)
> >> {
> >> - if (delete_ref(NULL, ref, oid, 0))
> >> - return 1;
>
> This used to return 1 if it failed to delete a ref.
>
> >> - printf(_("Deleted tag '%s' (was %s)\n"), name,
> >> - find_unique_abbrev(oid, DEFAULT_ABBREV));
> >> + struct string_list *ref_list = cb_data;
> >> +
> >> + string_list_append(ref_list, ref);
> >> + ref_list->items[ref_list->nr - 1].util = oiddup(oid);
> >> return 0;
>
> Now it unconditionally returns 0.
>
> >> }
> >>
> >> +static int delete_tags(const char **argv)
> >> +{
> >> + int result;
> >> + struct string_list refs_to_delete = STRING_LIST_INIT_DUP;
> >> + struct string_list_item *item;
> >> +
> >> + result = for_each_tag_name(argv, collect_tags, (void *)&refs_to_delete);
> >> + delete_refs(NULL, &refs_to_delete, REF_NO_DEREF);
>
> You now only look at the result of collecting the tags, and ignore the
> result of trying to delete them...
>
> >> +
> >> + for_each_string_list_item(item, &refs_to_delete) {
> >> + const char * name = item->string;
> >> + struct object_id * oid = item->util;
> >> + if (ref_exists(name))
> >> + result = 1;
>
> ...except that you check if the refs still exist afterward and set the
> return code based on it. Like with the branch case, I can't come up
> with a case where the difference matters. I suspect there's a race
> condition there somewhere, but once you start going down that road I
> think the old code may have had a bunch of races too. It might be
> nice to document with a comment that there's a small race condition
> with someone else trying to forcibly re-create the ref at the same
> time you are trying to delete, but I don't think it's a big deal.
>
> If you did use the result of delete_refs(), you might have to double
> check that the callers (git.c:handle_builtin() -> git.c:run_builtin()
> -> builtin/tag.c:cmd_tag() -> builtin/tag.c:delete_tags()) are all
> okay with the return code; it looks like handle_builtin() would pass
> the return code to exit() and the git-tag manpage doesn't document the
> return status, so you've at least got some leeway in terms of what
> values are acceptable. Or you could just normalize the return value
> of delete_refs() down to 0 or 1. But you'd only need to worry about
> that if the race condition is something we're worried enough to
> tackle.
Interesting. I was worried about imposing a requirement on delete_refs
that any non-zero return must mean that something was not deleted
which should have been. Maybe that's not such a worry, though, and it
would be acceptable to return a 1 even if all the refs were deleted
even though some error occurred further down the line.
I tried normalizing the value and then also verifying each ref was
removed, but that seemed wrong. Maybe it's ok to just normalize it
and not react to still-existing refs.
> >> + else
> >> + printf(_("Deleted tag '%s' (was %s)\n"),
> >> + item->string + 10,
> >> + find_unique_abbrev(oid, DEFAULT_ABBREV));
> >> +
> >> + free(oid);
> >> + }
> >> + string_list_clear(&refs_to_delete, 0);
> >> + return result;
> >> +}
> >> +
> >> static int verify_tag(const char *name, const char *ref,
> >> - const struct object_id *oid, const void *cb_data)
> >> + const struct object_id *oid, void *cb_data)
> >> {
> >> int flags;
> >> const struct ref_format *format = cb_data;
> >> @@ -511,7 +536,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
> >> if (filter.merge_commit)
> >> die(_("--merged and --no-merged options are only allowed in list mode"));
> >> if (cmdmode == 'd')
> >> - return for_each_tag_name(argv, delete_tag, NULL);
> >> + return delete_tags(argv);
> >> if (cmdmode == 'v') {
> >> if (format.format && verify_ref_format(&format))
> >> usage_with_options(git_tag_usage, options);
> >> --
> >> 2.23.0.rc1.174.g4cc1b04b4c
>
> Overall, I like the patch. Peff commented on v1 that the basic idea
> (use the part of the refs API that batches operations) is the right
> thing to do. I'm not that familiar with refs-touching code, but your
> patch makes sense to me. I think I spotted a minor issue (you ignore
> the return status of delete_refs(), then later check the existence of
> the refs afterwards to determine success, which I believe is a minor
> and unlikely race condition), but I'm not sure it's worth fixing;
> perhaps just mark it with #leftoverbits and move on -- the faster
> branch and tag deletion is a very nice improvement.
>
> I notice Martin said on v1 that there was a testcase that had problems
> with your patch; I tested v2 and it looks like you fixed any such
> issues. I think you also addressed the feedback from Junio, though
> his comments about the return code and the minor race condition I
> noticed around it might mean it'd be good to get his comments.
>
> Anyway,
> Acked-by: Elijah Newren <newren@gmail.com>
>
> I would say Reviewed-by, but I'd like to get Junio's comments on the
> return code and minor race.
Thanks for the detailed review and thoughts.
prev parent reply other threads:[~2021-01-16 2:29 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-14 2:49 [PATCH v2] use delete_refs when deleting tags or branches Phil Hord
[not found] ` <CABURp0rtkmo7MSSCVrdNXT0UzV9XqV_kXOGkC23C+_vMENNJUg@mail.gmail.com>
2021-01-15 18:43 ` Elijah Newren
2021-01-15 20:06 ` Junio C Hamano
2021-01-15 22:04 ` Junio C Hamano
2021-01-16 2:27 ` Phil Hord [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CABURp0p5e7vz-5tGJ_bByqYAmDcf+TQwAtV4gjnaqsdqDx0Zow@mail.gmail.com \
--to=phil.hord@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=martin.agren@gmail.com \
--cc=newren@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).