git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH v2] use delete_refs when deleting tags or branches
@ 2019-08-14  2:49 Phil Hord
       [not found] ` <CABURp0rtkmo7MSSCVrdNXT0UzV9XqV_kXOGkC23C+_vMENNJUg@mail.gmail.com>
  0 siblings, 1 reply; 5+ messages in thread
From: Phil Hord @ 2019-08-14  2:49 UTC (permalink / raw)
  To: git; +Cc: Martin Ågren, Elijah Newren, Junio C Hamano, Phil Hord

From: Phil Hord <phil.hord@gmail.com>

'git tag -d' and 'git branch -d' both accept one or more refs to
delete, but each deletion is done by calling `delete_ref` on each argv.
This is very slow when removing from packed refs as packed-refs is
locked and rewritten each time. Use delete_refs instead so all the
removals can be done inside a single transaction with a single update.

Since delete_refs performs all the packed-refs delete operations
inside a single transaction, if any of the deletes fail then all
them will be skipped. In practice, none of them should fail since
we verify the hash of each one before calling delete_refs, but some
network error or odd permissions problem could have different results
after this change.

Also, since the file-backed deletions are not performed in the same
transaction, those could succeed even when the packed-refs transaction
fails.

After deleting refs, report the deletion's success only if the ref was
actually deleted. For branch deletion, remove the branch config only
if the branch ref is actually removed.

A manual test deleting 24,000 tags took about 30 minutes using
delete_ref.  It takes about 5 seconds using delete_refs.

Signed-off-by: Phil Hord <phil.hord@gmail.com>
---
This reroll adds the same delete_refs change to 'git branch'. It checks 
individual refs after the operation to report correctly on each whether
it was successfully deleted or not. Maybe this is an unnecessary step,
though. This handles the weird case where some file system error
prevented us from deleting refs, leaving us with an error from 
delete_refs but without any idea which refs might have been affected.

 builtin/branch.c | 50 +++++++++++++++++++++++++++++-------------------
 builtin/tag.c    | 45 +++++++++++++++++++++++++++++++++----------
 2 files changed, 65 insertions(+), 30 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index 2ef214632f..2273239f41 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -202,6 +202,9 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
 	int remote_branch = 0;
 	struct strbuf bname = STRBUF_INIT;
 	unsigned allowed_interpret;
+	struct string_list refs_to_delete = STRING_LIST_INIT_DUP;
+	struct string_list_item *item;
+	int refname_pos = 0;
 
 	switch (kinds) {
 	case FILTER_REFS_REMOTES:
@@ -209,12 +212,13 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
 		/* For subsequent UI messages */
 		remote_branch = 1;
 		allowed_interpret = INTERPRET_BRANCH_REMOTE;
-
+		refname_pos = 13;
 		force = 1;
 		break;
 	case FILTER_REFS_BRANCHES:
 		fmt = "refs/heads/%s";
 		allowed_interpret = INTERPRET_BRANCH_LOCAL;
+		refname_pos = 11;
 		break;
 	default:
 		die(_("cannot use -a with -d"));
@@ -265,30 +269,36 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
 			goto next;
 		}
 
-		if (delete_ref(NULL, name, is_null_oid(&oid) ? NULL : &oid,
-			       REF_NO_DEREF)) {
-			error(remote_branch
-			      ? _("Error deleting remote-tracking branch '%s'")
-			      : _("Error deleting branch '%s'"),
-			      bname.buf);
-			ret = 1;
-			goto next;
-		}
-		if (!quiet) {
-			printf(remote_branch
-			       ? _("Deleted remote-tracking branch %s (was %s).\n")
-			       : _("Deleted branch %s (was %s).\n"),
-			       bname.buf,
-			       (flags & REF_ISBROKEN) ? "broken"
-			       : (flags & REF_ISSYMREF) ? target
-			       : find_unique_abbrev(&oid, DEFAULT_ABBREV));
-		}
-		delete_branch_config(bname.buf);
+		item = string_list_append(&refs_to_delete, name);
+		item->util = xstrdup((flags & REF_ISBROKEN) ? "broken"
+				    : (flags & REF_ISSYMREF) ? target
+				    : find_unique_abbrev(&oid, DEFAULT_ABBREV));
 
 	next:
 		free(target);
 	}
 
+	delete_refs(NULL, &refs_to_delete, REF_NO_DEREF);
+
+	for_each_string_list_item(item, &refs_to_delete) {
+		char * describe_ref = item->util;
+		char * name = item->string;
+		if (ref_exists(name))
+			ret = 1;
+		else {
+			char * refname = name + refname_pos;
+			if (!quiet)
+				printf(remote_branch
+					? _("Deleted remote-tracking branch %s (was %s).\n")
+					: _("Deleted branch %s (was %s).\n"),
+					name + refname_pos, describe_ref);
+
+			delete_branch_config(refname);
+		}
+		free(describe_ref);
+	}
+	string_list_clear(&refs_to_delete, 0);
+
 	free(name);
 	strbuf_release(&bname);
 
diff --git a/builtin/tag.c b/builtin/tag.c
index e0a4c25382..0d11ffcd04 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -72,10 +72,10 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
 }
 
 typedef int (*each_tag_name_fn)(const char *name, const char *ref,
-				const struct object_id *oid, const void *cb_data);
+				const struct object_id *oid, void *cb_data);
 
 static int for_each_tag_name(const char **argv, each_tag_name_fn fn,
-			     const void *cb_data)
+			     void *cb_data)
 {
 	const char **p;
 	struct strbuf ref = STRBUF_INIT;
@@ -97,18 +97,43 @@ static int for_each_tag_name(const char **argv, each_tag_name_fn fn,
 	return had_error;
 }
 
-static int delete_tag(const char *name, const char *ref,
-		      const struct object_id *oid, const void *cb_data)
+static int collect_tags(const char *name, const char *ref,
+			const struct object_id *oid, void *cb_data)
 {
-	if (delete_ref(NULL, ref, oid, 0))
-		return 1;
-	printf(_("Deleted tag '%s' (was %s)\n"), name,
-	       find_unique_abbrev(oid, DEFAULT_ABBREV));
+	struct string_list *ref_list = cb_data;
+
+	string_list_append(ref_list, ref);
+	ref_list->items[ref_list->nr - 1].util = oiddup(oid);
 	return 0;
 }
 
+static int delete_tags(const char **argv)
+{
+	int result;
+	struct string_list refs_to_delete = STRING_LIST_INIT_DUP;
+	struct string_list_item *item;
+
+	result = for_each_tag_name(argv, collect_tags, (void *)&refs_to_delete);
+	delete_refs(NULL, &refs_to_delete, REF_NO_DEREF);
+
+	for_each_string_list_item(item, &refs_to_delete) {
+		const char * name = item->string;
+		struct object_id * oid = item->util;
+		if (ref_exists(name))
+			result = 1;
+		else
+			printf(_("Deleted tag '%s' (was %s)\n"),
+				item->string + 10,
+				find_unique_abbrev(oid, DEFAULT_ABBREV));
+
+		free(oid);
+	}
+	string_list_clear(&refs_to_delete, 0);
+	return result;
+}
+
 static int verify_tag(const char *name, const char *ref,
-		      const struct object_id *oid, const void *cb_data)
+		      const struct object_id *oid, void *cb_data)
 {
 	int flags;
 	const struct ref_format *format = cb_data;
@@ -511,7 +536,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 	if (filter.merge_commit)
 		die(_("--merged and --no-merged options are only allowed in list mode"));
 	if (cmdmode == 'd')
-		return for_each_tag_name(argv, delete_tag, NULL);
+		return delete_tags(argv);
 	if (cmdmode == 'v') {
 		if (format.format && verify_ref_format(&format))
 			usage_with_options(git_tag_usage, options);
-- 
2.23.0.rc1.174.g4cc1b04b4c


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] use delete_refs when deleting tags or branches
       [not found] ` <CABURp0rtkmo7MSSCVrdNXT0UzV9XqV_kXOGkC23C+_vMENNJUg@mail.gmail.com>
@ 2021-01-15 18:43   ` Elijah Newren
  2021-01-15 20:06     ` Junio C Hamano
                       ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Elijah Newren @ 2021-01-15 18:43 UTC (permalink / raw)
  To: Phil Hord; +Cc: Git, Martin Ågren, Junio C Hamano

Hi,

On Thu, Jan 14, 2021 at 6:00 PM Phil Hord <phil.hord@gmail.com> wrote:
>
> I noticed this is still only in my local branch.   Can I get an ACK/NAK?

Sorry for missing this when you posted in August.  Thanks for sending
in the update from v1.

For other reviewers: v1 is over here:
https://lore.kernel.org/git/20190808035935.30023-1-phil.hord@gmail.com/,
and has review comments from Martin, me, Peff, and Junio.

> On Tue, Aug 13, 2019 at 7:49 PM Phil Hord <phil.hord@gmail.com> wrote:
>>
>> From: Phil Hord <phil.hord@gmail.com>
>>
>> 'git tag -d' and 'git branch -d' both accept one or more refs to
>> delete, but each deletion is done by calling `delete_ref` on each argv.
>> This is very slow when removing from packed refs as packed-refs is
>> locked and rewritten each time. Use delete_refs instead so all the
>> removals can be done inside a single transaction with a single update.

Awesome, thanks for also fixing up git branch with v2.

>> Since delete_refs performs all the packed-refs delete operations
>> inside a single transaction, if any of the deletes fail then all
>> them will be skipped. In practice, none of them should fail since
>> we verify the hash of each one before calling delete_refs, but some
>> network error or odd permissions problem could have different results
>> after this change.
>>
>> Also, since the file-backed deletions are not performed in the same
>> transaction, those could succeed even when the packed-refs transaction
>> fails.
>>
>> After deleting refs, report the deletion's success only if the ref was
>> actually deleted. For branch deletion, remove the branch config only
>> if the branch ref is actually removed.
>>
>> A manual test deleting 24,000 tags took about 30 minutes using
>> delete_ref.  It takes about 5 seconds using delete_refs.

As I said on v1, it's really nice to have this fixed.  Thanks for doing it.

>>
>> Signed-off-by: Phil Hord <phil.hord@gmail.com>
>> ---
>> This reroll adds the same delete_refs change to 'git branch'. It checks
>> individual refs after the operation to report correctly on each whether
>> it was successfully deleted or not. Maybe this is an unnecessary step,
>> though. This handles the weird case where some file system error
>> prevented us from deleting refs, leaving us with an error from
>> delete_refs but without any idea which refs might have been affected.
>>
>>  builtin/branch.c | 50 +++++++++++++++++++++++++++++-------------------
>>  builtin/tag.c    | 45 +++++++++++++++++++++++++++++++++----------
>>  2 files changed, 65 insertions(+), 30 deletions(-)
>>
>> diff --git a/builtin/branch.c b/builtin/branch.c
>> index 2ef214632f..2273239f41 100644
>> --- a/builtin/branch.c
>> +++ b/builtin/branch.c
>> @@ -202,6 +202,9 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
>>         int remote_branch = 0;
>>         struct strbuf bname = STRBUF_INIT;
>>         unsigned allowed_interpret;
>> +       struct string_list refs_to_delete = STRING_LIST_INIT_DUP;
>> +       struct string_list_item *item;
>> +       int refname_pos = 0;
>>
>>         switch (kinds) {
>>         case FILTER_REFS_REMOTES:
>> @@ -209,12 +212,13 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
>>                 /* For subsequent UI messages */
>>                 remote_branch = 1;
>>                 allowed_interpret = INTERPRET_BRANCH_REMOTE;
>> -
>> +               refname_pos = 13;
>>                 force = 1;
>>                 break;
>>         case FILTER_REFS_BRANCHES:
>>                 fmt = "refs/heads/%s";
>>                 allowed_interpret = INTERPRET_BRANCH_LOCAL;
>> +               refname_pos = 11;
>>                 break;
>>         default:
>>                 die(_("cannot use -a with -d"));
>> @@ -265,30 +269,36 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
>>                         goto next;
>>                 }
>>
>> -               if (delete_ref(NULL, name, is_null_oid(&oid) ? NULL : &oid,
>> -                              REF_NO_DEREF)) {
>> -                       error(remote_branch
>> -                             ? _("Error deleting remote-tracking branch '%s'")
>> -                             : _("Error deleting branch '%s'"),
>> -                             bname.buf);
>> -                       ret = 1;
>> -                       goto next;

The code used to set the return code to 1 if it failed to delete a branch

>> -               }
>> -               if (!quiet) {
>> -                       printf(remote_branch
>> -                              ? _("Deleted remote-tracking branch %s (was %s).\n")
>> -                              : _("Deleted branch %s (was %s).\n"),
>> -                              bname.buf,
>> -                              (flags & REF_ISBROKEN) ? "broken"
>> -                              : (flags & REF_ISSYMREF) ? target
>> -                              : find_unique_abbrev(&oid, DEFAULT_ABBREV));
>> -               }
>> -               delete_branch_config(bname.buf);
>> +               item = string_list_append(&refs_to_delete, name);
>> +               item->util = xstrdup((flags & REF_ISBROKEN) ? "broken"
>> +                                   : (flags & REF_ISSYMREF) ? target
>> +                                   : find_unique_abbrev(&oid, DEFAULT_ABBREV));
>>
>>         next:
>>                 free(target);
>>         }
>>
>> +       delete_refs(NULL, &refs_to_delete, REF_NO_DEREF);
>> +
>> +       for_each_string_list_item(item, &refs_to_delete) {
>> +               char * describe_ref = item->util;
>> +               char * name = item->string;
>> +               if (ref_exists(name))
>> +                       ret = 1;

Now it sets the return code if the branch still exists after trying to
delete.  I thought that was subtly different...but I tried doing a
branch deletion of a non-existent branch since I thought that would be
the only difference -- however, that errors out earlier in the
codepath before even getting to the stage of deleting refs.  So I
think these are effectively the same.

>> +               else {
>> +                       char * refname = name + refname_pos;
>> +                       if (!quiet)
>> +                               printf(remote_branch
>> +                                       ? _("Deleted remote-tracking branch %s (was %s).\n")
>> +                                       : _("Deleted branch %s (was %s).\n"),
>> +                                       name + refname_pos, describe_ref);

Neither remote_branch nor refname_pos are changing throughout this
loop, which I at first thought was in error, but it looks like git
branch only allows you to delete one type or the other -- not a
mixture.  So this is correct.

>> +
>> +                       delete_branch_config(refname);
>> +               }
>> +               free(describe_ref);
>> +       }
>> +       string_list_clear(&refs_to_delete, 0);
>> +
>>         free(name);
>>         strbuf_release(&bname);
>>
>> diff --git a/builtin/tag.c b/builtin/tag.c
>> index e0a4c25382..0d11ffcd04 100644
>> --- a/builtin/tag.c
>> +++ b/builtin/tag.c
>> @@ -72,10 +72,10 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
>>  }
>>
>>  typedef int (*each_tag_name_fn)(const char *name, const char *ref,
>> -                               const struct object_id *oid, const void *cb_data);
>> +                               const struct object_id *oid, void *cb_data);
>>
>>  static int for_each_tag_name(const char **argv, each_tag_name_fn fn,
>> -                            const void *cb_data)
>> +                            void *cb_data)
>>  {
>>         const char **p;
>>         struct strbuf ref = STRBUF_INIT;
>> @@ -97,18 +97,43 @@ static int for_each_tag_name(const char **argv, each_tag_name_fn fn,
>>         return had_error;
>>  }
>>
>> -static int delete_tag(const char *name, const char *ref,
>> -                     const struct object_id *oid, const void *cb_data)
>> +static int collect_tags(const char *name, const char *ref,
>> +                       const struct object_id *oid, void *cb_data)
>>  {
>> -       if (delete_ref(NULL, ref, oid, 0))
>> -               return 1;

This used to return 1 if it failed to delete a ref.

>> -       printf(_("Deleted tag '%s' (was %s)\n"), name,
>> -              find_unique_abbrev(oid, DEFAULT_ABBREV));
>> +       struct string_list *ref_list = cb_data;
>> +
>> +       string_list_append(ref_list, ref);
>> +       ref_list->items[ref_list->nr - 1].util = oiddup(oid);
>>         return 0;

Now it unconditionally returns 0.

>>  }
>>
>> +static int delete_tags(const char **argv)
>> +{
>> +       int result;
>> +       struct string_list refs_to_delete = STRING_LIST_INIT_DUP;
>> +       struct string_list_item *item;
>> +
>> +       result = for_each_tag_name(argv, collect_tags, (void *)&refs_to_delete);
>> +       delete_refs(NULL, &refs_to_delete, REF_NO_DEREF);

You now only look at the result of collecting the tags, and ignore the
result of trying to delete them...

>> +
>> +       for_each_string_list_item(item, &refs_to_delete) {
>> +               const char * name = item->string;
>> +               struct object_id * oid = item->util;
>> +               if (ref_exists(name))
>> +                       result = 1;

...except that you check if the refs still exist afterward and set the
return code based on it.  Like with the branch case, I can't come up
with a case where the difference matters.  I suspect there's a race
condition there somewhere, but once you start going down that road I
think the old code may have had a bunch of races too.  It might be
nice to document with a comment that there's a small race condition
with someone else trying to forcibly re-create the ref at the same
time you are trying to delete, but I don't think it's a big deal.

If you did use the result of delete_refs(), you might have to double
check that the callers (git.c:handle_builtin() -> git.c:run_builtin()
-> builtin/tag.c:cmd_tag() -> builtin/tag.c:delete_tags()) are all
okay with the return code; it looks like handle_builtin() would pass
the return code to exit() and the git-tag manpage doesn't document the
return status, so you've at least got some leeway in terms of what
values are acceptable.  Or you could just normalize the return value
of delete_refs() down to 0 or 1.  But you'd only need to worry about
that if the race condition is something we're worried enough to
tackle.

>> +               else
>> +                       printf(_("Deleted tag '%s' (was %s)\n"),
>> +                               item->string + 10,
>> +                               find_unique_abbrev(oid, DEFAULT_ABBREV));
>> +
>> +               free(oid);
>> +       }
>> +       string_list_clear(&refs_to_delete, 0);
>> +       return result;
>> +}
>> +
>>  static int verify_tag(const char *name, const char *ref,
>> -                     const struct object_id *oid, const void *cb_data)
>> +                     const struct object_id *oid, void *cb_data)
>>  {
>>         int flags;
>>         const struct ref_format *format = cb_data;
>> @@ -511,7 +536,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
>>         if (filter.merge_commit)
>>                 die(_("--merged and --no-merged options are only allowed in list mode"));
>>         if (cmdmode == 'd')
>> -               return for_each_tag_name(argv, delete_tag, NULL);
>> +               return delete_tags(argv);
>>         if (cmdmode == 'v') {
>>                 if (format.format && verify_ref_format(&format))
>>                         usage_with_options(git_tag_usage, options);
>> --
>> 2.23.0.rc1.174.g4cc1b04b4c

Overall, I like the patch.  Peff commented on v1 that the basic idea
(use the part of the refs API that batches operations) is the right
thing to do.  I'm not that familiar with refs-touching code, but your
patch makes sense to me.  I think I spotted a minor issue (you ignore
the return status of delete_refs(), then later check the existence of
the refs afterwards to determine success, which I believe is a minor
and unlikely race condition), but I'm not sure it's worth fixing;
perhaps just mark it with #leftoverbits and move on -- the faster
branch and tag deletion is a very nice improvement.

I notice Martin said on v1 that there was a testcase that had problems
with your patch; I tested v2 and it looks like you fixed any such
issues.  I think you also addressed the feedback from Junio, though
his comments about the return code and the minor race condition I
noticed around it might mean it'd be good to get his comments.

Anyway,
Acked-by: Elijah Newren <newren@gmail.com>

I would say Reviewed-by, but I'd like to get Junio's comments on the
return code and minor race.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] use delete_refs when deleting tags or branches
  2021-01-15 18:43   ` Elijah Newren
@ 2021-01-15 20:06     ` Junio C Hamano
  2021-01-15 22:04     ` Junio C Hamano
  2021-01-16  2:27     ` Phil Hord
  2 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2021-01-15 20:06 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Phil Hord, Git, Martin Ågren

Elijah Newren <newren@gmail.com> writes:

> Overall, I like the patch.  Peff commented on v1 that the basic idea
> (use the part of the refs API that batches operations) is the right
> thing to do.

I concur; I agree with the general direction.

Where do we reject branch deletion based on the merged-ness?  When
we ask to delete three branches and one is not merged yet and cannot
be removed (let's say we are not "--force"ing) but the other two are
OK to remove, we do want the other two branches to go.  I didn't
check what the patch does with respect to this point, but as long as
the updated code does not make it "all or none", it would be great.

> I would say Reviewed-by, but I'd like to get Junio's comments on the
> return code and minor race.

It would have to wait a bit more.  Thanks for a review.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] use delete_refs when deleting tags or branches
  2021-01-15 18:43   ` Elijah Newren
  2021-01-15 20:06     ` Junio C Hamano
@ 2021-01-15 22:04     ` Junio C Hamano
  2021-01-16  2:27     ` Phil Hord
  2 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2021-01-15 22:04 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Phil Hord, Git, Martin Ågren

Elijah Newren <newren@gmail.com> writes:

>>>                 /* For subsequent UI messages */
>>>                 remote_branch = 1;
>>>                 allowed_interpret = INTERPRET_BRANCH_REMOTE;
>>> -
>>> +               refname_pos = 13;
>>>                 force = 1;
>>>                 break;
>>>         case FILTER_REFS_BRANCHES:
>>>                 fmt = "refs/heads/%s";
>>>                 allowed_interpret = INTERPRET_BRANCH_LOCAL;
>>> +               refname_pos = 11;
>>>                 break;

Hmph, I can tell 11 is strlen("refs/heads/") and 13 is probably for
"refs/remotes/", but can't we do better?

		switch (kinds) {
		case ...:
			fmt = "refs/remotes/%s";
			...
		}
		refname_skip_bytes = strcspn(fmt, "%");

or something like that?

I think refname_pos is a misleading variable name.  The refname is
the whole thing, so it would always be 0.  What we are looking for
is the position at which we find human readable "branch name"
begins, or, the number of bytes to skip to get there from the
beginning (the latter is where 'refname_skip_bytes' come from).

I am also OK with 'branch_name_pos'.

>>> +       delete_refs(NULL, &refs_to_delete, REF_NO_DEREF);

Earlier in a separate message, I wondered where the branches that
are not merged are rejected; it seems that the code reuses the
original code that jumps to the end of the loop and refrains from
placing such a branch to the refs_to_delete list, so from the point
of view of "correctness", the patch would not make it "all or none",
which is good.

The order in which the failures are reported will change, though.

In the original code, we got errors in the order we iterate through
the branches.  With the patch, we'll see all the "not merged and can
not delete" first from A to Z, and then "we tried to delete but this
exists after that" in a separate batch from A to Z (we won't report
for the same ref twice).  I think it probably is OK and not worth
refactoring the error reporting mechanism of check_branch_commit()
just in order to restore the order of the error messages, but I am
pointing it out just for completeness.

>>> +       for_each_string_list_item(item, &refs_to_delete) {
>>> +               char * describe_ref = item->util;
>>> +               char * name = item->string;

Style.  The asterisk sticks to the identifier, i.e.

		char *name = item->string;

>>> +               if (ref_exists(name))
>>> +                       ret = 1;
>
> Now it sets the return code if the branch still exists after trying to
> delete.  I thought that was subtly different...but I tried doing a
> branch deletion of a non-existent branch since I thought that would be
> the only difference -- however, that errors out earlier in the
> codepath before even getting to the stage of deleting refs.  So I
> think these are effectively the same.

It checks if the branch exists after it tried to delete.  A branch
may exist because delete_refs() failed to delete, or it may exist
because somebody created it anew after delete_refs() returned.  The
original reported if a deletion was successful or not by checking
the status returned by delete_ref().

We may want to introduce a variant of delete_refs() that reports
which ones it failed to delete to the caller if we really want to
report the errors properly.  I dunno.

>>> diff --git a/builtin/tag.c b/builtin/tag.c
>>> ...
>>> +static int collect_tags(const char *name, const char *ref,
>>> +                       const struct object_id *oid, void *cb_data)
>>>  {
>>> -       if (delete_ref(NULL, ref, oid, 0))
>>> -               return 1;
>
> This used to return 1 if it failed to delete a ref.
>
>>> -       printf(_("Deleted tag '%s' (was %s)\n"), name,
>>> -              find_unique_abbrev(oid, DEFAULT_ABBREV));
>>> +       struct string_list *ref_list = cb_data;
>>> +
>>> +       string_list_append(ref_list, ref);
>>> +       ref_list->items[ref_list->nr - 1].util = oiddup(oid);
>>>         return 0;
>
> Now it unconditionally returns 0.

But this is just "collect what to delete" phase, and the return
value is to be consumed by for_each_tag_name() as the callback, no?

I do not see a problem here.

>>>  }
>>>
>>> +static int delete_tags(const char **argv)
>>> +{
>>> +       int result;
>>> +       struct string_list refs_to_delete = STRING_LIST_INIT_DUP;
>>> +       struct string_list_item *item;
>>> +
>>> +       result = for_each_tag_name(argv, collect_tags, (void *)&refs_to_delete);
>>> +       delete_refs(NULL, &refs_to_delete, REF_NO_DEREF);
>
> You now only look at the result of collecting the tags, and ignore the
> result of trying to delete them...

That is a bigger problem, and is shared by the branch side, too.

I do think delete_refs() API is misdesigned in that it is not
equipped to report what happened to individual refs.  It only
says "we saw an error" when there is even a single ref that
encounters a trouble.

If we fix that, I suspect that this patch will become much simpler.

Another thing that I think is a misdesign of delete_refs() actually
can work for us here.  The list of refnames to be deleted is passed
as a "struct string_list" (which is not the most natural way to pass
bunch of strings, which is why I consider it a misdesign), and for
each ref, there is a place for the delete_refs() and its internal
helpers to store some piece of info, i.e. string_list_item.util.

Not even compile tested, but we could stuff NULL/non-NULL to the
.util member so that we can report per-ref status from the API,
perhaps like this:

diff --git i/refs/files-backend.c w/refs/files-backend.c
index 04e85e7002..bdc17f55ef 100644
--- i/refs/files-backend.c
+++ w/refs/files-backend.c
@@ -1224,8 +1224,12 @@ static int files_delete_refs(struct ref_store *ref_store, const char *msg,
 	for (i = 0; i < refnames->nr; i++) {
 		const char *refname = refnames->items[i].string;
 
-		if (refs_delete_ref(&refs->base, msg, refname, NULL, flags))
-			result |= error(_("could not remove reference %s"), refname);
+		if (!refs_delete_ref(&refs->base, msg, refname, NULL, flags)) {
+			refnames->items[i].util = NULL;
+			continue;
+		}
+		result |= error(_("could not remove reference %s"), refname);
+		refnames->items[i].util = "error";
 	}
 
 	strbuf_release(&err);
diff --git i/refs/packed-backend.c w/refs/packed-backend.c
index b912f2505f..a5b5071511 100644
--- i/refs/packed-backend.c
+++ w/refs/packed-backend.c
@@ -1541,7 +1541,10 @@ static int packed_delete_refs(struct ref_store *ref_store, const char *msg,
 					   flags, msg, &err)) {
 			warning(_("could not delete reference %s: %s"),
 				item->string, err.buf);
+			item->util = "error";
 			strbuf_reset(&err);
+		} else {
+			item->util = NULL;
 		}
 	}
 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] use delete_refs when deleting tags or branches
  2021-01-15 18:43   ` Elijah Newren
  2021-01-15 20:06     ` Junio C Hamano
  2021-01-15 22:04     ` Junio C Hamano
@ 2021-01-16  2:27     ` Phil Hord
  2 siblings, 0 replies; 5+ messages in thread
From: Phil Hord @ 2021-01-16  2:27 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Git, Martin Ågren, Junio C Hamano

On Fri, Jan 15, 2021 at 10:43 AM Elijah Newren <newren@gmail.com> wrote:
>
> Hi,
>
> On Thu, Jan 14, 2021 at 6:00 PM Phil Hord <phil.hord@gmail.com> wrote:
> >
> > I noticed this is still only in my local branch.   Can I get an ACK/NAK?
>
> Sorry for missing this when you posted in August.  Thanks for sending
> in the update from v1.
>
> For other reviewers: v1 is over here:
> https://lore.kernel.org/git/20190808035935.30023-1-phil.hord@gmail.com/,
> and has review comments from Martin, me, Peff, and Junio.
>
> > On Tue, Aug 13, 2019 at 7:49 PM Phil Hord <phil.hord@gmail.com> wrote:
> >>
> >> From: Phil Hord <phil.hord@gmail.com>
> >>
> >> 'git tag -d' and 'git branch -d' both accept one or more refs to
> >> delete, but each deletion is done by calling `delete_ref` on each argv.
> >> This is very slow when removing from packed refs as packed-refs is
> >> locked and rewritten each time. Use delete_refs instead so all the
> >> removals can be done inside a single transaction with a single update.
>
> Awesome, thanks for also fixing up git branch with v2.
>
> >> Since delete_refs performs all the packed-refs delete operations
> >> inside a single transaction, if any of the deletes fail then all
> >> them will be skipped. In practice, none of them should fail since
> >> we verify the hash of each one before calling delete_refs, but some
> >> network error or odd permissions problem could have different results
> >> after this change.
> >>
> >> Also, since the file-backed deletions are not performed in the same
> >> transaction, those could succeed even when the packed-refs transaction
> >> fails.
> >>
> >> After deleting refs, report the deletion's success only if the ref was
> >> actually deleted. For branch deletion, remove the branch config only
> >> if the branch ref is actually removed.
> >>
> >> A manual test deleting 24,000 tags took about 30 minutes using
> >> delete_ref.  It takes about 5 seconds using delete_refs.
>
> As I said on v1, it's really nice to have this fixed.  Thanks for doing it.
>
> >>
> >> Signed-off-by: Phil Hord <phil.hord@gmail.com>
> >> ---
> >> This reroll adds the same delete_refs change to 'git branch'. It checks
> >> individual refs after the operation to report correctly on each whether
> >> it was successfully deleted or not. Maybe this is an unnecessary step,
> >> though. This handles the weird case where some file system error
> >> prevented us from deleting refs, leaving us with an error from
> >> delete_refs but without any idea which refs might have been affected.
> >>
> >>  builtin/branch.c | 50 +++++++++++++++++++++++++++++-------------------
> >>  builtin/tag.c    | 45 +++++++++++++++++++++++++++++++++----------
> >>  2 files changed, 65 insertions(+), 30 deletions(-)
> >>
> >> diff --git a/builtin/branch.c b/builtin/branch.c
> >> index 2ef214632f..2273239f41 100644
> >> --- a/builtin/branch.c
> >> +++ b/builtin/branch.c
> >> @@ -202,6 +202,9 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
> >>         int remote_branch = 0;
> >>         struct strbuf bname = STRBUF_INIT;
> >>         unsigned allowed_interpret;
> >> +       struct string_list refs_to_delete = STRING_LIST_INIT_DUP;
> >> +       struct string_list_item *item;
> >> +       int refname_pos = 0;
> >>
> >>         switch (kinds) {
> >>         case FILTER_REFS_REMOTES:
> >> @@ -209,12 +212,13 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
> >>                 /* For subsequent UI messages */
> >>                 remote_branch = 1;
> >>                 allowed_interpret = INTERPRET_BRANCH_REMOTE;
> >> -
> >> +               refname_pos = 13;
> >>                 force = 1;
> >>                 break;
> >>         case FILTER_REFS_BRANCHES:
> >>                 fmt = "refs/heads/%s";
> >>                 allowed_interpret = INTERPRET_BRANCH_LOCAL;
> >> +               refname_pos = 11;
> >>                 break;
> >>         default:
> >>                 die(_("cannot use -a with -d"));
> >> @@ -265,30 +269,36 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
> >>                         goto next;
> >>                 }
> >>
> >> -               if (delete_ref(NULL, name, is_null_oid(&oid) ? NULL : &oid,
> >> -                              REF_NO_DEREF)) {
> >> -                       error(remote_branch
> >> -                             ? _("Error deleting remote-tracking branch '%s'")
> >> -                             : _("Error deleting branch '%s'"),
> >> -                             bname.buf);
> >> -                       ret = 1;
> >> -                       goto next;
>
> The code used to set the return code to 1 if it failed to delete a branch
>
> >> -               }
> >> -               if (!quiet) {
> >> -                       printf(remote_branch
> >> -                              ? _("Deleted remote-tracking branch %s (was %s).\n")
> >> -                              : _("Deleted branch %s (was %s).\n"),
> >> -                              bname.buf,
> >> -                              (flags & REF_ISBROKEN) ? "broken"
> >> -                              : (flags & REF_ISSYMREF) ? target
> >> -                              : find_unique_abbrev(&oid, DEFAULT_ABBREV));
> >> -               }
> >> -               delete_branch_config(bname.buf);
> >> +               item = string_list_append(&refs_to_delete, name);
> >> +               item->util = xstrdup((flags & REF_ISBROKEN) ? "broken"
> >> +                                   : (flags & REF_ISSYMREF) ? target
> >> +                                   : find_unique_abbrev(&oid, DEFAULT_ABBREV));
> >>
> >>         next:
> >>                 free(target);
> >>         }
> >>
> >> +       delete_refs(NULL, &refs_to_delete, REF_NO_DEREF);
> >> +
> >> +       for_each_string_list_item(item, &refs_to_delete) {
> >> +               char * describe_ref = item->util;
> >> +               char * name = item->string;
> >> +               if (ref_exists(name))
> >> +                       ret = 1;
>
> Now it sets the return code if the branch still exists after trying to
> delete.  I thought that was subtly different...but I tried doing a
> branch deletion of a non-existent branch since I thought that would be
> the only difference -- however, that errors out earlier in the
> codepath before even getting to the stage of deleting refs.  So I
> think these are effectively the same.
>
> >> +               else {
> >> +                       char * refname = name + refname_pos;
> >> +                       if (!quiet)
> >> +                               printf(remote_branch
> >> +                                       ? _("Deleted remote-tracking branch %s (was %s).\n")
> >> +                                       : _("Deleted branch %s (was %s).\n"),
> >> +                                       name + refname_pos, describe_ref);
>
> Neither remote_branch nor refname_pos are changing throughout this
> loop, which I at first thought was in error, but it looks like git
> branch only allows you to delete one type or the other -- not a
> mixture.  So this is correct.
>
> >> +
> >> +                       delete_branch_config(refname);
> >> +               }
> >> +               free(describe_ref);
> >> +       }
> >> +       string_list_clear(&refs_to_delete, 0);
> >> +
> >>         free(name);
> >>         strbuf_release(&bname);
> >>
> >> diff --git a/builtin/tag.c b/builtin/tag.c
> >> index e0a4c25382..0d11ffcd04 100644
> >> --- a/builtin/tag.c
> >> +++ b/builtin/tag.c
> >> @@ -72,10 +72,10 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
> >>  }
> >>
> >>  typedef int (*each_tag_name_fn)(const char *name, const char *ref,
> >> -                               const struct object_id *oid, const void *cb_data);
> >> +                               const struct object_id *oid, void *cb_data);
> >>
> >>  static int for_each_tag_name(const char **argv, each_tag_name_fn fn,
> >> -                            const void *cb_data)
> >> +                            void *cb_data)
> >>  {
> >>         const char **p;
> >>         struct strbuf ref = STRBUF_INIT;
> >> @@ -97,18 +97,43 @@ static int for_each_tag_name(const char **argv, each_tag_name_fn fn,
> >>         return had_error;
> >>  }
> >>
> >> -static int delete_tag(const char *name, const char *ref,
> >> -                     const struct object_id *oid, const void *cb_data)
> >> +static int collect_tags(const char *name, const char *ref,
> >> +                       const struct object_id *oid, void *cb_data)
> >>  {
> >> -       if (delete_ref(NULL, ref, oid, 0))
> >> -               return 1;
>
> This used to return 1 if it failed to delete a ref.
>
> >> -       printf(_("Deleted tag '%s' (was %s)\n"), name,
> >> -              find_unique_abbrev(oid, DEFAULT_ABBREV));
> >> +       struct string_list *ref_list = cb_data;
> >> +
> >> +       string_list_append(ref_list, ref);
> >> +       ref_list->items[ref_list->nr - 1].util = oiddup(oid);
> >>         return 0;
>
> Now it unconditionally returns 0.
>
> >>  }
> >>
> >> +static int delete_tags(const char **argv)
> >> +{
> >> +       int result;
> >> +       struct string_list refs_to_delete = STRING_LIST_INIT_DUP;
> >> +       struct string_list_item *item;
> >> +
> >> +       result = for_each_tag_name(argv, collect_tags, (void *)&refs_to_delete);
> >> +       delete_refs(NULL, &refs_to_delete, REF_NO_DEREF);
>
> You now only look at the result of collecting the tags, and ignore the
> result of trying to delete them...
>
> >> +
> >> +       for_each_string_list_item(item, &refs_to_delete) {
> >> +               const char * name = item->string;
> >> +               struct object_id * oid = item->util;
> >> +               if (ref_exists(name))
> >> +                       result = 1;
>
> ...except that you check if the refs still exist afterward and set the
> return code based on it.  Like with the branch case, I can't come up
> with a case where the difference matters.  I suspect there's a race
> condition there somewhere, but once you start going down that road I
> think the old code may have had a bunch of races too.  It might be
> nice to document with a comment that there's a small race condition
> with someone else trying to forcibly re-create the ref at the same
> time you are trying to delete, but I don't think it's a big deal.
>
> If you did use the result of delete_refs(), you might have to double
> check that the callers (git.c:handle_builtin() -> git.c:run_builtin()
> -> builtin/tag.c:cmd_tag() -> builtin/tag.c:delete_tags()) are all
> okay with the return code; it looks like handle_builtin() would pass
> the return code to exit() and the git-tag manpage doesn't document the
> return status, so you've at least got some leeway in terms of what
> values are acceptable.  Or you could just normalize the return value
> of delete_refs() down to 0 or 1.  But you'd only need to worry about
> that if the race condition is something we're worried enough to
> tackle.

Interesting. I was worried about imposing a requirement on delete_refs
that any non-zero return must mean that something was not deleted
which should have been. Maybe that's not such a worry, though, and it
would be acceptable to return a 1 even if all the refs were deleted
even though some error occurred further down the line.

I tried normalizing the value and then also verifying each ref was
removed, but that seemed wrong.  Maybe it's ok to just normalize it
and not react to still-existing refs.

> >> +               else
> >> +                       printf(_("Deleted tag '%s' (was %s)\n"),
> >> +                               item->string + 10,
> >> +                               find_unique_abbrev(oid, DEFAULT_ABBREV));
> >> +
> >> +               free(oid);
> >> +       }
> >> +       string_list_clear(&refs_to_delete, 0);
> >> +       return result;
> >> +}
> >> +
> >>  static int verify_tag(const char *name, const char *ref,
> >> -                     const struct object_id *oid, const void *cb_data)
> >> +                     const struct object_id *oid, void *cb_data)
> >>  {
> >>         int flags;
> >>         const struct ref_format *format = cb_data;
> >> @@ -511,7 +536,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
> >>         if (filter.merge_commit)
> >>                 die(_("--merged and --no-merged options are only allowed in list mode"));
> >>         if (cmdmode == 'd')
> >> -               return for_each_tag_name(argv, delete_tag, NULL);
> >> +               return delete_tags(argv);
> >>         if (cmdmode == 'v') {
> >>                 if (format.format && verify_ref_format(&format))
> >>                         usage_with_options(git_tag_usage, options);
> >> --
> >> 2.23.0.rc1.174.g4cc1b04b4c
>
> Overall, I like the patch.  Peff commented on v1 that the basic idea
> (use the part of the refs API that batches operations) is the right
> thing to do.  I'm not that familiar with refs-touching code, but your
> patch makes sense to me.  I think I spotted a minor issue (you ignore
> the return status of delete_refs(), then later check the existence of
> the refs afterwards to determine success, which I believe is a minor
> and unlikely race condition), but I'm not sure it's worth fixing;
> perhaps just mark it with #leftoverbits and move on -- the faster
> branch and tag deletion is a very nice improvement.
>
> I notice Martin said on v1 that there was a testcase that had problems
> with your patch; I tested v2 and it looks like you fixed any such
> issues.  I think you also addressed the feedback from Junio, though
> his comments about the return code and the minor race condition I
> noticed around it might mean it'd be good to get his comments.
>
> Anyway,
> Acked-by: Elijah Newren <newren@gmail.com>
>
> I would say Reviewed-by, but I'd like to get Junio's comments on the
> return code and minor race.

Thanks for the detailed review and thoughts.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-01-16  2:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-14  2:49 [PATCH v2] use delete_refs when deleting tags or branches Phil Hord
     [not found] ` <CABURp0rtkmo7MSSCVrdNXT0UzV9XqV_kXOGkC23C+_vMENNJUg@mail.gmail.com>
2021-01-15 18:43   ` Elijah Newren
2021-01-15 20:06     ` Junio C Hamano
2021-01-15 22:04     ` Junio C Hamano
2021-01-16  2:27     ` Phil Hord

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).