git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 1/1] delete multiple tags in a single transaction
@ 2019-08-08  3:59 Phil Hord
  2019-08-08 12:47 ` Martin Ågren
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Phil Hord @ 2019-08-08  3:59 UTC (permalink / raw)
  To: git; +Cc: Phil Hord

From: Phil Hord <phil.hord@gmail.com>

'git tag -d' accepts one or more tag refs to delete, but each deletion
is done by calling `delete_ref` on each argv. This is painfully slow
when removing from packed refs. Use delete_refs instead so all the
removals can be done inside a single transaction with a single write.

I have a repo with 24,000 tags, most of which are not useful to any
developers. Having this many refs slows down many operations that
would otherwise be very fast. Removing these tags when they've been
accidentally fetched again takes about 30 minutes using delete_ref.

    git tag -l feature/* | xargs git tag -d

Removing the same tags using delete_refs takes less than 5 seconds.

Signed-off-by: Phil Hord <phil.hord@gmail.com>
---
 builtin/tag.c | 52 +++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 42 insertions(+), 10 deletions(-)

diff --git a/builtin/tag.c b/builtin/tag.c
index e0a4c25382..f652af83e7 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -72,10 +72,10 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
 }
 
 typedef int (*each_tag_name_fn)(const char *name, const char *ref,
-				const struct object_id *oid, const void *cb_data);
+				const struct object_id *oid, void *cb_data);
 
 static int for_each_tag_name(const char **argv, each_tag_name_fn fn,
-			     const void *cb_data)
+			     void *cb_data)
 {
 	const char **p;
 	struct strbuf ref = STRBUF_INIT;
@@ -97,18 +97,50 @@ static int for_each_tag_name(const char **argv, each_tag_name_fn fn,
 	return had_error;
 }
 
-static int delete_tag(const char *name, const char *ref,
-		      const struct object_id *oid, const void *cb_data)
+struct tag_args {
+	char *oid_abbrev;
+	char *refname;
+};
+
+static int make_string_list(const char *name, const char *ref,
+			    const struct object_id *oid, void *cb_data)
 {
-	if (delete_ref(NULL, ref, oid, 0))
-		return 1;
-	printf(_("Deleted tag '%s' (was %s)\n"), name,
-	       find_unique_abbrev(oid, DEFAULT_ABBREV));
+	struct string_list *ref_list = cb_data;
+	struct tag_args *info = xmalloc(sizeof(struct tag_args));
+
+	string_list_append(ref_list, ref);
+
+	info->oid_abbrev = xstrdup(find_unique_abbrev(oid, DEFAULT_ABBREV));
+	info->refname = xstrdup(name);
+	ref_list->items[ref_list->nr - 1].util = info;
 	return 0;
 }
 
+static int delete_tags(const char **argv)
+{
+	int result;
+	struct string_list ref_list = STRING_LIST_INIT_DUP;
+	struct string_list_item *ref_list_item;
+
+	result = for_each_tag_name(argv, make_string_list, (void *) &ref_list);
+	if (!result)
+		result = delete_refs(NULL, &ref_list, REF_NO_DEREF);
+
+	for_each_string_list_item(ref_list_item, &ref_list) {
+		struct tag_args * info = ref_list_item->util;
+		if (!result)
+			printf(_("Deleted tag '%s' (was %s)\n"), info->refname,
+				info->oid_abbrev);
+		free(info->oid_abbrev);
+		free(info->refname);
+		free(info);
+	}
+	string_list_clear(&ref_list, 0);
+	return result;
+}
+
 static int verify_tag(const char *name, const char *ref,
-		      const struct object_id *oid, const void *cb_data)
+		      const struct object_id *oid, void *cb_data)
 {
 	int flags;
 	const struct ref_format *format = cb_data;
@@ -511,7 +543,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 	if (filter.merge_commit)
 		die(_("--merged and --no-merged options are only allowed in list mode"));
 	if (cmdmode == 'd')
-		return for_each_tag_name(argv, delete_tag, NULL);
+		return delete_tags(argv);
 	if (cmdmode == 'v') {
 		if (format.format && verify_ref_format(&format))
 			usage_with_options(git_tag_usage, options);
-- 
2.23.0.rc1.174.g4cc1b04b4c.dirty


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] delete multiple tags in a single transaction
  2019-08-08  3:59 [PATCH 1/1] delete multiple tags in a single transaction Phil Hord
@ 2019-08-08 12:47 ` Martin Ågren
  2019-08-08 12:53   ` [PATCH] t7004: check existence of correct tag Martin Ågren
  2019-08-08 18:15 ` [PATCH 1/1] delete multiple tags in a single transaction Elijah Newren
  2019-08-08 19:39 ` Junio C Hamano
  2 siblings, 1 reply; 10+ messages in thread
From: Martin Ågren @ 2019-08-08 12:47 UTC (permalink / raw)
  To: Phil Hord; +Cc: Git Mailing List

On Thu, 8 Aug 2019 at 06:09, Phil Hord <phil.hord@gmail.com> wrote:
> I have a repo with 24,000 tags, most of which are not useful to any
> developers. Having this many refs slows down many operations that
> would otherwise be very fast. Removing these tags when they've been
> accidentally fetched again takes about 30 minutes using delete_ref.
>
>     git tag -l feature/* | xargs git tag -d
>
> Removing the same tags using delete_refs takes less than 5 seconds.

This looks worthwhile pursuing...

> -static int delete_tag(const char *name, const char *ref,
> -                     const struct object_id *oid, const void *cb_data)
> +struct tag_args {
> +       char *oid_abbrev;
> +       char *refname;
> +};
> +
> +static int make_string_list(const char *name, const char *ref,
> +                           const struct object_id *oid, void *cb_data)
>  {
> -       if (delete_ref(NULL, ref, oid, 0))
> -               return 1;

This provides `oid` for verifying that the tag actually points at that
particular oid before deleting. As far as I can tell, `oid` is no longer
used like that in the post-image. I'm not sure it matters, since we just
looked it up, but that might be worth mentioning, perhaps.

> -       printf(_("Deleted tag '%s' (was %s)\n"), name,
> -              find_unique_abbrev(oid, DEFAULT_ABBREV));
> +       struct string_list *ref_list = cb_data;
> +       struct tag_args *info = xmalloc(sizeof(struct tag_args));
> +
> +       string_list_append(ref_list, ref);
> +
> +       info->oid_abbrev = xstrdup(find_unique_abbrev(oid, DEFAULT_ABBREV));
> +       info->refname = xstrdup(name);
> +       ref_list->items[ref_list->nr - 1].util = info;
>         return 0;
>  }
>
> +static int delete_tags(const char **argv)
> +{
> +       int result;
> +       struct string_list ref_list = STRING_LIST_INIT_DUP;
> +       struct string_list_item *ref_list_item;
> +
> +       result = for_each_tag_name(argv, make_string_list, (void *) &ref_list);

If any tag is non-existing (or some other error happens here), we don't
continue to the actual deleting. That breaks t7004 which has a test for
removing an existing and a non-existing tag -- it wants the existing one
to be removed and the non-existing one not to interfere.

> +       if (!result)
> +               result = delete_refs(NULL, &ref_list, REF_NO_DEREF);

So this should perhaps be something more like an unconditional

        result |= delete_refs(...);

That makes the test suite happy, but perhaps only short-term ... See
below...

> +       for_each_string_list_item(ref_list_item, &ref_list) {
> +               struct tag_args * info = ref_list_item->util;
> +               if (!result)
> +                       printf(_("Deleted tag '%s' (was %s)\n"), info->refname,
> +                               info->oid_abbrev);

Change this conditional here, too, methinks. You'd need to separate
errors from looking up tags from errors about deleting refs, so having a
single "result" is probably not sufficient.

Probably worth inspecting the output of that `git tag -d` a bit in
t7004, to make sure we just claim to delete one tag, and have errors.

Your patch reshuffles the error and success messages (for certain
usages). I think that's ok, but might be worth mentioning.

I'm not too familiar with the refs API, so take this with a grain of
salt...

> +               free(info->oid_abbrev);
> +               free(info->refname);
> +               free(info);
> +       }
> +       string_list_clear(&ref_list, 0);
> +       return result;
> +}

Martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] t7004: check existence of correct tag
  2019-08-08 12:47 ` Martin Ågren
@ 2019-08-08 12:53   ` Martin Ågren
  0 siblings, 0 replies; 10+ messages in thread
From: Martin Ågren @ 2019-08-08 12:53 UTC (permalink / raw)
  To: Phil Hord; +Cc: git

We try to delete the non-existing tag "anothertag", but for the
verifications, we check that the tag "myhead" doesn't exist. "myhead"
isn't used in this test except for this checking. Comparing to the test
two tests earlier, it looks like a copy-paste mistake.

Perhaps it's overkill to check that `git tag -d` didn't decide to
*create* a tag. But since we're trying to be this careful, let's
actually check the correct tag. While we're doing this, let's use a more
descriptive tag name instead -- "nonexistingtag" should be obvious.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---

 > Probably worth inspecting the output of that `git tag -d` a bit in
 > t7004, to make sure we just claim to delete one tag, and have errors.

 Here's something I noticed while looking into this test. If you do
 expand on this test, feel free to pick this up, either as a preparatory
 patch or squash it into your "expand test" patch (if you do one). If
 you don't pick this up at all, I'll pursue it separately, later.

 Martin

 t/t7004-tag.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh
index 80eb13d94e..e4cf605907 100755
--- a/t/t7004-tag.sh
+++ b/t/t7004-tag.sh
@@ -227,10 +227,10 @@ test_expect_success \
 test_expect_success \
 	'trying to delete two tags, existing and not, should fail in the 2nd' '
 	tag_exists mytag &&
-	! tag_exists myhead &&
-	test_must_fail git tag -d mytag anothertag &&
+	! tag_exists nonexistingtag &&
+	test_must_fail git tag -d mytag nonexistingtag &&
 	! tag_exists mytag &&
-	! tag_exists myhead
+	! tag_exists nonexistingtag
 '
 
 test_expect_success 'trying to delete an already deleted tag should fail' \
-- 
2.23.0.rc0.30.g51cf315870


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] delete multiple tags in a single transaction
  2019-08-08  3:59 [PATCH 1/1] delete multiple tags in a single transaction Phil Hord
  2019-08-08 12:47 ` Martin Ågren
@ 2019-08-08 18:15 ` Elijah Newren
  2019-08-08 23:43   ` Phil Hord
  2019-08-08 19:39 ` Junio C Hamano
  2 siblings, 1 reply; 10+ messages in thread
From: Elijah Newren @ 2019-08-08 18:15 UTC (permalink / raw)
  To: Phil Hord; +Cc: Git Mailing List

On Wed, Aug 7, 2019 at 9:11 PM Phil Hord <phil.hord@gmail.com> wrote:
>
> From: Phil Hord <phil.hord@gmail.com>
>
> 'git tag -d' accepts one or more tag refs to delete, but each deletion
> is done by calling `delete_ref` on each argv. This is painfully slow
> when removing from packed refs. Use delete_refs instead so all the
> removals can be done inside a single transaction with a single write.

Nice, thanks for working on this.

> I have a repo with 24,000 tags, most of which are not useful to any
> developers. Having this many refs slows down many operations that
> would otherwise be very fast. Removing these tags when they've been
> accidentally fetched again takes about 30 minutes using delete_ref.

I also get really slow times on a repo with ~20,000 tags (though order
~3 minutes rather than ~30, probably due to having an SSD on this
machine) -- but ONLY IF the refs are packed first (git pack-refs
--all).  If the refs are loose, it's relatively quick to delete a
dozen thousand or so tags (order of a few seconds).  It might be worth
mentioning in the commit message that this only makes a significant
difference in the case where the refs are packed.

>     git tag -l feature/* | xargs git tag -d
>
> Removing the same tags using delete_refs takes less than 5 seconds.

It appears this same bug also affects `git branch -d` when deleting
lots of branches (or remote tracking branches) and they are all
packed; could you apply the same fix there?

In constrast, it appears that `git update-ref --stdin` is fast
regardless of whether the refs are packed, e.g.
   git tag -l feature/* | sed -e 's%^%delete refs/tags/%' | git
update-ref --stdin
finishes quickly (order of a few seconds).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] delete multiple tags in a single transaction
  2019-08-08  3:59 [PATCH 1/1] delete multiple tags in a single transaction Phil Hord
  2019-08-08 12:47 ` Martin Ågren
  2019-08-08 18:15 ` [PATCH 1/1] delete multiple tags in a single transaction Elijah Newren
@ 2019-08-08 19:39 ` Junio C Hamano
  2019-08-08 23:58   ` Phil Hord
  2 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2019-08-08 19:39 UTC (permalink / raw)
  To: Phil Hord; +Cc: git

Phil Hord <phil.hord@gmail.com> writes:

> From: Phil Hord <phil.hord@gmail.com>
>
> 'git tag -d' accepts one or more tag refs to delete, but each deletion
> is done by calling `delete_ref` on each argv. This is painfully slow
> when removing from packed refs. Use delete_refs instead so all the
> removals can be done inside a single transaction with a single write.
>
> I have a repo with 24,000 tags, most of which are not useful to any
> developers. Having this many refs slows down many operations that
> would otherwise be very fast. Removing these tags when they've been
> accidentally fetched again takes about 30 minutes using delete_ref.
>
>     git tag -l feature/* | xargs git tag -d
>
> Removing the same tags using delete_refs takes less than 5 seconds.

Makes sense.  As mentioned elsewhere in the thread already,
a batched update-ref would open the packed-refs ony once because
everything is done in a single transaction, so presumably a pipeline
like this

	git tag -l feature/* | 
	sed -e 's|^|delete refs/tags/|' |
	git update-ref --stdin

may work well, and "git tag -d" that gets these refs on the command
line should be capable of doing the same.

> -static int delete_tag(const char *name, const char *ref,
> -		      const struct object_id *oid, const void *cb_data)
> +struct tag_args {
> +	char *oid_abbrev;
> +	char *refname;
> +};
> +
> +static int make_string_list(const char *name, const char *ref,
> +			    const struct object_id *oid, void *cb_data)

Please think about a few more minutes before naming a function like
this, and make it a habit for your future patches.

We can see that the callback is used to insert more strings into a
string list, but the type (i.e. string_list) used to represent the
set is not all that important.  What is more important is why you
are building that set for, and saying what is in the set (as opposed
to saying that the container happens to be a string_list) would be a
good first step.

I presume that you are enumerating the tags to be deleted, together
with the data necessary for you to report the deletion of the tags?

>  {
> -	if (delete_ref(NULL, ref, oid, 0))
> -		return 1;
> -	printf(_("Deleted tag '%s' (was %s)\n"), name,
> -	       find_unique_abbrev(oid, DEFAULT_ABBREV));
> +	struct string_list *ref_list = cb_data;
> +	struct tag_args *info = xmalloc(sizeof(struct tag_args));
> +
> +	string_list_append(ref_list, ref);
> +
> +	info->oid_abbrev = xstrdup(find_unique_abbrev(oid, DEFAULT_ABBREV));
> +	info->refname = xstrdup(name);
> +	ref_list->items[ref_list->nr - 1].util = info;
>  	return 0;
>  }
>  
> +static int delete_tags(const char **argv)
> +{
> +	int result;
> +	struct string_list ref_list = STRING_LIST_INIT_DUP;
> +	struct string_list_item *ref_list_item;
> +
> +	result = for_each_tag_name(argv, make_string_list, (void *) &ref_list);
> +	if (!result)
> +		result = delete_refs(NULL, &ref_list, REF_NO_DEREF);
> +
> +	for_each_string_list_item(ref_list_item, &ref_list) {
> +		struct tag_args * info = ref_list_item->util;
> +		if (!result)
> +			printf(_("Deleted tag '%s' (was %s)\n"), info->refname,
> +				info->oid_abbrev);
> +		free(info->oid_abbrev);
> +		free(info->refname);
> +		free(info);

It is not performance critical, but info->refname is computable from
ref_list_item->string, isn't it?  I am just wondering if we can do
this without having to allocate the .util field for each of 20,000
tags.  We still need to remember oid (or oid_abbrev, but if I were
writing this, I'd record the full oid in .util and make the code
that prints call find_unique_abbrev() on it), so I guess we cannot
really leave .util NULL.

> +	}
> +	string_list_clear(&ref_list, 0);
> +	return result;

We used to return the returned value from for_each_tag_name() that
repeatedly called delete_tag().  

Now we return value from delete_refs().  Are our caller(s) OK with
the values that may come back from that function?  Can delete_refs()
return a value that is not appropriate to be returned from
cmd_tag(), for example a negative value?

> +}
> +
>  static int verify_tag(const char *name, const char *ref,
> -		      const struct object_id *oid, const void *cb_data)
> +		      const struct object_id *oid, void *cb_data)
>  {
>  	int flags;
>  	const struct ref_format *format = cb_data;
> @@ -511,7 +543,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
>  	if (filter.merge_commit)
>  		die(_("--merged and --no-merged options are only allowed in list mode"));
>  	if (cmdmode == 'd')
> -		return for_each_tag_name(argv, delete_tag, NULL);
> +		return delete_tags(argv);

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] delete multiple tags in a single transaction
  2019-08-08 18:15 ` [PATCH 1/1] delete multiple tags in a single transaction Elijah Newren
@ 2019-08-08 23:43   ` Phil Hord
  2019-08-09  3:05     ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Phil Hord @ 2019-08-08 23:43 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Git Mailing List

On Thu, Aug 8, 2019 at 11:15 AM Elijah Newren <newren@gmail.com> wrote:
>
> On Wed, Aug 7, 2019 at 9:11 PM Phil Hord <phil.hord@gmail.com> wrote:
> >
> > From: Phil Hord <phil.hord@gmail.com>
> >
> > 'git tag -d' accepts one or more tag refs to delete, but each deletion
> > is done by calling `delete_ref` on each argv. This is painfully slow
> > when removing from packed refs. Use delete_refs instead so all the
> > removals can be done inside a single transaction with a single write.
>
> Nice, thanks for working on this.
>
> > I have a repo with 24,000 tags, most of which are not useful to any
> > developers. Having this many refs slows down many operations that
> > would otherwise be very fast. Removing these tags when they've been
> > accidentally fetched again takes about 30 minutes using delete_ref.
>
> I also get really slow times on a repo with ~20,000 tags (though order
> ~3 minutes rather than ~30, probably due to having an SSD on this
> machine) -- but ONLY IF the refs are packed first (git pack-refs
> --all).  If the refs are loose, it's relatively quick to delete a
> dozen thousand or so tags (order of a few seconds).  It might be worth
> mentioning in the commit message that this only makes a significant
> difference in the case where the refs are packed.

I'm also using an SSD but I still see about 10 tags per second being
deleted with the current code (and packed-refs).  I see that I'm
CPU-bound, so I guess most of the time is spent searching through
.git/packed-refs.  Probably it will run faster as it progresses. I
guess the 18,000 branches in my repo keep me on the wrong end of O(N).

My VM is on an all-flash storage array, but I can't say much about its
write throughput since it's one VM among many.

Previously I thought I saw a significant speedup between v2.7.4 (on my
development vm) and v2.22.0 (on my laptop). But this week I saw it was
slow again on my laptop.  I looked for the regression but didn't find
anyone touching that code. Then I wrote this patch.

But it should have occurred to me while I was in the code that there
is a different path for unpacked refs which could explain my previous
speeds.  I didn't think I had any unpacked refs, though, since every
time I look in .git/refs for what I want, I find it relatively empty.
I see 'git pack-refs --help' says that new refs should show up loose,
but I can't say that has happened for me.  Maybe a new clone uses
packed-refs for *everything* and only newly fetched things are loose.
Is that it?  I guess since I seldom fetch tags after the first clone,
it makes sense they would all be packed.

> >     git tag -l feature/* | xargs git tag -d
> >
> > Removing the same tags using delete_refs takes less than 5 seconds.
>
> It appears this same bug also affects `git branch -d` when deleting
> lots of branches (or remote tracking branches) and they are all
> packed; could you apply the same fix there?

Will do.

> In constrast, it appears that `git update-ref --stdin` is fast
> regardless of whether the refs are packed, e.g.
>    git tag -l feature/* | sed -e 's%^%delete refs/tags/%' | git
> update-ref --stdin
> finishes quickly (order of a few seconds).

Nice!  That trick is going in my wiki for devs to use on their VMs.
Thanks for that.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] delete multiple tags in a single transaction
  2019-08-08 19:39 ` Junio C Hamano
@ 2019-08-08 23:58   ` Phil Hord
  0 siblings, 0 replies; 10+ messages in thread
From: Phil Hord @ 2019-08-08 23:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git

On Thu, Aug 8, 2019 at 12:39 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Phil Hord <phil.hord@gmail.com> writes:
>
> > From: Phil Hord <phil.hord@gmail.com>
> >
> > 'git tag -d' accepts one or more tag refs to delete, but each deletion
> > is done by calling `delete_ref` on each argv. This is painfully slow
> > when removing from packed refs. Use delete_refs instead so all the
> > removals can be done inside a single transaction with a single write.
> >
> > I have a repo with 24,000 tags, most of which are not useful to any
> > developers. Having this many refs slows down many operations that
> > would otherwise be very fast. Removing these tags when they've been
> > accidentally fetched again takes about 30 minutes using delete_ref.
> >
> >     git tag -l feature/* | xargs git tag -d
> >
> > Removing the same tags using delete_refs takes less than 5 seconds.
>
> Makes sense.  As mentioned elsewhere in the thread already,
> a batched update-ref would open the packed-refs ony once because
> everything is done in a single transaction, so presumably a pipeline
> like this
>
>         git tag -l feature/* |
>         sed -e 's|^|delete refs/tags/|' |
>         git update-ref --stdin
>
> may work well, and "git tag -d" that gets these refs on the command
> line should be capable of doing the same.
>
> > -static int delete_tag(const char *name, const char *ref,
> > -                   const struct object_id *oid, const void *cb_data)
> > +struct tag_args {
> > +     char *oid_abbrev;
> > +     char *refname;
> > +};
> > +
> > +static int make_string_list(const char *name, const char *ref,
> > +                         const struct object_id *oid, void *cb_data)
>
> Please think about a few more minutes before naming a function like
> this, and make it a habit for your future patches.
>
> We can see that the callback is used to insert more strings into a
> string list, but the type (i.e. string_list) used to represent the
> set is not all that important.  What is more important is why you
> are building that set for, and saying what is in the set (as opposed
> to saying that the container happens to be a string_list) would be a
> good first step.
>
> I presume that you are enumerating the tags to be deleted, together
> with the data necessary for you to report the deletion of the tags?

Hm.  collect_tags?  collect_tags_to_delete?

It's true I didn't put enought thought into that.  I was experimenting
a bit here and was surprised how little code I ended up needing.

> >  {
> > -     if (delete_ref(NULL, ref, oid, 0))
> > -             return 1;
> > -     printf(_("Deleted tag '%s' (was %s)\n"), name,
> > -            find_unique_abbrev(oid, DEFAULT_ABBREV));
> > +     struct string_list *ref_list = cb_data;
> > +     struct tag_args *info = xmalloc(sizeof(struct tag_args));
> > +
> > +     string_list_append(ref_list, ref);
> > +
> > +     info->oid_abbrev = xstrdup(find_unique_abbrev(oid, DEFAULT_ABBREV));
> > +     info->refname = xstrdup(name);
> > +     ref_list->items[ref_list->nr - 1].util = info;
> >       return 0;
> >  }
> >
> > +static int delete_tags(const char **argv)
> > +{
> > +     int result;
> > +     struct string_list ref_list = STRING_LIST_INIT_DUP;
> > +     struct string_list_item *ref_list_item;
> > +
> > +     result = for_each_tag_name(argv, make_string_list, (void *) &ref_list);
> > +     if (!result)
> > +             result = delete_refs(NULL, &ref_list, REF_NO_DEREF);
> > +
> > +     for_each_string_list_item(ref_list_item, &ref_list) {
> > +             struct tag_args * info = ref_list_item->util;
> > +             if (!result)
> > +                     printf(_("Deleted tag '%s' (was %s)\n"), info->refname,
> > +                             info->oid_abbrev);
> > +             free(info->oid_abbrev);
> > +             free(info->refname);
> > +             free(info);
>
> It is not performance critical, but info->refname is computable from
> ref_list_item->string, isn't it?

Oh, I guess it is.  It's a fixed offset into the string, after all.
Thanks.  I did look for a way to avoid the struct noise. Just not
well.

> I am just wondering if we can do
> this without having to allocate the .util field for each of 20,000
> tags.  We still need to remember oid (or oid_abbrev, but if I were
> writing this, I'd record the full oid in .util and make the code
> that prints call find_unique_abbrev() on it), so I guess we cannot
> really leave .util NULL.

My original patch did this (.util = oid).  But then I needed a name.
I'll go back to keeping the oid.  Much cleaner.

>
> > +     }
> > +     string_list_clear(&ref_list, 0);
> > +     return result;
>
> We used to return the returned value from for_each_tag_name() that
> repeatedly called delete_tag().
>
> Now we return value from delete_refs().  Are our caller(s) OK with
> the values that may come back from that function?  Can delete_refs()
> return a value that is not appropriate to be returned from
> cmd_tag(), for example a negative value?

Yes it does.  Will fix.

>
> > +}
> > +
> >  static int verify_tag(const char *name, const char *ref,
> > -                   const struct object_id *oid, const void *cb_data)
> > +                   const struct object_id *oid, void *cb_data)
> >  {
> >       int flags;
> >       const struct ref_format *format = cb_data;
> > @@ -511,7 +543,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
> >       if (filter.merge_commit)
> >               die(_("--merged and --no-merged options are only allowed in list mode"));
> >       if (cmdmode == 'd')
> > -             return for_each_tag_name(argv, delete_tag, NULL);
> > +             return delete_tags(argv);
>
> Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] delete multiple tags in a single transaction
  2019-08-08 23:43   ` Phil Hord
@ 2019-08-09  3:05     ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2019-08-09  3:05 UTC (permalink / raw)
  To: Phil Hord; +Cc: Elijah Newren, Git Mailing List

On Thu, Aug 08, 2019 at 04:43:16PM -0700, Phil Hord wrote:

> > I also get really slow times on a repo with ~20,000 tags (though order
> > ~3 minutes rather than ~30, probably due to having an SSD on this
> > machine) -- but ONLY IF the refs are packed first (git pack-refs
> > --all).  If the refs are loose, it's relatively quick to delete a
> > dozen thousand or so tags (order of a few seconds).  It might be worth
> > mentioning in the commit message that this only makes a significant
> > difference in the case where the refs are packed.
> 
> I'm also using an SSD but I still see about 10 tags per second being
> deleted with the current code (and packed-refs).  I see that I'm
> CPU-bound, so I guess most of the time is spent searching through
> .git/packed-refs.  Probably it will run faster as it progresses. I
> guess the 18,000 branches in my repo keep me on the wrong end of O(N).

Right, deleting individually from packed-refs is inherently quadratic,
because each deletion has to rewrite the entire file. So if you delete
all (or the majority of them), that's O(n^2) individual entry writes.

The loose case is just touching the filesystem for each entry (and the
refs code is smart enough not to bother rewriting packed-refs if the
entry isn't present there). That _can_ be slow if you have a lot of
entries in the same directory (because some filesystems are particularly
bad at this).

So the actual backing storage speed isn't really that important. All the
time goes to copying the same packed-refs entries over and over, whether
they hit the disk or not.

Your solution (using a single transaction) is definitely the right one
(and probably should apply to "branch -d", too). That's what we did long
ago for update-ref, and I think nobody ever really noticed for the
porcelain commands because they don't tend to be used for such bulk
changes.

> But it should have occurred to me while I was in the code that there
> is a different path for unpacked refs which could explain my previous
> speeds.  I didn't think I had any unpacked refs, though, since every
> time I look in .git/refs for what I want, I find it relatively empty.
> I see 'git pack-refs --help' says that new refs should show up loose,
> but I can't say that has happened for me.  Maybe a new clone uses
> packed-refs for *everything* and only newly fetched things are loose.
> Is that it?  I guess since I seldom fetch tags after the first clone,
> it makes sense they would all be packed.

Right, a fresh clone always writes all of its entries as packed refs.
It used to be done by hand, but it happens in a special "initial
transaction" method these days, since 58f233ce1e
(initial_ref_transaction_commit(): function for initial ref creation,
2015-06-22).

> > In constrast, it appears that `git update-ref --stdin` is fast
> > regardless of whether the refs are packed, e.g.
> >    git tag -l feature/* | sed -e 's%^%delete refs/tags/%' | git
> > update-ref --stdin
> > finishes quickly (order of a few seconds).
> 
> Nice!  That trick is going in my wiki for devs to use on their VMs.
> Thanks for that.

Please do encourage people to use for-each-ref instead of the "tag -l"
porcelain, as the latter is subject to change. My usual bulk deletion
command is:

  git for-each-ref --format='delete %(refname)' refs/tags/feature/ |
  git update-ref --stdin

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] t7004: check existence of correct tag
@ 2019-11-13 19:39 Martin Ågren
  2019-11-14  6:35 ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Ågren @ 2019-11-13 19:39 UTC (permalink / raw)
  To: git

We try to delete the non-existing tag "anothertag", but for the
verifications, we check that the tag "myhead" doesn't exist. "myhead"
isn't used in this test except for this checking. Comparing to the test
two tests earlier, it looks like a copy-paste mistake.

Perhaps it's overkill to check that `git tag -d` didn't decide to
*create* a tag. But since we're trying to be this careful, let's
actually check the correct tag. While we're doing this, let's use a more
descriptive tag name instead -- "nonexistingtag" should be obvious.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
---
 This is a resend of
 https://public-inbox.org/git/20190808125330.3104954-1-martin.agren@gmail.com/

 t/t7004-tag.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh
index 80eb13d94e..e4cf605907 100755
--- a/t/t7004-tag.sh
+++ b/t/t7004-tag.sh
@@ -227,10 +227,10 @@ test_expect_success \
 test_expect_success \
 	'trying to delete two tags, existing and not, should fail in the 2nd' '
 	tag_exists mytag &&
-	! tag_exists myhead &&
-	test_must_fail git tag -d mytag anothertag &&
+	! tag_exists nonexistingtag &&
+	test_must_fail git tag -d mytag nonexistingtag &&
 	! tag_exists mytag &&
-	! tag_exists myhead
+	! tag_exists nonexistingtag
 '
 
 test_expect_success 'trying to delete an already deleted tag should fail' \
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] t7004: check existence of correct tag
  2019-11-13 19:39 [PATCH] t7004: check existence of correct tag Martin Ågren
@ 2019-11-14  6:35 ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2019-11-14  6:35 UTC (permalink / raw)
  To: Martin Ågren; +Cc: git

On Wed, Nov 13, 2019 at 08:39:02PM +0100, Martin Ågren wrote:

> We try to delete the non-existing tag "anothertag", but for the
> verifications, we check that the tag "myhead" doesn't exist. "myhead"
> isn't used in this test except for this checking. Comparing to the test
> two tests earlier, it looks like a copy-paste mistake.

Thanks, the description and patch look good to me.

> Perhaps it's overkill to check that `git tag -d` didn't decide to
> *create* a tag. But since we're trying to be this careful, let's
> actually check the correct tag. While we're doing this, let's use a more
> descriptive tag name instead -- "nonexistingtag" should be obvious.

Yes, that would be a spectacular bug indeed. :)

I think it's fine to keep it in, though, as it communicates the
expectations pretty clearly.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-11-14  6:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-08  3:59 [PATCH 1/1] delete multiple tags in a single transaction Phil Hord
2019-08-08 12:47 ` Martin Ågren
2019-08-08 12:53   ` [PATCH] t7004: check existence of correct tag Martin Ågren
2019-08-08 18:15 ` [PATCH 1/1] delete multiple tags in a single transaction Elijah Newren
2019-08-08 23:43   ` Phil Hord
2019-08-09  3:05     ` Jeff King
2019-08-08 19:39 ` Junio C Hamano
2019-08-08 23:58   ` Phil Hord
  -- strict thread matches above, loose matches on Subject: below --
2019-11-13 19:39 [PATCH] t7004: check existence of correct tag Martin Ågren
2019-11-14  6:35 ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).