git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Phil Hord <phil.hord@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 1/1] delete multiple tags in a single transaction
Date: Thu, 08 Aug 2019 12:39:10 -0700
Message-ID: <xmqq4l2rfnvl.fsf@gitster-ct.c.googlers.com> (raw)
In-Reply-To: <20190808035935.30023-1-phil.hord@gmail.com> (Phil Hord's message of "Wed, 7 Aug 2019 20:59:35 -0700")

Phil Hord <phil.hord@gmail.com> writes:

> From: Phil Hord <phil.hord@gmail.com>
>
> 'git tag -d' accepts one or more tag refs to delete, but each deletion
> is done by calling `delete_ref` on each argv. This is painfully slow
> when removing from packed refs. Use delete_refs instead so all the
> removals can be done inside a single transaction with a single write.
>
> I have a repo with 24,000 tags, most of which are not useful to any
> developers. Having this many refs slows down many operations that
> would otherwise be very fast. Removing these tags when they've been
> accidentally fetched again takes about 30 minutes using delete_ref.
>
>     git tag -l feature/* | xargs git tag -d
>
> Removing the same tags using delete_refs takes less than 5 seconds.

Makes sense.  As mentioned elsewhere in the thread already,
a batched update-ref would open the packed-refs ony once because
everything is done in a single transaction, so presumably a pipeline
like this

	git tag -l feature/* | 
	sed -e 's|^|delete refs/tags/|' |
	git update-ref --stdin

may work well, and "git tag -d" that gets these refs on the command
line should be capable of doing the same.

> -static int delete_tag(const char *name, const char *ref,
> -		      const struct object_id *oid, const void *cb_data)
> +struct tag_args {
> +	char *oid_abbrev;
> +	char *refname;
> +};
> +
> +static int make_string_list(const char *name, const char *ref,
> +			    const struct object_id *oid, void *cb_data)

Please think about a few more minutes before naming a function like
this, and make it a habit for your future patches.

We can see that the callback is used to insert more strings into a
string list, but the type (i.e. string_list) used to represent the
set is not all that important.  What is more important is why you
are building that set for, and saying what is in the set (as opposed
to saying that the container happens to be a string_list) would be a
good first step.

I presume that you are enumerating the tags to be deleted, together
with the data necessary for you to report the deletion of the tags?

>  {
> -	if (delete_ref(NULL, ref, oid, 0))
> -		return 1;
> -	printf(_("Deleted tag '%s' (was %s)\n"), name,
> -	       find_unique_abbrev(oid, DEFAULT_ABBREV));
> +	struct string_list *ref_list = cb_data;
> +	struct tag_args *info = xmalloc(sizeof(struct tag_args));
> +
> +	string_list_append(ref_list, ref);
> +
> +	info->oid_abbrev = xstrdup(find_unique_abbrev(oid, DEFAULT_ABBREV));
> +	info->refname = xstrdup(name);
> +	ref_list->items[ref_list->nr - 1].util = info;
>  	return 0;
>  }
>  
> +static int delete_tags(const char **argv)
> +{
> +	int result;
> +	struct string_list ref_list = STRING_LIST_INIT_DUP;
> +	struct string_list_item *ref_list_item;
> +
> +	result = for_each_tag_name(argv, make_string_list, (void *) &ref_list);
> +	if (!result)
> +		result = delete_refs(NULL, &ref_list, REF_NO_DEREF);
> +
> +	for_each_string_list_item(ref_list_item, &ref_list) {
> +		struct tag_args * info = ref_list_item->util;
> +		if (!result)
> +			printf(_("Deleted tag '%s' (was %s)\n"), info->refname,
> +				info->oid_abbrev);
> +		free(info->oid_abbrev);
> +		free(info->refname);
> +		free(info);

It is not performance critical, but info->refname is computable from
ref_list_item->string, isn't it?  I am just wondering if we can do
this without having to allocate the .util field for each of 20,000
tags.  We still need to remember oid (or oid_abbrev, but if I were
writing this, I'd record the full oid in .util and make the code
that prints call find_unique_abbrev() on it), so I guess we cannot
really leave .util NULL.

> +	}
> +	string_list_clear(&ref_list, 0);
> +	return result;

We used to return the returned value from for_each_tag_name() that
repeatedly called delete_tag().  

Now we return value from delete_refs().  Are our caller(s) OK with
the values that may come back from that function?  Can delete_refs()
return a value that is not appropriate to be returned from
cmd_tag(), for example a negative value?

> +}
> +
>  static int verify_tag(const char *name, const char *ref,
> -		      const struct object_id *oid, const void *cb_data)
> +		      const struct object_id *oid, void *cb_data)
>  {
>  	int flags;
>  	const struct ref_format *format = cb_data;
> @@ -511,7 +543,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
>  	if (filter.merge_commit)
>  		die(_("--merged and --no-merged options are only allowed in list mode"));
>  	if (cmdmode == 'd')
> -		return for_each_tag_name(argv, delete_tag, NULL);
> +		return delete_tags(argv);

Thanks.

  parent reply	other threads:[~2019-08-08 19:39 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-08  3:59 Phil Hord
2019-08-08 12:47 ` Martin Ågren
2019-08-08 12:53   ` [PATCH] t7004: check existence of correct tag Martin Ågren
2019-08-08 18:15 ` [PATCH 1/1] delete multiple tags in a single transaction Elijah Newren
2019-08-08 23:43   ` Phil Hord
2019-08-09  3:05     ` Jeff King
2019-08-08 19:39 ` Junio C Hamano [this message]
2019-08-08 23:58   ` Phil Hord

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq4l2rfnvl.fsf@gitster-ct.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=phil.hord@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for the project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git