From: Phil Hord <phil.hord@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Git <git@vger.kernel.org>
Subject: Re: [PATCH 1/1] delete multiple tags in a single transaction
Date: Thu, 8 Aug 2019 16:58:30 -0700 [thread overview]
Message-ID: <CABURp0qHmuNQD4qxL8A5fCJaRsNZfZ51d3e2N3nD-x0jCMhnBw@mail.gmail.com> (raw)
In-Reply-To: <xmqq4l2rfnvl.fsf@gitster-ct.c.googlers.com>
On Thu, Aug 8, 2019 at 12:39 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Phil Hord <phil.hord@gmail.com> writes:
>
> > From: Phil Hord <phil.hord@gmail.com>
> >
> > 'git tag -d' accepts one or more tag refs to delete, but each deletion
> > is done by calling `delete_ref` on each argv. This is painfully slow
> > when removing from packed refs. Use delete_refs instead so all the
> > removals can be done inside a single transaction with a single write.
> >
> > I have a repo with 24,000 tags, most of which are not useful to any
> > developers. Having this many refs slows down many operations that
> > would otherwise be very fast. Removing these tags when they've been
> > accidentally fetched again takes about 30 minutes using delete_ref.
> >
> > git tag -l feature/* | xargs git tag -d
> >
> > Removing the same tags using delete_refs takes less than 5 seconds.
>
> Makes sense. As mentioned elsewhere in the thread already,
> a batched update-ref would open the packed-refs ony once because
> everything is done in a single transaction, so presumably a pipeline
> like this
>
> git tag -l feature/* |
> sed -e 's|^|delete refs/tags/|' |
> git update-ref --stdin
>
> may work well, and "git tag -d" that gets these refs on the command
> line should be capable of doing the same.
>
> > -static int delete_tag(const char *name, const char *ref,
> > - const struct object_id *oid, const void *cb_data)
> > +struct tag_args {
> > + char *oid_abbrev;
> > + char *refname;
> > +};
> > +
> > +static int make_string_list(const char *name, const char *ref,
> > + const struct object_id *oid, void *cb_data)
>
> Please think about a few more minutes before naming a function like
> this, and make it a habit for your future patches.
>
> We can see that the callback is used to insert more strings into a
> string list, but the type (i.e. string_list) used to represent the
> set is not all that important. What is more important is why you
> are building that set for, and saying what is in the set (as opposed
> to saying that the container happens to be a string_list) would be a
> good first step.
>
> I presume that you are enumerating the tags to be deleted, together
> with the data necessary for you to report the deletion of the tags?
Hm. collect_tags? collect_tags_to_delete?
It's true I didn't put enought thought into that. I was experimenting
a bit here and was surprised how little code I ended up needing.
> > {
> > - if (delete_ref(NULL, ref, oid, 0))
> > - return 1;
> > - printf(_("Deleted tag '%s' (was %s)\n"), name,
> > - find_unique_abbrev(oid, DEFAULT_ABBREV));
> > + struct string_list *ref_list = cb_data;
> > + struct tag_args *info = xmalloc(sizeof(struct tag_args));
> > +
> > + string_list_append(ref_list, ref);
> > +
> > + info->oid_abbrev = xstrdup(find_unique_abbrev(oid, DEFAULT_ABBREV));
> > + info->refname = xstrdup(name);
> > + ref_list->items[ref_list->nr - 1].util = info;
> > return 0;
> > }
> >
> > +static int delete_tags(const char **argv)
> > +{
> > + int result;
> > + struct string_list ref_list = STRING_LIST_INIT_DUP;
> > + struct string_list_item *ref_list_item;
> > +
> > + result = for_each_tag_name(argv, make_string_list, (void *) &ref_list);
> > + if (!result)
> > + result = delete_refs(NULL, &ref_list, REF_NO_DEREF);
> > +
> > + for_each_string_list_item(ref_list_item, &ref_list) {
> > + struct tag_args * info = ref_list_item->util;
> > + if (!result)
> > + printf(_("Deleted tag '%s' (was %s)\n"), info->refname,
> > + info->oid_abbrev);
> > + free(info->oid_abbrev);
> > + free(info->refname);
> > + free(info);
>
> It is not performance critical, but info->refname is computable from
> ref_list_item->string, isn't it?
Oh, I guess it is. It's a fixed offset into the string, after all.
Thanks. I did look for a way to avoid the struct noise. Just not
well.
> I am just wondering if we can do
> this without having to allocate the .util field for each of 20,000
> tags. We still need to remember oid (or oid_abbrev, but if I were
> writing this, I'd record the full oid in .util and make the code
> that prints call find_unique_abbrev() on it), so I guess we cannot
> really leave .util NULL.
My original patch did this (.util = oid). But then I needed a name.
I'll go back to keeping the oid. Much cleaner.
>
> > + }
> > + string_list_clear(&ref_list, 0);
> > + return result;
>
> We used to return the returned value from for_each_tag_name() that
> repeatedly called delete_tag().
>
> Now we return value from delete_refs(). Are our caller(s) OK with
> the values that may come back from that function? Can delete_refs()
> return a value that is not appropriate to be returned from
> cmd_tag(), for example a negative value?
Yes it does. Will fix.
>
> > +}
> > +
> > static int verify_tag(const char *name, const char *ref,
> > - const struct object_id *oid, const void *cb_data)
> > + const struct object_id *oid, void *cb_data)
> > {
> > int flags;
> > const struct ref_format *format = cb_data;
> > @@ -511,7 +543,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
> > if (filter.merge_commit)
> > die(_("--merged and --no-merged options are only allowed in list mode"));
> > if (cmdmode == 'd')
> > - return for_each_tag_name(argv, delete_tag, NULL);
> > + return delete_tags(argv);
>
> Thanks.
prev parent reply other threads:[~2019-08-08 23:58 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-08 3:59 [PATCH 1/1] delete multiple tags in a single transaction Phil Hord
2019-08-08 12:47 ` Martin Ågren
2019-08-08 12:53 ` [PATCH] t7004: check existence of correct tag Martin Ågren
2019-08-08 18:15 ` [PATCH 1/1] delete multiple tags in a single transaction Elijah Newren
2019-08-08 23:43 ` Phil Hord
2019-08-09 3:05 ` Jeff King
2019-08-08 19:39 ` Junio C Hamano
2019-08-08 23:58 ` Phil Hord [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CABURp0qHmuNQD4qxL8A5fCJaRsNZfZ51d3e2N3nD-x0jCMhnBw@mail.gmail.com \
--to=phil.hord@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).