From: Phil Hord <phil.hord@gmail.com> To: Junio C Hamano <gitster@pobox.com> Cc: Git <git@vger.kernel.org> Subject: Re: [PATCH 1/1] delete multiple tags in a single transaction Date: Thu, 8 Aug 2019 16:58:30 -0700 Message-ID: <CABURp0qHmuNQD4qxL8A5fCJaRsNZfZ51d3e2N3nD-x0jCMhnBw@mail.gmail.com> (raw) In-Reply-To: <xmqq4l2rfnvl.fsf@gitster-ct.c.googlers.com> On Thu, Aug 8, 2019 at 12:39 PM Junio C Hamano <gitster@pobox.com> wrote: > > Phil Hord <phil.hord@gmail.com> writes: > > > From: Phil Hord <phil.hord@gmail.com> > > > > 'git tag -d' accepts one or more tag refs to delete, but each deletion > > is done by calling `delete_ref` on each argv. This is painfully slow > > when removing from packed refs. Use delete_refs instead so all the > > removals can be done inside a single transaction with a single write. > > > > I have a repo with 24,000 tags, most of which are not useful to any > > developers. Having this many refs slows down many operations that > > would otherwise be very fast. Removing these tags when they've been > > accidentally fetched again takes about 30 minutes using delete_ref. > > > > git tag -l feature/* | xargs git tag -d > > > > Removing the same tags using delete_refs takes less than 5 seconds. > > Makes sense. As mentioned elsewhere in the thread already, > a batched update-ref would open the packed-refs ony once because > everything is done in a single transaction, so presumably a pipeline > like this > > git tag -l feature/* | > sed -e 's|^|delete refs/tags/|' | > git update-ref --stdin > > may work well, and "git tag -d" that gets these refs on the command > line should be capable of doing the same. > > > -static int delete_tag(const char *name, const char *ref, > > - const struct object_id *oid, const void *cb_data) > > +struct tag_args { > > + char *oid_abbrev; > > + char *refname; > > +}; > > + > > +static int make_string_list(const char *name, const char *ref, > > + const struct object_id *oid, void *cb_data) > > Please think about a few more minutes before naming a function like > this, and make it a habit for your future patches. > > We can see that the callback is used to insert more strings into a > string list, but the type (i.e. string_list) used to represent the > set is not all that important. What is more important is why you > are building that set for, and saying what is in the set (as opposed > to saying that the container happens to be a string_list) would be a > good first step. > > I presume that you are enumerating the tags to be deleted, together > with the data necessary for you to report the deletion of the tags? Hm. collect_tags? collect_tags_to_delete? It's true I didn't put enought thought into that. I was experimenting a bit here and was surprised how little code I ended up needing. > > { > > - if (delete_ref(NULL, ref, oid, 0)) > > - return 1; > > - printf(_("Deleted tag '%s' (was %s)\n"), name, > > - find_unique_abbrev(oid, DEFAULT_ABBREV)); > > + struct string_list *ref_list = cb_data; > > + struct tag_args *info = xmalloc(sizeof(struct tag_args)); > > + > > + string_list_append(ref_list, ref); > > + > > + info->oid_abbrev = xstrdup(find_unique_abbrev(oid, DEFAULT_ABBREV)); > > + info->refname = xstrdup(name); > > + ref_list->items[ref_list->nr - 1].util = info; > > return 0; > > } > > > > +static int delete_tags(const char **argv) > > +{ > > + int result; > > + struct string_list ref_list = STRING_LIST_INIT_DUP; > > + struct string_list_item *ref_list_item; > > + > > + result = for_each_tag_name(argv, make_string_list, (void *) &ref_list); > > + if (!result) > > + result = delete_refs(NULL, &ref_list, REF_NO_DEREF); > > + > > + for_each_string_list_item(ref_list_item, &ref_list) { > > + struct tag_args * info = ref_list_item->util; > > + if (!result) > > + printf(_("Deleted tag '%s' (was %s)\n"), info->refname, > > + info->oid_abbrev); > > + free(info->oid_abbrev); > > + free(info->refname); > > + free(info); > > It is not performance critical, but info->refname is computable from > ref_list_item->string, isn't it? Oh, I guess it is. It's a fixed offset into the string, after all. Thanks. I did look for a way to avoid the struct noise. Just not well. > I am just wondering if we can do > this without having to allocate the .util field for each of 20,000 > tags. We still need to remember oid (or oid_abbrev, but if I were > writing this, I'd record the full oid in .util and make the code > that prints call find_unique_abbrev() on it), so I guess we cannot > really leave .util NULL. My original patch did this (.util = oid). But then I needed a name. I'll go back to keeping the oid. Much cleaner. > > > + } > > + string_list_clear(&ref_list, 0); > > + return result; > > We used to return the returned value from for_each_tag_name() that > repeatedly called delete_tag(). > > Now we return value from delete_refs(). Are our caller(s) OK with > the values that may come back from that function? Can delete_refs() > return a value that is not appropriate to be returned from > cmd_tag(), for example a negative value? Yes it does. Will fix. > > > +} > > + > > static int verify_tag(const char *name, const char *ref, > > - const struct object_id *oid, const void *cb_data) > > + const struct object_id *oid, void *cb_data) > > { > > int flags; > > const struct ref_format *format = cb_data; > > @@ -511,7 +543,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix) > > if (filter.merge_commit) > > die(_("--merged and --no-merged options are only allowed in list mode")); > > if (cmdmode == 'd') > > - return for_each_tag_name(argv, delete_tag, NULL); > > + return delete_tags(argv); > > Thanks.
prev parent reply other threads:[~2019-08-08 23:58 UTC|newest] Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-08-08 3:59 Phil Hord 2019-08-08 12:47 ` Martin Ågren 2019-08-08 12:53 ` [PATCH] t7004: check existence of correct tag Martin Ågren 2019-08-08 18:15 ` [PATCH 1/1] delete multiple tags in a single transaction Elijah Newren 2019-08-08 23:43 ` Phil Hord 2019-08-09 3:05 ` Jeff King 2019-08-08 19:39 ` Junio C Hamano 2019-08-08 23:58 ` Phil Hord [this message]
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: http://vger.kernel.org/majordomo-info.html * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CABURp0qHmuNQD4qxL8A5fCJaRsNZfZ51d3e2N3nD-x0jCMhnBw@mail.gmail.com \ --to=phil.hord@gmail.com \ --cc=git@vger.kernel.org \ --cc=gitster@pobox.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
git@vger.kernel.org list mirror (unofficial, one of many) This inbox may be cloned and mirrored by anyone: git clone --mirror https://public-inbox.org/git git clone --mirror http://ou63pmih66umazou.onion/git git clone --mirror http://czquwvybam4bgbro.onion/git git clone --mirror http://hjrcffqmbrq6wope.onion/git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V1 git git/ https://public-inbox.org/git \ git@vger.kernel.org public-inbox-index git Example config snippet for mirrors. Newsgroups are available over NNTP: nntp://news.public-inbox.org/inbox.comp.version-control.git nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git nntp://news.gmane.io/gmane.comp.version-control.git note: .onion URLs require Tor: https://www.torproject.org/ code repositories for the project(s) associated with this inbox: https://80x24.org/mirrors/git.git AGPL code for this site: git clone https://public-inbox.org/public-inbox.git