From: Elijah Newren <newren@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 0/5] Add struct strmap and associated utility functions
Date: Fri, 21 Aug 2020 15:28:07 -0700 [thread overview]
Message-ID: <CABPp-BFFOKcBsPEu+zueCKwPGMh9RgOE8QJonKxkNNE3-Ym7Gw@mail.gmail.com> (raw)
In-Reply-To: <CABPp-BGYiphp-93Bf=2z-ZLd-Y=buTA0BCp6zuTJF39n1x3Rfw@mail.gmail.com>
On Fri, Aug 21, 2020 at 2:33 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Fri, Aug 21, 2020 at 1:16 PM Jeff King <peff@peff.net> wrote:
> >
> > On Fri, Aug 21, 2020 at 06:52:24PM +0000, Elijah Newren via GitGitGadget wrote:
> >
> > > Here I introduce a new strmap type, which my new merge backed, merge-ort,
> > > uses heavily. (I also made significant use of it in my changes to
> > > diffcore-rename). This strmap type was based on Peff's proposal from a
> > > couple years ago[1], but has additions that I made as I used it. I also
> > > start the series off with a quick documentation improvement to hashmap.c to
> > > differentiate between hashmap_free() and hashmap_free_entries(), since I
> > > personally had difficulty understanding them and it affects how
> > > strmap_clear()/strmap_free() are written.
> >
> > I like the direction overall (unsurprisingly), but left a bunch of
> > comments. I do think if we're going to do this that it may be worth
> > cleaning up hashmap a bit first, especially around its clear/free
> > semantics, and its ability to lazy-allocate the table.
> >
> > I'm happy to work on that, but don't want to step on your toes.
>
> I have patches which introduce hashmap_clear() and
> hashmap_clear_entries() to hashmap.[ch], which allowed me to simplify
> strmap_clear(); instead of needing to call both
> hashmap_free[_entries]() && strmap_init(), I could just call
> hashmap_clear[_entries](). Doing that surprised me with a significant
> performance impact (in a good direction), at which point I started
> adding mem-pool integration into hashmap for storing the entries that
> hashmap.c allocates and got further good speedups.
>
> I thought those were better explained when I got to the performance
> stuff, so I had held off on those patches. I could pull them out and
> submit them first.
>
> However, there's an important difference here between what I've done
> and what you've suggested for hashmap: my method did not deallocate
> hashmap->table in hashmap_clear() and then use lazy initialization.
> In fact, I think not deallocating the table was part of the charm --
> the table had already naturally grown to the right size, and because
> the repository has approximately the same number of paths in various
> commits, this provided me a way of getting a table preallocated to a
> reasonable size for all merges after the first (and there are multiple
> merges either when recursiveness is needed due to multiple merge
> bases, OR when rebasing or cherry-picking a sequence of commits).
> This prevented, as hashmap.h puts it, "expensive resizing".
>
> So, once again, my performance ideas might be clashing with some of
> your desires for the API. Any clever ideas for resolving that?
>
> Also, since you want to see hashmap cleanup first, should I submit
> just the hashmap_clear[_entries()] stuff, or should I also submit the
> API additions to allow mem-pool integration in hashmap (it's pretty
> small and self-contained, but it'll be a while before I submit the
> patches that use it...)?
Nevermind, I mis-remembered. The mempool integration was added
specifically to strmap, not to hashmap, because strmap_put() does the
allocation of the str_entry. So I'll just pull out the
hashmap_clear[_entries]() stuff and send it up.
>
> > I also wonder if you looked at the khash stuff at all. Especially for
> > storing integers, it makes things much more natural. You'd do something
> > like:
> >
> > /* you might even be able to just write !strcmp in the macro below */
> > static inline int streq(const char *a, const char *b)
> > {
> > return !strcmp(a, b);
> > }
> >
> > KHASH_INIT(strint_map, char *, int, 1, strhash, streq);
> >
> > and then you'd probably want a "put" wrapper that makes a copy of the
> > string. khash has its own charming awkwardness, but I'm just curious if you
> > looked at it and found it more awkward than hashmap.c, or if you just
> > didn't look at it.
>
> I did look at it, but only briefly. I had a further investigation on
> my TODO list for months, along with several other improvement ideas.
> But it seemed like my TODO list was really long, and my new merge
> backend hasn't benefited anyone yet. At some point, I decided to punt
> on it and other ideas and start cleaning up my code and submitting. I
> believe merge-ort is more accurate than merge-recursive (it fixes
> several test_expect_failures) and is a lot faster as well for the
> cases I'm looking at. So, for now, I've pulled it off my radar.
>
> But I'd be really happy if someone else wanted to jump in and try
> switching out hashmap for khash in the strmap API and see if it helps
> merge-ort performance. :-)
next prev parent reply other threads:[~2020-08-21 22:28 UTC|newest]
Thread overview: 144+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-21 18:52 [PATCH 0/5] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
2020-08-21 18:52 ` [PATCH 1/5] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
2020-08-21 19:22 ` Jeff King
2020-08-21 18:52 ` [PATCH 2/5] strmap: new utility functions Elijah Newren via GitGitGadget
2020-08-21 19:48 ` Jeff King
2020-08-21 18:52 ` [PATCH 3/5] strmap: add more " Elijah Newren via GitGitGadget
2020-08-21 19:58 ` Jeff King
2020-08-21 18:52 ` [PATCH 4/5] strmap: add strdup_strings option Elijah Newren via GitGitGadget
2020-08-21 20:01 ` Jeff King
2020-08-21 20:41 ` Elijah Newren
2020-08-21 21:03 ` Jeff King
2020-08-21 22:25 ` Elijah Newren
2020-08-28 7:08 ` Jeff King
2020-08-28 17:20 ` Elijah Newren
2020-08-21 18:52 ` [PATCH 5/5] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
2020-08-21 20:10 ` Jeff King
2020-08-21 20:51 ` Elijah Newren
2020-08-21 21:05 ` Jeff King
2020-08-21 20:16 ` [PATCH 0/5] Add struct strmap and associated utility functions Jeff King
2020-08-21 21:33 ` Elijah Newren
2020-08-21 22:28 ` Elijah Newren [this message]
2020-08-28 7:03 ` Jeff King
2020-08-28 15:29 ` Elijah Newren
2020-09-01 9:27 ` Jeff King
2020-10-13 0:40 ` [PATCH v2 00/10] " Elijah Newren via GitGitGadget
2020-10-13 0:40 ` [PATCH v2 01/10] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
2020-10-30 12:50 ` Jeff King
2020-10-30 19:55 ` Elijah Newren
2020-11-03 16:26 ` Jeff King
2020-11-03 16:48 ` Elijah Newren
2020-10-13 0:40 ` [PATCH v2 02/10] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
2020-10-30 12:51 ` Jeff King
2020-10-13 0:40 ` [PATCH v2 03/10] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
2020-10-30 13:35 ` Jeff King
2020-10-30 15:37 ` Elijah Newren
2020-11-03 16:08 ` Jeff King
2020-11-03 16:16 ` Elijah Newren
2020-10-13 0:40 ` [PATCH v2 04/10] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
2020-10-30 13:41 ` Jeff King
2020-10-30 16:03 ` Elijah Newren
2020-11-03 16:10 ` Jeff King
2020-10-13 0:40 ` [PATCH v2 05/10] strmap: new utility functions Elijah Newren via GitGitGadget
2020-10-30 14:12 ` Jeff King
2020-10-30 16:26 ` Elijah Newren
2020-10-13 0:40 ` [PATCH v2 06/10] strmap: add more " Elijah Newren via GitGitGadget
2020-10-30 14:23 ` Jeff King
2020-10-30 16:43 ` Elijah Newren
2020-11-03 16:12 ` Jeff King
2020-10-13 0:40 ` [PATCH v2 07/10] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
2020-10-30 14:27 ` Jeff King
2020-10-13 0:40 ` [PATCH v2 08/10] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
2020-10-30 14:39 ` Jeff King
2020-10-30 17:28 ` Elijah Newren
2020-11-03 16:20 ` Jeff King
2020-11-03 16:46 ` Elijah Newren
2020-10-13 0:40 ` [PATCH v2 09/10] strmap: add a strset sub-type Elijah Newren via GitGitGadget
2020-10-30 14:44 ` Jeff King
2020-10-30 18:02 ` Elijah Newren
2020-10-13 0:40 ` [PATCH v2 10/10] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
2020-10-30 14:56 ` Jeff King
2020-10-30 19:31 ` Elijah Newren
2020-11-03 16:24 ` Jeff King
2020-11-02 18:55 ` [PATCH v3 00/13] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
2020-11-02 18:55 ` [PATCH v3 01/13] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
2020-11-02 18:55 ` [PATCH v3 02/13] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
2020-11-02 18:55 ` [PATCH v3 03/13] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
2020-11-02 18:55 ` [PATCH v3 04/13] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
2020-11-02 18:55 ` [PATCH v3 05/13] hashmap: provide deallocation function names Elijah Newren via GitGitGadget
2020-11-02 18:55 ` [PATCH v3 06/13] strmap: new utility functions Elijah Newren via GitGitGadget
2020-11-02 18:55 ` [PATCH v3 07/13] strmap: add more " Elijah Newren via GitGitGadget
2020-11-04 20:13 ` Jeff King
2020-11-04 20:24 ` Elijah Newren
2020-11-02 18:55 ` [PATCH v3 08/13] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
2020-11-02 18:55 ` [PATCH v3 09/13] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
2020-11-04 20:21 ` Jeff King
2020-11-02 18:55 ` [PATCH v3 10/13] strmap: add a strset sub-type Elijah Newren via GitGitGadget
2020-11-04 20:31 ` Jeff King
2020-11-02 18:55 ` [PATCH v3 11/13] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
2020-11-02 18:55 ` [PATCH v3 12/13] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget
2020-11-04 20:43 ` Jeff King
2020-11-02 18:55 ` [PATCH v3 13/13] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget
2020-11-04 20:48 ` Jeff King
2020-11-04 20:52 ` [PATCH v3 00/13] Add struct strmap and associated utility functions Jeff King
2020-11-04 22:20 ` Elijah Newren
2020-11-05 0:22 ` [PATCH v4 " Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 01/13] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 02/13] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 03/13] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 04/13] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 05/13] hashmap: provide deallocation function names Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 06/13] strmap: new utility functions Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 07/13] strmap: add more " Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 08/13] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 09/13] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 10/13] strmap: add a strset sub-type Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 11/13] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 12/13] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 13/13] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget
2020-11-05 13:29 ` [PATCH v4 00/13] Add struct strmap and associated utility functions Jeff King
2020-11-05 20:25 ` Junio C Hamano
2020-11-05 21:17 ` Jeff King
2020-11-05 21:22 ` Elijah Newren
2020-11-05 22:15 ` Junio C Hamano
2020-11-06 0:24 ` [PATCH v5 00/15] " Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 01/15] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 02/15] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 03/15] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 04/15] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 05/15] hashmap: provide deallocation function names Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 06/15] strmap: new utility functions Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 07/15] strmap: add more " Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 08/15] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 09/15] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 10/15] strmap: split create_entry() out of strmap_put() Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 11/15] strmap: add a strset sub-type Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 12/15] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
2020-11-11 17:33 ` Phillip Wood
2020-11-11 18:49 ` Elijah Newren
2020-11-11 19:01 ` Jeff King
2020-11-11 20:34 ` Chris Torek
2020-11-06 0:24 ` [PATCH v5 13/15] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 14/15] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 15/15] shortlog: use strset from strmap.h Elijah Newren via GitGitGadget
2020-11-06 2:00 ` [PATCH v5 00/15] Add struct strmap and associated utility functions Junio C Hamano
2020-11-06 2:42 ` Elijah Newren
2020-11-06 2:48 ` Jeff King
2020-11-06 17:32 ` Junio C Hamano
2020-11-11 20:02 ` [PATCH v6 " Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 01/15] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 02/15] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 03/15] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 04/15] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 05/15] hashmap: provide deallocation function names Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 06/15] strmap: new utility functions Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 07/15] strmap: add more " Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 08/15] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 09/15] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 10/15] strmap: split create_entry() out of strmap_put() Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 11/15] strmap: add a strset sub-type Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 12/15] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 13/15] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 14/15] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 15/15] shortlog: use strset from strmap.h Elijah Newren via GitGitGadget
2020-11-11 20:07 ` [PATCH v6 00/15] Add struct strmap and associated utility functions Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CABPp-BFFOKcBsPEu+zueCKwPGMh9RgOE8QJonKxkNNE3-Ym7Gw@mail.gmail.com \
--to=newren@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).