From: Jeff King <peff@peff.net>
To: Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Elijah Newren <newren@gmail.com>
Subject: Re: [PATCH 2/5] strmap: new utility functions
Date: Fri, 21 Aug 2020 15:48:57 -0400 [thread overview]
Message-ID: <20200821194857.GD1165@coredump.intra.peff.net> (raw)
In-Reply-To: <a86fd5fdcc47baf85046eb07257f4db9f9498084.1598035949.git.gitgitgadget@gmail.com>
On Fri, Aug 21, 2020 at 06:52:26PM +0000, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
>
> Add strmap as a new struct and associated utility functions,
> specifically for hashmaps that map strings to some value. The API is
> taken directly from Peff's proposal at
> https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/
Uh oh. You are encouraging me in the belief that I can send half-baked
ideas to the list and somebody will come along and implement them for
me. ;)
> Peff only included the header, not the implementation, so it isn't clear what
> the structure was he was going to use for the hash entries. Instead of having
> my str_entry struct have three subfields (the hashmap_entry, the string, and
> the void* value), I made it only have two -- the hashmap_entry and a
> string_list_item, for two reasons:
I'd probably have done:
struct strmap_entry {
struct hashmap_entry ent;
void *value;
char key[FLEX_ALLOC];
};
That saves 8 bytes (plus malloc overhead)per item, plus avoids an extra
pointer-chase for each item we consider when looking up.
> 1) a strmap is often the data structure we want where string_list has
> been used in the past. Using the same building block for
> individual entries in both makes it easier to adopt and reuse
> parts of the string_list API in strmap.
I can see that there might be some value in being able to interchange
the items for code that's expecting a string_list_item. But I have to
wonder if the potential for confusion is worth it. I.e., should that
code really be expecting a raw string pointer (possibly with a separate
void pointer, or even better an actual typed pointer).
I'll keep an eye out as I read the rest of the series for code which
uses this.
> 2) In some cases, after doing lots of other work, I want to be able
> to iterate over the items in my strmap in sorted order. hashmap
> obviously doesn't support that, but I wanted to be able to export
> the strmap to a string_list easily and then use its functions.
> (Note: I do not need the data structure to both be sorted and have
> efficient lookup at all times. If I did, I might use a B-tree
> instead, as suggested by brian in response to Peff in the thread
> noted above. In my case, most strmaps will never need sorting, but
> in one special case at the very end of a bunch of other work I want
> to iterate over the items in sorted order without doing any more
> lookups afterward.)
Hmm. Likewise, I'll keep an eye open for how this works in practice. I
do suspect that a B-tree might be a better solution here, but
implementing it is non-trivial, and most callers don't care about this
property.
If the interim solution is to just dump it to a string_list and sort
that, that's really not that bad, assuming it just happens once after
we've added everything. I'm not sure there's that big a benefit to using
string_list_item internally, since presumably that conversion needs to
write a whole new array of string_list_items anyway.
> Also, I removed the STRMAP_INIT macro, since it cannot be used to
> correctly initialize a strmap; the underlying hashmap needs a call to
> hashmap_init() to allocate the hash table first.
Since access to the underlying hashmap happens through strmap functions,
they could lazily initialize it. That's how oidmap works.
> diff --git a/strmap.c b/strmap.c
> new file mode 100644
> index 0000000000..1c9fdb3b1e
> --- /dev/null
> +++ b/strmap.c
> @@ -0,0 +1,81 @@
> +#include "git-compat-util.h"
> +#include "strmap.h"
> +
> +static int cmp_str_entry(const void *hashmap_cmp_fn_data,
> + const struct hashmap_entry *entry1,
> + const struct hashmap_entry *entry2,
> + const void *keydata)
> +{
> + const struct str_entry *e1, *e2;
> +
> + e1 = container_of(entry1, const struct str_entry, ent);
> + e2 = container_of(entry2, const struct str_entry, ent);
> + return strcmp(e1->item.string, e2->item.string);
> +}
If you do go the FLEX_ALLOC route, obviously lookups won't want to
allocate a str_entry struct for the lookup key. You'd use keydata there
(and prefer it over looking at entry2 at all). See remotes_hash_cmp()
for an example.
> +static struct str_entry *find_str_entry(struct strmap *map,
> + const char *str)
> +{
> + struct str_entry entry;
> + hashmap_entry_init(&entry.ent, strhash(str));
> + entry.item.string = (char *)str;
> + return hashmap_get_entry(&map->map, &entry, ent, NULL);
> +}
Casting away constness here is awkward. It could likewise benefit from
using keydata, so you wouldn't need to create a temporary
string_list_item (which is where the non-constness comes from).
> +void strmap_clear(struct strmap *map, int free_util)
> +{
> + struct hashmap_iter iter;
> + struct str_entry *e;
> +
> + if (!map)
> + return;
In a lazy-init world, this becomes:
if (!map || !map->map.table)
Of course it would be better still if the hashmap code learned to do the
lazy-init stuff itself.
> + hashmap_for_each_entry(&map->map, &iter, e, ent /* member name */) {
> + free(e->item.string);
> + if (free_util)
> + free(e->item.util);
> + }
> + hashmap_free_entries(&map->map, struct str_entry, ent);
With a flex-alloc struct, you can avoid the extra string free. But I
guess you still wouldn't avoid the loop if you want to support
free_entries().
I wonder if it would make the API simpler if the struct knew whether it
owned the void pointer values or not. Then you'd do:
struct strmap foo = { .free_values = 1 };
...
strmap_put(&foo, "key", value);
...
strmap_clear(&foo);
and wouldn't have to remember to do the right thing at clear-time. It is
a little less flexible (e.g., if you transfer ownership after a certain
point in the code), but I wonder if any callers actually need that (and
they could always set the free_values flag then).
> +/*
> + * Insert "str" into the map, pointing to "data". A copy of "str" is made, so
> + * it does not need to persist after the this function is called.
> + *
> + * If an entry for "str" already exists, its data pointer is overwritten, and
> + * the original data pointer returned. Otherwise, returns NULL.
> + */
> +void *strmap_put(struct strmap *map, const char *str, void *data)
Minor, but IMHO we should avoid copying the docstrings to the
implementation, since it gives two places that people have to remember
to update if the API changes.
> +void *strmap_put(struct strmap *map, const char *str, void *data)
> +{
> + struct str_entry *entry = find_str_entry(map, str);
> + void *old = NULL;
In a lazy-init world, this is:
if (!map->map.table) {
strmap_init(map);
entry = NULL;
} else {
entry = find_str_entry(map, str);
}
(or just call find_str_entry() in both cases and let it realize there's
nothing to find).
> + if (entry) {
> + old = entry->item.util;
> + entry->item.util = data;
> + } else {
> + entry = xmalloc(sizeof(*entry));
> + hashmap_entry_init(&entry->ent, strhash(str));
> + entry->item.string = strdup(str);
> + entry->item.util = data;
> + hashmap_add(&map->map, &entry->ent);
> + }
And in a flex-alloc world, this second block is:
FLEX_ALLOC_STR(entry, key, str);
hashmap_entry_init(&entry->ent, strhash(str));
entry->value = data;
hashmap_add(&map->map, &entry->ent);
> +void *strmap_get(struct strmap *map, const char *str)
> +{
> + struct str_entry *entry = find_str_entry(map, str);
> + return entry ? entry->item.util : NULL;
> +}
In a lazy world, this is:
if (!map->map.table)
return NULL;
> +int strmap_contains(struct strmap *map, const char *str)
> +{
> + return find_str_entry(map, str) != NULL;
> +}
And likewise:
if (!map->map.table)
return NULL;
It might actually be easier to stick that in find_str_entry().
The rest of it all looked good to me.
-Peff
next prev parent reply other threads:[~2020-08-21 19:49 UTC|newest]
Thread overview: 144+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-21 18:52 [PATCH 0/5] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
2020-08-21 18:52 ` [PATCH 1/5] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
2020-08-21 19:22 ` Jeff King
2020-08-21 18:52 ` [PATCH 2/5] strmap: new utility functions Elijah Newren via GitGitGadget
2020-08-21 19:48 ` Jeff King [this message]
2020-08-21 18:52 ` [PATCH 3/5] strmap: add more " Elijah Newren via GitGitGadget
2020-08-21 19:58 ` Jeff King
2020-08-21 18:52 ` [PATCH 4/5] strmap: add strdup_strings option Elijah Newren via GitGitGadget
2020-08-21 20:01 ` Jeff King
2020-08-21 20:41 ` Elijah Newren
2020-08-21 21:03 ` Jeff King
2020-08-21 22:25 ` Elijah Newren
2020-08-28 7:08 ` Jeff King
2020-08-28 17:20 ` Elijah Newren
2020-08-21 18:52 ` [PATCH 5/5] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
2020-08-21 20:10 ` Jeff King
2020-08-21 20:51 ` Elijah Newren
2020-08-21 21:05 ` Jeff King
2020-08-21 20:16 ` [PATCH 0/5] Add struct strmap and associated utility functions Jeff King
2020-08-21 21:33 ` Elijah Newren
2020-08-21 22:28 ` Elijah Newren
2020-08-28 7:03 ` Jeff King
2020-08-28 15:29 ` Elijah Newren
2020-09-01 9:27 ` Jeff King
2020-10-13 0:40 ` [PATCH v2 00/10] " Elijah Newren via GitGitGadget
2020-10-13 0:40 ` [PATCH v2 01/10] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
2020-10-30 12:50 ` Jeff King
2020-10-30 19:55 ` Elijah Newren
2020-11-03 16:26 ` Jeff King
2020-11-03 16:48 ` Elijah Newren
2020-10-13 0:40 ` [PATCH v2 02/10] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
2020-10-30 12:51 ` Jeff King
2020-10-13 0:40 ` [PATCH v2 03/10] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
2020-10-30 13:35 ` Jeff King
2020-10-30 15:37 ` Elijah Newren
2020-11-03 16:08 ` Jeff King
2020-11-03 16:16 ` Elijah Newren
2020-10-13 0:40 ` [PATCH v2 04/10] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
2020-10-30 13:41 ` Jeff King
2020-10-30 16:03 ` Elijah Newren
2020-11-03 16:10 ` Jeff King
2020-10-13 0:40 ` [PATCH v2 05/10] strmap: new utility functions Elijah Newren via GitGitGadget
2020-10-30 14:12 ` Jeff King
2020-10-30 16:26 ` Elijah Newren
2020-10-13 0:40 ` [PATCH v2 06/10] strmap: add more " Elijah Newren via GitGitGadget
2020-10-30 14:23 ` Jeff King
2020-10-30 16:43 ` Elijah Newren
2020-11-03 16:12 ` Jeff King
2020-10-13 0:40 ` [PATCH v2 07/10] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
2020-10-30 14:27 ` Jeff King
2020-10-13 0:40 ` [PATCH v2 08/10] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
2020-10-30 14:39 ` Jeff King
2020-10-30 17:28 ` Elijah Newren
2020-11-03 16:20 ` Jeff King
2020-11-03 16:46 ` Elijah Newren
2020-10-13 0:40 ` [PATCH v2 09/10] strmap: add a strset sub-type Elijah Newren via GitGitGadget
2020-10-30 14:44 ` Jeff King
2020-10-30 18:02 ` Elijah Newren
2020-10-13 0:40 ` [PATCH v2 10/10] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
2020-10-30 14:56 ` Jeff King
2020-10-30 19:31 ` Elijah Newren
2020-11-03 16:24 ` Jeff King
2020-11-02 18:55 ` [PATCH v3 00/13] Add struct strmap and associated utility functions Elijah Newren via GitGitGadget
2020-11-02 18:55 ` [PATCH v3 01/13] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
2020-11-02 18:55 ` [PATCH v3 02/13] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
2020-11-02 18:55 ` [PATCH v3 03/13] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
2020-11-02 18:55 ` [PATCH v3 04/13] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
2020-11-02 18:55 ` [PATCH v3 05/13] hashmap: provide deallocation function names Elijah Newren via GitGitGadget
2020-11-02 18:55 ` [PATCH v3 06/13] strmap: new utility functions Elijah Newren via GitGitGadget
2020-11-02 18:55 ` [PATCH v3 07/13] strmap: add more " Elijah Newren via GitGitGadget
2020-11-04 20:13 ` Jeff King
2020-11-04 20:24 ` Elijah Newren
2020-11-02 18:55 ` [PATCH v3 08/13] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
2020-11-02 18:55 ` [PATCH v3 09/13] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
2020-11-04 20:21 ` Jeff King
2020-11-02 18:55 ` [PATCH v3 10/13] strmap: add a strset sub-type Elijah Newren via GitGitGadget
2020-11-04 20:31 ` Jeff King
2020-11-02 18:55 ` [PATCH v3 11/13] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
2020-11-02 18:55 ` [PATCH v3 12/13] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget
2020-11-04 20:43 ` Jeff King
2020-11-02 18:55 ` [PATCH v3 13/13] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget
2020-11-04 20:48 ` Jeff King
2020-11-04 20:52 ` [PATCH v3 00/13] Add struct strmap and associated utility functions Jeff King
2020-11-04 22:20 ` Elijah Newren
2020-11-05 0:22 ` [PATCH v4 " Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 01/13] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 02/13] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 03/13] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 04/13] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 05/13] hashmap: provide deallocation function names Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 06/13] strmap: new utility functions Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 07/13] strmap: add more " Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 08/13] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 09/13] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 10/13] strmap: add a strset sub-type Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 11/13] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 12/13] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget
2020-11-05 0:22 ` [PATCH v4 13/13] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget
2020-11-05 13:29 ` [PATCH v4 00/13] Add struct strmap and associated utility functions Jeff King
2020-11-05 20:25 ` Junio C Hamano
2020-11-05 21:17 ` Jeff King
2020-11-05 21:22 ` Elijah Newren
2020-11-05 22:15 ` Junio C Hamano
2020-11-06 0:24 ` [PATCH v5 00/15] " Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 01/15] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 02/15] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 03/15] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 04/15] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 05/15] hashmap: provide deallocation function names Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 06/15] strmap: new utility functions Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 07/15] strmap: add more " Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 08/15] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 09/15] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 10/15] strmap: split create_entry() out of strmap_put() Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 11/15] strmap: add a strset sub-type Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 12/15] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
2020-11-11 17:33 ` Phillip Wood
2020-11-11 18:49 ` Elijah Newren
2020-11-11 19:01 ` Jeff King
2020-11-11 20:34 ` Chris Torek
2020-11-06 0:24 ` [PATCH v5 13/15] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 14/15] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget
2020-11-06 0:24 ` [PATCH v5 15/15] shortlog: use strset from strmap.h Elijah Newren via GitGitGadget
2020-11-06 2:00 ` [PATCH v5 00/15] Add struct strmap and associated utility functions Junio C Hamano
2020-11-06 2:42 ` Elijah Newren
2020-11-06 2:48 ` Jeff King
2020-11-06 17:32 ` Junio C Hamano
2020-11-11 20:02 ` [PATCH v6 " Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 01/15] hashmap: add usage documentation explaining hashmap_free[_entries]() Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 02/15] hashmap: adjust spacing to fix argument alignment Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 03/15] hashmap: allow re-use after hashmap_free() Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 04/15] hashmap: introduce a new hashmap_partial_clear() Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 05/15] hashmap: provide deallocation function names Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 06/15] strmap: new utility functions Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 07/15] strmap: add more " Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 08/15] strmap: enable faster clearing and reusing of strmaps Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 09/15] strmap: add functions facilitating use as a string->int map Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 10/15] strmap: split create_entry() out of strmap_put() Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 11/15] strmap: add a strset sub-type Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 12/15] strmap: enable allocations to come from a mem_pool Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 13/15] strmap: take advantage of FLEXPTR_ALLOC_STR when relevant Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 14/15] Use new HASHMAP_INIT macro to simplify hashmap initialization Elijah Newren via GitGitGadget
2020-11-11 20:02 ` [PATCH v6 15/15] shortlog: use strset from strmap.h Elijah Newren via GitGitGadget
2020-11-11 20:07 ` [PATCH v6 00/15] Add struct strmap and associated utility functions Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200821194857.GD1165@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=newren@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).